-
Notifications
You must be signed in to change notification settings - Fork 405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 5673: (SLTS) Fix wrong GC config #5674
Issue 5673: (SLTS) Fix wrong GC config #5674
Conversation
Signed-off-by: Sachin Joshi <sachin.joshi@emc.com>
Codecov Report
@@ Coverage Diff @@
## master #5674 +/- ##
============================================
+ Coverage 84.73% 84.74% +0.01%
- Complexity 13884 13889 +5
============================================
Files 917 917
Lines 51445 51441 -4
Branches 5305 5305
============================================
+ Hits 43592 43595 +3
+ Misses 4962 4959 -3
+ Partials 2891 2887 -4 Continue to review full report at Codecov.
|
@@ -41,8 +41,8 @@ | |||
public static final Property<Integer> GARBAGE_COLLECTION_DELAY = Property.named("garbage.collection.delay.seconds", 60); | |||
public static final Property<Integer> GARBAGE_COLLECTION_MAX_CONCURRENCY = Property.named("garbage.collection.concurrency.max", 10); | |||
public static final Property<Integer> GARBAGE_COLLECTION_MAX_QUEUE_SIZE = Property.named("garbage.collection.queue.size.max", 16 * 1024); | |||
public static final Property<Integer> GARBAGE_COLLECTION_SLEEP = Property.named("garbage.collection.sleep.seconds", 60); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to flood our logs if we ever want to use DEBUG level for a period of time to have more details about the activity of the system. In our system tests, this will for instance make the size of logs much larger without need. Also, it looks a super low interval for something that is supposed to periodically delete garbage, which is going to induce a significant amount of CPU usage without a clear justification. I could agree on lowering this value, but anything below 1-second I think that deserves a justification that I don't see in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- GC polling runs on its own dedicated thread.
- During each iteration the code blocks until at least one metadata item is eligible for deletion. (So iteration don't finish until there is something to delete. This way there are no empty iterations creating lots of useless logs.)
- Each iteration it only deletes GARBAGE_COLLECTION_MAX_CONCURRENCY number of items.
- However when there are lots of chunks to be deleted the queue becomes full quickly if we don't drain the queue fast enough.
- By reducing interval between iterations we are responding to increased delete queue faster, at the same time when queue is empty we are blocked thus adjusting delete speed to queue length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so my next question is: if the behavior of the GC thread is blocking as you mention (this could be mentioned in the parameter javadoc, by the way), what is the point of having a sleep at all? That is, it will delete chucks as long as there are chunks to delete, or it will remain blocked otherwise. This parameter seems a candidate for removal, so we could simplify the configuration of SLTS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter allow us to throttle the rate at which background delete tasks are run. However I don't know what is a good value. In such cases it's better to have config value that you can tweak and experiment with.
Changed SLTS GC time unit from seconds to milliseconds. Also changed the level of some log messages to reduce noise. Signed-off-by: Sachin Joshi <sachin.joshi@emc.com>
Changed SLTS GC time unit from seconds to milliseconds. Also changed the level of some log messages to reduce noise. Signed-off-by: Sachin Joshi <sachin.joshi@emc.com> Signed-off-by: Tom Kaitchuck <tom.kaitchuck@emc.com>
Changed SLTS GC time unit from seconds to milliseconds. Also changed the level of some log messages to reduce noise. Signed-off-by: Sachin Joshi <sachin.joshi@emc.com>
Changed SLTS GC time unit from seconds to milliseconds. Also changed the level of some log messages to reduce noise. Signed-off-by: Sachin Joshi <sachin.joshi@emc.com>
Changed SLTS GC time unit from seconds to milliseconds. Also changed the level of some log messages to reduce noise. Signed-off-by: Sachin Joshi <sachin.joshi@emc.com>
Changed SLTS GC time unit from seconds to milliseconds. Also changed the level of some log messages to reduce noise. Signed-off-by: Sachin Joshi <sachin.joshi@emc.com>
Cherry-picks following changes 0.9 Issue 5518: (SLTS) Fix Integer overflow. (#5520) Issue 5606: BoundedInputStream::markSupported() should always return false.. (#5615) Issue 5597 : ExtendedS3ChunkStorage and ExtendedS3Storage should close S3Client. (#5598) Issue 5535: (SegmentStore) Refactoring ExtendedS3 Test Mocks (#5537) Issue 5456: (SLTS) - Handle possible failure during deletion of metadata keys can cause version mismatch in next attempt (#5457) Issue 5458: (SLTS) For read operation, read all chunks in parallel. (#5459) Issue 4967: (SLTS) Add createWithContent overload that creates object and writes to it in a single call. (#5455) Issue 5664: SLTS - fix possible thread visibility issues.. (#5665) Issue 5475: Eliminate unnecessary pessimistic checks. (#5662) Issue 5570: (SLTS) AsyncBaseChunkStorage latency should not include time spent in waiting to be executed. (#5661) Issue 4967: (SLTS) No need to check last chunk length for claimOwnership for no append mode. (#5663) Issue 5673: (SLTS) Fix wrong GC config (#5674) Issue 5737: (SLTS) Improve metrics (#5746) Issue 5460: (SLTS) Add read index virtual block entries to metadata. (#5461) Issue 5772: (SLTS) - BaseMetadataStore does not evict entries from buffer. (#5773) Issue 5798: (SLTS) Fix close in GarbageCollector, ExtendedS3ChunkStorage and HDFSChunkStorage. (#5800) Issue 5788: (SLTS) Remove ACL operations from ExtendedS3ChunkStorage (#5790) Issue 5808: SLTS - Fix Preconditions checks to include enough information in message. (#5809) Issue 5853: SLTS - BaseMetadataStore.get does not return deep copy when loading from store. #5858 Issue 5866: (SLTS) ReadIndexCache stats not reported. (#5867) Signed-off-by: Sachin Joshi <sachin.joshi@emc.com>
Signed-off-by: Sachin Joshi sachin.joshi@emc.com
Change log description
Change unit from seconds to milliseconds.
Also change log.info to log.debug to reduce noise.
Purpose of the change
Fixes #5673
What the code does
How to verify it
No regression