New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 5154: (SegmentStore) Fixed a stats concurrency bug in the Read Index #5170
Merged
sachin-j-joshi
merged 3 commits into
pravega:master
from
andreipaduroiu:issue-5154-cache-eviction
Sep 14, 2020
Merged
Issue 5154: (SegmentStore) Fixed a stats concurrency bug in the Read Index #5170
sachin-j-joshi
merged 3 commits into
pravega:master
from
andreipaduroiu:issue-5154-cache-eviction
Sep 14, 2020
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ndex summaries to get out of sync, thus causing the Cache Manager to evict more than needed. Signed-off-by: Andrei Paduroiu <andrei.paduroiu@emc.com>
Codecov Report
@@ Coverage Diff @@
## master #5170 +/- ##
============================================
- Coverage 84.78% 84.75% -0.03%
- Complexity 12229 12233 +4
============================================
Files 795 795
Lines 45067 45066 -1
Branches 4693 4693
============================================
- Hits 38210 38196 -14
- Misses 4364 4376 +12
- Partials 2493 2494 +1 Continue to review full report at Codecov.
|
tkaitchuck
approved these changes
Sep 11, 2020
andreipaduroiu
added a commit
to andreipaduroiu/pravega
that referenced
this pull request
Sep 14, 2020
…Index (pravega#5170) Fixed a concurrency bug in StreamSegmentReadIndex which could allow index summaries to get out of sync, thus causing the Cache Manager to evict more than needed. Signed-off-by: Andrei Paduroiu <andrei.paduroiu@emc.com> Co-authored-by: Tom Kaitchuck <tkaitchuck@users.noreply.github.com>
ravisharda
pushed a commit
that referenced
this pull request
Sep 15, 2020
andreipaduroiu
added a commit
to andreipaduroiu/pravega
that referenced
this pull request
Sep 24, 2020
…Index (pravega#5170) Fixed a concurrency bug in StreamSegmentReadIndex which could allow index summaries to get out of sync, thus causing the Cache Manager to evict more than needed. Signed-off-by: Andrei Paduroiu <andrei.paduroiu@emc.com> Co-authored-by: Tom Kaitchuck <tkaitchuck@users.noreply.github.com>
andreipaduroiu
added a commit
that referenced
this pull request
Sep 28, 2020
…ranch r0.7 (#5217) Cherry-pick #5207, #5154, #5155 and #5119 into branch r0.7: * Issue 5119: (SegmentStore) Copy-on-Read for Table Segment Compaction * Issue 5155: (SegmentStore) Enabling copy-on-read for all Segment Reads * Issue 5154: (SegmentStore) Fixed a stats concurrency bug in the Read Index (#5170) * Issue 5207: (SegmentStore) Read Index Bug Fixes Signed-off-by: Andrei Paduroiu <andrei.paduroiu@emc.com>
tkaitchuck
added a commit
to tkaitchuck/pravega-1
that referenced
this pull request
Feb 15, 2021
…Index (pravega#5170) Fixed a concurrency bug in StreamSegmentReadIndex which could allow index summaries to get out of sync, thus causing the Cache Manager to evict more than needed. Signed-off-by: Andrei Paduroiu <andrei.paduroiu@emc.com> Co-authored-by: Tom Kaitchuck <tkaitchuck@users.noreply.github.com> Signed-off-by: Tom Kaitchuck <tom.kaitchuck@emc.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change log description
Fixed a concurrency bug in StreamSegmentReadIndex which could allow index summaries to get out of sync, thus causing the Cache Manager to evict more than needed.
Purpose of the change
Fixes #5154.
What the code does
The Cache Manager relies on accurate data from its clients to determine whether to advance its "Oldest Generation" (i.e., tell clients to evict data) or not. If the reported Oldest Generation from the clients does not change as a result of this process, it will keep advancing it in subsequent iterations until it can no longer do it (equals Current Generation). If Oldest Generation == Current Generation, then cache entries will not live too long in the cache, thus rendering it mostly useless.
The StreamSegmentReadIndex keeps a parallel statistic of how many entries it has per generation (this is because it can have tens, if not hundreds of thousands of entries - and querying them every time the Cache Manager needs it would be too time consuming). This statistic should be updated in lockstep with the rest of the index (upon inserts, removals, appends and retrievals). For consistency, all these updates (except evictions) should be done while holding the read index lock as they may end up changing the Generation of the entry (i.e., it was touched). It seems that one code path was not doing it under the lock, which meant that a "touch" could alter the stats of the wrong generation (should a concurrent touch/update execute).
This change moves the problematic stats update under the read index lock, which ensures that no other update on that entry may produce adverse side effects.
NOTE: there may be more efficient ways of operating on the Read Index, probably with finer grained locks. However this has not posed a problem so far and hasn't gotten in our way. The goal of this PR is to fix a bug, and not improve theoretical performance.
How to verify it
All tests must pass. Unfortunately the nature of this change means it cannot be tested via unit tests.