-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[Test] Fix SharedBlobCacheServiceTests.testGetMultiThreaded #112322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A cacheFileRegion can be concurrently evicted while its being incref'd. See this comment https://github.com/elastic/elasticsearch/blob/98fe686da4c5cb82d4b03719977be428dc7934e7/x-pack/plugin/blob-cache/src/main/java/org/elasticsearch/blobcache/shared/SharedBlobCacheService.java#L1812-L1813 The tryRead method also performs null and eviction check for io before returing true. https://github.com/elastic/elasticsearch/blob/98fe686da4c5cb82d4b03719977be428dc7934e7/x-pack/plugin/blob-cache/src/main/java/org/elasticsearch/blobcache/shared/SharedBlobCacheService.java#L926-L931 Resolves: elastic#112314
|
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
original-brownbear
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm @henningandersen could you also take a look here? I wonder if this doesn't show a concurrency issue now?
In a sense this result to me means that the tryRead path could now randomly fail run because it does a non-volatile read on the IO that was just set.
It will eventually work, but for stuff that just (for some measure of "just") written to the cache there is a chance the fast-path is broken isn't there? Maybe I'm missing a detail?
|
Thanks for the ping @original-brownbear . I think it is ok. In the path where we set io to null, we evict first, then make assumptions on the refcount. As stated in the comment, we As for fast path read / |
henningandersen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for jumping on this. I want to give Armin a chance to respond before approving though.
| if (yield[i] == 0) { | ||
| Thread.yield(); | ||
| } | ||
| assertNotNull(cacheFileRegion.testOnlyNonVolatileIO()); | ||
| assertTrue(cacheFileRegion.testOnlyNonVolatileIO() != null || cacheFileRegion.isEvicted()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can now also do this outside the incref block, I think it could be good to move it there. I originally intended this to be a safe spot for it, but given the extra check for evicted we may as well just do it prior to the incref block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. Pushed 7f50458
|
One more note on the safety here: we may also in the past (prior to the non-volatile io PR) not have seen the transition to null of |
|
I do find the combine usage of evicted and refcount a bit trickly to reason about. It surprised me that incref does not always prevent eviction. That said, I believe it is safe on the |
|
The |
Yea that was the idea, it's fine to read some garbage bytes optimistically as long as we can check that they're garbage after. But I get it now, sorry was tired last night :D this test definitely does not show a problem in the becoming visible of IO != null, just the other direction that doesn't matter much since we have the flag :) => LGTM :) thanks both! |
henningandersen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
I'll merge this to reenable the test coverage of this. |
…112322) A cacheFileRegion can be concurrently evicted while its being incref'd. See this comment https://github.com/elastic/elasticsearch/blob/98fe686da4c5cb82d4b03719977be428dc7934e7/x-pack/plugin/blob-cache/src/main/java/org/elasticsearch/blobcache/shared/SharedBlobCacheService.java#L1812-L1813 The tryRead method also performs null and eviction check for io before returing true. https://github.com/elastic/elasticsearch/blob/98fe686da4c5cb82d4b03719977be428dc7934e7/x-pack/plugin/blob-cache/src/main/java/org/elasticsearch/blobcache/shared/SharedBlobCacheService.java#L926-L931 Resolves: elastic#112314
…112322) A cacheFileRegion can be concurrently evicted while its being incref'd. See this comment https://github.com/elastic/elasticsearch/blob/98fe686da4c5cb82d4b03719977be428dc7934e7/x-pack/plugin/blob-cache/src/main/java/org/elasticsearch/blobcache/shared/SharedBlobCacheService.java#L1812-L1813 The tryRead method also performs null and eviction check for io before returing true. https://github.com/elastic/elasticsearch/blob/98fe686da4c5cb82d4b03719977be428dc7934e7/x-pack/plugin/blob-cache/src/main/java/org/elasticsearch/blobcache/shared/SharedBlobCacheService.java#L926-L931 Resolves: elastic#112314
A cacheFileRegion can be concurrently evicted while its being incref'd.
See this comment
elasticsearch/x-pack/plugin/blob-cache/src/main/java/org/elasticsearch/blobcache/shared/SharedBlobCacheService.java
Lines 1812 to 1813 in 98fe686
Resolves: #112314