New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISPN-3959 JdbcBinaryStore's expiration locks buckets indefinitely #2368
Conversation
@@ -62,9 +62,18 @@ private void pollUntilEmpty() { | |||
* @return True if all currently scheduled tasks have already been completed, false otherwise; | |||
*/ | |||
public boolean isAllCompleted() { | |||
pollUntilEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rvansa I think it would be simpler to implement CompletionService directly, similar to ExecutorCompletionService, creating a FutureTask that overrides done()
and updates the counters directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, after looking closer at the purging code, I think what you need is a regular ExecutorCompletionService, with the BucketPurger tasks returning the bucket id instead of null.
Correction: bucket ids, since a batch can have more than one bucket.
Compilation error in the infinispan-cachestore-remote test suite:
|
@rvansa Did you get around to fixing the issue @danberindei spotted? |
Oops, I forgot about this PR... I will try to fix it this afternoon. |
Rebased, fixed RemoteStore and RestStore tests... |
We should hold off integrating this until the server testsuite is stable again, to make sure we don't introduce further bugs. @mmarkus is looking into that. |
|
||
} catch (Exception ex) { | ||
// if something happens make sure buckets locks are being release | ||
// if something happens make sure buckets locks are released |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't noticed this before, but I think you need to keep track of all the Futures you started and wait for them to finish before releasing the locks - otherwise you might have worker threads modifying buckets without holding the appropriate locks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, I have to fix the exception handling.
@danberindei Updated. I've also changed the processing of emtpy buckets so that these are not locked for the full duration of purge but instead are removed as soon as the amount gets larger than BATCH_SIZE. |
@rvansa Sorry for the late feedback, but it seems |
I've reran the test and it's working locally... Does it work for you, locally? Can we let it run in CI with logging? |
@galderz ^ Have you tried to run the tests on your local machine? All tests are working for me, and without verbose logging, I can't find what kind of problem CI has. |
// wait until all tasks have completed | ||
while (tasksCompleted < tasksScheduled) { | ||
tasksCompleted += unlockCompleted(ecs); | ||
Thread.yield(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "this"? The yielding? I think when we busy wait for another thread to finish, it's better to yield in order to let it finish (even on multicores). So, it's not exactly necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize it was in a busy wait loop, couldn't you use ExecutorCompletionService.take()
instead of poll()
in unlockCompleted()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I can pass another parameter and make unlockCompleted block in this final wait.
@galderz All (JDBC) tests pass for me as well, and I haven't seen another failure in CI. @rvansa We could create another job in CI to run the PR tests with trace logging enabled, but it would have to be triggered manually because the CI machine is overloaded enough as it is. And some issues prefer to only appear when trace logs are disabled anyway :) |
@danberindei OK, as the failure was in time-limited test, it's possible that the cause was GC or some other hiccup. |
@danberindei So, it uses take() when finishing, now. I am a bit ashamed how many re-pushes it needed to look correct... |
Never be ashamed about it, the important thing is getting it right :) However, I still see |
Don't worry about how many updates it takes @rvansa! It sounds likely that the leveldb failures are caused by your BaseStoreTest changes, though. |
@danberindei Ouch, I thought this was already integrated! The LevelDB has been fixed. |
Testing this... |
Testing completed and integrated, thx @rvansa :) |
https://issues.jboss.org/browse/ISPN-3959