Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISPN-3959 JdbcBinaryStore's expiration locks buckets indefinitely #2368

Closed
wants to merge 2 commits into from

Conversation

rvansa
Copy link
Member

@rvansa rvansa commented Jan 31, 2014

@@ -62,9 +62,18 @@ private void pollUntilEmpty() {
* @return True if all currently scheduled tasks have already been completed, false otherwise;
*/
public boolean isAllCompleted() {
pollUntilEmpty();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvansa I think it would be simpler to implement CompletionService directly, similar to ExecutorCompletionService, creating a FutureTask that overrides done() and updates the counters directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after looking closer at the purging code, I think what you need is a regular ExecutorCompletionService, with the BucketPurger tasks returning the bucket id instead of null.
Correction: bucket ids, since a batch can have more than one bucket.

@danberindei
Copy link
Member

Compilation error in the infinispan-cachestore-remote test suite:

[12:28:33][ERROR] /mnt/ebs/TeamCity/buildAgent/work/c8db63ad778e4d73/persistence/remote/src/test/java/org/infinispan/persistence/remote/RemoteStoreTest.java:[99,4] method does not override or implement a method from a supertype
  [12:28:33][ERROR] /mnt/ebs/TeamCity/buildAgent/work/c8db63ad778e4d73/persistence/remote/src/test/java/org/infinispan/persistence/remote/RemoteStoreRawValuesTest.java:[101,4] method does not override or implement a method from a supertype

@galderz
Copy link
Member

galderz commented Apr 10, 2014

@rvansa Did you get around to fixing the issue @danberindei spotted?

@rvansa
Copy link
Member Author

rvansa commented Apr 10, 2014

Oops, I forgot about this PR... I will try to fix it this afternoon.

@rvansa
Copy link
Member Author

rvansa commented Apr 10, 2014

Rebased, fixed RemoteStore and RestStore tests...

@galderz
Copy link
Member

galderz commented Apr 15, 2014

We should hold off integrating this until the server testsuite is stable again, to make sure we don't introduce further bugs. @mmarkus is looking into that.


} catch (Exception ex) {
// if something happens make sure buckets locks are being release
// if something happens make sure buckets locks are released
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't noticed this before, but I think you need to keep track of all the Futures you started and wait for them to finish before releasing the locks - otherwise you might have worker threads modifying buckets without holding the appropriate locks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I have to fix the exception handling.

@rvansa rvansa changed the title ISPN-3956 JdbcBinaryStore's expiration locks buckets indefinitely ISPN-3959 JdbcBinaryStore's expiration locks buckets indefinitely Apr 29, 2014
@rvansa
Copy link
Member Author

rvansa commented May 6, 2014

@danberindei Updated. I've also changed the processing of emtpy buckets so that these are not locked for the full duration of purge but instead are removed as soon as the amount gets larger than BATCH_SIZE.

@galderz
Copy link
Member

galderz commented May 26, 2014

@rvansa Sorry for the late feedback, but it seems BinaryStoreWithManagedConnectionTest.testLoadAndStoreWithIdle is failing in CI. Could it be related to your changes?

@rvansa
Copy link
Member Author

rvansa commented May 26, 2014

I've reran the test and it's working locally... Does it work for you, locally? Can we let it run in CI with logging?

@rvansa
Copy link
Member Author

rvansa commented May 29, 2014

@galderz ^ Have you tried to run the tests on your local machine? All tests are working for me, and without verbose logging, I can't find what kind of problem CI has.

// wait until all tasks have completed
while (tasksCompleted < tasksScheduled) {
tasksCompleted += unlockCompleted(ecs);
Thread.yield();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "this"? The yielding? I think when we busy wait for another thread to finish, it's better to yield in order to let it finish (even on multicores). So, it's not exactly necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize it was in a busy wait loop, couldn't you use ExecutorCompletionService.take() instead of poll() in unlockCompleted()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I can pass another parameter and make unlockCompleted block in this final wait.

@danberindei
Copy link
Member

@galderz All (JDBC) tests pass for me as well, and I haven't seen another failure in CI.

@rvansa We could create another job in CI to run the PR tests with trace logging enabled, but it would have to be triggered manually because the CI machine is overloaded enough as it is. And some issues prefer to only appear when trace logs are disabled anyway :)

@rvansa
Copy link
Member Author

rvansa commented May 29, 2014

@danberindei OK, as the failure was in time-limited test, it's possible that the cause was GC or some other hiccup.

@rvansa
Copy link
Member Author

rvansa commented May 30, 2014

@danberindei So, it uses take() when finishing, now.

I am a bit ashamed how many re-pushes it needed to look correct...

@galderz
Copy link
Member

galderz commented Jun 2, 2014

Never be ashamed about it, the important thing is getting it right :)

However, I still see testLoadAndStoreWithIdle failures on Friday runs. I don't really see such failures in master, so we really need to get to the bottom of them :(

@danberindei
Copy link
Member

Don't worry about how many updates it takes @rvansa!

It sounds likely that the leveldb failures are caused by your BaseStoreTest changes, though.

@rvansa
Copy link
Member Author

rvansa commented Jun 18, 2014

@danberindei Ouch, I thought this was already integrated! The LevelDB has been fixed.

@galderz
Copy link
Member

galderz commented Jun 27, 2014

Testing this...

@galderz
Copy link
Member

galderz commented Jun 27, 2014

Testing completed and integrated, thx @rvansa :)

@galderz galderz closed this Jun 27, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants