Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HHH-12457 Local Infinispan read-write caches become stale on rollback #6010

Merged
merged 2 commits into from Aug 24, 2018

Conversation

rvansa
Copy link
Member

@rvansa rvansa commented May 24, 2018

https://hibernate.atlassian.net/browse/HHH-12457

Backport #5970 already integrated, but this has dependency to #5954 and #5964

https://issues.jboss.org/browse/ISPN-9205 which was introduced in the backport is fixed here.

@tristantarrant
Copy link
Member

Needs rebase

@galderz
Copy link
Member

galderz commented Jun 14, 2018

@rvansa Needs rebasing

@rvansa
Copy link
Member Author

rvansa commented Jun 14, 2018

@galderz Done

Copy link
Member

@galderz galderz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments

@@ -21,7 +21,7 @@
*/
public abstract class InvalidationCacheAccessDelegate implements AccessDelegate {
protected static final InfinispanMessageLogger log = InfinispanMessageLogger.Provider.getLog( InvalidationCacheAccessDelegate.class );
protected static final boolean TRACE_ENABLED = log.isTraceEnabled();
protected static final boolean trace = log.isTraceEnabled();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? trace is used later on...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary refactoring of the name ;)

@@ -32,7 +30,6 @@ public void beforeCompletion() {}

@Override
public void afterCompletion(int status) {
log.tracef("After completion callback with status %d", status);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove log message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find it too useful, and to keep it symmetric with LocalInvalidationSynchronization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it useful when debugging ISPN-8026. PFLValidator's endInvalidatingKey can be called from multiple places. This log message clarified for me what the origin of the end invalidation message was.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll put that back then to both places.

// Exceptions in #afterCompletion() are silently ignored, since the transaction
// is already committed in DB. However we must not return until we update the cache.
FutureUpdate futureUpdate = new FutureUpdate(uuid, region.nextTimestamp(), success ? this.value : null);
for (;;) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this get noisy? Couldn't we just remove the remove the entry from the cache if we have any issues (and log those silently) rather than trying until it works?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A removal is as likely to fail as any other update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

sync.registerBeforeCommit(future);
} else {
log.trace("Removal was not applied immediatelly, waiting.");
future.join();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does this wait for? Endlessly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as any sync command would - a combination of locking timeouts and rpc timeouts. But not endlessly.

Object task = tasks[i];
if (task instanceof CompletableFuture) {
try {
((CompletableFuture) task).join();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing about this join too

CompletableFuture<?> cf = (CompletableFuture<?>) tasks[i];
if (cf != null) {
try {
cf.join();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another

log.trace("Tombstone was not applied immediatelly, waiting.");
// Slow path: there's probably a locking conflict and we need to wait
// until the command gets applied (and replicated, too!).
future.join();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

@rvansa
Copy link
Member Author

rvansa commented Jun 27, 2018

Rebased, still on top of #5964 (that should go in first).

@galderz
Copy link
Member

galderz commented Jul 10, 2018

Needs rebasing

@rvansa rvansa force-pushed the HHH-12457 branch 4 times, most recently from feb8c27 to 79c7c87 Compare July 11, 2018 15:04
@rvansa
Copy link
Member Author

rvansa commented Jul 12, 2018

Rebased. There's one test failure in 2LC, though it's not in local mode...

@galderz
Copy link
Member

galderz commented Jul 13, 2018

If there's a failure this is not ready.

@galderz
Copy link
Member

galderz commented Jul 13, 2018

To be clear: I've spent a lot of time in the past year removing all random failures in 2LC testsuite. Regardless the failure is related to the change or not, the moment one appears it needs to be resolved there and then. Otherwise the changes don't go in.

@rvansa
Copy link
Member Author

rvansa commented Aug 16, 2018

@galderz Forum reference: https://developer.jboss.org/message/984624

I am scheduled to other tasks right now but if you could get a trace log from the failing test I could take a peek for couple of hours.

@galderz
Copy link
Member

galderz commented Aug 20, 2018

@rvansa I'll handle this, thx

@Override
public CompletableFuture<Void> invoke(boolean success) {
if (trace) {
log.tracef("After completion callback, success=%d", success);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be success=%b

@@ -31,7 +33,9 @@ public InvalidationInvocation(NonTxPutFromLoadInterceptor nonTxPutFromLoadInterc

@Override
public CompletableFuture<Void> invoke(boolean success) {
log.tracef("After completion callback, success? %s", success);
if (trace) {
log.tracef("After completion callback, success=%d", success);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be success=%b

@diegolovison

This comment has been minimized.

* Also making sure that we use simple cache and not the async chain
directly in local mode

Signed-off-by: Radim Vansa <rvansa@redhat.com>
@pruivo
Copy link
Member

pruivo commented Aug 24, 2018

@rvansa @galderz is this ready for this week release? or it will be moved to the next one?

@rvansa
Copy link
Member Author

rvansa commented Aug 24, 2018

@pruivo Galder has reservations since we've seen random failure in the CI on this PR. There's a good chance that it's not related but Galder gave it red and I was not able to reproduce it.

@galderz
Copy link
Member

galderz commented Aug 24, 2018

@pruivo @rvansa I've been running the 5.1 testsuite continuously with TRACE for several days and I've not seen it fail. I'll do one last check of the PR and if it looks good integrate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants