Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspicious behavour of testRecovery_singleInstanceRemaining #5923

Closed
jerrinot opened this issue Aug 11, 2015 · 5 comments

Comments

Projects
None yet
2 participants
@jerrinot
Copy link
Contributor

commented Aug 11, 2015

public void testRecovery_singleInstanceRemaining() throws XAException {
seems to have two operational modes:

  • fast mode (<10s)
  • slow mode - (>2 minutes)

There is nothing in between. My guess is either our transaction recovery or the test itself is racy - sometimes a transaction is recovered successfully, but sometimes it's waiting for a timeout (2 minutes)

https://hazelcast-l337.ci.cloudbees.com/job/Hazelcast-3.x-OpenJDK6/com.hazelcast$hazelcast/623/testReport/com.hazelcast.xa/HazelcastXATest/testRecovery_singleInstanceRemaining/history/

image

@jerrinot jerrinot self-assigned this Aug 11, 2015

@jerrinot jerrinot added this to the 3.6 milestone Aug 11, 2015

@jerrinot

This comment has been minimized.

Copy link
Contributor Author

commented Aug 13, 2015

It seems like a side-effect of #5602 - the recent changes in the lock operations.

I can see the transaction is recovered, it's even committed, but the record is not unlocked because of this check. When the entry is not unlocked then the get() operation is waiting for tx timeout -> 2 minutes long mode.

@jerrinot

This comment has been minimized.

Copy link
Contributor Author

commented Aug 13, 2015

I can reproduce & detect the problem reliably by annotating the testRecovery_singleInstanceRemaining test by

    @Test(timeout = 60 * 1000 * 1)
    @Repeat(100)

When I randomize the initial callId on members then the issue is gone.

Now the question is how to fix this properly. I could split the referenceId into lockReferenceId and unlockReferenceId and it would probably fix this very test. But I'm afraid it's not a real solution as a transaction rollback would be still affected. @mdogan: What's your view?

jerrinot added a commit to jerrinot/hazelcast that referenced this issue Aug 13, 2015

@mdogan

This comment has been minimized.

Copy link
Member

commented Aug 13, 2015

I did not understand, how lock and unlock are using the same callId. Aren't they separate invocations?

@jerrinot

This comment has been minimized.

Copy link
Contributor Author

commented Aug 13, 2015

They are. However in transactional map both invocations might be initiated by a different members and they might have the same callID - purely by a chance. That's why randomization of initial callID makes this to go away.

@mdogan

This comment has been minimized.

Copy link
Member

commented Aug 13, 2015

Ah I see. lock and unlock can be called by different members in transactions. For normal lock operations, this cannot happen.

Maybe we can use some different reference-id (different from call-id) for transactional locks. Or we can remove reference check for them...

jerrinot added a commit to jerrinot/hazelcast that referenced this issue Aug 14, 2015

Fix hazelcast#5923
For transactional locks the referenceId is not taken into consideration as lock/unlock operations might
be initiated by different members -> they might have the same referenceId and the unlock might be lost.

See the hazelcast#5923 and hazelcast#5954 for details.

jerrinot added a commit to jerrinot/hazelcast that referenced this issue Aug 14, 2015

Fix hazelcast#5923
For transactional locks the referenceId is not taken into consideration as lock/unlock operations might
be initiated by different members -> they might have the same referenceId and the unlock might be lost.

See the hazelcast#5923 and hazelcast#5954 for details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.