Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: nodeId out of range: -1 invokeWithArguments(MethodHandle.java:627) #16609

Closed
Danny-Hazelcast opened this issue Feb 5, 2020 · 2 comments · Fixed by #17671
Closed
Assignees
Labels
Source: Internal PR or issue was opened by an employee Team: Core Type: Defect
Milestone

Comments

@Danny-Hazelcast
Copy link
Contributor

http://jenkins.hazelcast.com/view/freeze/job/freeze-all/13/console

Hazelcast Enterprise 4.1-SNAPSHOT (20200205 - 37bffaf, 1e23c41)

02:21:26.259 /disk1/jenkins/workspace/freeze-all/4.0/2020_02_04-11_09_34/stable/destroy/create-use-destroy Failed
02:21:26.265 fail HzClient4HZ _create-use-destroy_createUseDestroy_reliableId hzcmd.id.flake.CreateUseDestroy threadId=0 java.lang.AssertionError: nodeId out of range: -1 �[0m
02:21:26.500 http://54.147.27.51/~jenkins/workspace/freeze-all/4.0/2020_02_04-11_09_34/stable/destroy/create-use-destroy
java.lang.AssertionError: nodeId out of range: -1
	at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
	at com.hazelcast.spi.impl.AbstractInvocationFuture.tryWrapInSameClass(AbstractInvocationFuture.java:1954)
	at com.hazelcast.spi.impl.AbstractInvocationFuture.wrapError(AbstractInvocationFuture.java:1945)
	at com.hazelcast.spi.impl.AbstractInvocationFuture.wrapOrPeel(AbstractInvocationFuture.java:1931)
	at com.hazelcast.spi.impl.AbstractInvocationFuture$ExceptionalResult.wrapForJoinInternal(AbstractInvocationFuture.java:1395)
	at com.hazelcast.spi.impl.AbstractInvocationFuture.resolveAndThrowForJoinInternal(AbstractInvocationFuture.java:592)
	at com.hazelcast.spi.impl.AbstractInvocationFuture.joinInternal(AbstractInvocationFuture.java:576)
	at com.hazelcast.client.impl.proxy.ClientFlakeIdGeneratorProxy.newIdBatch(ClientFlakeIdGeneratorProxy.java:58)
	at com.hazelcast.client.impl.proxy.ClientFlakeIdGeneratorProxy.access$000(ClientFlakeIdGeneratorProxy.java:33)
	at com.hazelcast.client.impl.proxy.ClientFlakeIdGeneratorProxy$1.newIdBatch(ClientFlakeIdGeneratorProxy.java:45)
	at com.hazelcast.flakeidgen.impl.AutoBatcher.newId(AutoBatcher.java:58)
	at com.hazelcast.client.impl.proxy.ClientFlakeIdGeneratorProxy.newId(ClientFlakeIdGeneratorProxy.java:52)
	at hzcmd.id.flake.CreateUseDestroy.timeStep(CreateUseDestroy.java:11)
	at remote.bench.marker.MetricsMarker.flatOut(MetricsMarker.java:53)
	at remote.bench.marker.MetricsMarker.bench(MetricsMarker.java:40)
	at remote.bench.BenchThread.call(BenchThread.java:41)
	at remote.bench.BenchThread.call(BenchThread.java:14)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
@Danny-Hazelcast Danny-Hazelcast modified the milestones: 4.0.1, 4.1 Feb 5, 2020
@mmedenjak mmedenjak added the Source: Internal PR or issue was opened by an employee label Feb 11, 2020
@Danny-Hazelcast
Copy link
Contributor Author

4.1-SNAPSHOT (20200225 - 8b8c625, 4fa4827)

00:10:20.761 /disk1/jenkins/workspace/freeze-x/4.1-SNAPSHOT/2020_02_27-11_38_20/create-use-destroy Failed
00:10:20.769 fail HzClient1HZ _create-use-destroy_createUseDestroy_reliableId hzcmd.id.flake.CreateUseDestroy threadId=0 java.lang.AssertionError: nodeId out of range: -1 �[0m
00:10:20.961 http://54.147.27.51/~jenkins/workspace/freeze-x/4.1-SNAPSHOT/2020_02_27-11_38_20/create-use-destroy
java.lang.AssertionError: nodeId out of range: -1
	at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
	at com.hazelcast.spi.impl.AbstractInvocationFuture.tryWrapInSameClass(AbstractInvocationFuture.java:1954)
	at com.hazelcast.spi.impl.AbstractInvocationFuture.wrapError(AbstractInvocationFuture.java:1945)
	at com.hazelcast.spi.impl.AbstractInvocationFuture.wrapOrPeel(AbstractInvocationFuture.java:1931)
	at com.hazelcast.spi.impl.AbstractInvocationFuture$ExceptionalResult.wrapForJoinInternal(AbstractInvocationFuture.java:1395)
	at com.hazelcast.spi.impl.AbstractInvocationFuture.resolveAndThrowForJoinInternal(AbstractInvocationFuture.java:592)
	at com.hazelcast.spi.impl.AbstractInvocationFuture.joinInternal(AbstractInvocationFuture.java:576)
	at com.hazelcast.client.impl.proxy.ClientFlakeIdGeneratorProxy.newIdBatch(ClientFlakeIdGeneratorProxy.java:58)
	at com.hazelcast.client.impl.proxy.ClientFlakeIdGeneratorProxy.access$000(ClientFlakeIdGeneratorProxy.java:33)
	at com.hazelcast.client.impl.proxy.ClientFlakeIdGeneratorProxy$1.newIdBatch(ClientFlakeIdGeneratorProxy.java:45)
	at com.hazelcast.flakeidgen.impl.AutoBatcher.newId(AutoBatcher.java:58)
	at com.hazelcast.client.impl.proxy.ClientFlakeIdGeneratorProxy.newId(ClientFlakeIdGeneratorProxy.java:52)
	at hzcmd.id.flake.CreateUseDestroy.timeStep(CreateUseDestroy.java:11)
	at remote.bench.marker.MetricsMarker.flatOut(MetricsMarker.java:53)
	at remote.bench.marker.MetricsMarker.bench(MetricsMarker.java:40)
	at remote.bench.BenchThread.call(BenchThread.java:41)
	at remote.bench.BenchThread.call(BenchThread.java:14)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

@jerrinot jerrinot self-assigned this Sep 29, 2020
@jerrinot
Copy link
Contributor

There appears to be a race in a proxy initialization.

Consider this: There is a newly created flake id proxy and 2 user threads accessing the same proxy ends up in the getNode() method. At this point the nextNodeIdUpdate field is set to Long.MIN_VALUE and nodeId field to -1 (=NODE_ID_NOT_YET_SET)

The first thread enters the if branch and set the nextNodeIdUpdate field before setting the nodeId field.

Then the 2nd thread will NOT enter the if branch (because nextNodeIdUpdate <= nanoTime is not true) and will return uninitialized nodeId which still points it -1 and this violates the assertion

Possible solution:

  1. Fix the race inside getNode().
  2. Initialize nodeId in the proxy initialization hook so it never stays as -1

jerrinot added a commit that referenced this issue Oct 2, 2020
Fixes #16609

Reasoning:
The `nextNodeIdUpdate` timestamp was updated BEFORE the `nodeId` field.
This means when 2 threads were initializing the proxy concurrently
then the 2nd thread could see the timestamp already updated then skip the
IF branch and return a stale version of nodeId - which was not updated yet.

This problem is fixed by updating the timestamp after the nodeId field.

Additionally when the IF branch is not taken we still have to dereference
the nodeId field. Otherwise we would return a stale value we had dereferenced
before the IF branch.
jerrinot added a commit that referenced this issue Oct 5, 2020
Fixes #16609

Reasoning:
The `nextNodeIdUpdate` timestamp was updated BEFORE the `nodeId` field.
This means when 2 threads were initializing the proxy concurrently
then the 2nd thread could see the timestamp already updated then skip the
IF branch and return a stale version of nodeId - which was not updated yet.

This problem is fixed by updating the timestamp after the nodeId field.
mmedenjak pushed a commit that referenced this issue Oct 5, 2020
Fixes #16609

Reasoning:
The `nextNodeIdUpdate` timestamp was updated BEFORE the `nodeId` field.
This means when 2 threads were initializing the proxy concurrently
then the 2nd thread could see the timestamp already updated then skip the
IF branch and return a stale version of nodeId - which was not updated yet.

This problem is fixed by updating the timestamp after the nodeId field.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Source: Internal PR or issue was opened by an employee Team: Core Type: Defect
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants