New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[scheduled-executor] ScheduledTaskDescriptor.stopForMigration Nullpointer #11047

Closed
ruslan-belinskyy opened this Issue Aug 7, 2017 · 3 comments

Comments

Projects
None yet
3 participants
@ruslan-belinskyy

ruslan-belinskyy commented Aug 7, 2017

Guys,
i'm currently getting:

java.lang.NullPointerException
at com.hazelcast.scheduledexecutor.impl.ScheduledTaskDescriptor.stopForMigration(ScheduledTaskDescriptor.java:146) ~[hazelcast-all-3.8.3.jar:3.8.3]

When trying to schedule the Task:

IScheduledExecutorService service = hazelcastInstance.getScheduledExecutorService(executorName);
service.scheduleAtFixedRate(new HzcastTimerExchangeSender(this.getEndpoint().getEndpointUri(), executorName), endpoint.getDelay(), endpoint.getPeriod(), TimeUnit.MILLISECONDS);

I have looked to the fix you have here (#10604) and it doesn't look safe and could cause issue i have.

You had:

try {
 descriptor.cancel(true);
 descriptor.setScheduledFuture(null);
 descriptor.setTaskOwner(false); 
} catch (Exception ex) {
 throw rethrow(ex); 
}

And now:

try {
 descriptor.stopForMigration();
 } catch (Exception ex) {
 throw rethrow(ex); 
}

Where descriptor.stopForMigration() has:

void stopForMigration() {

   // Result is not set, allowing task to get re-scheduled, if/when needed.
   this.isTaskOwner = false;
   this.future.cancel(true); //Nullpointer here
   this.future = null;
}

And in old code :

descriptor.cancel(true);

Which has:

boolean cancel(boolean mayInterrupt)
throws ExecutionException, InterruptedException {
if (!resultRef.compareAndSet(null, new ScheduledTaskResult(true)) || future == null) {
return false;
}

    return future.cancel(mayInterrupt);
}

This look way safer.

Nullpointer i have on 2 machines which have code you can see above.

@mmedenjak mmedenjak added this to the 3.9 milestone Aug 7, 2017

@mmedenjak mmedenjak changed the title from ScheduledTaskDescriptor.stopForMigration Nullpointer to [scheduled-executor] ScheduledTaskDescriptor.stopForMigration Nullpointer Aug 7, 2017

@tkountis tkountis self-assigned this Aug 8, 2017

@tkountis

This comment has been minimized.

Contributor

tkountis commented Aug 8, 2017

Thanks for the detailed report @Batter2014 . Yes the fix should be very trivial, but I was wondering if you could share the logs as well, to see what action led to this result. Nevertheless, i will try to push a fix in a bit, but I would rather have a solid reproducer as a test.

@ruslan-belinskyy

This comment has been minimized.

ruslan-belinskyy commented Aug 8, 2017

I don't have anything special, just two hosts with code:

IScheduledExecutorService service = hazelcastInstance.getScheduledExecutorService(executorName);
service.scheduleAtFixedRate(new HzcastTimerExchangeSender(this.getEndpoint().getEndpointUri(), executorName), endpoint.getDelay(), endpoint.getPeriod(), TimeUnit.MILLISECONDS);

When i start both of them, one after another i will always get that exception

Full stack:

java.lang.NullPointerException
	at com.hazelcast.scheduledexecutor.impl.ScheduledTaskDescriptor.stopForMigration(ScheduledTaskDescriptor.java:146) ~[hazelcast-all-3.8.3.jar:3.8.3]
	at com.hazelcast.scheduledexecutor.impl.ScheduledExecutorContainer.prepareForReplication(ScheduledExecutorContainer.java:294) ~[hazelcast-all-3.8.3.jar:3.8.3]
	at com.hazelcast.scheduledexecutor.impl.ScheduledExecutorPartition.prepareReplicationOperation(ScheduledExecutorPartition.java:70) ~[hazelcast-all-3.8.3.jar:3.8.3]
	at com.hazelcast.scheduledexecutor.impl.DistributedScheduledExecutorService.prepareReplicationOperation(DistributedScheduledExecutorService.java:140) ~[hazelcast-all-3.8.3.jar:3.8.3]
	at com.hazelcast.internal.partition.operation.MigrationRequestOperation.prepareMigrationOperations(MigrationRequestOperation.java:218) [hazelcast-all-3.8.3.jar:3.8.3]
	at com.hazelcast.internal.partition.operation.MigrationRequestOperation.run(MigrationRequestOperation.java:86) [hazelcast-all-3.8.3.jar:3.8.3]
	at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:186) [hazelcast-all-3.8.3.jar:3.8.3]
	at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:120) [hazelcast-all-3.8.3.jar:3.8.3]
	at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.run(OperationThread.java:100) [hazelcast-all-3.8.3.jar:3.8.3]
@tkountis

This comment has been minimized.

Contributor

tkountis commented Aug 10, 2017

@Batter2014 I posted a fix for 3.9 which i will also backport once merged to 3.8 (the version you are using). However, I wasn't able to reproduce it not even once, and since you said that you see this always, I wanted to make sure I am not missing sth. Your finding is absolutely valid, it is just the always part that got my attention. Would you be able to share your (or part of) the Runnable code ? Also logs of the nodes would also help, to see if there is sth wrong going on during migration of things between the nodes that cause this misbehaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment