New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix rescheduling on stopped tasks after migrating back to origin #10604
Fix rescheduling on stopped tasks after migrating back to origin #10604
Conversation
return "ScheduledTaskStatisticsImpl{ runs=" + runs + ", createdAt=" | ||
+ createdAt + ", firstRunStart=" + firstRunStart + ", lastRunStart=" + lastRunStart + ", lastRunEnd=" + lastRunEnd | ||
+ ", lastIdleTime=" + lastIdleTime + ", totalRunTime=" + totalRunTime + ", totalIdleTime=" + totalIdleTime + '}'; | ||
return "ScheduledTaskStatisticsImpl{ runs=" + runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ScheduledTaskStatisticsImpl{runs=
without a space matches the majority of our toString()
methods.
HazelcastInstance second = factory.newHazelcastInstance(); | ||
waitAllForSafeState(first, second); | ||
|
||
// Kill the second member, tasks should now get rescheduled back in first member |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can you be sure the task was migrated to the second member in between? assertTrueEventually(new AllTasksRunning(scheduler));
should also pass if there was no migration at all, shouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I am not sure, I can add an extra check to see if there are tasks in the second member, specifically, but I tried to keep the taskCount rather high so that at least some will migrate. But yes, its not deterministic. I will modify to make it so.
HazelcastInstance first = factory.newHazelcastInstance(); | ||
|
||
int tasksCount = 1000; | ||
final IScheduledExecutorService scheduler = getScheduledExecutor(new HazelcastInstance[] {first }, "scheduler"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought there would be more logic inside the getScheduledExecutor()
method, but since it isn't I would just use first.getScheduledExecutorService("scheduler");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its used in the Client tests and its slightly different there. eg. ClientScheduledExecutorServiceBasicTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to remove the whole method. But this test looks like it doesn't need it, since it requires to construct an array (which looks more complex than just retrieving the scheduler).
Test PASSed. |
1cd9ac8
to
c094417
Compare
Test PASSed. |
Guys,
When trying to schedule the Task:
I have looked to the fix you have here and it doesn't look safe and could cause issue i have. You had:
And now:
Where descriptor.stopForMigration() has:
And in old code :
Which has:
This look way safer. Nullpointer i have on 2 machines which have code you can see above. |
@Batter2014 can you please submit a new issue for this? |
Yep, just did that: #11047 |
Starting a task on a single node, and then adding one more member, causes some of the tasks to migrate to partitions owned by the latter member.
During this process, the container stops currently running task, leaving it in a
cancelled
state. If this state, doesn't get disposed (eg. cluster never gets bigger), when the latter member goes down, the tasks will migrate back to the first one. However, since their state there is marked as cancelled, they never get re-scheduled.The fix, introduces a
stop
method, which still cancels the task, but doesn't interact with the state.Fix #10603
Also, minor cleanup and namings.