Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wrong scheduled task count caused by backup replicas #9704

Merged

Conversation

tkountis
Copy link
Contributor

When scheduling tasks on random partitions, the getAllScheduled call
returns twice as many futures from the expected result. The reason
seems to be the backup replicas which are also returned, even though they are
just stashed placeholders having no future scheduled. To address this, we introduced
a flag, marking the main copy of the task as master upon initital scheduling or promotions
and using that when building the list of tasks to return to the API call.

Fixes #9694

When scheduling tasks on random partitions, the getAllScheduled call
returns twice as many futures from the expected result. The reason
seems to be the backup replicas which are also returned, even though they are
just stashed placeholders having no future scheduled. To address this, we introduced
a flag, marking the main copy of the task as master upon initital scheduling or promotions
and using that when building the list of tasks to return to the API call
@tkountis tkountis added this to the 3.8 milestone Jan 19, 2017
@tkountis tkountis self-assigned this Jan 19, 2017
@devOpsHazelcast
Copy link
Collaborator

Test PASSed.

@jerrinot
Copy link
Contributor

When there is a migration rolllback then the source side will always set itself as a master replica. is this the right thing to do?

@tkountis
Copy link
Contributor Author

@jerrinot thanks for the comment, but to be honest I can't say I understand the concern :/
In a migration rollback, the source side will take ownership of the task and re-schedule it once again, meaning it is going to be the master replica. Am I understanding something wrong from your comment ? What would you expect the correct behaviour to be ?

@mmedenjak mmedenjak self-requested a review January 20, 2017 09:34
Copy link
Contributor

@mmedenjak mmedenjak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok, just minor change and check the rollbackMigration behaviour in case when the source is not the partition owner.

* This flag is set to true only on ititial scheduling of a task, and on after a promotion (stashed or migration),
* in the latter case the other replicas get disposed.
*/
private transient boolean masterReplica;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps for clarity rename to isPartitionOwner since "master" is a different thing in the cluster.

@tkountis
Copy link
Contributor Author

@mmedenjak addressed both comments ;)

Copy link
Contributor

@jerrinot jerrinot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. it would be good to squash commits before merge.

@devOpsHazelcast
Copy link
Collaborator

Test PASSed.

1 similar comment
@devOpsHazelcast
Copy link
Collaborator

Test PASSed.

@tkountis tkountis merged commit d7bef66 into hazelcast:master Jan 20, 2017
@tkountis tkountis deleted the fix/3.8/sched_exec_wrong_task_count branch January 20, 2017 15:36
@mmedenjak mmedenjak added the Source: Internal PR or issue was opened by an employee label Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Source: Internal PR or issue was opened by an employee Team: Core Type: Defect
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong getAllScheduledCount on Scheduled Executor Service
4 participants