New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix wrong scheduled task count caused by backup replicas #9704
Fix wrong scheduled task count caused by backup replicas #9704
Conversation
When scheduling tasks on random partitions, the getAllScheduled call returns twice as many futures from the expected result. The reason seems to be the backup replicas which are also returned, even though they are just stashed placeholders having no future scheduled. To address this, we introduced a flag, marking the main copy of the task as master upon initital scheduling or promotions and using that when building the list of tasks to return to the API call
Test PASSed. |
When there is a migration rolllback then the source side will always set itself as a master replica. is this the right thing to do? |
@jerrinot thanks for the comment, but to be honest I can't say I understand the concern :/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok, just minor change and check the rollbackMigration
behaviour in case when the source is not the partition owner.
* This flag is set to true only on ititial scheduling of a task, and on after a promotion (stashed or migration), | ||
* in the latter case the other replicas get disposed. | ||
*/ | ||
private transient boolean masterReplica; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps for clarity rename to isPartitionOwner since "master" is a different thing in the cluster.
@mmedenjak addressed both comments ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. it would be good to squash commits before merge.
Test PASSed. |
1 similar comment
Test PASSed. |
When scheduling tasks on random partitions, the getAllScheduled call
returns twice as many futures from the expected result. The reason
seems to be the backup replicas which are also returned, even though they are
just stashed placeholders having no future scheduled. To address this, we introduced
a flag, marking the main copy of the task as master upon initital scheduling or promotions
and using that when building the list of tasks to return to the API call.
Fixes #9694