-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix job server race condition #3769
Conversation
Thanks for the pull request! Here is what will happen next:
Thank you for contributing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thesentinels approve |
🤘 I am testing your branch against master before merging it. We do this to ensure that the master branch is never failing tests. |
Travis CI has started testing this PR. |
💔 Travis CI reports this PR failed to pass the test suite. The next step is to examine the job and figure out why. If it is transient, you can try re-triggering the Travis CI Job - if it passes, this PR will be automatically merged. If it is not transient, you should fix the issue and update this pull request, and issue |
jobsrv::JobState::Pending | | ||
jobsrv::JobState::Processing | | ||
jobsrv::JobState::Dispatched => false, | ||
_ => true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend explicitly exhausting all matches so the compiler warns somebody about this function if they add an additional state which may be considered an "incomplete" state
self.delete_worker(&worker)?; | ||
worker.ready() | ||
if !self.is_job_complete(worker.job_id.unwrap())? { | ||
// Handle potential race condition where a Ready heartbeat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chefsalim one minor suggestion but other than that g2g. I gotta revisit the networking between jobsrv and worker eventually |
Signed-off-by: Salim Alam <salam@chef.io>
@thesentinels approve |
🤘 I am testing your branch against master before merging it. We do this to ensure that the master branch is never failing tests. |
Travis CI has started testing this PR. |
💖 Travis CI reports this PR passed. It always makes me feel nice when humans approve of one anothers work. I'm merging this PR now. I just want you and the contributor to answer me one question: |
This change fixes a race condition that can cause the job server to not properly handle busy worker state, and can get into a condition where it loses track of a busy worker. Subsequently if that busy worker is re-started in the middle of a job, it's interrupted job will not get restarted. There are also a couple of other minor tweaks.
Signed-off-by: Salim Alam salam@chef.io