Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix job server race condition #3769

Merged
merged 1 commit into from
Oct 13, 2017
Merged

Fix job server race condition #3769

merged 1 commit into from
Oct 13, 2017

Conversation

chefsalim
Copy link
Contributor

This change fixes a race condition that can cause the job server to not properly handle busy worker state, and can get into a condition where it loses track of a busy worker. Subsequently if that busy worker is re-started in the middle of a job, it's interrupted job will not get restarted. There are also a couple of other minor tweaks.

Signed-off-by: Salim Alam salam@chef.io

tenor-228551396

@thesentinels
Copy link
Contributor

Thanks for the pull request! Here is what will happen next:

  1. Your PR will be reviewed by the maintainers
  2. If everything looks good, one of them will approve it, and your PR will be merged.

Thank you for contributing!

Copy link
Contributor

@raskchanky raskchanky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tenor-264980007

@chefsalim
Copy link
Contributor Author

@thesentinels approve

@thesentinels
Copy link
Contributor

🤘 I am testing your branch against master before merging it. We do this to ensure that the master branch is never failing tests.

@thesentinels
Copy link
Contributor

:neckbeard: Travis CI has started testing this PR.

@thesentinels
Copy link
Contributor

💔 Travis CI reports this PR failed to pass the test suite.

The next step is to examine the job and figure out why. If it is transient, you can try re-triggering the Travis CI Job - if it passes, this PR will be automatically merged. If it is not transient, you should fix the issue and update this pull request, and issue approve again. If you believe it will never pass, and you are feeling :godmode:, you can issue a force to merge this PR anyway.

jobsrv::JobState::Pending |
jobsrv::JobState::Processing |
jobsrv::JobState::Dispatched => false,
_ => true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend explicitly exhausting all matches so the compiler warns somebody about this function if they add an additional state which may be considered an "incomplete" state

self.delete_worker(&worker)?;
worker.ready()
if !self.is_job_complete(worker.job_id.unwrap())? {
// Handle potential race condition where a Ready heartbeat
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so sorry for this. It's because we're going through two sockets instead of one - I fixed all of this in the other servers but didn't refactor the network code for JobSrv->Worker

tenor-51344680

@reset
Copy link
Collaborator

reset commented Oct 13, 2017

@chefsalim one minor suggestion but other than that g2g. I gotta revisit the networking between jobsrv and worker eventually

Signed-off-by: Salim Alam <salam@chef.io>
@chefsalim
Copy link
Contributor Author

@thesentinels approve

@thesentinels
Copy link
Contributor

🤘 I am testing your branch against master before merging it. We do this to ensure that the master branch is never failing tests.

@thesentinels
Copy link
Contributor

:neckbeard: Travis CI has started testing this PR.

@thesentinels
Copy link
Contributor

💖 Travis CI reports this PR passed.

It always makes me feel nice when humans approve of one anothers work. I'm merging this PR now.

I just want you and the contributor to answer me one question:

gif-keyboard-3280869874741411265

@thesentinels thesentinels merged commit 64c6f5c into master Oct 13, 2017
@thesentinels thesentinels deleted the SA/tweaks branch October 13, 2017 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants