Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race condition: deployment logic & tagging sometimes makes MRO fail #382

Closed
clemensutschig opened this issue Jun 19, 2020 · 3 comments · Fixed by #385
Closed

race condition: deployment logic & tagging sometimes makes MRO fail #382

clemensutschig opened this issue Jun 19, 2020 · 3 comments · Fixed by #385
Assignees
Labels
bug Something isn't working

Comments

@clemensutschig
Copy link
Member

clemensutschig commented Jun 19, 2020

latest master.

  1. Provisioned a new nodejs quickstarter - and waited for the initial build to finish
  2. Provisioned a new release manager and configured nodejs as repo in metadata.yml
  3. Ran rm with default params - wip

error below (sometimes ... ) ... it seems a race condition (new image triggered a new deployment, set latest triggered a new deployment ... and then redeploy comes along)

The dafault nodejs quickstarter contains an 'image change' trigger ...

[Deploy to OpenShift-nodejs] **** STARTING stage 'Deploy to OpenShift' for component 'nodejs' branch 'master' ****
[Pipeline] stag)
[Pipeline] { (Deploy to OpenShift)
[Pipeline] sh (Check existance of DeploymentConfig nodejs)
oc -n hello-dev get DeploymentConfig/nodejs
[Pipeline] sh (Get latest version of dc/nodejs)
oc -n hello-dev get dc/nodejs -o 'jsonpath={.status.latestVersion}'
[Pipeline] fileExist)
[Pipeline] sh (Get container images for deploymentconfigs (nodejs))
oc -n hello-dev get dc nodejs -o 'jsonpath={.spec.template.spec.containers[*].image}'
[Pipeline] sh (Check existance of ImageStream nodejs)
oc -n hello-dev get ImageStream/nodejs
[Pipeline] sh (Set tag latest on is/nodejs)
oc -n hello-dev tag nodejs:2-619469ea nodejs:latest
Tag nodejs:latest set to nodejs@sha256:073421ece7d9357989c8f273e287b0ffcf100e4c9d438894ac6d2b9ef90ec904.
[Pipeline] sh (Get latest version of dc/nodejs)
oc -n hello-dev get dc/nodejs -o 'jsonpath={.status.latestVersion}'
[Pipeline] sh (Rollout latest version of dc/nodejs)
oc -n hello-dev rollout latest dc/nodejs
**error: #3 is already in progress (New).**
[Pipeline] }
[Pipeline] // stage
[Pipeline] [Deploy to OpenShift-nodejs] **** ENDED stage 'Deploy to OpenShift' for component 'nodejs' branch 'master' **** (took 2458 ms)
[Pipeline] }
[Pipeline] // wrap
[Pipeline] stag)
[Pipeline] { (odsPipeline error)
[Pipeline] WARN: [nodejs] ***** Finished ODS Pipeline for nodejs (with error) ***** (took 155437 ms)
@clemensutschig clemensutschig added the bug Something isn't working label Jun 19, 2020
@opendevstack opendevstack deleted a comment from georgfedermann Jun 19, 2020
@clemensutschig clemensutschig added this to To Do in OpenDevStack 3.0 via automation Jun 19, 2020
@clemensutschig clemensutschig changed the title master / deployment logic & tagging sometimes fails master / race condition: deployment logic & tagging sometimes makes MRO fail Jun 19, 2020
@michaelsauter
Copy link
Member

OK. I guess the best way forward here is to avoid failing, and instead, if the rollout has been started since that last check, just watch the rollout. I'll need to find out what the safest way is to determine that the rollout did actually start automatically between the check for the latest version and the rollout command.

@clemensutschig clemensutschig changed the title master / race condition: deployment logic & tagging sometimes makes MRO fail race condition: deployment logic & tagging sometimes makes MRO fail Jun 21, 2020
@clemensutschig
Copy link
Member Author

@michaelsauter i gave this soem thought.. i guess the only way.. (as we trigger the Tagging is to .. remove the Image Trigger in the dc.. so we really own the deployment...)

@michaelsauter
Copy link
Member

I also thought about it, but I came to a different conclusion 😄

Removing the trigger is complicated I think because many existing projects will have it, and making them update will be hard as I find the UI/UX around image triggers quite surprising. I'm not a fan of image triggers though, and it also seems like OpenShift is favouring Kubernetes-native Deployment resources going forward, which do not have an image trigger feature at all. So we could think about removing the trigger for new components?

My suggestion to fix this particular race condition:
When the explicit rollout fails, we check again for the latest version of the DC. This time, the version should really be updated, so we just watch the rollout. If the version is still not updated, we just start the rollout again - it is of course likely to fail, but then we cannot do anything about it. I think this solution is safe and should fix the race condition, so nothing to loose.

@michaelsauter michaelsauter moved this from To Do to In Progress in OpenDevStack 3.0 Jun 22, 2020
michaelsauter added a commit to BIX-Digital/ods-jenkins-shared-library that referenced this issue Jun 22, 2020
There seems to be a race condition, in which the version is updated by
an image trigger just between aksing for the version and starting the
rollout.

Fixes opendevstack#382.
OpenDevStack 3.0 automation moved this from In Progress to Done Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants