race condition: deployment logic & tagging sometimes makes MRO fail #382

clemensutschig · 2020-06-19T09:57:45Z

latest master.

Provisioned a new nodejs quickstarter - and waited for the initial build to finish
Provisioned a new release manager and configured nodejs as repo in metadata.yml
Ran rm with default params - wip

error below (sometimes ... ) ... it seems a race condition (new image triggered a new deployment, set latest triggered a new deployment ... and then redeploy comes along)

The dafault nodejs quickstarter contains an 'image change' trigger ...

[Deploy to OpenShift-nodejs] **** STARTING stage 'Deploy to OpenShift' for component 'nodejs' branch 'master' ****
[Pipeline] stag)
[Pipeline] { (Deploy to OpenShift)
[Pipeline] sh (Check existance of DeploymentConfig nodejs)
oc -n hello-dev get DeploymentConfig/nodejs
[Pipeline] sh (Get latest version of dc/nodejs)
oc -n hello-dev get dc/nodejs -o 'jsonpath={.status.latestVersion}'
[Pipeline] fileExist)
[Pipeline] sh (Get container images for deploymentconfigs (nodejs))
oc -n hello-dev get dc nodejs -o 'jsonpath={.spec.template.spec.containers[*].image}'
[Pipeline] sh (Check existance of ImageStream nodejs)
oc -n hello-dev get ImageStream/nodejs
[Pipeline] sh (Set tag latest on is/nodejs)
oc -n hello-dev tag nodejs:2-619469ea nodejs:latest
Tag nodejs:latest set to nodejs@sha256:073421ece7d9357989c8f273e287b0ffcf100e4c9d438894ac6d2b9ef90ec904.
[Pipeline] sh (Get latest version of dc/nodejs)
oc -n hello-dev get dc/nodejs -o 'jsonpath={.status.latestVersion}'
[Pipeline] sh (Rollout latest version of dc/nodejs)
oc -n hello-dev rollout latest dc/nodejs
**error: #3 is already in progress (New).**
[Pipeline] }
[Pipeline] // stage
[Pipeline] [Deploy to OpenShift-nodejs] **** ENDED stage 'Deploy to OpenShift' for component 'nodejs' branch 'master' **** (took 2458 ms)
[Pipeline] }
[Pipeline] // wrap
[Pipeline] stag)
[Pipeline] { (odsPipeline error)
[Pipeline] WARN: [nodejs] ***** Finished ODS Pipeline for nodejs (with error) ***** (took 155437 ms)

The text was updated successfully, but these errors were encountered:

michaelsauter · 2020-06-19T12:18:18Z

OK. I guess the best way forward here is to avoid failing, and instead, if the rollout has been started since that last check, just watch the rollout. I'll need to find out what the safest way is to determine that the rollout did actually start automatically between the check for the latest version and the rollout command.

clemensutschig · 2020-06-21T10:17:45Z

@michaelsauter i gave this soem thought.. i guess the only way.. (as we trigger the Tagging is to .. remove the Image Trigger in the dc.. so we really own the deployment...)

michaelsauter · 2020-06-22T06:12:04Z

I also thought about it, but I came to a different conclusion 😄

Removing the trigger is complicated I think because many existing projects will have it, and making them update will be hard as I find the UI/UX around image triggers quite surprising. I'm not a fan of image triggers though, and it also seems like OpenShift is favouring Kubernetes-native Deployment resources going forward, which do not have an image trigger feature at all. So we could think about removing the trigger for new components?

My suggestion to fix this particular race condition:
When the explicit rollout fails, we check again for the latest version of the DC. This time, the version should really be updated, so we just watch the rollout. If the version is still not updated, we just start the rollout again - it is of course likely to fail, but then we cannot do anything about it. I think this solution is safe and should fix the race condition, so nothing to loose.

There seems to be a race condition, in which the version is updated by an image trigger just between aksing for the version and starting the rollout. Fixes opendevstack#382.

clemensutschig added the bug Something isn't working label Jun 19, 2020

opendevstack deleted a comment from georgfedermann Jun 19, 2020

clemensutschig assigned michaelsauter Jun 19, 2020

clemensutschig added this to To Do in OpenDevStack 3.0 via automation Jun 19, 2020

clemensutschig changed the title ~~master / deployment logic & tagging sometimes fails~~ master / race condition: deployment logic & tagging sometimes makes MRO fail Jun 19, 2020

clemensutschig changed the title ~~master / race condition: deployment logic & tagging sometimes makes MRO fail~~ race condition: deployment logic & tagging sometimes makes MRO fail Jun 21, 2020

michaelsauter moved this from To Do to In Progress in OpenDevStack 3.0 Jun 22, 2020

michaelsauter mentioned this issue Jun 22, 2020

Fix rollout race condition and display events in Jenkins log #385

Merged

michaelsauter closed this as completed in #385 Jun 23, 2020

OpenDevStack 3.0 automation moved this from In Progress to Done Jun 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

race condition: deployment logic & tagging sometimes makes MRO fail #382

race condition: deployment logic & tagging sometimes makes MRO fail #382

clemensutschig commented Jun 19, 2020 •

edited

Loading

michaelsauter commented Jun 19, 2020

clemensutschig commented Jun 21, 2020

michaelsauter commented Jun 22, 2020

race condition: deployment logic & tagging sometimes makes MRO fail #382

race condition: deployment logic & tagging sometimes makes MRO fail #382

Comments

clemensutschig commented Jun 19, 2020 • edited Loading

michaelsauter commented Jun 19, 2020

clemensutschig commented Jun 21, 2020

michaelsauter commented Jun 22, 2020

clemensutschig commented Jun 19, 2020 •

edited

Loading