Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rollout race condition and display events in Jenkins log #385

Merged

Conversation

michaelsauter
Copy link
Member

@michaelsauter michaelsauter commented Jun 22, 2020

Fixes #382.

If there is an error while starting a rollout, we check if the version has advanced, and simply do not start a rollout.

Further, this PR introduces a new feature which I believe will help users a lot to debug issues. When something goes wrong with a deployment the cause can usually be found in either the events of the replication controller (some error prevents pod creation) or the events of the pod (some error prevents the pod from becoming active). Unfortunately, many people don't know where to look or don't have the rights to do so. To improve this situation, we now query for the events and display them in the Jenkins log when something prevents a successful rollout. It may look like this:

WARN: Error: hudson.AbortException: Deployment timed out. Observed related event messages:
=== Events from ReplicationController 'foo-20' ===
Created pod: foo-20-b6wmt
=== Events from Pod 'foo-20-b6wmt' ===
Successfully assigned myproject/foo-20-b6wmt to ip-172-31-52-140.eu-east-1.compute.internal
pulling image "172.30.21.193:5000/myproject/foo@sha256:ca88cd1ff2572cee6b74f7efdc5afc6f6d649a1e1616394a80da18dbe9601f4c"
Successfully pulled image "172.30.21.193:5000/myproject foo@sha256: ca88cd1ff2572cee6b74f7efdc5afc6f6d649a1e1616394a80da18dbe9601f4c"
Created container
Started container
Readiness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:235: starting container process caused "exec: \"exit 1\": executable file not found in $PATH"

I have also reduced the code duplication between component and orchestration pipeline. I have done a basic test of the orchestration pipeline, but am planning to do more extensive testing anyway as I am working on #367 which requires a fair amount of refactoring.

There seems to be a race condition, in which the version is updated by
an image trigger just between aksing for the version and starting the
rollout.

Fixes opendevstack#382.
@michaelsauter michaelsauter added the enhancement New feature or request label Jun 22, 2020
@michaelsauter michaelsauter self-assigned this Jun 22, 2020
@michaelsauter michaelsauter merged commit ec8adab into opendevstack:master Jun 23, 2020
@michaelsauter michaelsauter added this to Done in OpenDevStack 3.0 via automation Jun 23, 2020
@michaelsauter michaelsauter deleted the fix/rollout-race-condition branch June 23, 2020 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

race condition: deployment logic & tagging sometimes makes MRO fail
2 participants