-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2046181: baremetal: wait for image-customization to come up #5579
Conversation
@dtantsur: This pull request references Bugzilla bug 2046181, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (yporagpa@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test e2e-metal-ipi |
This is a fix worth making but I don't think addresses the bug, the end user will still get a ironic related error from the installer
|
I guess we have a whole different story that the installer doesn't care if bootstrap fails. I don't think we can have a better fix though, not unless we rework how ironic is started. |
if [ $attempt -eq 15 ]; then | ||
echo The image-customization controller did not come up in 30 seconds | ||
podman logs image-customization | ||
exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will just restart the script I think? https://github.com/openshift/installer/blob/master/data/data/bootstrap/baremetal/systemd/units/ironic.service#L12
We really need a way to stop installation.
Using https://github.com/openshift/installer/blob/master/docs/dev/bootstrap_services.md#bootstrap-service-records looks like it would really help for debugging, but I'm not clear on whether anything in the installer will actually bail out once a failure is recorded. @staebler what is the recommended way of bailing out from a non-recoverable error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can tell systemd to treat some exit codes as non-restartable RestartPreventExitStatus=42
and use exit 42
here. Will it work as a quick workaround?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no current way for the bootstrap machine to inform the installer that it should bail out of waiting for the installation to complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Then, I think, aborting this script is the most obvious thing we can do.
@@ -204,6 +204,19 @@ podman run -d --net host --privileged --name image-customization \ | |||
--secret pull-secret,mode=400 \ | |||
${CUSTOMIZATION_IMAGE} | |||
|
|||
# We're not interesting for the exit code, just that the server is available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would actually be sufficient to just check if the container has exited after a short interval (perhaps after the sleep 10 on line 233).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure 10 seconds are always enough? I guess it's not going to take too long...
d11ff03
to
ba6effe
Compare
/retest |
/test e2e-metal-ipi |
/bugzilla refresh The requirements for Bugzilla bugs have changed (BZs linked to PRs on master branch need to target OCP 4.11), recalculating validity. |
@openshift-bot: This pull request references Bugzilla bug 2046181, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@dtantsur: This pull request references Bugzilla bug 2046181, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (yporagpa@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
e2e-alibaba is permanently failing, I think we should override it for the time being |
Yes, e2e-alibaba is permanently failing. /approve |
/bugzilla refresh |
@wking: This pull request references Bugzilla bug 2046181, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (yporagpa@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ardaguclu, elfosardo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
data/data/bootstrap/baremetal/files/usr/local/bin/startironic.sh.template
Show resolved
Hide resolved
data/data/bootstrap/baremetal/files/usr/local/bin/startironic.sh.template
Outdated
Show resolved
Hide resolved
data/data/bootstrap/baremetal/files/usr/local/bin/startironic.sh.template
Outdated
Show resolved
Hide resolved
It crashes if the provided network configuration is invalid. In this case inspection currently fails with a generic message.
/lgtm |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
8 similar comments
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
@dtantsur: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@dtantsur: All pull requests linked via external trackers have merged: Bugzilla bug 2046181 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
It crashes if the provided network configuration is invalid. In this
case inspection currently fails with a generic message.