-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests/installed: Make reboot task less racy #1548
Conversation
This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935 Closes: #1548 Approved by: jlebon
💔 Test failed - status-atomicjenkins |
This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935
It seems like 240 retries is just not long enough for all the non-destructive tests running in parallel to finish. Let's crank that up to 500 retries.
12dac6a
to
9414bea
Compare
Ahh OK, that's an instance of #1549. |
🙀 |
This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935 Closes: #1548 Approved by: cgwalters
It seems like 240 retries is just not long enough for all the non-destructive tests running in parallel to finish. Let's crank that up to 500 retries. Closes: #1548 Approved by: cgwalters
Whoops! I have this filter that marks all @rh-atomic-bot notifications as read unless it's "Test timed out" or "Test failed". I tweaked it to also leave the "try again" emails unread. |
💔 Test failed - status-atomicjenkins |
@rh-atomic-bot retry |
This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935 Closes: #1548 Approved by: cgwalters
It seems like 240 retries is just not long enough for all the non-destructive tests running in parallel to finish. Let's crank that up to 500 retries. Closes: #1548 Approved by: cgwalters
💔 Test failed - status-atomicjenkins |
@rh-atomic-bot retry |
This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935 Closes: #1548 Approved by: cgwalters
It seems like 240 retries is just not long enough for all the non-destructive tests running in parallel to finish. Let's crank that up to 500 retries. Closes: #1548 Approved by: cgwalters
💔 Test failed - status-atomicjenkins |
@rh-atomic-bot retry |
This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935 Closes: #1548 Approved by: cgwalters
💔 Test failed - status-atomicjenkins |
@rh-atomic-bot retry
:cry:
|
This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935 Closes: #1548 Approved by: cgwalters
It seems like 240 retries is just not long enough for all the non-destructive tests running in parallel to finish. Let's crank that up to 500 retries. Closes: #1548 Approved by: cgwalters
Let's only print if the commit isn't already partial; this addresses a spam of "marking commit partial" from fsck. Closes: #1548 Approved by: cgwalters
No need to poll every second, there's going to be some latency here and we want to avoid the overhead of polling. Closes: #1548 Approved by: cgwalters
Hopefully we'll fix this soon. Closes: #1548 Approved by: cgwalters
@rh-atomic-bot, stop being such a jerk! |
💔 Test failed - status-atomicjenkins |
OK, I thought I had it, it seemed fairly robust locally but apparently no dice. I briefly looked at doing a custom Ansible module, but (again just from initial investigation) it seems we'd really need the engine to understand this; we basically want an atomic combination of |
This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935 Closes: #1548 Approved by: cgwalters
It seems like 240 retries is just not long enough for all the non-destructive tests running in parallel to finish. Let's crank that up to 500 retries. Closes: #1548 Approved by: cgwalters
Let's only print if the commit isn't already partial; this addresses a spam of "marking commit partial" from fsck. Closes: #1548 Approved by: cgwalters
No need to poll every second, there's going to be some latency here and we want to avoid the overhead of polling. Closes: #1548 Approved by: cgwalters
Hopefully we'll fix this soon. Closes: #1548 Approved by: cgwalters
Going back to longer timeouts seems to be OK in some manual testing in the CI env. We'll just live with races temporarily...I'll investigate other approaches. |
💔 Test failed - status-atomicjenkins |
Well, the PR test definitely passed, so 🍾 🎊. |
⚡ Test exempted: pull fully rebased and already tested. |
It seems like 240 retries is just not long enough for all the non-destructive tests running in parallel to finish. Let's crank that up to 500 retries. Closes: #1548 Approved by: cgwalters
Let's only print if the commit isn't already partial; this addresses a spam of "marking commit partial" from fsck. Closes: #1548 Approved by: cgwalters
No need to poll every second, there's going to be some latency here and we want to avoid the overhead of polling. Closes: #1548 Approved by: cgwalters
Hopefully we'll fix this soon. Closes: #1548 Approved by: cgwalters
This took a whole lot of experimentation. I hit upon the idea
of doing a
systemctl stop sshd
to avoid the situation where wemight ssh back into the system while it's in the process of shutting
down.
Ultimately the other fix is disabling
ControlMaster
; seefor example: ansible/ansible#17935