tests/installed: Make reboot task less racy #1548

cgwalters · 2018-04-19T19:59:55Z

This took a whole lot of experimentation. I hit upon the idea
of doing a systemctl stop sshd to avoid the situation where we
might ssh back into the system while it's in the process of shutting
down.

Ultimately the other fix is disabling ControlMaster; see
for example: ansible/ansible#17935

jlebon · 2018-04-19T20:24:44Z

@rh-atomic-bot r+ 4cd789f

rh-atomic-bot · 2018-04-19T20:25:15Z

⌛ Testing commit 4cd789f with merge 760d285...

This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935 Closes: #1548 Approved by: jlebon

rh-atomic-bot · 2018-04-19T21:09:45Z

💔 Test failed - status-atomicjenkins

This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935

It seems like 240 retries is just not long enough for all the non-destructive tests running in parallel to finish. Let's crank that up to 500 retries.

jlebon · 2018-04-20T12:30:42Z

Ahh OK, that's an instance of #1549.
@rh-atomic-bot r+ bafa31c

rh-atomic-bot · 2018-04-20T12:30:43Z

🙀 bafa31c is not a valid commit SHA. Please try again with 9414bea.

cgwalters · 2018-04-20T12:49:22Z

@rh-atomic-bot r+ 9414bea

rh-atomic-bot · 2018-04-20T12:49:27Z

⌛ Testing commit 9414bea with merge 0013a6b...

This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935 Closes: #1548 Approved by: cgwalters

It seems like 240 retries is just not long enough for all the non-destructive tests running in parallel to finish. Let's crank that up to 500 retries. Closes: #1548 Approved by: cgwalters

jlebon · 2018-04-20T13:01:19Z

Whoops! I have this filter that marks all @rh-atomic-bot notifications as read unless it's "Test timed out" or "Test failed". I tweaked it to also leave the "try again" emails unread.

rh-atomic-bot · 2018-04-20T14:16:56Z

💔 Test failed - status-atomicjenkins

jlebon · 2018-04-20T14:19:56Z

@rh-atomic-bot retry

rh-atomic-bot · 2018-04-20T14:20:01Z

⌛ Testing commit 9414bea with merge eca179b...

This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935 Closes: #1548 Approved by: cgwalters

It seems like 240 retries is just not long enough for all the non-destructive tests running in parallel to finish. Let's crank that up to 500 retries. Closes: #1548 Approved by: cgwalters

rh-atomic-bot · 2018-04-20T14:37:44Z

💔 Test failed - status-atomicjenkins

cgwalters · 2018-04-20T15:16:59Z

@rh-atomic-bot retry

rh-atomic-bot · 2018-04-20T15:17:07Z

⌛ Testing commit 9414bea with merge bb90ca8...

This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935 Closes: #1548 Approved by: cgwalters

It seems like 240 retries is just not long enough for all the non-destructive tests running in parallel to finish. Let's crank that up to 500 retries. Closes: #1548 Approved by: cgwalters

rh-atomic-bot · 2018-04-20T15:37:30Z

💔 Test failed - status-atomicjenkins

jlebon · 2018-04-20T16:08:15Z

@rh-atomic-bot retry

rh-atomic-bot · 2018-04-20T16:08:20Z

⌛ Testing commit 9414bea with merge f46ead0...

This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935 Closes: #1548 Approved by: cgwalters

rh-atomic-bot · 2018-04-20T18:41:54Z

💔 Test failed - status-atomicjenkins

cgwalters · 2018-04-20T18:42:46Z

@rh-atomic-bot retry :cry:

rh-atomic-bot · 2018-04-20T18:42:51Z

⌛ Testing commit 1463077 with merge 81705ce...

This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935 Closes: #1548 Approved by: cgwalters

It seems like 240 retries is just not long enough for all the non-destructive tests running in parallel to finish. Let's crank that up to 500 retries. Closes: #1548 Approved by: cgwalters

Let's only print if the commit isn't already partial; this addresses a spam of "marking commit partial" from fsck. Closes: #1548 Approved by: cgwalters

No need to poll every second, there's going to be some latency here and we want to avoid the overhead of polling. Closes: #1548 Approved by: cgwalters

Hopefully we'll fix this soon. Closes: #1548 Approved by: cgwalters

jlebon · 2018-04-20T18:45:51Z

@rh-atomic-bot, stop being such a jerk!

rh-atomic-bot · 2018-04-20T19:04:59Z

💔 Test failed - status-atomicjenkins

cgwalters · 2018-04-20T21:37:22Z

OK, I thought I had it, it seemed fairly robust locally but apparently no dice. I briefly looked at doing a custom Ansible module, but (again just from initial investigation) it seems we'd really need the engine to understand this; we basically want an atomic combination of- shell: reboot + - wait_for_connection.

cgwalters · 2018-04-23T16:50:11Z

@rh-atomic-bot r+ 951052c

rh-atomic-bot · 2018-04-23T16:50:18Z

⌛ Testing commit 951052c with merge d168de5...

This took a whole lot of experimentation. I hit upon the idea of doing a `systemctl stop sshd` to avoid the situation where we might ssh back into the system while it's in the process of shutting down. Ultimately the other fix is disabling `ControlMaster`; see for example: ansible/ansible#17935 Closes: #1548 Approved by: cgwalters

It seems like 240 retries is just not long enough for all the non-destructive tests running in parallel to finish. Let's crank that up to 500 retries. Closes: #1548 Approved by: cgwalters

Let's only print if the commit isn't already partial; this addresses a spam of "marking commit partial" from fsck. Closes: #1548 Approved by: cgwalters

No need to poll every second, there's going to be some latency here and we want to avoid the overhead of polling. Closes: #1548 Approved by: cgwalters

Hopefully we'll fix this soon. Closes: #1548 Approved by: cgwalters

cgwalters · 2018-04-23T16:50:50Z

Going back to longer timeouts seems to be OK in some manual testing in the CI env. We'll just live with races temporarily...I'll investigate other approaches.

rh-atomic-bot · 2018-04-23T17:14:17Z

💔 Test failed - status-atomicjenkins

jlebon · 2018-04-23T17:23:37Z

Well, the PR test definitely passed, so 🍾 🎊.
Let's give it another try, it should ⚡.
@rh-atomic-bot retry

rh-atomic-bot · 2018-04-23T17:23:43Z

⚡ Test exempted: pull fully rebased and already tested.

It seems like 240 retries is just not long enough for all the non-destructive tests running in parallel to finish. Let's crank that up to 500 retries. Closes: #1548 Approved by: cgwalters

Let's only print if the commit isn't already partial; this addresses a spam of "marking commit partial" from fsck. Closes: #1548 Approved by: cgwalters

No need to poll every second, there's going to be some latency here and we want to avoid the overhead of polling. Closes: #1548 Approved by: cgwalters

Hopefully we'll fix this soon. Closes: #1548 Approved by: cgwalters

cgwalters mentioned this pull request Apr 19, 2018

tests/installed: bump reboot timeout to 180s #1545

Closed

jlebon added the homu/approved label Apr 19, 2018

cgwalters and others added 2 commits April 19, 2018 17:24

tests/installed: increase async retries to 500

9414bea

It seems like 240 retries is just not long enough for all the non-destructive tests running in parallel to finish. Let's crank that up to 500 retries.

cgwalters force-pushed the reboot-ansible branch from 12dac6a to 9414bea Compare April 19, 2018 21:24

cgwalters mentioned this pull request Apr 20, 2018

fsck: Only print "marking commit partial" once #1544

Closed

rh-atomic-bot pushed a commit that referenced this pull request Apr 20, 2018

fsck: Only print "marking commit partial" once

f581edd

Let's only print if the commit isn't already partial; this addresses a spam of "marking commit partial" from fsck. Closes: #1548 Approved by: cgwalters

rh-atomic-bot pushed a commit that referenced this pull request Apr 20, 2018

tests: Lower retry timeout to 5s

7b64826

No need to poll every second, there's going to be some latency here and we want to avoid the overhead of polling. Closes: #1548 Approved by: cgwalters

rh-atomic-bot pushed a commit that referenced this pull request Apr 20, 2018

tests: Disable itest-pull.sh since it is too slow

81705ce

Hopefully we'll fix this soon. Closes: #1548 Approved by: cgwalters

fixup! tests/installed: Make reboot task less racy

951052c

rh-atomic-bot pushed a commit that referenced this pull request Apr 23, 2018

fsck: Only print "marking commit partial" once

cbddefe

Let's only print if the commit isn't already partial; this addresses a spam of "marking commit partial" from fsck. Closes: #1548 Approved by: cgwalters

rh-atomic-bot pushed a commit that referenced this pull request Apr 23, 2018

tests: Lower retry timeout to 5s

d9001fe

No need to poll every second, there's going to be some latency here and we want to avoid the overhead of polling. Closes: #1548 Approved by: cgwalters

rh-atomic-bot pushed a commit that referenced this pull request Apr 23, 2018

tests: Disable itest-pull.sh since it is too slow

d168de5

Hopefully we'll fix this soon. Closes: #1548 Approved by: cgwalters

rh-atomic-bot closed this in e5f6c9d Apr 23, 2018

rh-atomic-bot pushed a commit that referenced this pull request Apr 23, 2018

fsck: Only print "marking commit partial" once

41b97e9

Let's only print if the commit isn't already partial; this addresses a spam of "marking commit partial" from fsck. Closes: #1548 Approved by: cgwalters

rh-atomic-bot pushed a commit that referenced this pull request Apr 23, 2018

tests: Lower retry timeout to 5s

76f3e60

No need to poll every second, there's going to be some latency here and we want to avoid the overhead of polling. Closes: #1548 Approved by: cgwalters

rh-atomic-bot pushed a commit that referenced this pull request Apr 23, 2018

tests: Disable itest-pull.sh since it is too slow

d428272

Hopefully we'll fix this soon. Closes: #1548 Approved by: cgwalters

FatmanUK mentioned this pull request Mar 8, 2024

Failed to set execute bit on remote files FatmanUK/k3s_playground#146

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests/installed: Make reboot task less racy #1548

tests/installed: Make reboot task less racy #1548

cgwalters commented Apr 19, 2018

jlebon commented Apr 19, 2018

rh-atomic-bot commented Apr 19, 2018

rh-atomic-bot commented Apr 19, 2018

jlebon commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

cgwalters commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

jlebon commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

jlebon commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

cgwalters commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

jlebon commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

cgwalters commented Apr 20, 2018 via email

rh-atomic-bot commented Apr 20, 2018

jlebon commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

cgwalters commented Apr 20, 2018

cgwalters commented Apr 23, 2018

rh-atomic-bot commented Apr 23, 2018

cgwalters commented Apr 23, 2018

rh-atomic-bot commented Apr 23, 2018

jlebon commented Apr 23, 2018

rh-atomic-bot commented Apr 23, 2018

tests/installed: Make reboot task less racy #1548

tests/installed: Make reboot task less racy #1548

Conversation

cgwalters commented Apr 19, 2018

jlebon commented Apr 19, 2018

rh-atomic-bot commented Apr 19, 2018

rh-atomic-bot commented Apr 19, 2018

jlebon commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

cgwalters commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

jlebon commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

jlebon commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

cgwalters commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

jlebon commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

cgwalters commented Apr 20, 2018 via email

rh-atomic-bot commented Apr 20, 2018

jlebon commented Apr 20, 2018

rh-atomic-bot commented Apr 20, 2018

cgwalters commented Apr 20, 2018

cgwalters commented Apr 23, 2018

rh-atomic-bot commented Apr 23, 2018

cgwalters commented Apr 23, 2018

rh-atomic-bot commented Apr 23, 2018

jlebon commented Apr 23, 2018

rh-atomic-bot commented Apr 23, 2018