-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Swarm tests debug #38080
Conversation
Since commit 17173ef checkSwarmLockedToUnlocked() no longer require its third argument, so remove it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Using MNT_FORCE flag does not make sense for nsfs. Using MNT_DETACH though might help. 2. When -check.vv is added to TESTFLAGS, there are a lot of messages like this one: > unmount of /tmp/dxr/d847fd103a4ba/netns failed: invalid argument and some like > unmount of /tmp/dxr/dd245af642d94/netns failed: no such file or directory The first one means directory is not a mount point, the second one means it's gone. Do ignore both of these. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
A timer is leaking on every daemon start and stop. Probably nothing major, but given the amount of daemon starts/stops during tests, it's better to be accurate about it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
98e0388
to
eadb71b
Compare
OK the failure in
happens (occasionally) because d1 (which is a leader) losts both d2 and d3, and thus it loses quorum and steps itself down from the leader to a follower:
what exactly causes that is not yet clear to me... |
2310221
to
1b9ae46
Compare
Found a possible cause. Testing a fix (which makes sense for other tests, too, and will be applied if proved usable). |
8dd6f99
to
1547db6
Compare
Out of 5 runs, got a TestAPISwarmLeaderElection failure on z (I haven't patched this test yet). Guess it makes sense to patch all the swarm tests. |
72f830b
to
7c3bf5b
Compare
Patched all the tests, reset the counter, running repeated CI again. Runs 1-5, no failures. Run 6, failure on experimental in
|
706c6ad
to
6f9b564
Compare
Runs 7-9: no failures. Run 10/60 [▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Failure in
|
This is repeated 6 times in different tests, with slight minor variations. Let's factor it out, for clarity. While at it, simplify the code: instead of more complex parsing of "docker swarm init|update --autolock" output (1) and checking if the key is also present in "docker swarm unlock-key" output (2), get the key from (2) and check it is present in (1). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
.. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
3f55b07
to
8faac3e
Compare
Run 18/60 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Failure on power, 00:11:55.420 |
8faac3e
to
46f123a
Compare
Run 19/60 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Two failures, one on power in 00:19:42.491 and a new failure on z, 00:35:47.832 |
46f123a
to
af4ec3e
Compare
...... Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Some previous findings on |
4a5b90b
to
e9d385d
Compare
Run 20 -- no failures Run 21/60 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Failure on power in 00:33:55.218 |
b168e90
to
5643aed
Compare
When starting docker daemons for swarm testing, we disable iptables and use lo for communication (in order to avoid network conflicts). The problem is, these options are lost on restart, that can lead to any sorts of network conflicts and thus connectivity issues between swarm nodes. Fix this. 29/60 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Failures so far: * TestSwarmLockUnlockCluster ▓▓ * TestAPISwarmLeaderElection ▓▓▓▓ * TestSwarmClusterRotateUnlockKey ▓▓▓ * TestSwarmPublishAdd ▓ Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
5643aed
to
ede2d11
Compare
@kolyshkin giving the "status" and the time without activity on this PR, I'm gonna close it 👼 |
Right, thanks. The good part of it is already merged in #38127, the rest is just [futile] attempts to figure out what's going on with flaky swarm tests. |
Perhaps we could use |
Just playing with TestSwarmClusterRotateUnlockKey a bit