Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola/tests/tang: Ignore interfaces with no IPs #509

Merged
merged 4 commits into from
Mar 15, 2024
Merged

Conversation

jepio
Copy link
Member

@jepio jepio commented Mar 15, 2024

We use Equinix Metal machines in our CI that have bonded network interfaces, which means we end up with nics with no ips. The tang setup code needs to handle this. We will also want to rework the code to provide tang in the same way as we do for etcd.

@jepio jepio requested a review from a team March 15, 2024 08:43
We use Equinix Metal machines in our CI that have bonded network interfaces,
which means we end up with nics with no ips. The tang setup code needs to
handle this. We will also want to rework the code to provide tang in the same
way as we do for etcd.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
log.Fatalf executes os.Exit. We don't want the whole kola process to exit if
the port is busy, only the test case should fail.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
To prevent conflicts between test cases and jobs.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
These don't work in nspawn/docker and spam the console with "not permitted"
errors.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
@pothos
Copy link
Member

pothos commented Mar 15, 2024

I think we can also remove the systemd.mask=systemd-cryptsetup@rootencrypted.service parameters.

The run on first boot seems to fail due to a race but the system boots ok, and the second boot has no error messages:

Mar 15 11:34:08 localhost systemd[1]: Starting systemd-cryptsetup@rootencrypted.service - Cryptography Setup for rootencrypted...
Mar 15 11:34:15 localhost systemd-cryptsetup[1675]: Set cipher aes, mode xts-plain64, key size 512 bits for device /dev/disk/by-uuid/12345678-9abc-def0-1234-56789abcdef0.
Mar 15 11:34:17 localhost systemd[1]: systemd-cryptsetup@rootencrypted.service: Main process exited, code=killed, status=9/KILL
Mar 15 11:34:17 localhost systemd[1]: systemd-cryptsetup@rootencrypted.service: Failed with result 'signal'.
Mar 15 11:34:17 localhost systemd[1]: Failed to start systemd-cryptsetup@rootencrypted.service - Cryptography Setup for rootencrypted.
Mar 15 11:34:18 localhost systemd[1]: Starting systemd-cryptsetup@rootencrypted.service - Cryptography Setup for rootencrypted...
Mar 15 11:34:18 localhost systemd-cryptsetup[2499]: Volume rootencrypted already active.
Mar 15 11:34:18 localhost systemd[1]: Finished systemd-cryptsetup@rootencrypted.service - Cryptography Setup for rootencrypted.
Mar 15 11:35:11 localhost systemd[1]: Stopping systemd-cryptsetup@rootencrypted.service - Cryptography Setup for rootencrypted...
Mar 15 11:35:11 localhost systemd-cryptsetup[3354]: Device rootencrypted is still in use.
Mar 15 11:35:11 localhost systemd-cryptsetup[3354]: Failed to deactivate 'rootencrypted': Device or resource busy
Mar 15 11:35:11 localhost systemd[1]: systemd-cryptsetup@rootencrypted.service: Control process exited, code=exited, status=1/FAILURE
Mar 15 11:35:11 localhost systemd[1]: systemd-cryptsetup@rootencrypted.service: Failed with result 'exit-code'.
Mar 15 11:35:11 localhost systemd[1]: Stopped systemd-cryptsetup@rootencrypted.service - Cryptography Setup for rootencrypted.
-- Boot 30d233489ce04d65802ce0704942d135 --
Mar 15 11:35:16 localhost systemd[1]: Starting systemd-cryptsetup@rootencrypted.service - Cryptography Setup for rootencrypted...
Mar 15 11:35:17 localhost systemd-cryptsetup[557]: Set cipher aes, mode xts-plain64, key size 512 bits for device /dev/disk/by-uuid/12345678-9abc-def0-1234-56789abcdef0.
Mar 15 11:35:18 localhost systemd[1]: Finished systemd-cryptsetup@rootencrypted.service - Cryptography Setup for rootencrypted.

@jepio jepio merged commit ac5c4b5 into flatcar-master Mar 15, 2024
2 checks passed
@jepio jepio deleted the fix-tang-test branch March 15, 2024 13:45
@pothos
Copy link
Member

pothos commented Mar 15, 2024

For arm64 I prepared a PR: flatcar/scripts#1755

@simoncampion
Copy link

Thanks for fixing this, @jepio !

Regarding the failing service: I should've run the tests on arm64 locally, then I would have caught the issue before the merge. Sorry about that, I completely forgot about arm64.

I think the issue is that we need the rd.luks.name=... parameter on the second boot but not on the first boot. We need it on the second (and all subsequent) boots because it triggers disk unlocking in the initrd for the root disk. But, as you say, it looks like it creates a race condition on the first boot, with the systemd-cryptsetup unlocking (sometimes?) failing. There is no reason why systemd-cryptsetup should be running during first boot, since Ignition takes care of opening the newly created LUKS device.

We could fix this issue by changing the Ignition config in the tests so that the rd.luks.name=... parameter is passed only on the second boot. I'm not sure what's the best way to achieve this would be---maybe adding a systemd unit that modifies the grub.cfg. Alternatively, we can remove the rd.luks.name=... parameters and implement a generator in bootengine that doesn't do anything on first boot, but on subsequent boot runs systemd-cryptsetup-generator to generate a systemd-cryptsetup unit for unlocking the root disk. We had discussed this as a follow-up improvement, but it might be more urgent now than initially thought. The generator solution would probably be a lot cleaner and also has other advantages. I'll try to implement and test it soon (see this draft PR).

Regarding the masking: Yep, the masking might be unnecessary now. I am relatively sure it doesn't make the tests fail at the moment though, so I'd suggest we get the arm64 tests to pass first and then I'll check whether tests still pass without the masking and open a PR if they do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants