Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Off Network Timeout #289

Open
superteece opened this issue Feb 3, 2021 · 3 comments
Open

Off Network Timeout #289

superteece opened this issue Feb 3, 2021 · 3 comments

Comments

@superteece
Copy link

I have a build where Clevis is auto unlocking the root partition when Tang can be reached, as expected.

Additionally I've configured a Yubikey to unlock the root partition if the system is off network and cannot reach Tang. This also works in testing when I boot with the network cable disconnected and the Yubikey inserted.

However, it appears if the device is on a different network (where Tang is unreachable) and the Yubikey is inserted, the root partition never unlocks. As if Clevis is stuck in tryin to reach Tang.

What logs should I look into for possible causes?

@superteece
Copy link
Author

I've narrowed down more specific detail.

First, I've moved from RHEL 8.3 to 8.2 because of an issue with Dracut's network-manager and Clevis. The issue I still face is that if the computer boots on a network that is absent DHCP, Dracut/Clevis fails to give up and move on to the YubiKey unlock. The system will drop into an emergency Dracut shell.

If I exit Dracut, the boot will continue and use the Yubikey unlock.

Any idea on why Dracut/Clevis is behaving this way? The original provisioning of the system, including the binding of Clevis to Tang occurs on a network with DHCP. Operations happen on a different network that may or may not have DHCP.

@ksieluzycki
Copy link

Hello,

I just ran into similar same issue - I am not sure if this will be helpful in your case (especially after so long time) but I thought I'll share anyway.

When clevis tries to connect to Tang server it uses curl with default connection timeout - which is 300seconds. When I started checking this, I did not expect timeout that long - so after waiting two minutes or so, I thought that the problem is somewhere else and there's no timeout..

In your case - if the cable is disconnected, curl will exit immediately. But if it is a different network - it will wait till connection timeout.

I use a wifi connection (so when wpa_supplicant is started interface is up), so curl always waits for its timeout when wifi is not connected.

To change the default timeout, I just added --connect-timeout 5 in /usr/bin/clevis-decrypt-tang [line 104]:

if ! rep="$(curl --connect-timeout 5 -sfg -X POST -H "$ct" --data-binary @- "$url" <<< "$xfr")"; then

5 seconds is enough to establish wifi connection, with ethernet it should be much quicker.

@luckylinux
Copy link

luckylinux commented Apr 29, 2024

Somehow I am suffering from this old issue on Ubuntu 23.04 (IIRC) / 23.10 (Confirmed) / 24.04 (Confirmed).

It takes 300 seconds to achieve unlocking and a successful boot happens afterwards.

If I recall correctly there are also some "Bad file descriptor" or "write descriptor" messages during boot, but everything appears to work after that ...

Difficult to know what's exactly the issue, as after boot there are no logs to inspect that (dmesg doesn't show anything). Possibly the situation was made worse once I configured PXE Boot on the DHCP Server (but I'm pretty sure it also happened before). Or it could also be a VLAN issue due to multiple (tagged) VLANs on that interface ?

Would netconsole kernel module record such issues ?

I don't have a timeout setting anymore in my Version of /usr/bin/clevis-decrypt-tang.

EDIT 1: just noticed @ksieluzycki mentioning "Added", not "Editing" / "Changing". Alright, will try it now 👍.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants