Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zero seconds timeout after sending DHCP REQUEST #1154

Closed
harridu opened this issue Jul 20, 2022 · 6 comments
Closed

zero seconds timeout after sending DHCP REQUEST #1154

harridu opened this issue Jul 20, 2022 · 6 comments
Labels
Milestone

Comments

@harridu
Copy link

harridu commented Jul 20, 2022

System (please complete the following information):

  • OS: Debian 11 amd64
  • Kernel version (if applicable): 5.10.120
  • strongSwan version(s): 5.9.6
  • Tested/confirmed with the latest version: yes

Describe the bug
Sometimes the dhcp plugin runs into a timeout without waiting, as it seems. Sample from my charon.log:

Jul 19 16:21:37 13[IKE1] <IPSec-IKEv2|28124> peer requested virtual IP %any
Jul 19 16:21:37 13[CFG1] <IPSec-IKEv2|28124> sending DHCP DISCOVER for 7a:a7:67:4c:aa:bb to 10.0.122.9
Jul 19 16:21:38 13[CFG1] <IPSec-IKEv2|28124> sending DHCP DISCOVER for 7a:a7:67:4c:aa:bb to 10.0.122.9
Jul 19 16:21:40 13[CFG1] <IPSec-IKEv2|28124> sending DHCP DISCOVER for 7a:a7:67:4c:aa:bb to 10.0.122.9
Jul 19 16:21:40 13[CFG1] <IPSec-IKEv2|28124> sending DHCP DISCOVER for 7a:a7:67:4c:aa:bb to 10.0.122.9
Jul 19 16:21:44 13[CFG1] <IPSec-IKEv2|28124> received DHCP OFFER 10.0.122.36 from 10.0.122.9
Jul 19 16:21:44 13[CFG1] <IPSec-IKEv2|28124> sending DHCP REQUEST for 10.0.122.36 to 10.0.122.9
Jul 19 16:21:44 13[CFG1] <IPSec-IKEv2|28124> sending DHCP REQUEST for 10.0.122.36 to 10.0.122.9
Jul 19 16:21:44 13[CFG1] <IPSec-IKEv2|28124> sending DHCP REQUEST for 10.0.122.36 to 10.0.122.9
Jul 19 16:21:44 13[CFG1] <IPSec-IKEv2|28124> sending DHCP REQUEST for 10.0.122.36 to 10.0.122.9
Jul 19 16:21:44 13[CFG1] <IPSec-IKEv2|28124> sending DHCP REQUEST for 10.0.122.36 to 10.0.122.9
Jul 19 16:21:44 13[CFG1] <IPSec-IKEv2|28124> DHCP REQUEST timed out

Look at the clock entries. Apparently the dhcp plugin did not wait for the DHCP_REQUEST to succeed. Maybe dnsmasq could be granted a few seconds to process the request?

Additional context
dns and dhcp are provided by dnsmasq 2.85 running on localhost.

dhcp.conf:

dhcp {
    force_client_port = yes
    force_server_address = yes
    identity_lease = yes
    load = yes
    server = 10.0.122.9
}
@harridu harridu changed the title maybe dhcp plugin could be a little bit more patient? zero seconds timeout after sending DHCP REQUEST Jul 20, 2022
@harridu
Copy link
Author

harridu commented Jul 20, 2022

If I got the code correctly, the whole DHCP negotiation has a common timeout of 5 seconds. Thats not enough for dnsmasq. It runs a pretty time-consuming neighbor detection to make sure the IP address it offers is not in use yet. Maybe the DHCP REQUEST could get its own timeout?

@tobiasbrunner
Copy link
Member

If I got the code correctly, the whole DHCP negotiation has a common timeout of 5 seconds.

As far as I understand the code, the intended retransmission scheme for every request is 5 tries with a delay of 1s * try after each, that is, the client should wait a total of 15 seconds for each response (which is quite a long time, considering that the half-open SA timeout is only 30 seconds, by default).

However, the current implementation is definitely flawed. After each sent message the code simply blocks on a condvar with a timeout of 1s * try. The problem is that the condvar is not only signaled for the response we're waiting for (or timeouts), but for every message received on the DHCP client socket (those might be responses to previous retransmits or messages completely unrelated to the current transaction, e.g. for other clients). For unrelated wakeups, the code just increases the counter, sends a retransmit and waits again on the condvar. That's why you see those REQUESTs getting sent at the same time. There must be some unrelated inbound messages that signal the condvar concurrently (there could even be spurious wakeups).

I've pushed a possible fix for this to the 1154-dhcp-retransmits branch. Please let me know if that works for you.

@tobiasbrunner tobiasbrunner removed the new label Jul 20, 2022
@harridu
Copy link
Author

harridu commented Jul 21, 2022

Sure, stay tuned.

FTR, increasing DHCP_TRIES to 10 seems to work. Since the reboot last night (including this change) DHCP did not run into a timeout anymore. Before I had about 90 to 100 incidents per day.

@harridu
Copy link
Author

harridu commented Jul 21, 2022

Your new patch looks much better. I have the impression that the whole DHCP negotiation is much faster now, but of course load is still low after the restart. This is what I see in charon.log:

Jul 21 08:15:31 23[IKE1] <IPSec-IKEv2|14> peer requested virtual IP %any
Jul 21 08:15:31 23[CFG1] <IPSec-IKEv2|14> sending DHCP DISCOVER for 7a:a7:84:9c:aa:bb to 10.0.122.9
Jul 21 08:15:32 23[CFG1] <IPSec-IKEv2|14> sending DHCP DISCOVER for 7a:a7:84:9c:aa:bb to 10.0.122.9
Jul 21 08:15:34 23[CFG1] <IPSec-IKEv2|14> sending DHCP DISCOVER for 7a:a7:84:9c:aa:bb to 10.0.122.9
Jul 21 08:15:34 23[CFG1] <IPSec-IKEv2|14> received DHCP OFFER 10.0.122.245 from 10.0.122.9
Jul 21 08:15:34 23[CFG1] <IPSec-IKEv2|14> sending DHCP REQUEST for 10.0.122.245 to 10.0.122.9
Jul 21 08:15:34 23[CFG1] <IPSec-IKEv2|14> received DHCP ACK for 10.0.122.245

I will keep it running. Tomorrow I can say if DHCP ran into some timeouts. Stay tuned.

@tobiasbrunner
Copy link
Member

FTR, increasing DHCP_TRIES to 10 seems to work.

I guess it increases the chances that the original code actually waits long enough to receive a response, but it definitely doesn't fix the issue and might only cause additional messages to get sent to the DHCP server. And with the new code it would increase the maximum total timeout to 55 seconds, which is too long in relation to the default half-open SA timeout.

I have the impression that the whole DHCP negotiation is much faster now

Maybe because there are now less requests sent.

I will keep it running. Tomorrow I can say if DHCP ran into some timeouts. Stay tuned.

OK, if you hurry, we might be able to squeeze this fix into the upcoming release.

@tobiasbrunner tobiasbrunner added this to the 5.9.7 milestone Jul 21, 2022
@harridu
Copy link
Author

harridu commented Jul 21, 2022

Of course DHCP_TRIES is back to the default for the patched version I am running now. It will be enrolled on our major VPN gateway this night.

Its great that this change made it into 5.9.7. Thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants