-
Notifications
You must be signed in to change notification settings - Fork 696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dhcrelay: can get stuck with 100% CPU usage in new implementation #7471
Comments
We are seeing the same issue. We need to restart the dhcrelay service once every 2 or 3 days to get DHCP Relay functionality working again. Even after the OPNsense 24.1.7_4-amd64 update. |
We are currently debugging the issue but the problem is elusive. It seems to hit an error condition in the BPF packet capture that the daemon can't recover from. We will publish updates as we encounter them. |
We are having the same issue since 24.1.6. Will the fix only be available in 24.7 or can we hope for a hotfix in any of the 24.1.x releases? |
A fix will be available quickly as it is found for all supported version. |
…als 0 and the length of the packet off the wire (bh_datalen) doesn't equal 0, we will loop forever in receive_packet() should fix opnsense/core#7471
…als 0 and the length of the packet off the wire (bh_datalen) doesn't equal 0, we will loop forever in receive_packet() should fix opnsense/core#7471
To test opnsense/dhcrelay#1, install using the command below and re-apply the config via the gui.
|
While debugging and writing this we found that FreeBSD has 3 fixes way back from 2005/2006 in the tree for this particular code derived from dhclient which all originates from common ISC code and perfectly fits the problem. freebsd/freebsd-src@4eae015 Here is a test package with the FreeBSD changes instead of the previous PR state by @AdSchellevis
All feedback on both binaries is welcome. |
Installed |
@TheHellSite woohoo! tentatively at least :) |
@fichtner We are also successfully running the patch for about 24h now without noticing any issues. DHCRelay is working fine again. I think this can be closed. |
@browne-net thanks we will ship in 24.1.8 tomorrow |
dhcrelay seems to be dropping BOOTREPLY messages if the source IP of the REPLY does not match the destination IP specified in the UI. Previously, I was able to specify the VIP address of my DHCP servers in the DHCP relay config. BOOTREPLY from a different source IP (e.g. physical NIC of the active server) would still be forwarded to the client. |
@mileyceberus feel free to open a new ticketl, but I don't quite understand what "VIP address of my DHCP servers" means. It just takes an address. It can be any address. |
when it's about source address, source nat is likely the place to look :) |
@fichtner, no problem. Happy to open a new ticket as required. I was referring to the virtual ip (VIP/CARP) of my dhcp servers. In the past, I could point dhcrelay to a VIP/CARP address. dhcrelay would simply pass the OFFER messages to the clients regardless of the source addresses (as these could change depending on which server is active). However, this behaviour seems to have changed. |
@AdSchellevis Thanks for the suggestion. I have made the change on my side and it seems to have resolved the issue. For the benefit of those who may be experiencing similar issues, this is what I did on my DHCP servers. iptables -t nat -A POSTROUTING -o <OUTBOUND_INTERFACE> -p udp --sport 67 -j SNAT --to <VIRTUAL_IP> |
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
Describe the bug
Since upgrade to 24.1.6 Opnsense goes into 100% CPU usage from one of the dhcp relay processes.
This happens randomly after a few hours or days of uptime.
To Reproduce
Steps to reproduce the behavior:
*In my case the new relay configuration was created automatically upon upgrading to 24.1.6 (destination list is called "Migrated IPv4 server entry")
Expected behavior
No heavy CPU usage should come from DHCP relay service.
Describe alternatives you considered
I tried disabling the DHCP relay on my management VLAN where my destinations DHCP servers reside.
I tried creating a CRON job to restart the dhcp relay service every hour but it's not working.
Now I have a CRON job rebooting the VM every morning.
Upgraded to 24.1.7 today (waiting for the issue to reappear)
Relevant log files
I don't know where to find logs for the new DHCP relay service.
Additional context
Apparently I'm not the only one facing the issue ;
https://forum.opnsense.org/index.php?topic=40126.0
https://forum.opnsense.org/index.php?topic=40284.0
Environment
Software version used and hardware type if relevant, e.g.:
OPNsense 24.1.6-amd64
My setup :
Edge sites (x2) :
Central site :
Site-to-site Wireguard VPN
No DHCP guarding whatsoever on Unifi side.
Opnsense VMs (router and helper) all have an interface in each VLAN.
Target DHCP servers on edge sites are both the local and the central Windows DHCP server.
This setup worked flawlessly for months (if not years) before 24.1.6.
The text was updated successfully, but these errors were encountered: