-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout / Full Load problems with HAproxy 1.8 #588
Comments
|
I made a revert via |
|
Are you sure? "os-haproxy" is the plugin itself, not HAProxy 1.8 / 1.7 so if its in the plugin @fraenki would be able to figure it out. :) |
|
yes, os-haproxy 2.5 installs haproxy-devel, which seems to have this bug (it's haproxy 1.8.4 as far as I could see). When I try to revert haproxy-devel, this is not possible as there was no package haproxy-devel in 18.1.2, but there was haproxy-1.7. I just validated again: I made an upgrade to os-haproxy 2.5 via GUI, then I had to "Apply" settings again in HAproxy, then the problems are back, also with the high load and no response up to 30 seconds when having enough complex website reloads (as with Ctrl+F5). Also had some health timeouts Layer4 again. |
|
Oh you are right... FreeBSD still has haproxy and haproxy-devel split. :) |
|
Just for information: 18.1.5 does not solve the problem, I had to rewind again to 18.1.2 for os-haproxy package. Is there a way to freeze this package or do I have to rewind every time I update OPNsense? |
|
go to system: firmware: packages: find "os-haproxy" and click "lock" |
|
omg, so easy... :X I just did not look there... thank you! |
|
sure thing :) does upstream / FreeBSD know this is happening for 1.8 ? I don't believe this is OPNsense-specific... further amplified because FreeBSD keeps this in "devel" mode instead of shipping the latest release... |
|
1.8.5 is out now, not sure if it addresses your issue: https://www.mail-archive.com/haproxy@formilux.org/msg29401.html |
|
I am not sure if it is an OPNsense specific issue or not, I have to set up an additional HAProxy for testing this, did not have a chance to try this out, yet. Maybe 1.8.5 helps? I am not sure, there are some bugs stated with 100% CPU usage, but mostly with multithreading (multiprocess?) and as far as I could see, even 1.8.4 on OPNsense is using only 1 process, but I am not sure if it uses multiple threads? |
|
We are experiencing the same symptoms. Reverting to os-haproxy 2.4 as @addy90 suggested fixed the issue. Make sure to restart haproxy/press "Apply" in the GUI. |
|
Thanks all for the reports and sorry for the long period of silence! This is an upstream bug in HAProxy 1.8. There is no fix available yet. It does not occur with all configurations, so the only workaround is to stay on HAProxy 1.7 until a fix is available. That being said, I encourage everyone to help get this bug fixed by contributing debug information and example configurations (to help reproduce this bug). I'm aware of the following upstream threads regarding this (or similar) issues, feel free to contribute: https://discourse.haproxy.org/t/haproxy-1-8-4-at-100-cpu-right-after-startup/2218 Please read these threads thoroughly, they contain further details how to debug this issue. |
|
I can confirm that "opnsense-revert -r 18.1.2 os-haproxy" ran in Shell over SSH makes haproxy working as expected. |
|
Just upgraded to 18.1.6 and the issue is even stranger. Then i also downgraded the haproxy-devel package In this version of haproxy, the high cpu load is now showing even under heavy network load. |
|
HAProxy 1.8.7 does not have a fix for this issue, please do not upgrade OPNsense if you're affected by this bug until a fix becomes available. |
|
i'm also affected by this bug, as a work around i have manually replaced the binary from 1.7.10, and removed tune.lua.maxmem from haproxy.conf template (/usr/local/opnsense/service/templates/OPNsense/HAProxy/haproxy.conf) |
|
anyone have tested HAProxy > 1.8.7 ? |
|
HAProxy 1.8.12 is included in OPNsense 18.1.11. |
|
HAProxy 1.8.12 looks indeed very promising. The announcement (for 1.8.10) specifically mentions that a 100% CPU issue is fixed. I have yet to test this new release myself. |
|
Updates have been smooth so far as I heard no complaints due to recent HAProxy version bumps. Very good engineering on their part if true. Like it. :) |
|
I will test the new version during the next days and if it does what it claims, I will roll it out on our productive environment next week. Sorry for offtopic, but: I love you guys for helping and keeping us updated! :) Running five instances of OPNsense already in different environments, it more and more looks like this was the right decision! Thank you! |
|
Be sure to let us know how the testing goes. And no need to be sorry, thank you. ❤️ |
|
Just upgraded one box to OPNsense 18.1.11 and HAProxy 1.8.12 and the 100% CPU issue is gone. |
|
Me too, upgraded two instances to HAProxy 1.8.12 now and it seems to be working great so far :) |
I am having strange timeout problems with HAproxy since 1.8. (So since OPNsense 18.1.3, with 18.1.2 it worked!)
The network is using jumbo frames and validated TLS 1.2 sessions between HAproxy and Backend Server, except in one case.
With the previous HAproxy version 1.7, no problems were happening!
Now, when I call a website via Frontend, HAproxy sometimes hangs up with 100% CPU load for 30 seconds until the timeout breaks the connection and the client reconnects. This seems to "always" happen when I hit Ctrl+F5 for "full reload" of the website (because of the amount of data as it seems). Different backends (apache, nginx) = same problem. So does not depend on the backend server.
Sometimes I receive TLS Alerts within package capture, but I am not sure if these are the reason or someting else, maybe the large MTU or again something else. The thing is, when I call the backend website directly (with jumbo frames), everything works. It also works when I call via VPN, so MTU conversion is not the problem.
I also have Health check timeouts, sometimes Level4, Level6 or Level7 at random times...
Moreover, I sometimes get "Timeout during SSL handshake" errors from HAproxy when the 30 seconds timeouts are over.
There are no known package drops within the system, pings with don't fragment and the corresponding jumbo mtu work from both directions.
No idea what happened, but this is a big problem... but somehow, HAproxy hangs itself up with the parameters of my setup and I have not found any way to stop this behavior.
By the way: When I try to change the Timeout parameters globally (nothing manually set in servers), no changes in /usr/local/etc/haproxy.conf are made.
EDIT: It looks like HTTP backend without TLS have the same problem!
So I have a 30s timeout and 100% CPU load even with blank unencrypted backends / servers!
I once got a "TCPWindowFull" while HAproxy is at 100% some times. It really looks like a bug with jumbo frames or some other performance bottleneck in HAproxy with processing frames. Some times HAproxy also just RST the connection with a high window.
So while HAproxy is fully 100% loaded, it is not able to process the incoming packages which seems to result in broken connections or so... The tcp windows runs up until nearly 90.000 some times and HAproxy blocks then. Maybe some deeper problem?
The text was updated successfully, but these errors were encountered: