New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HAProxy 2.6.14 - former worker SIGABRT after reload #2222
Comments
You can just use the tarball of the 2.6 snapshot I linked to in the forum, so you don't have to apply any patches or do anything in git (just download the tarball and recompile):
|
Here it seems there is an ABBA problem. Looking at your GDB trace, the Thread 3 is waiting for the listener's lock while it already have the proxy's lock, in However, the pattern seems to be inverted in the Thread 5. Thus to sum up, the Thread 5 owns the listener's lock and is waiting for the proxy's lock. The Thread 3 owns the proxy's lock and is waiting for the listener's lock. Several threads are blocked too, waiting for the listener's lock. I must investigate a bit to figure out how to fix the bug. It was introduced with the commit bcad7e6. But it is part of a long series, I must review the changes first. |
Here is a patch: @chipflake, have you any way to test it ? |
@capflam thank you. I can test the patch from next week. We will need to wait at least a couple of weeks to confirm that the issue has been solved though. (since so far we've seen 3 crashes in ~20 days) |
In any case I think the patch is correct and fixes a real issue, so it would be fine to merge it (which could possibly make your tests even easier), and if for any reason it was still there or another one surfaced, we'd know that this one is addressed ad least and it would simplify the analysis. |
I agree, I will merge it |
…essary Listener functions must follow a common locking pattern: 1. Get the proxy's lock if necessary 2. Get the protocol's lock if necessary 3. Get the listener's lock if necessary We must take care to respect this order to avoid any ABBA issue. However, an issue was introduced in the commit bcad7e6 ("MINOR: listener: add relax_listener() function"). relax_listener() gets the lisener's lock and if resume_listener() is called, the proxy's lock is then acquired. So to fix the issue, the proxy's lock is first acquired in relax_listener(), if necessary. This patch should fix the issue #2222. It must be backported as far as 2.4 because the above commit is marked to be backported there.
I marked the issue as fixed for now waiting for your feedback. I'll try to backport to ease your tests. Thanks ! |
…essary Listener functions must follow a common locking pattern: 1. Get the proxy's lock if necessary 2. Get the protocol's lock if necessary 3. Get the listener's lock if necessary We must take care to respect this order to avoid any ABBA issue. However, an issue was introduced in the commit bcad7e6 ("MINOR: listener: add relax_listener() function"). relax_listener() gets the lisener's lock and if resume_listener() is called, the proxy's lock is then acquired. So to fix the issue, the proxy's lock is first acquired in relax_listener(), if necessary. This patch should fix the issue haproxy#2222. It must be backported as far as 2.4 because the above commit is marked to be backported there. (cherry picked from commit ff1c803) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 6844af6) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
The fix was backported to all versions. I'm closing the issue but of course, reopen it if the issue is still there. Thanks ! |
…essary Listener functions must follow a common locking pattern: 1. Get the proxy's lock if necessary 2. Get the protocol's lock if necessary 3. Get the listener's lock if necessary We must take care to respect this order to avoid any ABBA issue. However, an issue was introduced in the commit bcad7e6 ("MINOR: listener: add relax_listener() function"). relax_listener() gets the lisener's lock and if resume_listener() is called, the proxy's lock is then acquired. So to fix the issue, the proxy's lock is first acquired in relax_listener(), if necessary. This patch should fix the issue haproxy#2222. It must be backported as far as 2.4 because the above commit is marked to be backported there. (cherry picked from commit ff1c803) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 6844af6) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 6859790) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
…essary Listener functions must follow a common locking pattern: 1. Get the proxy's lock if necessary 2. Get the protocol's lock if necessary 3. Get the listener's lock if necessary We must take care to respect this order to avoid any ABBA issue. However, an issue was introduced in the commit bcad7e6 ("MINOR: listener: add relax_listener() function"). relax_listener() gets the lisener's lock and if resume_listener() is called, the proxy's lock is then acquired. So to fix the issue, the proxy's lock is first acquired in relax_listener(), if necessary. This patch should fix the issue haproxy#2222. It must be backported as far as 2.4 because the above commit is marked to be backported there. (cherry picked from commit ff1c803) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 6844af6) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 6859790) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit fbc1119) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
…essary Listener functions must follow a common locking pattern: 1. Get the proxy's lock if necessary 2. Get the protocol's lock if necessary 3. Get the listener's lock if necessary We must take care to respect this order to avoid any ABBA issue. However, an issue was introduced in the commit bcad7e6 ("MINOR: listener: add relax_listener() function"). relax_listener() gets the lisener's lock and if resume_listener() is called, the proxy's lock is then acquired. So to fix the issue, the proxy's lock is first acquired in relax_listener(), if necessary. This patch should fix the issue haproxy#2222. It must be backported as far as 2.4 because the above commit is marked to be backported there. (cherry picked from commit ff1c803) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Detailed Description of the Problem
Originally asked in the forum.
We run HAProxy in master-worker mode. On reloads, sometimes the former worker crashes with
SIGABRT
instead of exiting cleanly.This started after we updated from 2.4.22 to 2.6.14. In the past 20 days we have seen this 3 times, each time on a different host.
Backtrace from the latest crash is here: gdb.txt (attachment because of length)
If needed, I can provide the backtrace and logs from the two crashes prior to this one.
Expected Behavior
The former worker to exit cleanly.
Steps to Reproduce the Behavior
Unfortunately we don't know, other than it happens rarely on reloads.
Do you have any idea what may have caused this?
In the backtrace, some threads with
stuck=1
seem to be waiting on the listener and the proxy locks. There have been some changes to listener functions, like:fed93d3 BUG/MEDIUM: listener: read-lock the listener during accept()
9bfa34f MINOR: listener: add relax_listener() function
It might also be related to the concurrency bug fixes introduced in 2.6.10: https://www.mail-archive.com/haproxy@formilux.org/msg43304.html
I also see these bug fixes for 2.6, after the 2.6.14 release:
however I'm not sure they apply here.
Do you have an idea how to solve the issue?
No response
What is your configuration?
Output of
haproxy -vv
Last Outputs and Backtraces
Additional Information
No response
The text was updated successfully, but these errors were encountered: