New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crash: segfault with threading on arm since HA_ATOMIC_DWCAS #105
Labels
severity: medium
This issue is of MEDIUM severity.
status: fixed
This issue is a now-fixed bug.
subsystem: core
The issue is within the core of haproxy.
type: bug
This issue describes a bug.
Comments
lukastribus
added
type: bug
This issue describes a bug.
1.9
This issue affects the HAProxy 1.9 stable branch.
dev
This issue affects the HAProxy development branch.
severity: medium
This issue is of MEDIUM severity.
status: reviewed
This issue was reviewed. A fix is required.
subsystem: core
The issue is within the core of haproxy.
labels
May 23, 2019
On Thu, May 23, 2019 at 03:19:57PM -0700, Lukas Tribus wrote:
John Smith reported on discourse that haproxy 1.9 and dev crashes on Raspberry Pi's (CentOs 7):
https://discourse.haproxy.org/t/segmentation-fault-on-raspberry-pi/3850
Thank you Lukas, I'm investigating. I do have quite some ARM boards
here, I can try :-) I'm also shocked by the initial issue that happens
during the config parsing.
Willy
|
So I can reproduce it on my ARMv7 board. Looking at the disassembled code
it crashes in ha_cas_dw() when dereferencing "set" here :
: "=&r" (previous), "=&r" (tmp)
: "r" (*(uint64_t *)compare), "r" (*(uint64_t *)set), "r" (target)
: "memory", "cc");
This "set" argument is passed like this by fd_rm_from_fd_list() :
unlikely(!_HA_ATOMIC_DWCAS(((void **)(void *)&_GET_NEXT(fd, off)), ((void **)(void *)&cur_list), (*(void **)(void *)&next_list))))
It's *(void **)(void *)&next_list and is expected to be a pointer, whose
value is initialized to -2 at the beginning of the function, and which is
still -2 at this point :
(gdb) p *(void **)(void *)&next_list
$11 = (void *) 0xfffffffe
Thus obviously it crashes when dereferencing it again in __ha_cas_dw()
before making it arg %3 :
0x000c69f0 <+72>: ldr r1, [sp, #32]
0x000c69f2 <+74>: ldr r3, [r6, #0]
0x000c69f4 <+76>: ldrd r10, r11, [sp, #24]
=> 0x000c69f8 <+80>: ldrd r4, r5, [r1]
0x000c69fc <+84>: add r3, r8
0x000c69fe <+86>: strd r4, r5, [sp, #8]
0x000c6a02 <+90>: ldrexd r0, r1, [r3]
0x000c6a06 <+94>: cmp r0, r10
0x000c6a08 <+96>: ittt eq
0x000c6a0a <+98>: cmpeq r1, r11
0x000c6a0c <+100>: strexdeq r9, r4, r5, [r3]
0x000c6a10 <+104>: cmpeq.w r9, #1
0x000c6a14 <+108>: beq.n 0xc6a02 <fd_rm_from_fd_list+90>
And indeed, r1 is still -2 :
(gdb) p /x $r1
$13 = 0xfffffffe
But the code is constructed the same way for other platforms, and I
can't find why they don't fail as well! So I must be overlooking
something, otherwise x86_64 doesn't work, which doesn't match what
I'm seeing.
CCing Olivier who authored this loop in case he has an idea.
|
haproxy-mirror
pushed a commit
that referenced
this issue
May 27, 2019
…forms On armv7 haproxy doesn't work because of the fixes on the double-word CAS. There are two issues. The first one is that the last argument in case of dwcas is a pointer to the set of value and not a value ; the second is that it's not enough to cast the data as (void*) since it will be a single word. Let's fix this by using the pointers as an array of long. This was tested on i386, armv7, x86_64 and aarch64 and it is now fine. An alternate approach using a struct was attempted as well but it used to produce less optimal code. This fix must be backported to 1.9. This fixes github issue #105. Cc: Olivier Houchard <ohouchard@haproxy.com>
TimWolla
added
status: fixed
This issue is a now-fixed bug.
and removed
dev
This issue affects the HAProxy development branch.
status: reviewed
This issue was reviewed. A fix is required.
labels
May 27, 2019
OK, now merged into dev and backported. Tested on my MIQI boards which used to fail and they're fine now. Closing the issue, thanks! |
FireBurn
pushed a commit
to FireBurn/haproxy
that referenced
this issue
Jan 29, 2020
…forms On armv7 haproxy doesn't work because of the fixes on the double-word CAS. There are two issues. The first one is that the last argument in case of dwcas is a pointer to the set of value and not a value ; the second is that it's not enough to cast the data as (void*) since it will be a single word. Let's fix this by using the pointers as an array of long. This was tested on i386, armv7, x86_64 and aarch64 and it is now fine. An alternate approach using a struct was attempted as well but it used to produce less optimal code. This fix must be backported to 1.9. This fixes github issue haproxy#105. Cc: Olivier Houchard <ohouchard@haproxy.com> (cherry picked from commit c3b5958) [wt: adjust context, s/_HA/HA/] Signed-off-by: Willy Tarreau <w@1wt.eu>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
severity: medium
This issue is of MEDIUM severity.
status: fixed
This issue is a now-fixed bug.
subsystem: core
The issue is within the core of haproxy.
type: bug
This issue describes a bug.
John Smith reported on discourse that haproxy 1.9 and dev crashes on Raspberry Pi's (CentOs 7):
https://discourse.haproxy.org/t/segmentation-fault-on-raspberry-pi/3850
kernel, gcc, haproxy version and
/proc/cpuinfo
GDB backtrace
What's the configuration?
Steps to reproduce the behavior
Start haproxy with threading enabled on a Raspberry Pi, it crashes either immediately or after a few seconds.
Workaround
Disable threading with
nbthread 1
.Do you have any idea what may have caused this?
This is a regression caused by commit 6a38b32 (
BUILD: threads: fix again the __ha_cas_dw() definition
) and also affects1.9.8
@wtarreauNote there is also a different threading issue reported on the ML in 1.9.8 (though probably unrelated):
https://www.mail-archive.com/haproxy@formilux.org/msg33854.html
The text was updated successfully, but these errors were encountered: