Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

haproxy 2.0.0 snapshot 3f0b1d-9 quits (crashes?) on SIGBUS (signal 10) #137

Closed
Yuubari opened this issue Jun 24, 2019 · 15 comments
Closed
Labels
severity: major This issue is of MAJOR severity. status: fixed This issue is a now-fixed bug. subsystem: h1 The issue is within the HTTP/1 subsystem. type: bug This issue describes a bug.

Comments

@Yuubari
Copy link

Yuubari commented Jun 24, 2019

Output of haproxy -vv and uname -a

HA-Proxy version 2.0.0-3f0b1d-9 2019/06/19 - https://haproxy.org/
Build options :
  TARGET  = freebsd
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-
label -Wno-sign-compare -Wno-unused-parameter -Wno-ignored-qualifiers -Wno-missing-field-initializers -Wno-implicit-fall
through -Wtype-limits -Wshift-negative-value -Wnull-dereference
  OPTIONS = USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1

Feature list : -EPOLL +KQUEUE -MY_EPOLL -MY_SPLICE -NETFILTER +PCRE -PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +TH
READ -PTHREAD_PSHARED -REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY -LINUX_TPROXY -LINUX_SPLICE +LIBCRYPT -CRYPT_H -VSYSCA
LL -GETADDRINFO +OPENSSL -LUA -FUTEX -ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY -TFO -NS -DL -RT -DEVICEATLAS -51DEGR
EES -WURFL -SYSTEMD -OBSOLETE_LINKER -PRCTL -THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=8).
Built with OpenSSL version : OpenSSL 1.1.1a-freebsd  20 Nov 2018
Running on OpenSSL version : OpenSSL 1.1.1a-freebsd  20 Nov 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 8.43 2019-02-23
Running on PCRE version : 8.43 2019-02-23
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes

Available polling systems :
     kqueue : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTTP       side=FE        mux=H2
              h2 : mode=HTX        side=FE|BE     mux=H2
       <default> : mode=HTX        side=FE|BE     mux=H1
       <default> : mode=TCP|HTTP   side=FE|BE     mux=PASS

Available services : none

Available filters :
        [SPOE] spoe
        [COMP] compression
        [CACHE] cache
        [TRACE] trace

What's the configuration?

global
        master-worker
        maxconn         2048

        user            nobody
        group           nogroup

        ca-base         /etc/ssl/cert.pem
        crt-base        /usr/local/etc/letsencrypt/live
        ssl-default-bind-options        no-sslv3 no-tlsv10
        tune.ssl.default-dh-param       2048

        log                     /var/run/log local7

defaults
        timeout         client 30s
        timeout         server 30s
        timeout         connect 5s
        mode            http
        option          http-use-htx

        log                     global
        #option         httplog

frontend tls_termination
        bind            ipv4:80
        bind            ipv4:443       ssl crt <...cut...> alpn h2,http/1.1
        bind            [ipv6]:80
        bind            [ipv6]:443     ssl crt <...cut...> alpn h2,http/1.1

        http-request    set-var(txn.hostname)   req.hdr(host)
        http-request    set-var(txn.hsts)               ssl_fc

        acl                     blacklisted_ua          hdr_sub(user-agent) -i -f /usr/local/etc/blacklist.agent
        http-request deny if blacklisted_ua
        acl                     blacklisted_ref         hdr_sub(referer) -i -f /usr/local/etc/blacklist.referer
        http-request deny if blacklisted_ref

        acl                     acme_challenge          path_beg /.well-known/acme-challenge/

        acl                     notls_oxij                      hdr_dom(host) -i domain.tld
        acl                     notls_stor              hdr_dom(host) -i storage.domain.tld
        acl                     notls_kcvapi            hdr_dom(host) -i kcvapi.domain.tld
        redirect        scheme  https if !{ ssl_fc } !notls_stor !notls_kcvapi !acme_challenge

        acl                     secure  dst_port eq 443
        http-response   replace-value   Set-Cookie (.*) \1;\ Secure if secure

        http-response   del-header      Server
        http-response   del-header      X-Powered-By

        http-response   set-header      X-XSS-Protection        "1;mode=block"
        http-response   set-header      Referrer-Policy         "strict-origin-when-cross-origin"
        http-response   set-header      Feature-Policy          "camera 'self'; microphone 'self'"

        use_backend     certbot if acme_challenge !{ ssl_fc }

        acl             azurlane                hdr_dom(host) -i azurlane.domain.tld
        acl             domainism               hdr(host) -i domainism.tld

        acl             al_imgstatic    path_beg -i /w/images/

        use_backend     varnish if azurlane !al_imgstatic
        use_backend     varnish if domainism

        use_backend     varnish if azurlane al_imgstatic

        default_backend be_primary

backend be_primary
        server          s_primary 10.0.0.2:8080 check send-proxy

        server          s_p_maint 10.0.0.1:8089 backup send-proxy maxconn 128 weight 0

        acl                     hsts                            var(txn.hsts) -m bool
        acl                     notls_stor              var(txn.hostname) -i -m beg storage.domain.tld

        http-response   set-header      Strict-Transport-Security "max-age=15552000; includeSubDomains" if hsts !notls_stor

backend certbot
        server          s_certbot 127.0.0.1:8081

backend varnish
        fullconn        2048
        server          s_varnish 10.0.0.10:80 check send-proxy maxconn 2048
        server          s_p_backup 10.0.0.2:8080 backup send-proxy maxconn 128 weight 250

        server          s_p_maint 10.0.0.1:8089 backup send-proxy maxconn 128 weight 0

## ACLs for virtual hosts and SSL

        acl                     hsts                            var(txn.hsts) -m bool

        acl                     notls_stor              var(txn.hostname) -i -m beg storage.domain.tld
        acl                     azurlane        var(txn.hostname) -i -m beg azurlane.domain.tld

## Set content security policy for virtual hosts

        http-response   set-header      Content-Security-Policy "frame-src self; script-src 'self' 'unsafe-inline' 'unsafe-eval' *.domain.tld cdnjs.cloudflare.com" if azurlane

## Enable HSTS for everything except storage

        http-response   set-header      Strict-Transport-Security "max-age=15552000; includeSubDomains" if hsts !notls_stor
## Frame options
        http-response   set-header      X-Frame-Options         "SAMEORIGIN"

backend static
        fullconn        2048
        server          s_static 10.0.0.2:8081 check send-proxy maxconn 2048
        server          s_s_backup 10.0.0.2:8080 backup send-proxy maxconn 128 weight 200

        server          s_p_maint 10.0.0.1:8089 backup send-proxy maxconn 128 weight 0

        acl                     hsts                            var(txn.hsts) -m bool
        acl                     notls_stor              var(txn.hostname) -i -m beg storage.domain.tld

        http-response   set-header      Strict-Transport-Security "max-age=15552000; includeSubDomains" if hsts !notls_stor

listen stats
        bind            10.0.0.1:81
        mode            http
        stats           enable
        stats           uri /
        stats           realm "HAProxy Statistics"
        stats           hide-version

Steps to reproduce the behavior

  1. Run haproxy for a day or two on a medium load web server

Actual behavior

haproxy exits (crashes?) without any action on the administrator's part:

Jun 24 06:59:23 lincle kernel: pid 69059 (haproxy), uid 65534: exited on signal 10

Please note that this is FreeBSD and signal 10 is SIGBUS bus error, not SIGUSR1 as on Linux.

Expected behavior

haproxy does not exit.

Do you have any idea what may have caused this?

Started after the upgrade to 2.0.0-3f0b1d-9. Did not run the 2.0.0 stable release for long enough to see if the bug manifests, as something in the stable release prevented the most visited website from working in Chrome (likely due to #127).

Prior to the upgrade haproxy 1.9.x was running for months without issues.

Do you have an idea how to solve the issue?

No ideas yet.

@Yuubari Yuubari added status: needs-triage This issue needs to be triaged. type: bug This issue describes a bug. labels Jun 24, 2019
@capflam
Copy link
Member

capflam commented Jun 24, 2019

@Yuubari, 2.0.0-dev9 does not exist. Do you mean dev7 ?
BTW, the issue #127 is fixed now. So you can upgrade haproxy using daily snapshots. Otherwise, we will try to do the release 2.0.1 this week.

@Yuubari
Copy link
Author

Yuubari commented Jun 24, 2019

I mean the daily snapshot; when built outside a git repository it tags itself as dev9.

@lukastribus
Copy link
Member

You meant 2.0.0-3f0b1d-9 from your haproxy -vv output? That's not dev9.

@Yuubari Yuubari changed the title haproxy 2.0.0-dev9 quits (crashes?) on SIGBUS (signal 10) haproxy 2.0.0 snapshot 3f0b1d-9 quits (crashes?) on SIGBUS (signal 10) Jun 24, 2019
@capflam
Copy link
Member

capflam commented Jun 24, 2019

So, if you are running haproxy 2.0 on the commit 3f0b1de, the bug described in issue #127 should be fixed. For your issue, I have no idea yet. If possible, could you try to disable the htx to see if the bug also appears in the legacy HTTP mode ?

@Yuubari
Copy link
Author

Yuubari commented Jun 24, 2019

I will try that and will update if anything happens. Sadly, it takes a while to manifest, and the core file is somehow not generated (even though it should be).
The log does not help, either:

Jun 24 06:59:22 lincle haproxy[69059]: Connect from client-ipv6:57995 to ipv6:443 (tls_termination/HTTP)
Jun 24 06:59:23 lincle syslogd: last message repeated 15 times
Jun 24 07:11:12 lincle haproxy[30099]: Proxy tls_termination started.

@capflam
Copy link
Member

capflam commented Jun 24, 2019

Thanks for your help. About the core file, the global option set-dumpable may help.

@Yuubari
Copy link
Author

Yuubari commented Jun 24, 2019

Seems like set-dumpable does not help here (FreeBSD); I can only get the parent process (UID 0) to dump core, the child process does not produce core (I am not using chroot in the configuration file, the core dump location has 777 permissions, and other programs ran as the same user, nobody, successfully dump there on kill -SIGBUS). I will try running haproxy in debug mode inside lldb and see if I can find anything meaningful there.

Thank you for help.

@wtarreau
Copy link
Member

wtarreau commented Jun 24, 2019 via email

@Yuubari
Copy link
Author

Yuubari commented Jun 24, 2019

@wtarreau, thank you, your suggestion worked, with the child process running as root (but inside a chroot) I can get it to dump core. Built haproxy with debug info (-DDEBUG_FULL) and going to run it with HTX enabled for a bit.

@Yuubari
Copy link
Author

Yuubari commented Jun 25, 2019

Got a core dump after the crash:

(lldb) bt all
* thread #1, name = 'haproxy', stop reason = signal SIGBUS
    frame #0: 0x0000000000309d20 haproxy`b_alloc_margin [inlined] __pool_get_from_cache(pool=0x00000000004ba640) at memory.h:199
    frame #1: 0x0000000000309c8f haproxy`b_alloc_margin [inlined] __pool_get_first(pool=0x00000000004ba640) at memory.h:216
    frame #2: 0x0000000000309c8f haproxy`b_alloc_margin(buf=<unavailable>, margin=0) at buffer.h:165
    frame #3: 0x0000000000308a90 haproxy`h1_io_cb [inlined] h1_get_buf(h1c=<unavailable>, bptr=0x0000000802ea5078) at mux_h1.c:189
    frame #4: 0x0000000000308a8b haproxy`h1_io_cb [inlined] h1_recv(h1c=<unavailable>) at mux_h1.c:1819
  * frame #5: 0x0000000000308a8b haproxy`h1_io_cb(t=<unavailable>, ctx=0x0000000802ea5060, status=<unavailable>) at mux_h1.c:2010
    frame #6: 0x000000000036c08f haproxy`process_runnable_tasks at task.c:414
    frame #7: 0x00000000002e6e96 haproxy`run_thread_poll_loop [inlined] run_poll_loop at haproxy.c:2513
    frame #8: 0x00000000002e6e62 haproxy`run_thread_poll_loop(data=<unavailable>) at haproxy.c:2634
    frame #9: 0x000000080042279b libthr.so.3`thread_start(curthread=0x0000000800cafe00) at thr_create.c:292
  thread #2, name = 'haproxy', stop reason = signal SIGBUS
    frame #0: 0x0000000800a12d4a libc.so.7`_kevent at _kevent.S:3
    frame #1: 0x0000000800425862 libthr.so.3`__thr_kevent(kq=3, changelist=0x0000000000000000, nchanges=0, eventlist=0x0000000802e00640, nevents=200, timeout=0x00007fffffffe800) at thr_syscalls.c:400
    frame #2: 0x0000000000260a93 haproxy`_do_poll(p=<unavailable>, exp=-1949597429, wake=0) at ev_kqueue.c:146
    frame #3: 0x00000000002e6fc2 haproxy`run_thread_poll_loop [inlined] run_poll_loop at haproxy.c:2550
    frame #4: 0x00000000002e6e62 haproxy`run_thread_poll_loop(data=<unavailable>) at haproxy.c:2634
    frame #5: 0x00000000002e62eb haproxy`main(argc=<unavailable>, argv=<unavailable>) at haproxy.c:3308
    frame #6: 0x000000000025f120 haproxy`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1.c:76
  thread #3, name = 'haproxy', stop reason = signal SIGBUS
    frame #0: 0x0000000800a12d4a libc.so.7`_kevent at _kevent.S:3
    frame #1: 0x0000000800425862 libthr.so.3`__thr_kevent(kq=27, changelist=0x0000000000000000, nchanges=0, eventlist=0x00000008028a1280, nevents=200, timeout=0x00007fffdfdfcf30) at thr_syscalls.c:400
    frame #2: 0x0000000000260a93 haproxy`_do_poll(p=<unavailable>, exp=-1949596809, wake=0) at ev_kqueue.c:146
    frame #3: 0x00000000002e6fc2 haproxy`run_thread_poll_loop [inlined] run_poll_loop at haproxy.c:2550
    frame #4: 0x00000000002e6e62 haproxy`run_thread_poll_loop(data=<unavailable>) at haproxy.c:2634
    frame #5: 0x000000080042279b libthr.so.3`thread_start(curthread=0x0000000800cb0300) at thr_create.c:292
  thread #4, name = 'haproxy', stop reason = signal SIGBUS
    frame #0: 0x0000000800a12d4a libc.so.7`_kevent at _kevent.S:3
    frame #1: 0x0000000800425862 libthr.so.3`__thr_kevent(kq=15, changelist=0x0000000000000000, nchanges=0, eventlist=0x0000000801b225c0, nevents=200, timeout=0x00007fffdfbfbf30) at thr_syscalls.c:400
    frame #2: 0x0000000000260a93 haproxy`_do_poll(p=<unavailable>, exp=-1949597478, wake=0) at ev_kqueue.c:146
    frame #3: 0x00000000002e6fc2 haproxy`run_thread_poll_loop [inlined] run_poll_loop at haproxy.c:2550
    frame #4: 0x00000000002e6e62 haproxy`run_thread_poll_loop(data=<unavailable>) at haproxy.c:2634
    frame #5: 0x000000080042279b libthr.so.3`thread_start(curthread=0x0000000800cb0800) at thr_create.c:292
  thread #5, name = 'haproxy', stop reason = signal SIGBUS
    frame #0: 0x0000000800a12d4a libc.so.7`_kevent at _kevent.S:3
    frame #1: 0x0000000800425862 libthr.so.3`__thr_kevent(kq=18, changelist=0x0000000000000000, nchanges=0, eventlist=0x00000008022006c0, nevents=200, timeout=0x00007fffdf9faf30) at thr_syscalls.c:400
    frame #2: 0x0000000000260a93 haproxy`_do_poll(p=<unavailable>, exp=-1949596592, wake=0) at ev_kqueue.c:146
    frame #3: 0x00000000002e6fc2 haproxy`run_thread_poll_loop [inlined] run_poll_loop at haproxy.c:2550
    frame #4: 0x00000000002e6e62 haproxy`run_thread_poll_loop(data=<unavailable>) at haproxy.c:2634
    frame #5: 0x000000080042279b libthr.so.3`thread_start(curthread=0x0000000800cb0d00) at thr_create.c:292
  thread #6, name = 'haproxy', stop reason = signal SIGBUS
    frame #0: 0x0000000800a12d4a libc.so.7`_kevent at _kevent.S:3
    frame #1: 0x0000000800425862 libthr.so.3`__thr_kevent(kq=21, changelist=0x0000000000000000, nchanges=0, eventlist=0x00000008022a67c0, nevents=200, timeout=0x00007fffdf7f9f30) at thr_syscalls.c:400
    frame #2: 0x0000000000260a93 haproxy`_do_poll(p=<unavailable>, exp=-1949594301, wake=0) at ev_kqueue.c:146
    frame #3: 0x00000000002e6fc2 haproxy`run_thread_poll_loop [inlined] run_poll_loop at haproxy.c:2550
    frame #4: 0x00000000002e6e62 haproxy`run_thread_poll_loop(data=<unavailable>) at haproxy.c:2634
    frame #5: 0x000000080042279b libthr.so.3`thread_start(curthread=0x0000000800cb1200) at thr_create.c:292
  thread #7, name = 'haproxy', stop reason = signal SIGBUS
    frame #0: 0x0000000800a12d4a libc.so.7`_kevent at _kevent.S:3
    frame #1: 0x0000000800425862 libthr.so.3`__thr_kevent(kq=30, changelist=0x0000000000000000, nchanges=0, eventlist=0x00000008029428c0, nevents=200, timeout=0x00007fffdf5f8f30) at thr_syscalls.c:400
    frame #2: 0x0000000000260a93 haproxy`_do_poll(p=<unavailable>, exp=-1949598103, wake=0) at ev_kqueue.c:146
    frame #3: 0x00000000002e6fc2 haproxy`run_thread_poll_loop [inlined] run_poll_loop at haproxy.c:2550
    frame #4: 0x00000000002e6e62 haproxy`run_thread_poll_loop(data=<unavailable>) at haproxy.c:2634
    frame #5: 0x000000080042279b libthr.so.3`thread_start(curthread=0x0000000800cb1700) at thr_create.c:292
  thread #8, name = 'haproxy', stop reason = signal SIGBUS
    frame #0: 0x0000000800a12d4a libc.so.7`_kevent at _kevent.S:3
    frame #1: 0x0000000800425862 libthr.so.3`__thr_kevent(kq=24, changelist=0x0000000000000000, nchanges=0, eventlist=0x0000000802800a00, nevents=200, timeout=0x00007fffdf3f7f30) at thr_syscalls.c:400
    frame #2: 0x0000000000260a93 haproxy`_do_poll(p=<unavailable>, exp=-1949595946, wake=0) at ev_kqueue.c:146
    frame #3: 0x00000000002e6fc2 haproxy`run_thread_poll_loop [inlined] run_poll_loop at haproxy.c:2550
    frame #4: 0x00000000002e6e62 haproxy`run_thread_poll_loop(data=<unavailable>) at haproxy.c:2634
    frame #5: 0x000000080042279b libthr.so.3`thread_start(curthread=0x0000000800cb1c00) at thr_create.c:292

It looks like the offending thread is the first one in this backtrace. Which frames and what within them should I analyse (and what exactly) to post here? I would like to avoid sharing the core file itself as it likely has confidential data.

@capflam
Copy link
Member

capflam commented Jun 26, 2019

@Yuubari, a fix was pushed in upstream and backported to 2.0. See commit c2518a5 for details. It should fix your bug. So there is no snapshot yet with this commit, but I should do the release 2.0.1 today. But of course, a feedback before the release would be cool :)

@capflam capflam added 1.9 This issue affects the HAProxy 1.9 stable branch. 2.0 This issue affects the HAProxy 2.0 stable branch. dev This issue affects the HAProxy development branch. and removed status: needs-triage This issue needs to be triaged. labels Jun 26, 2019
@Yuubari
Copy link
Author

Yuubari commented Jun 26, 2019

@capflam, it seems to be working but the crash usually required a few hours (usually at least 10) to trigger, so I cannot really say I can properly test it before release. Thank you for the swift fix; closing this issue now.

@Yuubari Yuubari closed this as completed Jun 26, 2019
@capflam
Copy link
Member

capflam commented Jun 26, 2019

Ok, thanks. Reopen button is still there if it crash again :)

@capflam capflam added severity: major This issue is of MAJOR severity. status: fixed This issue is a now-fixed bug. subsystem: h1 The issue is within the HTTP/1 subsystem. and removed 1.9 This issue affects the HAProxy 1.9 stable branch. 2.0 This issue affects the HAProxy 2.0 stable branch. dev This issue affects the HAProxy development branch. labels Jun 26, 2019
@Yuubari
Copy link
Author

Yuubari commented Jun 27, 2019

Now approaching 23 hours of haproxy up-time with the debug snapshot from yesterday so it seems like the crash was indeed fixed.

@capflam
Copy link
Member

capflam commented Jun 27, 2019

Nice. Thanks for your feedback !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity: major This issue is of MAJOR severity. status: fixed This issue is a now-fixed bug. subsystem: h1 The issue is within the HTTP/1 subsystem. type: bug This issue describes a bug.
Projects
None yet
Development

No branches or pull requests

4 participants