Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Haproxy 2.0 stuck thread #181

Closed
japeldoorn opened this issue Jul 22, 2019 · 11 comments
Closed

Haproxy 2.0 stuck thread #181

japeldoorn opened this issue Jul 22, 2019 · 11 comments
Labels
severity: critical This issue is of CRITICAL severity. status: fixed This issue is a now-fixed bug. type: bug This issue describes a bug.

Comments

@japeldoorn
Copy link

Output of haproxy -vv and uname -a

HA-Proxy version 2.0.2 2019/07/16 - https://haproxy.org/
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-format-truncation -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wno-implicit-fallthrough -Wno-stringop-overflow -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_PCRE2=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_SYSTEMD=1

Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER -PCRE -PCRE_JIT +PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED -REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=4).
Built with OpenSSL version : OpenSSL 1.1.1  11 Sep 2018
Running on OpenSSL version : OpenSSL 1.1.1  11 Sep 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with network namespace support.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE2 version : 10.31 2018-02-12
PCRE2 library supports JIT : no (USE_PCRE2_JIT not set)
Encrypted password support via crypt(3): yes

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTX        side=FE|BE     mux=H2
              h2 : mode=HTTP       side=FE        mux=H2
       <default> : mode=HTX        side=FE|BE     mux=H1
       <default> : mode=TCP|HTTP   side=FE|BE     mux=PASS

Available services : none

Available filters :
        [SPOE] spoe
        [COMP] compression
        [CACHE] cache
        [TRACE] trace


Linux proxy01 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

What's the configuration?

This is part of the configuration, it contains numerous frontends and backends similar to the one below.

global
        log 127.0.0.1 local3
        log 127.0.0.1 local4
        daemon
        stats socket /var/run/haproxy.sock mode 777 level admin expose-fd listeners
        stats socket ipv4@10.201.0.20:9999 level admin interface eth0
        stats socket ipv4@10.101.0.2:9999 level admin interface eth3
        stats timeout 30s
        maxconn 50000
        maxpipes 25000
        ulimit-n 200000
        spread-checks 5
        tune.ssl.default-dh-param 2048
        chroot /usr/local/var/lib/haproxy
        user haproxy
        group haproxy

        ssl-default-bind-ciphers TLS13-AES-256-GCM-SHA384:TLS13-AES-128-GCM-SHA256:TLS13-CHACHA20-POLY1305-SHA256:ECDH+AESGCM:ECDH+CHACHA20:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
        ssl-default-bind-options no-sslv3

        ca-base /opt/haproxy/certs
        crt-base /opt/haproxy/certs

        nbthread 1

defaults
        log global
        option forwardfor
        mode http
        option redispatch
        option httplog
#       balance roundrobin
        balance random
        hash-balance-factor 150

        default_backend ZiberWebsite_DS3
        option splice-auto
        option tcp-smart-connect
        option tcp-smart-accept
        option http-server-close
        option dontlognull
        option log-health-checks
        option log-separate-errors
        option contstats

        timeout server 90s
        timeout connect 20s
        timeout client 40s
        timeout http-request 30s
        timeout tunnel 1h
        default-server inter 10s fastinter 2s downinter 30s

# Do not output the default error-file for 408 errors.
# This causes newer (Chrome) browsers to cancel the connection
# and show the 408 error to the client
        errorfile 408 /dev/null


frontend standaardsite
        acl invalid_clients hdr_sub(referer) -f /opt/haproxy/haproxy_blocked_referers.lst
        acl invalid_clients hdr_sub(user-agent) -f /opt/haproxy/haproxy_blocked_useragents.lst
        acl invalid_clients src -f /opt/haproxy/haproxy_blocked_ips.lst
        use_backend TARPIT if invalid_clients

        bind 91.233.52.202:80
        maxconn 10000
        acl is_content path_beg -i /content
        use_backend Content if is_content



backend ZiberWebsite_DS3
        option httpchk HEAD / HTTP/1.1\r\nHost:\ livecheck_ziberwebsite_ds3
        http-check disable-on-404
        server web01 10.101.0.20:80 check port 8080 weight 100 cookie web01
        server web02 10.101.0.21:80 check port 8080 weight 100 cookie web02
        server web03 10.101.0.22:80 check port 8080 weight 45 cookie web03
        server web04 10.101.0.23:80 check port 8080 weight 30 cookie web04
        server web05 10.101.0.24:80 check port 8080 weight 100 cookie web05
        server web06 10.101.0.25:80 check port 8080 weight 20 cookie web06
        server web07 10.101.0.26:80 check port 8080 weight 5  cookie web07 disabled

        http-request add-header X-Forwarded-Proto https if { ssl_fc }
        stick on src
        stick-table type ip size 200k expire 30m
        cookie SRV insert indirect nocache
        retry-on all-retryable-errors
        http-request disable-l7-retry if METH_POST

Steps to reproduce the behavior

  1. During normal operation, HAProxy 2.0 (we have tried 2.0.0, 2.0.1 and 2.0.2) seem to suffer from a stuck thread, which causes a reload of HAProxy, which results in dropped connections.
  2. The behaviour occurs numerous times a day (but at random intervals).
  3. We changed the default balance method from roundrobin to random, without effect.
  4. We changed the number of threads to 1 (from the default of 4), without effect. (In case of 4 threads, the stuck thread can be any of the 4).

Actual behavior

HAProxy reloads, with the following logged to syslog:

haproxy[39744]: Thread 1 is about to kill the process.
haproxy[39744]: *>Thread 1 : act=1 glob=0 wq=1 rq=0 tl=1 tlsz=0 rqsz=0
haproxy[39744]:              stuck=1 fdcache=1 prof=0 harmless=0 wantrdv=0
haproxy[39744]:              cpu_ns: poll=16608864685 now=19068153922 diff=2459289237
haproxy[39744]:              curr_task=0x55e4154ce6b0 (task) calls=2 last=0
haproxy[39744]:                fct=0x55e40a998710 (process_stream) ctx=0x55e418445420
haproxy[39744]:              strm=0x55e418445420 src=188.166.31.210 fe=standaardsite be=ZiberWebsite_DS3 dst=unknown
haproxy[39744]:              rqf=44808002 rqa=2800 rpf=80000000 rpa=0 sif=EST,200028 sib=INI,30
haproxy[39744]:              af=(nil),0 csf=0x55e4132b0230,4000
haproxy[39744]:              ab=(nil),0 csb=(nil),0
haproxy[39744]:              cof=0x55e4188a17d0,80201300:H1(0x55e4188fdbd0)/RAW((nil))/tcpv4(347)
haproxy[39744]:              cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0)
haproxy[39744]: [ALERT] 202/120446 (39744) : Current worker #1 (39758) exited with code 134 (Aborted)
haproxy[39744]: [ALERT] 202/120446 (39744) : exit-on-failure: killing every processes with SIGTERM
haproxy[39744]: [WARNING] 202/120446 (39744) : All workers exited. Exiting... (134)
systemd[1]: haproxy.service: Main process exited, code=exited, status=134/n/a
systemd[1]: haproxy.service: Failed with result 'exit-code'.
systemd[1]: haproxy.service: Service hold-off time over, scheduling restart.
systemd[1]: haproxy.service: Scheduled restart job, restart counter is at 159.

On our fallback HAProxy instances (running 1.8.14 and earlier) we have not seen this behaviour.

@japeldoorn japeldoorn added status: needs-triage This issue needs to be triaged. type: bug This issue describes a bug. labels Jul 22, 2019
@wtarreau
Copy link
Member

There is nothing abnormal nor unusual in your configuration which could likely explain this.
From the trace it looks like something prevents the state from converging in process_stream() before performing the load balancing. It would help a lot if you were able to take a core dump and to pass it through gdb to decode the backtrace ("thread apply all bt full") next time it happens. In any case, once you get a core, please don't delete it and keep it preciously with a copy of the haproxy executable in case we need to ask you to look at it.

In order to get a core dumped, you will need either to disable the user/group settings or to add the "set-dumpable" directive in the global section. Also, depending on how core dumps are collected on your machine, you may need to disable the chroot or to configure a service to be able to capture them. You can check /proc/sys/kernel/core_pattern to figure if a particular service is involved or if a relative or absolute path is mentioned.

@wtarreau wtarreau added status: feedback required The developers are waiting for a reply from the reporter. and removed status: needs-triage This issue needs to be triaged. labels Jul 22, 2019
@japeldoorn
Copy link
Author

Thank you for your quick response.

We've managed to get a coredump/.crash passing it through gdb yields the following:


Thread 1 (Thread 0x7fdc2c43b0c0 (LWP 48710)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
        set = {__val = {8192, 0 <repeats 13 times>, 1179670597, 0}}
        pid = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
#1  0x00007fdc2a764801 in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x1, sa_sigaction = 0x1}, sa_mask = {__val = {140734641298274, 18446744067267100671, 18446744073709551615 <repeats 14 times>}}, sa_flags = -1, sa_restorer = 0xfffffffffffdfed8}
        sigs = {__val = {32, 0 <repeats 15 times>}}
        __cnt = <optimized out>
        __set = <optimized out>
        __cnt = <optimized out>
        __set = <optimized out>
#2  0x000055d77cc11c4e in ha_panic () at src/debug.c:164
No locals.
#3  0x000055d77cc11ec8 in wdt_handler (sig=14, si=<optimized out>, arg=<optimized out>) at src/wdt.c:123
        n = <optimized out>
        p = <optimized out>
        thr = 0
#4  <signal handler called>
No locals.
#5  0x000055d77cb02b98 in htx_manage_client_side_cookies (s=s@entry=0x55d78a6caa70, req=req@entry=0x55d78a6caa80) at src/proto_htx.c:4058
        sess = 0x55d783643750
        txn = 0x55d78a6caec0
        htx = <optimized out>
        ctx = {blk = 0x55d78a5dc0f0, value = {
            ptr = 0x55d78a5d817c ";user-agentMozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1664.3 Safari/537.36accepttext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8ac"...,
            len = 1}, lws_before = 0, lws_after = 0}
        hdr_beg = 0x55d78a5d817c ";user-agentMozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1664.3 Safari/537.36accepttext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8ac"...
        hdr_end = <optimized out>
        del_from = 0x0
        prev = 0x55d78a5d817c ";user-agentMozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1664.3 Safari/537.36accepttext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8ac"...
        att_beg = <optimized out>
        att_end = <optimized out>
        equal = <optimized out>
        val_beg = 0x55d78a5d817c ";user-agentMozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1664.3 Safari/537.36accepttext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8ac"...
        val_end = 0x55d78a5d817c ";user-agentMozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1664.3 Safari/537.36accepttext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8ac"...
        next = 0x55d78a5d817c ";user-agentMozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1664.3 Safari/537.36accepttext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8ac"...
        preserve_hdr = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
#6  0x000055d77cb0a96d in htx_process_request (s=s@entry=0x55d78a6caa70, req=req@entry=0x55d78a6caa80, an_bit=an_bit@entry=2048) at src/proto_htx.c:817
        sess = 0x55d783643750
        txn = 0x55d78a6caec0
        msg = 0x55d78a6caf20
        htx = <optimized out>
#7  0x000055d77caefcaf in http_process_request (s=0x55d78a6caa70, req=0x55d78a6caa80, an_bit=2048) at src/proto_http.c:2806
        sess = 0x55d783643750
        txn = <optimized out>
        msg = <optimized out>
        cli_conn = <optimized out>
#8  0x000055d77cb1a715 in process_stream (t=t@entry=0x55d783a6fad0, context=0x55d78a6caa70, state=<optimized out>) at src/stream.c:2096
        max_loops = <optimized out>
        ana_list = 10240
        ana_back = 10240
        flags = <optimized out>
        srv = <optimized out>
        s = 0x55d78a6caa70
        sess = <optimized out>
        rqf_last = <optimized out>
        rpf_last = 2147483648
        rq_prod_last = 8
        rq_cons_last = 0
        rp_cons_last = 8
        rp_prod_last = 0
        req_ana_back = <optimized out>
        req = 0x55d78a6caa80
        res = 0x55d78a6caae0
        si_f = 0x55d78a6cad18
        si_b = 0x55d78a6cad70
        rate = <optimized out>
#9  0x000055d77cbef085 in process_runnable_tasks () at src/task.c:412
        t = 0x55d783a6fad0
        state = <optimized out>
        ctx = <optimized out>
        process = <optimized out>
        lrq = <optimized out>
        grq = <optimized out>
        t = <optimized out>
        max_processed = 198
#10 0x000055d77cb59b54 in run_poll_loop () at src/haproxy.c:2516
        next = <optimized out>
        wake = <optimized out>
        next = <optimized out>
        wake = <optimized out>
#11 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2637
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}
        init_cond = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No}
#12 0x000055d77cab69e2 in main (argc=<optimized out>, argv=0x7fff564b2f18) at src/haproxy.c:3314
        blocked_sig = {__val = {18446744067199990583, 18446744073709551615 <repeats 15 times>}}
        old_sig = {__val = {0, 10333040384448973056, 140734641155920, 140734641155424, 8, 32, 12884901894, 7, 48, 94384019685400, 80, 18446744073709409816, 0, 206158430211, 0, 0}}
        i = <optimized out>
        err = <optimized out>
        retry = <optimized out>
        limit = {rlim_cur = 200000, rlim_max = 200000}
        errmsg = "\000/KV\377\177\000\000J\277\265|\327U\000\000n\000\000\000[\000\000\000`-KV\377\177\000\000\b\000\000\000\000\000\000\000bu\253|\327U\000\000\030\326\375\377\377\377\377\377\260\033\371}\327U\000\000\017", '\000' <repeats 34 times>
        pidfd = <optimized out>

I hope this is usable output.

@cognet
Copy link
Contributor

cognet commented Jul 22, 2019

Hi,

Thanks a lot ! It's been very useful, and we can now reproduce it, so we should be enough to fix it soon.

@wtarreau wtarreau added status: reviewed This issue was reviewed. A fix is required. and removed status: feedback required The developers are waiting for a reply from the reporter. labels Jul 22, 2019
haproxy-mirror pushed a commit that referenced this issue Jul 23, 2019
…by a delimiter

When client-side or server-side cookies are parsed, HAProxy enters in an
infinite loop if a Cookie/Set-Cookie header value starts by a delimiter (a colon
or a semicolon). Depending on the operating system, the service may become
degraded, unresponsive, or may trigger haproxy's watchdog causing a service stop
or automatic restart.

To fix this bug, in the loop parsing the attributes, we must be sure to always
skip delimiters once the first attribute-value pair was parsed, empty or
not. The credit for the fix goes to Olivier.

CVE-2019-14241 was assigned to this bug. This patch fixes the Github issue #181.

This patch must be backported to 2.0 and 1.9. However, the patch will have to be
adapted.
@wtarreau wtarreau added status: fixed This issue is a now-fixed bug. and removed status: reviewed This issue was reviewed. A fix is required. labels Jul 23, 2019
@wtarreau
Copy link
Member

A fix for this was issued in 1.9.9 and 2.0.3. Please upgrade to any of these versions as this bug can be triggered on purpose, making your server potentially vulnerable to an attack.

@wtarreau
Copy link
Member

Closing the issue now.

@TimWolla TimWolla added the severity: critical This issue is of CRITICAL severity. label Jul 23, 2019
@carlwgeorge
Copy link

The CVE indicates that this affects versions 1.4 through 1.9.8 and 2.0.0 through 2.0.2. The home page does say that 1.4 is unmaintained, but will this fix be backported to 1.5, 1.6, 1.7, and 1.8?

@wtarreau
Copy link
Member

The CVE says crap unfortunately, as almost always for CVEs. I took a great care at explaining that only 2.0.0 to 2.0.2 and 1.9.0 to 1.9.8 were vulnerable, and they translated this to "all haproxy up to 2.0.2" then somehow reformulated it as 1.4 to 1.9.8. I responded to the CVE form to get this mess fixed and as expected, there was neither response nor fix. It's confusing for users. I really don't know why I stlil try to get CVEs assigned, I think I never ever saw one done correctly whatever the project that had them...

@carlwgeorge
Copy link

Thanks for the clarification @wtarreau.

@max2k1
Copy link

max2k1 commented Jul 29, 2019

Looks like it still not fixed. Haproxy=2.0.3:

backtrace:

Core was generated by `/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fd665728428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fd66701f180 (LWP 29741))]
(gdb) t a a bt

Thread 4 (Thread 0x7fd662526700 (LWP 29745)):
#0  0x0000564c376805c6 in process_srv_queue (s=0x564c3ab726e0) at src/queue.c:316
#1  0x0000564c375afe2a in stream_free (s=0x7fd65c542670) at src/stream.c:382
#2  process_stream (t=<optimized out>, context=0x7fd65c542670, state=<optimized out>) at src/stream.c:2739
#3  0x0000564c37684169 in process_runnable_tasks () at src/task.c:412
#4  0x0000564c375ed7a8 in run_poll_loop () at src/haproxy.c:2516
#5  run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2637
#6  0x00007fd66682a6ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7  0x00007fd6657fa41d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 3 (Thread 0x7fd663528700 (LWP 29743)):
#0  0x0000564c37680ca7 in pendconn_grab_from_px (s=s@entry=0x564c3ab726e0) at src/queue.c:446
#1  0x0000564c375e233e in server_warmup (t=0x564c3cc81d20, context=0x564c3ab726e0, state=<optimized out>) at src/checks.c:1505
#2  0x0000564c376844c5 in process_runnable_tasks () at src/task.c:414
#3  0x0000564c375ed7a8 in run_poll_loop () at src/haproxy.c:2516
#4  run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2637
#5  0x00007fd66682a6ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#6  0x00007fd6657fa41d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 2 (Thread 0x7fd662d27700 (LWP 29744)):
#0  0x0000564c376809df in pendconn_queue_lock (p=<optimized out>, p=<optimized out>) at src/queue.c:157
#1  pendconn_add (strm=strm@entry=0x7fd654194dc0) at src/queue.c:369
#2  0x0000564c3763dd38 in assign_server_and_queue (s=s@entry=0x7fd654194dc0) at src/backend.c:1037
#3  0x0000564c37640f28 in assign_server_and_queue (s=0x7fd654194dc0) at include/proto/backend.h:104
#4  srv_redispatch_connect (s=s@entry=0x7fd654194dc0) at src/backend.c:1700
#5  0x0000564c375abad5 in sess_prepare_conn_req (s=0x7fd654194dc0) at src/stream.c:1237
#6  process_stream (t=<optimized out>, context=0x7fd654194dc0, state=<optimized out>) at src/stream.c:2400
#7  0x0000564c37684169 in process_runnable_tasks () at src/task.c:412
#8  0x0000564c375ed7a8 in run_poll_loop () at src/haproxy.c:2516
#9  run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2637
#10 0x00007fd66682a6ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007fd6657fa41d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 1 (Thread 0x7fd66701f180 (LWP 29741)):
#0  0x00007fd665728428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fd66572a02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x0000564c376a764e in ha_panic () at src/debug.c:164
#3  0x0000564c376a78c8 in wdt_handler (sig=14, si=<optimized out>, arg=<optimized out>) at src/wdt.c:123
#4  <signal handler called>
#5  0x0000564c37680a5a in pendconn_queue_lock (p=<optimized out>, p=<optimized out>) at src/queue.c:157
#6  pendconn_add (strm=strm@entry=0x564c3d1259e0) at src/queue.c:369
#7  0x0000564c3763dd38 in assign_server_and_queue (s=s@entry=0x564c3d1259e0) at src/backend.c:1037
#8  0x0000564c37640f28 in assign_server_and_queue (s=0x564c3d1259e0) at include/proto/backend.h:104
#9  srv_redispatch_connect (s=s@entry=0x564c3d1259e0) at src/backend.c:1700
#10 0x0000564c375abad5 in sess_prepare_conn_req (s=0x564c3d1259e0) at src/stream.c:1237
#11 process_stream (t=<optimized out>, context=0x564c3d1259e0, state=<optimized out>) at src/stream.c:2400
#12 0x0000564c37684169 in process_runnable_tasks () at src/task.c:412
#13 0x0000564c375ed7a8 in run_poll_loop () at src/haproxy.c:2516
#14 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2637
#15 0x0000564c3754aa27 in main (argc=<optimized out>, argv=0x7ffe193247a8) at src/haproxy.c:3314
haproxy -vv
HA-Proxy version 2.0.3-1 2019/07/24 - https://haproxy.org/
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_REGPARM=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_TFO=1 USE_SYSTEMD=1

Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=56).
Built with OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
Running on OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.1
Built with network namespace support.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE2 version : 10.21 2016-01-12
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with the Prometheus exporter as a service

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTX        side=FE|BE     mux=H2
              h2 : mode=HTTP       side=FE        mux=H2
       <default> : mode=HTX        side=FE|BE     mux=H1
       <default> : mode=TCP|HTTP   side=FE|BE     mux=PASS

Available services :
	prometheus-exporter

Available filters :
	[SPOE] spoe
	[COMP] compression
	[CACHE] cache
	[TRACE] trace

@wtarreau
Copy link
Member

Ah that's bad, it's a clear AB/BA locking issue :-(

server_warmup() calls pendconn_grab_from_px() under the server's lock, but this one then takes the proxy's lock. In parallel, process_srv_queue() does exactly the opposite.

I'm checking if we can just safely switch the two locks in process_srv_queue() to validate the impacts. Thank you!

@wtarreau wtarreau reopened this Jul 29, 2019
@wtarreau
Copy link
Member

Let's create a new bug for this one which is completely different or we'll totally mess up backports.

jkoelker pushed a commit to jkoelker/haproxy that referenced this issue Sep 27, 2019
…by a delimiter

When client-side or server-side cookies are parsed, HAProxy enters in an
infinite loop if a Cookie/Set-Cookie header value starts by a delimiter (a colon
or a semicolon). Depending on the operating system, the service may become
degraded, unresponsive, or may trigger haproxy's watchdog causing a service stop
or automatic restart.

To fix this bug, in the loop parsing the attributes, we must be sure to always
skip delimiters once the first attribute-value pair was parsed, empty or
not. The credit for the fix goes to Olivier.

CVE-2019-14241 was assigned to this bug. This patch fixes the Github issue haproxy#181.

This patch must be backported to 2.0 and 1.9. However, the patch will have to be
adapted.

(cherry picked from commit f0f4238)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
FireBurn pushed a commit to FireBurn/haproxy that referenced this issue Jan 29, 2020
…by a delimiter

When client-side or server-side cookies are parsed, HAProxy enters in an
infinite loop if a Cookie/Set-Cookie header value starts by a delimiter (a colon
or a semicolon). Depending on the operating system, the service may become
degraded, unresponsive, or may trigger haproxy's watchdog causing a service stop
or automatic restart.

To fix this bug, in the loop parsing the attributes, we must be sure to always
skip delimiters once the first attribute-value pair was parsed, empty or
not. The credit for the fix goes to Olivier.

CVE-2019-14241 was assigned to this bug. This patch fixes the Github issue haproxy#181.

This patch must be backported to 2.0 and 1.9. However, the patch will have to be
adapted.

(cherry picked from commit f0f4238)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit fc7f52e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity: critical This issue is of CRITICAL severity. status: fixed This issue is a now-fixed bug. type: bug This issue describes a bug.
Projects
None yet
Development

No branches or pull requests

6 participants