Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random high cpu usage with QUIC/H3 #1903

Closed
gabrieltz opened this issue Oct 20, 2022 · 229 comments
Closed

Random high cpu usage with QUIC/H3 #1903

gabrieltz opened this issue Oct 20, 2022 · 229 comments
Labels
status: fixed This issue is a now-fixed bug. type: bug This issue describes a bug.

Comments

@gabrieltz
Copy link

Detailed Description of the Problem

2.7-dev7 and dev8 both appear to consume all available cpu resources from the system and goes from 30-40% cpu to 1400% for hours until we restart it.
While haproxy is in that state, all CPU cores are C0 state 100% of the time, while the normal usage is 5-10% at C0 state.

This is the output of show info using socat

Name: HAProxy
Version: 2.7-dev8-ea8aebe
Release_date: 2022/10/14
Nbthread: 16
Nbproc: 1
Process_num: 1
Pid: 539549
Uptime: 0d 22h18m18s
Uptime_sec: 80298
Memmax_MB: 0
PoolAlloc_MB: 641
PoolUsed_MB: 641
PoolFailed: 0
Ulimit-n: 200200
Maxsock: 200200
Maxconn: 100000
Hard_maxconn: 100000
CurrConns: 148
CumConns: 6792149
CumReq: 7641394
MaxSslConns: 0
CurrSslConns: 140
CumSslConns: 3620642
Maxpipes: 0
PipesUsed: 0
PipesFree: 0
ConnRate: 60
ConnRateLimit: 0
MaxConnRate: 315
SessRate: 60
SessRateLimit: 0
MaxSessRate: 315
SslRate: 57
SslRateLimit: 0
MaxSslRate: 193
SslFrontendKeyRate: 37
SslFrontendMaxKeyRate: 100
SslFrontendSessionReuse_pct: 34
SslBackendKeyRate: 0
SslBackendMaxKeyRate: 0
SslCacheLookups: 1356608
SslCacheMisses: 33095
CompressBpsIn: 0
CompressBpsOut: 0
CompressBpsRateLim: 0
ZlibMemUsage: 0
MaxZlibMemUsage: 0
Tasks: 429
Run_queue: 6
Idle_pct: 18
node: thor
Stopping: 0
Jobs: 201
Unstoppable Jobs: 1
Listeners: 54
ActivePeers: 0
ConnectedPeers: 0
DroppedLogs: 0
BusyPolling: 0
FailedResolutions: 0
TotalBytesOut: 110920837010
TotalSplicdedBytesOut: 0
BytesOutRate: 1631776
DebugCommandsIssued: 0
CumRecvLogs: 0
Build info: 2.7-dev8-ea8aebe
Memmax_bytes: 0
PoolAlloc_bytes: 672458896
PoolUsed_bytes: 672458896
Start_time_sec: 1666168649
Tainted: 0

output of show threads

  Thread 1 : id=0x7fc5d1ccd980 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/1    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=57000299520906 now=57000299525755 diff=4849
  Thread 2 : id=0x7fc5d1cca700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/2    stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=54178651790329 now=54178651798857 diff=8528
  Thread 3 : id=0x7fc5c314c700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=0 rqsz=5
      1/3    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=353159608409 now=54678576125366 diff=54325416516957
  Thread 4 : id=0x7fc5c294b700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/4    stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=56651185302360 now=56651185309114 diff=6754
  Thread 5 : id=0x7fc5c214a700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/5    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=57523634015460 now=57523634017621 diff=2161
  Thread 6 : id=0x7fc5c1949700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/6    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=53458315729557 now=53458315733967 diff=4410
  Thread 7 : id=0x7fc5c1148700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/7    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=57288119069930 now=57288119072740 diff=2810
  Thread 8 : id=0x7fc5c0947700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/8    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=57187809175472 now=57187809186982 diff=11510
* Thread 9 : id=0x7fc5c0146700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=1 rqsz=1
      1/9    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=54341108868004 now=54341108938803 diff=70799
             curr_task=0x7fc5900d6310 (task) calls=2 last=0
               fct=0x55c906109760(task_run_applet) ctx=0x7fc5902b8130(<CLI>)
             strm=0x7fc5908c0490,8 src=unix fe=GLOBAL be=GLOBAL dst=<CLI>
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=c0c023 rqa=0 rpf=80008000 rpa=0
             scf=0x7fc59054fbd0,EST,0 scb=0x7fc59027d450,EST,1
             af=(nil),0 sab=0x7fc5902b8130,9
             cof=0x7fc5908a5c60,40300:PASS(0x7fc5901c1210)/RAW((nil))/unix_stream(121)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 10: id=0x7fc5bf945700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/10   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=57131730710375 now=57131730713184 diff=2809
  Thread 11: id=0x7fc5bf144700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/11   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=57109074489307 now=57109074496683 diff=7376
  Thread 12: id=0x7fc5be943700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/12   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=53686682253767 now=53686682261219 diff=7452
  Thread 13: id=0x7fc5be142700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/13   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=57774917877018 now=57774917878421 diff=1403
  Thread 14: id=0x7fc5bd941700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/14   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=54471852912156 now=54471852915629 diff=3473
  Thread 15: id=0x7fc5bd140700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/15   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=53871773544025 now=53871773549479 diff=5454
  Thread 16: id=0x7fc5bc93f700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
      1/16   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=57535516781435 now=57535516788259 diff=6824

but when i used show tasks haproxy hung never returned anything, but also stopped servicing requests and i had to restart it.

i also captured a perf record -g

Screenshot from 2022-10-20 10-11-36

Expected Behavior

normal cpu usage

Steps to Reproduce the Behavior

none, sorry

Do you have any idea what may have caused this?

I am suspecting it has to do with QUIC / h3

Do you have an idea how to solve the issue?

No response

What is your configuration?

i trimmed the config down to the pieces that include quic/h3 which may be where the bug is,
otherwise it's 39 more backends and 5 more frontends


global
	daemon
	maxconn 100000
	log /dev/log local3
	pidfile /run/haproxy.pid
	chroot /var/lib/haproxy
	stats socket /run/haproxy/admin-balancer.sock mode 660 level admin expose-fd listeners
	stats socket ipv4@balancer_lan_ip:9999 level admin
	stats timeout 30s
	### multithreading
	nbthread 16
	cpu-map auto:1/1-16 0-15
	### end multithreading
	user haproxy
	group haproxy
	server-state-base /etc/haproxy/state/
	server-state-file /etc/haproxy/state/current
	ca-base /etc/ssl/certs/
	crt-base /etc/ssl/private/
	ssl-default-bind-ciphers kEECDH+aRSA+AES:kRSA+AES:+AES256:!RC4-SHA:!kEDH:!LOW:!EXP:!MD5:!aNULL:!eNULL
	tune.bufsize 32768

	ssl-default-bind-options no-sslv3 no-tls-tickets
	ssl-dh-param-file /etc/haproxy/dhparams.pem
	tune.ssl.cachesize 500000
	tune.ssl.default-dh-param 2048
        tune.ssl.lifetime 360
	tune.ssl.maxrecord 1460
        tune.ssl.ssl-ctx-cache-size 1000
	hard-stop-after 180s

resolvers consul
        nameserver consul1     onsulip1:8600
        nameserver consul2    consulip2:8600
        nameserver consul3    consulip3:8600
        accepted_payload_size 8192
	hold valid 	      10s
        resolve_retries       10
        timeout resolve       2s
        timeout retry         2s

defaults
        mode http
        timeout connect 3s
        timeout http-request 3s
        timeout client 10s
        timeout server 40s
	timeout http-keep-alive 2s
	timeout client-fin 15s
        backlog 262144
        load-server-state-from-file global

	option forwardfor
	option http-server-close
        option httplog
        option dontlognull
	option dontlog-normal
	option log-separate-errors

        errorfile 400 /etc/haproxy/errors/400.http
        errorfile 403 /etc/haproxy/errors/403.http
        errorfile 408 /dev/null
        errorfile 500 /etc/haproxy/errors/500.http
        errorfile 502 /etc/haproxy/errors/502.http
        errorfile 503 /etc/haproxy/errors/503.http
        errorfile 504 /etc/haproxy/errors/504.http



frontend domain-ssl
	mode http
        option dontlog-normal
        option forwardfor except balancer_lan_ip
        option http-buffer-request
	maxconn 10000
        backlog 262144
	tcp-request inspect-delay 10s
        tcp-request content track-sc0 ssl_fc_protocol table st_ssl_stats
	log /dev/log local3
	timeout http-request 6s
        timeout client 60s
        timeout client-fin 15s
        no option http-keep-alive
        no option http-server-close
        no option httpclose

	bind vlanX_ip:443 tfo ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/5
	bind vlanX_ip:443 tfo ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/6
	bind vlanX_ip:443 tfo ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/7
	bind vlanX_ip:443 tfo ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/8
	bind vlanX_ip:443 tfo ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/9
	bind vlanX_ip:443 tfo ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/10
	bind vlanX_ip:443 tfo ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/11
	bind vlanX_ip:443 tfo ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/12
	bind vlanX_ip:443 tfo ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/13
	bind vlanX_ip:443 tfo ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/14
	bind vlanX_ip:443 tfo ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/15
	bind vlanX_ip:443 tfo ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/16
	# ====================
	bind vlanY_ip:443 ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/5
	bind vlanY_ip:443 ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/6
	bind vlanY_ip:443 ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/7
	bind vlanY_ip:443 ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/8
	bind vlanY_ip:443 ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/9
	bind vlanY_ip:443 ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/10
	bind vlanY_ip:443 ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/11
	bind vlanY_ip:443 ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/12
	bind vlanY_ip:443 ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/13
	bind vlanY_ip:443 ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/14
	bind vlanY_ip:443 ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/15
	bind vlanY_ip:443 ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h2,http/1.1 thread 1/16
	### quic / http3 / h3 ###
	bind quic4@vlanX_ip:443 ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h3
	bind quic4@vlanY_ip:443 ssl crt /etc/ssl/private/cert1.haproxy.pem crt /etc/ssl/private/certbot/cert2.pem alpn h3

	bind balancer_lan_ip:344
	capture request header Host len 32
	capture request header X-Forwarded-For len 50
        capture request header Referer len 32

	acl our-nets src <some_ips>
	acl is-dev hdr_sub(Host) -i .dev. # contains .dev. 
	acl is-dev hdr_beg(Host) -i dev.  # begins with dev
        http-request add-header X-CLIENT-IP %[src]
	http-request tarpit if is-dev !our-nets
	http-request set-header X-HTTPS on
        http-request deny if METH_TRACE
	http-request deny if { hdr(Host) -m ip 0.0.0.0/0 }
        http-request silent-drop if { hdr(proxy-authorization) -m found }

	### catch old version , no refer header, and /item/<id> with no title to catch it.
        acl old_chrome hdr(user-agent) -m sub Chrome/4 Chrome/5 Chrome/6 Chrome/7
        acl has_referer hdr(referer) -m found
        acl get_no_title path_reg ^/item/\d{10}$
        http-request deny deny_status 410 if get_no_title !has_referer old_chrome

        # use agent filter 
	acl bad_ua hdr_sub(user-agent) -f /etc/haproxy/blocked-user-agents.txt
        acl bad_ua hdr(user-agent) -- r
        http-request deny if bad_ua

        ##### Throttle clients
        acl exclude_tracking hdr_dom(Host) -m end a_few_hostnames_here
	acl exclude_tracking path -i -m end .jpg .png .ico .js .svg .css
        stick-table  type ip size 50k  expire 30s  store http_req_rate(10s)
        http-request track-sc1 src unless exclude_tracking || our-nets
        http-request deny deny_status 429 if { sc_http_req_rate(1) gt 100 }

        acl is-robots path_beg /robots.txt
        acl no-robots hdr_dom(Host) a_few_hostnames_here
	http-request replace-header Host (.*) robot-server if is-robots no-robots
        http-request set-path /robots.disallow.txt if is-robots no-robots

        acl is-go-to-main hdr_end(Host) -i main_hostname main_hostname:443
        acl is-valid hdr_beg(Host) -i -f /etc/haproxy/are-valid-hosts.txt 	# hosts listed here are valid and will not be redirected.
        http-request redirect location https://valid_hostname%[capture.req.uri] code 301 if is-go-to-main !is-valid

	### acl below may be removed
        acl is-meh hdr_dom(Host) -i meh.domain

        use_backend pubsub if { hdr_dom(Host) -i pubsub.hostname }
	use_backend dist if { hdr_dom(Host) -i dist.hostname }
	use_backend %[str(active),map(/etc/haproxy/rpc-proxy.map,rpc-proxy-blue)] if { hdr(host) rpc.hostname }

	use_backend %[str(active),map(/etc/haproxy/meh-content.map,meh-content-green)] if { hdr_dom(host) meh.hostname }
	use_backend active-web if is-go-to-main
	default_backend main-cluster

### end domain-ssl frontend.

####################################### BACKENDS #####################################

backend main-cluster
	mode http
	log /dev/log local3 
	balance roundrobin
	option forwardfor

        server-template php74srv 4 _ph74-web._tcp.service.hello.consul maxconn 90 weight 50 resolvers consul check observe layer4 error-limit 4 on-error mark-down

backend pubsub
        mode http
	no log
        timeout tunnel          1200s
        fullconn                3000
        server pubsub:staging   <ip_here>:4001 check weight 10
	server-template pubsubsrv 1 _pubsub._tcp.service.hello.consul resolvers consul check observe layer4 backup

############################ end rpc-proxy blue/green deploy using haproxy maps ###########################
backend rpc-proxy-green
        mode http
        fullconn 5000
        timeout server 300s
	log /dev/log local3
        option httpclose
        option httpchk GET /status HTTP/1.0
        http-response add-header  X-Color green
	http-response set-header alt-svc "h3=\":443\";ma=180;"
        server-template rpcproxygreensrv 5 _rpc-proxy-green._tcp.service.hello.consul resolvers consul check observe layer4 maxconn 50

backend rpc-proxy-blue
        mode http
        fullconn 5000
	log /dev/log local3
        timeout server 300s
        option httpclose
        option httpchk GET /status HTTP/1.0
        http-response add-header X-Color blue
	http-response set-header alt-svc "h3=\":443\";ma=180;"
        server-template rpcproxybluesrv 5 _rpc-proxy-blue._tcp.service.hello.consul resolvers consul check observe layer4 maxconn 50

############################ end rpc-proxy-bicolor setup ###########################


backend meh-content-green
        mode http
	fullconn 1000
        option httpclose
        log global
        http-response set-header  X-Color green
        server-template green-content 4  _meh-content-green._tcp.service.hello.consul maxconn 50 weight 50 resolvers consul check observe layer4 error-limit 4 on-error mark-down

backend meh-content-blue
        mode http
	fullconn 1000
        option httpclose
        log global
        http-response set-header  X-Color blue
        server-template blue-content 4 _meh-content-blue._tcp.service.hello.consul maxconn 50 weight 50 resolvers consul check observe layer4 error-limit 4 on-error mark-down

############################ end meh-bicolor setup ###########################

backend activeweb
	mode http
	timeout server 60s
	http-response set-header alt-svc "h3=\":443\";ma=180;"

	server-template websrv 8 _activeweb._tcp.service.hello.consul maxconn 200 weight 100 resolvers consul check observe layer4 error-limit 4 on-error mark-down

listen stats 
	bind balancer_lan_ip:8080
	log global
	mode http
        http-request use-service prometheus-exporter if { path /metrics }
	stats enable
	#stats hide-version
	stats realm Haproxy\ Statistics
	stats uri /
	stats auth yes:no


### Output of `haproxy -vv`

```plain
HAProxy version 2.7-dev8-ea8aebe 2022/10/14 - https://haproxy.org/
Status: development branch - not safe for use in production.
Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open
Running on: Linux 5.4.0-110-generic #124-Ubuntu SMP Thu Apr 14 19:46:19 UTC 2022 x86_64
Build options :
  TARGET  = generic
  CPU     = native
  CC      = cc
  CFLAGS  = -O2 -march=native -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_EPOLL=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_THREAD=1 USE_LINUX_SPLICE=1 USE_OPENSSL=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 USE_SYSTEMD=1 USE_QUIC=1 USE_PROMEX=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : +EPOLL -KQUEUE -NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL +THREAD -PTHREAD_EMULATION -BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY -LINUX_TPROXY +LINUX_SPLICE -LIBCRYPT -CRYPT_H -ENGINE -GETADDRINFO +OPENSSL -LUA -ACCEPT4 -CLOSEFROM +ZLIB -SLZ +CPU_AFFINITY -TFO -NS -DL -RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER -PRCTL -PROCCTL -THREAD_DUMP -EVPORTS -OT +QUIC +PROMEX -MEMORY_PROFILING -SHM_OPEN

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=16).
Built with OpenSSL version : OpenSSL 3.0.6+quic 11 Oct 2022
Running on OpenSSL version : OpenSSL 3.0.6+quic 11 Oct 2022
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
OpenSSL providers loaded : default
Built with the Prometheus exporter as a service
Support for malloc_trim() is enabled.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.34 2019-11-21
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): no
Built with gcc compiler version 9.4.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
       quic : mode=HTTP  side=FE     mux=QUIC  flags=HTX|NO_UPG|FRAMED
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : prometheus-exporter
Available filters :
	[BWLIM] bwlim-in
	[BWLIM] bwlim-out
	[CACHE] cache
	[COMP] compression
	[FCGI] fcgi-app
	[SPOE] spoe
	[TRACE] trace
Linux thor 5.4.0-110-generic #124-Ubuntu SMP Thu Apr 14 19:46:19 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux


### Last Outputs and Backtraces

```plain
there are no disturbing log entries

Additional Information

  1. There is an increase of incoming udp errors when haproxy is in this state.
  2. Context switches / sec decrease from 16-17k average to 3k when stuck.
  3. Sometimes, If we leave haproxy in this state for more than 12 hours, tcp time_wait connections start to pile up until we restart haproxy.
@gabrieltz gabrieltz added status: needs-triage This issue needs to be triaged. type: bug This issue describes a bug. labels Oct 20, 2022
@Tristan971
Copy link
Member

I experienced pretty similar behaviour and had to disable QUIC advertisement as a result, but didn't know exactly what data to collect and share for an issue about it.

Happy to share any relevant command output if it helps too.

@a-denoyelle
Copy link
Contributor

Thanks for the report. Can you confirm me that the issue is not present for 2.7-dev6 ?

@Tristan971
Copy link
Member

Tristan971 commented Oct 20, 2022

It is present in ae1e14d which is part of 2.7-dev6:

HAProxy version 2.7-dev6-ae1e14d+mangadex-c85c6fd 2022-09-20T15:45+00:00 - https://haproxy.org/
Status: development branch - not safe for use in production.
Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open
Running on: Linux 5.4.195-1-pve #1 SMP PVE 5.4.195-1 (Wed, 13 Jul 2022 13:19:46 +0200) x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -ggdb3 -gdwarf-4 -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wnull-dereference -fwrapv -Wno-unknown-warning-option -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment -DMAX_SESS_STKCTR=5
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_STATIC_PCRE2=1 USE_LIBCRYPT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_TFO=1 USE_NS=1 USE_SYSTEMD=1 USE_QUIC=1 USE_PROMEX=1
  DEBUG   = -DDEBUG_MEMORY_POOLS -DDEBUG_STRICT

Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL +THREAD -PTHREAD_EMULATION +BACKTRACE -STATIC_PCRE +STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -ENGINE +GETADDRINFO +OPENSSL +LUA +ACCEPT4 -CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT +QUIC +PROMEX -MEMORY_PROFILING

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=32).
Built with OpenSSL version : OpenSSL 3.0.5+quic-mangadex-c85c6fd 20 Sep 2022
Running on OpenSSL version : OpenSSL 3.0.5+quic-mangadex-c85c6fd 20 Sep 2022
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
OpenSSL providers loaded : default
Built with Lua version : Lua 5.3.6
Built with the Prometheus exporter as a service
Built with network namespace support.
Support for malloc_trim() is enabled.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.40 2022-04-14
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with clang compiler version 14.0.6

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
       quic : mode=HTTP  side=FE     mux=QUIC  flags=HTX|NO_UPG|FRAMED
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=

Available services : prometheus-exporter
Available filters :
        [BWLIM] bwlim-in
        [BWLIM] bwlim-out
        [CACHE] cache
        [COMP] compression
        [FCGI] fcgi-app
        [SPOE] spoe
        [TRACE] trace

which was my "golden" 2.7-dev commit to trial running QUIC in wider production deployment

@gabrieltz
Copy link
Author

@a-denoyelle i have no experience with previous versions on this particular server ( or it's standby instance ) just dev7 and dev8 .

I have another instance that has run 2.7 dev3, dev4, dev7 and dev8 that had no problems with cpu usage, however those balancers' use case is very different

@gabrieltz
Copy link
Author

I don't know if the exported prometheus metrics may help but here you can see that haproxy active tasks dropped drastically
at 18:00 ( and high cpu usage begun ) until 10:00 when haproxy was restarted. However show tasks from the admin socket didn't return anything

Screenshot from 2022-10-20 13-33-26

i can add more graphs if you think they may help

@gabrieltz
Copy link
Author

gabrieltz commented Oct 21, 2022

i switched one of the servers to run the older 2.7-dev4 compiled binary i had left laying around and created a symlink to 2.7-dev8 in case it restarted.

Οκτ 20 23:16:21 naboo haproxy[2781866]:   call trace(11):
Οκτ 20 23:16:21 naboo haproxy[2781866]:   | 0x55af27ba2609 [c6 04 25 01 00 00 00 00]: main-0x4447
Οκτ 20 23:16:21 naboo haproxy[2781866]:   | 0x55af27be66e2 [85 c0 75 1a 48 8b 74 24]: qc_send_ppkts+0x3b2/0x5b0
Οκτ 20 23:16:21 naboo haproxy[2781866]:   | 0x55af27bec904 [85 c0 74 78 48 85 ed 74]: quic_conn_io_cb+0x6a4/0x11f9
Οκτ 20 23:16:21 naboo haproxy[2781866]:   | 0x55af27d3d762 [64 49 8b 06 8b 80 e0 00]: run_tasks_from_lists+0x232/0x951
Οκτ 20 23:16:21 naboo haproxy[2781866]:   | 0x55af27d3e20c [29 44 24 1c 8b 4c 24 1c]: process_runnable_tasks+0x37c/0x6f1
Οκτ 20 23:16:21 naboo haproxy[2781866]:   | 0x55af27d0cf5d [83 3d 3c 88 3c 00 01 0f]: run_poll_loop+0x12d/0x4f7
Οκτ 20 23:16:21 naboo haproxy[2781866]:   | 0x55af27d0d509 [48 8b 1d b0 c8 16 00 4c]: main+0x166ab9
Οκτ 20 23:16:21 naboo haproxy[2781866]:   | 0x7f842bcaa609 [64 48 89 04 25 30 06 00]: libpthread:+0x8609
Οκτ 20 23:16:21 naboo haproxy[2781866]:   | 0x7f842b55f133 [48 89 c7 b8 3c 00 00 00]: libc:clone+0x43/0x5e
Οκτ 20 23:16:22 naboo haproxy[2404977]: [NOTICE]   (2404977) : haproxy version is 2.7-dev4-f532019
Οκτ 20 23:16:22 naboo haproxy[2404977]: [NOTICE]   (2404977) : path to executable is /usr/sbin/haproxy
Οκτ 20 23:16:22 naboo haproxy[2404977]: [ALERT]    (2404977) : Current worker (2781866) exited with code 139 (Segmentation fault)
Οκτ 20 23:16:22 naboo haproxy[2404977]: [ALERT]    (2404977) : exit-on-failure: killing every processes with SIGTERM
Οκτ 20 23:16:22 naboo haproxy[2404977]: [WARNING]  (2404977) : Former worker (2404979) exited with code 143 (Terminated)
Οκτ 20 23:16:22 naboo haproxy[2404977]: [WARNING]  (2404977) : All workers exited. Exiting... (139)
Οκτ 20 23:16:22 naboo systemd[1]: haproxy.service: Main process exited, code=exited, status=139/n/a

upon restart, after the crash systemd started dev8 , which after a couple of hours, got stuck and i had to manually restart

after examining the logs i discovered that 2.7-dev3 also crashed periodically back in august, but i didn't notice because i wasn't monitoring systemd restart counter for haproxy....

Αυγ 30 01:21:57 naboo systemd[1]: Started HAProxy Load Balancer.
Αυγ 30 01:21:58 naboo haproxy[2644061]:   call trace(11):
Αυγ 30 01:21:58 naboo haproxy[2644061]:   | 0x55c2c24f95c9 [c6 04 25 01 00 00 00 00]: main-0x4377
Αυγ 30 01:21:58 naboo haproxy[2644061]:   | 0x55c2c25383d2 [85 c0 74 22 8b 35 4c 62]: qc_send_ppkts+0x432/0x651
Αυγ 30 01:21:58 naboo haproxy[2644061]:   | 0x55c2c253cb72 [85 c0 0f 84 86 fb ff ff]: quic_conn_io_cb+0x6c2/0x10fc
Αυγ 30 01:21:58 naboo haproxy[2644061]:   | 0x55c2c268a8a2 [64 49 8b 06 8b 80 e0 00]: run_tasks_from_lists+0x232/0x939
Αυγ 30 01:21:58 naboo haproxy[2644061]:   | 0x55c2c268b32c [29 44 24 1c 8b 4c 24 1c]: process_runnable_tasks+0x37c/0x6f1
Αυγ 30 01:21:58 naboo haproxy[2644061]:   | 0x55c2c265a69d [83 3d fc 0f 3c 00 01 0f]: run_poll_loop+0x12d/0x4f7
Αυγ 30 01:21:58 naboo haproxy[2644061]:   | 0x55c2c265ac49 [48 8b 1d b0 50 16 00 4c]: main+0x15d309
Αυγ 30 01:21:58 naboo haproxy[2644061]:   | 0x7f3b478b2609 [64 48 89 04 25 30 06 00]: libpthread:+0x8609
Αυγ 30 01:21:58 naboo haproxy[2644061]:   | 0x7f3b47168133 [48 89 c7 b8 3c 00 00 00]: libc:clone+0x43/0x5e
Αυγ 30 01:21:58 naboo haproxy[2644059]: [NOTICE]   (2644059) : haproxy version is 2.7-dev3-87e95d3
Αυγ 30 01:21:58 naboo haproxy[2644059]: [NOTICE]   (2644059) : path to executable is /usr/sbin/haproxy
Αυγ 30 01:21:58 naboo haproxy[2644059]: [ALERT]    (2644059) : Current worker (2644061) exited with code 139 (Segmentation fault)
Αυγ 30 01:21:58 naboo haproxy[2644059]: [ALERT]    (2644059) : exit-on-failure: killing every processes with SIGTERM
Αυγ 30 01:21:58 naboo haproxy[2644059]: [WARNING]  (2644059) : All workers exited. Exiting... (139)

My guess is that the fix for the crash has some kind of side effect causing haproxy to get stuck

I hope this helps your investigation

@a-denoyelle
Copy link
Contributor

My guess is that the fix for the crash has some kind of side effect causing haproxy to get stuck

This may not be so simple, as QUIC implementation still evolves frequently on haproxy. Also, we have notice recently that the QUIC traffic seems to increase on haproxy.org : maybe recent browsers are more willing to use it and this may expose our QUIC stack to new traffic level.

Can you please activate and report haproxy tasks profiling please ? This will help us to detect if there is a task which is woken up too many times :

set profiling tasks on
show profiling

Also, it might be useful to record QUIC traces during a few minutes when you notice a CPU spike. Thanks for your help.

@gabrieltz
Copy link
Author

ok, so as soon as i enabled profiling it crashed but i managed to query profiling twice a few seconds before that

> show profiling
Per-task CPU profiling              : on            # set profiling tasks {on|auto|off}
Memory usage profiling              : off           # set profiling memory {on|off}
Tasks activity:
  function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
  ssl_sock_io_cb                89846   44.39s    494.1us   2.531s    28.17us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  quic_lstnr_dghdlr             79634   2.372s    29.79us   10.14s    127.4us <- quic_lstnr_dgram_dispatch@src/quic_sock.c:278 tasklet_wakeup
  h1_io_cb                      64003   1.214s    18.97us   24.80ms   387.0ns <- ssl_sock_io_cb@src/ssl_sock.c:6504 tasklet_wakeup
  process_stream                41210   1.375s    33.38us   955.3ms   23.18us <- sc_notify@src/stconn.c:1211 task_wakeup
  sc_conn_io_cb                 40717   459.3ms   11.28us   88.64ms   2.176us <- h1_wake_stream_for_recv@src/mux_h1.c:2537 tasklet_wakeup
  quic_conn_io_cb               40185   13.42s    333.9us   363.1ms   9.035us <- qc_lstnr_pkt_rcv@src/quic_conn.c:6292 tasklet_wakeup_after
  h1_io_cb                      39148   408.2ms   10.43us   54.87ms   1.401us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  quic_conn_app_io_cb           39069   1.110s    28.40us   252.2ms   6.455us <- qc_lstnr_pkt_rcv@src/quic_conn.c:6292 tasklet_wakeup_after
  sc_conn_io_cb                 38749   79.41ms   2.049us   1.394s    35.97us <- sc_app_chk_rcv_conn@src/stconn.c:764 tasklet_wakeup
  process_stream                38688   2.396s    61.93us   1.530s    39.55us <- stream_new@src/stream.c:564 task_wakeup
  ssl_sock_io_cb                31550   9.729s    308.4us   1.676s    53.13us <- ssl_sock_start@src/ssl_sock.c:5838 tasklet_wakeup
  ssl_sock_io_cb                31334   13.73ms   438.0ns   302.2ms   9.645us <- conn_subscribe@src/connection.c:732 tasklet_wakeup
  h1_timeout_task               29336   1.584s    53.99us   901.2ms   30.72us <- wake_expired_tasks@src/task.c:344 task_wakeup
  qc_process_timer              11852   16.81ms   1.418us   398.3ms   33.61us <- wake_expired_tasks@src/task.c:344 task_wakeup
  quic_conn_app_io_cb           11397   88.86ms   7.796us   1.505s    132.0us <- qc_process_timer@src/quic_conn.c:4655 tasklet_wakeup
  qc_idle_timer_task            10388   507.8ms   48.88us   312.9ms   30.12us <- wake_expired_tasks@src/task.c:344 task_wakeup
  qc_io_cb                       9496   271.0ms   28.54us   98.88ms   10.41us <- qc_snd_buf@src/mux_quic.c:2145 tasklet_wakeup
  quic_conn_app_io_cb            9438   273.9ms   29.02us   94.96ms   10.06us <- qc_xprt_start@src/xprt_quic.c:134 tasklet_wakeup
  qc_io_cb                       9438   21.38ms   2.265us   68.28ms   7.234us <- qc_init@src/mux_quic.c:2009 tasklet_wakeup
  main-0x274bd4                  9381   307.4ms   32.77us   300.1ms   31.99us <- wake_expired_tasks@src/task.c:344 task_wakeup
  quic_accept_run                9370   55.94ms   5.970us   1.525s    162.8us <- quic_accept_push_qc@src/quic_sock.c:605 tasklet_wakeup
  quic_conn_app_io_cb            9335   6.251ms   669.0ns   354.7ms   38.00us <- qc_check_close_on_released_mux@src/quic_conn.c:1593 tasklet_wakeup
  qc_io_cb                       5997   128.8ms   21.48us   365.0ms   60.86us <- qc_rcv_buf@src/mux_quic.c:2114 tasklet_wakeup
  h1_timeout_task                2875   3.047ms   1.059us   210.4ms   73.17us <- h1_release@src/mux_h1.c:1024 task_wakeup
  h1_io_cb                        866   2.523ms   2.913us   1.179ms   1.361us <- sock_conn_iocb@src/sock.c:849 tasklet_wakeup
  sc_conn_io_cb                   618   16.86ms   27.29us   1.889ms   3.056us <- h1_wake_stream_for_send@src/mux_h1.c:2547 tasklet_wakeup
  srv_cleanup_toremove_conns      507   18.31ms   36.11us   78.62ms   155.1us <- srv_cleanup_idle_conns@src/server.c:5951 task_wakeup
  process_chk                     430   33.59ms   78.11us   11.14ms   25.90us <- wake_expired_tasks@src/task.c:344 task_wakeup
  process_chk                     398   12.11ms   30.43us   2.525ms   6.343us <- wake_srv_chk@src/check.c:1053 task_wakeup
  quic_conn_io_cb                 375   12.25ms   32.66us   43.89ms   117.0us <- qc_process_timer@src/quic_conn.c:4655 tasklet_wakeup
  srv_chk_io_cb                   248   2.263ms   9.125us   476.2us   1.920us <- h1_wake_stream_for_recv@src/mux_h1.c:2537 tasklet_wakeup
  srv_chk_io_cb                   248   10.96ms   44.20us   646.3us   2.605us <- h1_wake_stream_for_send@src/mux_h1.c:2547 tasklet_wakeup
  sc_conn_io_cb                    96   2.127ms   22.16us   2.021ms   21.05us <- qcs_notify_recv@src/mux_quic.c:435 tasklet_wakeup
  srv_cleanup_idle_conns           93   1.971ms   21.20us   1.821ms   19.57us <- wake_expired_tasks@src/task.c:429 task_drop_running
  session_expire_embryonic         69   2.954ms   42.81us   1.953ms   28.30us <- wake_expired_tasks@src/task.c:344 task_wakeup
  process_resolvers                50   129.9us   2.598us   595.6us   11.91us <- wake_expired_tasks@src/task.c:429 task_drop_running
  srv_cleanup_toremove_conns       40   13.13us   328.0ns   275.7us   6.892us <- run_tasks_from_lists@src/task.c:652 task_drop_running
  process_chk                      32   77.27us   2.414us   3.990ms   124.7us <- run_tasks_from_lists@src/task.c:652 task_drop_running
  h1_io_cb                         20   14.69us   734.0ns   1.236ms   61.79us <- h1_takeover@src/mux_h1.c:4035 tasklet_wakeup
  quic_conn_app_io_cb              15   130.5us   8.702us   331.0us   22.06us <- qc_process_timer@src/quic_conn.c:4609 tasklet_wakeup
  qc_process_timer                 14   14.73us   1.052us   3.057ms   218.4us <- qc_set_timer@src/quic_conn.c:764 task_wakeup
  task_run_applet                  10   2.118ms   211.8us   21.93us   2.193us <- run_tasks_from_lists@src/task.c:652 task_drop_running
  h1_io_cb                          5   29.93us   5.985us   257.5us   51.50us <- conn_subscribe@src/connection.c:732 tasklet_wakeup
  qc_io_cb                          2   41.78us   20.89us   31.75us   15.87us <- qcc_reset_stream@src/mux_quic.c:796 tasklet_wakeup
  task_run_applet                   2   1.679ms   839.3us   7.243us   3.621us <- sc_app_chk_snd_applet@src/stconn.c:998 appctx_wakeup
  sc_conn_io_cb                     2   24.27us   12.14us   2.576us   1.288us <- sock_conn_iocb@src/sock.c:869 tasklet_wakeup
  accept_queue_process              1   15.03us   15.03us   2.037ms   2.037ms <- listener_accept@src/listener.c:1123 tasklet_wakeup
  task_run_applet                   1   269.3us   269.3us   2.013us   2.013us <- sc_app_chk_rcv_applet@src/stconn.c:971 appctx_wakeup
  other                             1      -         -      15.12us   15.12us <- qc_set_timer@src/quic_conn.c:764 task_wakeup
  task_run_applet                   1   148.5us   148.5us   8.997us   8.997us <- sc_applet_create@src/stconn.c:491 appctx_wakeup
  process_stream                    1   6.001us   6.001us   14.05us   14.05us <- wake_expired_tasks@src/task.c:344 task_wakeup
  other                             1      -         -      38.71us   38.71us <- wake_expired_tasks@src/task.c:344 task_wakeup

The thing is that haproxy doesn't dump a core file , our apps do dump cores on that machine.
I tried recompiling it with -O0 , verified that -g exists in CFLAGS , core_pattern is normal

root@naboo:~# cat /proc/sys/kernel/core_pattern 
/var/crash/core.%e

systemd unit has LimitCORE=infinity , /var/crash and /var/lib/haproxy/var/crash ( haproxy chroot ) both exist and are writable by haproxy user.
Do you have any other suggestions ?

@Tristan971
Copy link
Member

Tristan971 commented Oct 21, 2022

@gabrieltz https://docs.haproxy.org/2.6/configuration.html#set-dumpable might help

Also worth checking the permissions of the chroot folder if you use it, to ensure it will allow the user haproxy workers run as to write the file out

@gabrieltz
Copy link
Author

@Tristan971 permissions was the first thing i checked, thank you for the heads up, i enabled set-dumpable in global section

@gabrieltz
Copy link
Author

here is a gdb backtrace for the latest crash.
backtrace.txt

haproxy-mirror pushed a commit that referenced this issue Oct 24, 2022
Remove ABORT_NOW() statement on unhandled sendto error. Instead use a
dedicated counter sendto_err_unknown to report these cases.

If we detect increment of this counter, strace can be used to detect
errno value :
  $ strace -p $(pidof haproxy) -f -e trace=sendto -Z

This should be backported up to 2.6.

This should help to debug github issue #1903.
@a-denoyelle
Copy link
Contributor

a-denoyelle commented Oct 24, 2022

This crash happens because sendto returned an errno value that we did not anticipate. I just push a patch to suppress ABORT_NOW statement and replace it with a dedicated counter.

Can you please update your haproxy to the latest master tip and run strace in parallel to report the errors you encounter ?
strace -p $(pidof haproxy) -f -e trace=sendto -Z

@gabrieltz
Copy link
Author

ok, deployed and running strace , what should i look for ?

haproxy-mirror pushed a commit that referenced this issue Oct 24, 2022
This patch complete the previous incomplete commit. The new counter
sendto_err_unknown is now displayed on stats page/CLI show stats.

This is related to github issue #1903.

This should be backported up to 2.6.
@a-denoyelle
Copy link
Contributor

Whoops, I made a typo on the strace command. Replace -z by -Z and it should be good (I edited my comment for the record). This will filtered syscalls that failed so you should not have too many output (at least I hope).

Besides, my previous commit was incomplete. I pushed a new patch. You can now see a new counter quic_sendto_err_unknwn on the stats page / CLI show stats. If you see that this counter is incremented, please report us the strace output.

@a-denoyelle
Copy link
Contributor

Please note that if you use the HTML stats page, you need stats show-modules on the stats frontend to see QUIC stats reported on it.

@gabrieltz
Copy link
Author

The following are from the stats page.

Total number of dropped packets	16986
Total number of dropped packets because of buffer overrun	0
Total number of dropped packets upon parsing error	69004
Total number of EAGAIN error on sendto() calls	0
Total number of error on sendto() calls, EAGAIN excepted	0
Total number of error on sendto() calls not explicitely listed	0
Total number of lost sent packets	10465
Total number of too short dgrams with Initial packets	4287
Total number of Retry sent	0
Total number of validated Retry tokens	0
Total number of Retry tokens errors	0
Total number of half open connections	151
Total number of handshake failures	0
Total number of stateless reset packet sent	0
Total number of NO_ERROR errors received	5318
Total number of INTERNAL_ERROR errors received	1
Total number of CONNECTION_REFUSED errors received	0
Total number of FLOW_CONTROL_ERROR errors received	0
Total number of STREAM_LIMIT_ERROR errors received	0
Total number of STREAM_STATE_ERROR errors received	0
Total number of FINAL_SIZE_ERROR errors received	0
Total number of FRAME_ENCODING_ERROR errors received	0
Total number of TRANSPORT_PARAMETER_ERROR errors received	0
Total number of CONNECTION_ID_LIMIT_ERROR errors received	0
Total number of PROTOCOL_VIOLATION errors received	0
Total number of INVALID_TOKEN errors received	0
Total number of APPLICATION_ERROR errors received	0
Total number of CRYPTO_BUFFER_EXCEEDED errors received	0
Total number of KEY_UPDATE_ERROR errors received	0
Total number of AEAD_LIMIT_REACHED errors received	0
Total number of NO_VIABLE_PATH errors received	0
Total number of CRYPTO_ERROR errors received	0
Total number of UNKNOWN_ERROR errors received	0
Total number of received DATA_BLOCKED frames	0
Total number of received STREAMS_BLOCKED frames	0
Total number of received STREAM_DATA_BLOCKED_BIDI frames	395
Total number of received STREAM_DATA_BLOCKED_UNI frames	0

and the compressed strace output , i already filtered out the SIGARLM signals
sendto-strace.log.gz

I hope those help ( i'm keeping strace running in case it captures something else apart from ECONNRESET and EPIPE )

@a-denoyelle
Copy link
Contributor

This is strange as both Total number of error on sendto() calls, EAGAIN excepted and Total number of error on sendto() calls not explicitely listed is 0. I do not see sendto invocation outside of QUIC code. Maybe you did not look at the correct frontend line ?

@gabrieltz
Copy link
Author

This is indeed strange
the only frontends with quic metrics are:

  1. the main one i sent you the metrics from
  2. a rarely used api listen block which listens to an irrelevant port and should not have quic and all its' metrics are 0
  3. the stats listen block which also has 0 in all quic metrics

@gabrieltz
Copy link
Author

it crashed again , this is the last strace output

[pid 2423939] sendto(1177, "\27\3\3\0R\n\304\364<\222\226\300/r\303H\372\356S\32R\331\f}J^\251H\364\216\364o"..., 87, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNRESET (Connection reset by peer)
[pid 2423936] --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=1, ptr=0x1}} ---
[pid 2423936] --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0x1, si_overrun=0, si_value={int=2, ptr=0x2}} ---
[pid 2423938] sendto(140, "\27\3\3\0R.\212(u\351\333\35\246h\7\346@\27\206F\223;\263\325O\220\4\240!\306\310{"..., 87, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNRESET (Connection reset by peer)
[pid 2423939] --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0x3, si_overrun=0, si_value={int=6, ptr=0x6}} ---
[pid 2423944] --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0x5, si_overrun=0, si_value={int=4, ptr=0x4}} ---
[pid 2423936] --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0x6, si_overrun=0, si_value={int=0, ptr=NULL}} ---
[pid 2423942] --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0x7, si_overrun=0, si_value={int=5, ptr=0x5}} ---
[pid 2423944] --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0x2, si_overrun=0, si_value={int=3, ptr=0x3}} ---
[pid 2423936] --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0x4, si_overrun=0, si_value={int=7, ptr=0x7}} ---
[pid 2423943] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
[pid 2423936] ????( <unfinished ...>
[pid 2423944] +++ killed by SIGSEGV +++
[pid 2423943] +++ killed by SIGSEGV +++
[pid 2423942] +++ killed by SIGSEGV +++
[pid 2423941] +++ killed by SIGSEGV +++
[pid 2423940] +++ killed by SIGSEGV +++
[pid 2423939] +++ killed by SIGSEGV +++
[pid 2423938] +++ killed by SIGSEGV +++
+++ killed by SIGSEGV +++

i'm adding set-dumpable and rerunning it

@gabrieltz
Copy link
Author

I think i may have found what you are looking for it's EINVAL

stats

# echo "show stat" | socat /run/haproxy/admin.sock stdio | grep -P '(FRONTEND|svname)' | awk -F ',' '{print $110,$111}'
quic_sendto_err quic_sendto_err_unknwn
0 9
0 0
0 0

strace output

# grep -v -P '(SIGALRM|ECONNRESET|EPIPE|EAGAIN)' sendto-strace.log
strace: Process 2524626 attached with 8 threads
[pid 2524632] sendto(9, "\311\0\0\0\1\0\10\320\5\342\304\34267\262\0@t\275t\350eP\231\271\301\370\1\214\367\203\212"..., 1252, MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("obscured_user_ip")}, 16) = -1 EINVAL (Invalid argument)
[pid 2524632] sendto(9, "\301\0\0\0\1\0\10\320\5\342\304\34267\262\0@o\2630\324f\320\324\177\355\345\231\243\270=\2"..., 1247, MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("obscured_user_ip")}, 16) = -1 EINVAL (Invalid argument)
[pid 2524632] sendto(9, "\303\0\0\0\1\0\10\320\5\342\304\34267\262\0@o\37\345YS\22q\23[\330\271\2301\353\25"..., 1247, MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("obscured_user_ip")}, 16) = -1 EINVAL (Invalid argument)
[pid 2524632] sendto(9, "\352\0\0\0\1\0\10\320\5\342\304\34267\262D\323bx\tv#;^\253\336\16\335\231\367X\303"..., 1252, MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("obscured_user_ip")}, 16) = -1 EINVAL (Invalid argument)
[pid 2524632] sendto(9, "\313\0\0\0\1\0\10\320\5\342\304\34267\262\0\33@\265\225\335\344\310\10\230\236O\273\34\337\265,"..., 44, MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("obscured_user_ip")}, 16) = -1 EINVAL (Invalid argument)
[pid 2524632] sendto(9, "\315\0\0\0\1\0\10\320\5\342\304\34267\262\0\32\312\226\376\376{\23\252\v\0\261\237\2244\21\263"..., 43, MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("obscured_user_ip")}, 16) = -1 EINVAL (Invalid argument)
[pid 2524632] sendto(9, "\307\0\0\0\1\0\10\320\5\342\304\34267\262\0@o\350\336\34\177\5U|\255\332>\262q\256h"..., 1247, MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("obscured_user_ip")}, 16) = -1 EINVAL (Invalid argument)
[pid 2524632] sendto(9, "\347\0\0\0\1\0\10\320\5\342\304\34267\262D\323i\347K\311n\246T\35l\317\35\242\220\270\300"..., 1252, MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("obscured_user_ip")}, 16) = -1 EINVAL (Invalid argument)
[pid 2524632] sendto(9, "\317\0\0\0\1\0\10\320\5\342\304\34267\262\0\36#\265\312\263\353\363\234\342\335\310\34\217\17\232\302"..., 47, MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("obscured_user_ip")}, 16) = -1 EINVAL (Invalid argument)

File descriptor 9 is ( lsof output ):

haproxy 2524626 haproxy    9u     IPv4         2778338644      0t0        UDP *:443 

@a-denoyelle
Copy link
Contributor

It seems dst_port is 0 which is the reason of EINVAL for sendto. Good catch thanks !

@gabrieltz
Copy link
Author

you're welcome :)

@a-denoyelle
Copy link
Contributor

@gabrieltz I have inspected the code but I did not see a good explanation for a sendto invocation with a nul port. One possibility though would be that we have previously received a datagram with a nul port.

To confirm this hypothesis, can you please my branch ade-quic-g1903 on the following repository :
https://github.com/haproxytech/quic-dev/tree/ade-quic-g1903

I fixed some issues I have noticed on datagram reception. Its last commit will simply ignore such datagrams on reception. Also, ctr0 counter will be incremented if this case happens. You can display its value using show activity on the CLI. Please report to me if you see that its value has been changed.

If this scenario is confirmed, we can then move on from this problem and refocus our attention on the CPU consumption you first mentioned. Thanks.

@gabrieltz
Copy link
Author

@a-denoyelle it fails to compile

cc -Iinclude  -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv  -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment     -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS  -DUSE_EPOLL  -DUSE_NETFILTER   -DUSE_PCRE2 -DUSE_PCRE2_JIT -DUSE_POLL -DUSE_THREAD  -DUSE_BACKTRACE   -DUSE_TPROXY -DUSE_LINUX_TPROXY -DUSE_LINUX_SPLICE -DUSE_LIBCRYPT -DUSE_CRYPT_H  -DUSE_GETADDRINFO -DUSE_OPENSSL  -DUSE_ACCEPT4  -DUSE_ZLIB  -DUSE_CPU_AFFINITY -DUSE_TFO -DUSE_NS -DUSE_DL -DUSE_RT    -DUSE_SYSTEMD  -DUSE_PRCTL  -DUSE_THREAD_DUMP   -DUSE_QUIC -DUSE_PROMEX  -DUSE_SHM_OPEN  -I/opt/quictls/include -DUSE_PCRE2 -DPCRE2_CODE_UNIT_WIDTH=8  -I/usr/include  -DCONFIG_HAPROXY_VERSION=\"2.7-dev8\" -DCONFIG_HAPROXY_DATE=\"2022/10/14\" -c -o src/xprt_quic.o src/xprt_quic.c
src/quic_sock.c: In function ‘quic_recv’:
src/quic_sock.c:381:19: error: ‘struct activity’ has no member named ‘ctr0’
  381 |    ++activity[tid].ctr0;
      |                   ^
compilation terminated due to -Wfatal-errors.
make: *** [Makefile:1016: src/quic_sock.o] Error 1
make: *** Waiting for unfinished jobs....

@gabrieltz
Copy link
Author

nevermind the above commend, it just needed -DDEBUG_DEV , it's up and running

@gabrieltz
Copy link
Author

I have the latest core dump backtrace compiled from branch ade-quic-g1903
and commit 33f0061a53836d09b14106733f135d8518ed5b25

backtrace-latest.txt

I was polling ctr0 with show activity every 10 seconds but i didn't get any non-zero values for any thread

@gabrieltz
Copy link
Author

Regarding the memory drop at 4GB you observed on your graph, it was because show pools reported memory usage as a 32-bit integer. Willy has just pushed a fix.

The 4Gb roll is happening again :)

Screenshot from 2022-12-17 09-52-55

@wtarreau
Copy link
Member

Hmmm can you please share the output of "show pools" ?

@gabrieltz
Copy link
Author

I didn't record it the show pools output this morning that i reported the issue but i remember it being 9Gb due to the buffer pool leak i mentioned in #1964 , i have restarted two instances to the latest just now, i'll keep the third running with the old leaky version until tomorrow and post the output , it's currently at 3.5 gigs.
Keep in mind that i'm talking about the values in prometheus metrics
haproxy_process_allocated_bytes and haproxy_process_pool_used_bytes , both went to zero without a restart as you can see in the graph ( orange line in right Y axis is uptime )

@wtarreau
Copy link
Member

Ah I didn't understand it was in prometheus. I don't know about it, I only fixed the "show pools" output which was using 32-bit ints. It's possible that prometheus does the same, we'll have to check that with @capflam and @wdauchy as I'm totally ignorant of this area yet.

@gabrieltz
Copy link
Author

yes, the problem is only in prometheus stats, it's helpful to set alerts for those metrics before they get too high on memory constrained installations

@gabrieltz
Copy link
Author

latest 2.8-dev0-46bea1-99 crashes frequently ( about 4 times / hour )

from syslog i get

Dec 19 08:07:28 thor haproxy[190576]: corrupted double-linked list
Dec 19 08:07:29 thor haproxy[190574]: [NOTICE]   (190574) : haproxy version is 2.8-dev0-46bea1-99
Dec 19 08:07:29 thor haproxy[190574]: [NOTICE]   (190574) : path to executable is /usr/sbin/haproxy
Dec 19 08:07:29 thor haproxy[190574]: [ALERT]    (190574) : Current worker (190576) exited with code 134 (Aborted)
Dec 19 08:07:29 thor haproxy[190574]: [ALERT]    (190574) : exit-on-failure: killing every processes with SIGTERM
Dec 19 08:07:29 thor haproxy[190574]: [WARNING]  (190574) : All workers exited. Exiting... (134)
short backtrace
ore was generated by `/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f8acf1ba700 (LWP 190582))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f8ae0922859 in __GI_abort () at abort.c:79
#2  0x00007f8ae098d26e in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f8ae0ab7298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007f8ae09952fc in malloc_printerr (str=str@entry=0x7f8ae0ab546a "corrupted double-linked list") at malloc.c:5347
#4  0x00007f8ae099594c in unlink_chunk (p=p@entry=0x7f8ab1249870, av=0x7f8ab0000020) at malloc.c:1460
#5  0x00007f8ae0995a7c in malloc_consolidate (av=av@entry=0x7f8ab0000020) at malloc.c:4494
#6  0x00007f8ae0997c83 in _int_malloc (av=av@entry=0x7f8ab0000020, bytes=bytes@entry=16392) at malloc.c:3699
#7  0x00007f8ae099a299 in __GI___libc_malloc (bytes=16392) at malloc.c:3066
#8  0x000055630ffc5ffe in pool_alloc_area (size=<optimized out>) at include/haproxy/pool-os.h:37
#9  pool_get_from_os (pool=0x55631169d7c0) at src/pool.c:347
#10 0x000055630ffc616d in pool_alloc_nocache (pool=pool@entry=0x55631169d7c0) at src/pool.c:379
#11 0x000055630ffc6b64 in __pool_alloc (pool=0x55631169d7c0, flags=flags@entry=1) at src/pool.c:749
#12 0x000055630feae792 in h1_get_buf (bptr=0x7f8ab27b8650, h1c=0x7f8ab27b8630) at src/mux_h1.c:499
#13 h1_recv (h1c=0x7f8ab27b8630) at src/mux_h1.c:2783
#14 h1_io_cb (t=0x7f8ab1705500, ctx=0x7f8ab27b8630, state=<optimized out>) at src/mux_h1.c:3174
#15 0x000055630ffafb4e in run_tasks_from_lists (budgets=<optimized out>) at src/task.c:596
#16 0x000055630ffb0616 in process_runnable_tasks () at src/task.c:861
#17 0x000055630ff7f1aa in run_poll_loop () at src/haproxy.c:2913
#18 0x000055630ff7f789 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
#19 0x00007f8ae0fc4609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#20 0x00007f8ae0a1f133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
full backtrace
(gdb) thread apply all backtrace full

Thread 8 (Thread 0x7f8ae0763700 (LWP 190577)):
#0  0x00007f8ae0a1f46e in epoll_wait (epfd=53, events=0x7f8ad8026a70, maxevents=200, timeout=timeout@entry=49) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x000055630fe17f8e in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
        timeout = 49
        status = <optimized out>
        fd = <optimized out>
        count = <optimized out>
        updt_idx = <optimized out>
        wait_time = 49
        old_fd = <optimized out>
#2  0x000055630ff7f16a in run_poll_loop () at src/haproxy.c:2984
        next = <optimized out>
        wake = <optimized out>
        __func__ = <optimized out>
#3  0x000055630ff7f789 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}
        init_cond = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No}
#4  0x00007f8ae0fc4609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140234448058112, 5892746499953637785, 140723691638014, 140723691638015, 140723691638016, 140234448014720, -5849334431455004263, -5849335610127697511}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#5  0x00007f8ae0a1f133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 7 (Thread 0x7f8adce5a700 (LWP 190578)):
#0  0x00007f8ae0a1f46e in epoll_wait (epfd=56, events=0x7f8ac8026380, maxevents=200, timeout=timeout@entry=47) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 1
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x000055630fe17f8e in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
        timeout = 47
        status = <optimized out>
        fd = <optimized out>
        count = <optimized out>
        updt_idx = <optimized out>
        wait_time = 47
        old_fd = <optimized out>
#2  0x000055630ff7f16a in run_poll_loop () at src/haproxy.c:2984
        next = <optimized out>
        wake = <optimized out>
        __func__ = <optimized out>
#3  0x000055630ff7f789 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}
        init_cond = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No}
#4  0x00007f8ae0fc4609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140234388252416, 5892746499953637785, 140723691638014, 140723691638015, 140723691638016, 140234388209024, -5849467342976075367, -5849335610127697511}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#5  0x00007f8ae0a1f133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 6 (Thread 0x7f8ace9b9700 (LWP 190583)):
#0  0x00007f8ae0a1f46e in epoll_wait (epfd=65, events=0x7f8ab8026380, maxevents=200, timeout=timeout@entry=49) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x000055630fe17f8e in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
        timeout = 49
        status = <optimized out>
        fd = <optimized out>
        count = <optimized out>
        updt_idx = <optimized out>
        wait_time = 49
        old_fd = <optimized out>
#2  0x000055630ff7f16a in run_poll_loop () at src/haproxy.c:2984
        next = <optimized out>
        wake = <optimized out>
        __func__ = <optimized out>
#3  0x000055630ff7f789 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}
        init_cond = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No}
#4  0x00007f8ae0fc4609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140234148517632, 5892746499953637785, 140723691638014, 140723691638015, 140723691638016, 140234148474240, -5849436435854542439, -5849335610127697511}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#5  0x00007f8ae0a1f133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 5 (Thread 0x7f8ad09bd700 (LWP 190579)):
#0  0x00007f8ae0a1f46e in epoll_wait (epfd=59, events=0x7f8ac0026380, maxevents=200, timeout=timeout@entry=51) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x000055630fe17f8e in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
        timeout = 51
        status = <optimized out>
        fd = <optimized out>
        count = <optimized out>
        updt_idx = <optimized out>
        wait_time = 51
        old_fd = <optimized out>
#2  0x000055630ff7f16a in run_poll_loop () at src/haproxy.c:2984
        next = <optimized out>
        wake = <optimized out>
        __func__ = <optimized out>
#3  0x000055630ff7f789 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}
        init_cond = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No}
#4  0x00007f8ae0fc4609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140234182088448, 5892746499953637785, 140723691638014, 140723691638015, 140723691638016, 140234182045056, -5849440831753569895, -5849335610127697511}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#5  0x00007f8ae0a1f133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 4 (Thread 0x7f8ad01bc700 (LWP 190580)):
#0  0x00007f8ae0a1f46e in epoll_wait (epfd=68, events=0x7f8abc026380, maxevents=200, timeout=timeout@entry=53) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x000055630fe17f8e in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
        timeout = 53
        status = <optimized out>
        fd = <optimized out>
        count = <optimized out>
        updt_idx = <optimized out>
        wait_time = 53
        old_fd = <optimized out>
#2  0x000055630ff7f16a in run_poll_loop () at src/haproxy.c:2984
        next = <optimized out>
        wake = <optimized out>
        __func__ = <optimized out>
#3  0x000055630ff7f789 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}
        init_cond = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No}
#4  0x00007f8ae0fc4609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140234173695744, 5892746499953637785, 140723691638014, 140723691638015, 140723691638016, 140234173652352, -5849439731705071207, -5849335610127697511}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#5  0x00007f8ae0a1f133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 3 (Thread 0x7f8ae076e980 (LWP 190576)):
#0  0x00007f8ae0a1f46e in epoll_wait (epfd=6, events=0x5563114c56d0, maxevents=200, timeout=timeout@entry=47) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x000055630fe17f8e in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
        timeout = 47
        status = <optimized out>
        fd = <optimized out>
        count = <optimized out>
        updt_idx = <optimized out>
        wait_time = 47
        old_fd = <optimized out>
#2  0x000055630ff7f16a in run_poll_loop () at src/haproxy.c:2984
        next = <optimized out>
        wake = <optimized out>
        __func__ = <optimized out>
#3  0x000055630ff7f789 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}
        init_cond = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No}
#4  0x000055630fe15675 in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3758
        err = <optimized out>
        retry = <optimized out>
        limit = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615}
        pidfd = <optimized out>
        intovf = <optimized out>
        msg = <optimized out>

Thread 2 (Thread 0x7f8acf9bb700 (LWP 190581)):
#0  0x00007f8ae0a1f46e in epoll_wait (epfd=62, events=0x7f8ac4026380, maxevents=200, timeout=timeout@entry=41) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x000055630fe17f8e in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
        timeout = 41
        status = <optimized out>
        fd = <optimized out>
        count = <optimized out>
        updt_idx = <optimized out>
        wait_time = 41
        old_fd = <optimized out>
#2  0x000055630ff7f16a in run_poll_loop () at src/haproxy.c:2984
        next = <optimized out>
        wake = <optimized out>
        __func__ = <optimized out>
#3  0x000055630ff7f789 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}
        init_cond = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No}
#4  0x00007f8ae0fc4609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140234165303040, 5892746499953637785, 140723691638014, 140723691638015, 140723691638016, 140234165259648, -5849434233610061415, -5849335610127697511}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#5  0x00007f8ae0a1f133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 1 (Thread 0x7f8acf1ba700 (LWP 190582)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
        set = {__val = {18446744067195525655, 0, 0, 5, 140233670123072, 140234156865107, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}}
        pid = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
#1  0x00007f8ae0922859 in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x7f8ab02d0450, sa_sigaction = 0x7f8ab02d0450}, sa_mask = {__val = {140234156865788, 895, 140234454631633, 140233637954640, 140234456888289, 140233641532128, 18446744073709551615, 140234156865360, 1024, 0, 93883958437055, 140233641532003, 140233641532128, 140233641548508, 0, 0}}, sa_flags = -1882724864, sa_restorer = 0x7f8ab08ac840}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007f8ae098d26e in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f8ae0ab7298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
        ap = {{gp_offset = 24, fp_offset = 32650, overflow_arg_area = 0x7f8acf1af8b0, reg_save_area = 0x7f8acf1af840}}
        fd = <optimized out>
        list = <optimized out>
        nlist = <optimized out>
        cp = <optimized out>
#3  0x00007f8ae09952fc in malloc_printerr (str=str@entry=0x7f8ae0ab546a "corrupted double-linked list") at malloc.c:5347
No locals.
#4  0x00007f8ae099594c in unlink_chunk (p=p@entry=0x7f8ab1249870, av=0x7f8ab0000020) at malloc.c:1460
        fd = <optimized out>
        bk = <optimized out>
#5  0x00007f8ae0995a7c in malloc_consolidate (av=av@entry=0x7f8ab0000020) at malloc.c:4494
        fb = 0x7f8ab0000048
        maxfb = 0x7f8ab0000078
        p = 0x7f8ab1249870
        nextp = 0x7f8ab01a2890
        unsorted_bin = 0x7f8ab0000080
        first_unsorted = <optimized out>
        nextchunk = 0x7f8ab12499d0
        size = 352
        nextsize = 144
        prevsize = <optimized out>
        nextinuse = <optimized out>
#6  0x00007f8ae0997c83 in _int_malloc (av=av@entry=0x7f8ab0000020, bytes=bytes@entry=16392) at malloc.c:3699
        nb = <optimized out>
        idx = 114
        bin = <optimized out>
        victim = <optimized out>
        size = <optimized out>
        victim_index = <optimized out>
        remainder = <optimized out>
        remainder_size = <optimized out>
        block = <optimized out>
        bit = <optimized out>
        map = <optimized out>
        fwd = <optimized out>
        bck = <optimized out>
        tcache_unsorted_count = <optimized out>
        tcache_nb = <optimized out>
        tc_idx = <optimized out>
        return_cached = <optimized out>
        __PRETTY_FUNCTION__ = "_int_malloc"
#7  0x00007f8ae099a299 in __GI___libc_malloc (bytes=16392) at malloc.c:3066
        ar_ptr = 0x7f8ab0000020
        victim = <optimized out>
        hook = <optimized out>
        tbytes = <optimized out>
        tc_idx = <optimized out>
        __PRETTY_FUNCTION__ = "__libc_malloc"
#8  0x000055630ffc5ffe in pool_alloc_area (size=<optimized out>) at include/haproxy/pool-os.h:37
No locals.
#9  pool_get_from_os (pool=0x55631169d7c0) at src/pool.c:347
        ptr = <optimized out>
#10 0x000055630ffc616d in pool_alloc_nocache (pool=pool@entry=0x55631169d7c0) at src/pool.c:379
        ptr = 0x0
#11 0x000055630ffc6b64 in __pool_alloc (pool=0x55631169d7c0, flags=flags@entry=1) at src/pool.c:749
        p = <optimized out>
        caller = 0x55630feae792 <h1_io_cb+2578>
#12 0x000055630feae792 in h1_get_buf (bptr=0x7f8ab27b8650, h1c=0x7f8ab27b8630) at src/mux_h1.c:499
        _area = <optimized out>
        _retbuf = 0x7f8ab27b8650
        buf = 0x0
        buf = <optimized out>
        _area = <optimized out>
        _retbuf = <optimized out>
#13 h1_recv (h1c=0x7f8ab27b8630) at src/mux_h1.c:2783
        conn = <optimized out>
        ret = 0
        max = <optimized out>
        flags = 0
        conn = <optimized out>
        ret = <optimized out>
        max = <optimized out>
        flags = <optimized out>
        __FUNCTION__ = "h1_recv"
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
        __x = <optimized out>
        __l = <optimized out>
#14 h1_io_cb (t=0x7f8ab1705500, ctx=0x7f8ab27b8630, state=<optimized out>) at src/mux_h1.c:3174
        conn = 0x5563118eae60
        tl = 0x7f8ab1705500
        conn_in_list = 0
        h1c = 0x7f8ab27b8630
        ret = <optimized out>
        __FUNCTION__ = "h1_io_cb"
        __lk_r = <optimized out>
        __set_r = <optimized out>
        __msk_r = <optimized out>
        ret = <optimized out>
#15 0x000055630ffafb4e in run_tasks_from_lists (budgets=<optimized out>) at src/task.c:596
        process = <optimized out>
        tl_queues = <optimized out>
        t = 0x7f8ab1705500
        budget_mask = <optimized out>
        profile_entry = 0x0
        done = <optimized out>
        queue = <optimized out>
        state = <optimized out>
        ctx = <optimized out>
        __func__ = <optimized out>
#16 0x000055630ffb0616 in process_runnable_tasks () at src/task.c:861
        tt = 0x556310314380 <ha_thread_ctx+2304>
        lrq = <optimized out>
        grq = <optimized out>
        t = <optimized out>
        max = {91, 0, 0, 0}
        max_total = <optimized out>
        tmp_list = <optimized out>
        queue = <optimized out>
        max_processed = <optimized out>
        lpicked = <optimized out>
        gpicked = <optimized out>
        heavy_queued = 1
        budget = <optimized out>
        __lk_r = <optimized out>
        __set_r = <optimized out>
        __msk_r = <optimized out>
        ret = <optimized out>
#17 0x000055630ff7f1aa in run_poll_loop () at src/haproxy.c:2913
        next = <optimized out>
        wake = <optimized out>
        __func__ = <optimized out>
#18 0x000055630ff7f789 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}
        init_cond = pthread_cond_t = {Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No}
#19 0x00007f8ae0fc4609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140234156910336, 5892746499953637785, 140723691638014, 140723691638015, 140723691638016, 140234156866944, -5849433137856530023, -5849335610127697511}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#20 0x00007f8ae0a1f133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

@gabrieltz
Copy link
Author

I got another crash with BUG_ON() on another server with the same haproxy version but the core dump was unusable

Dec 19 09:48:17 sheldon haproxy[30866]: FATAL: bug condition "b_data(b) + len > b_size(b)" matched at include/haproxy/buf.h:547
Dec 19 09:48:17 sheldon haproxy[30866]:   call trace(12):
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b27a686c [c6 04 25 01 00 00 00 00]: main-0x5424
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b27a6a7e [ba 03 00 00 00 48 8d 35]: main-0x5212
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b27f3b1a [41 01 c5 49 8b 44 24 18]: main+0x47e8a
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b27f55ad [44 8b 15 14 bf 28 00 45]: qc_io_cb+0x2d/0x77a
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b29475be [64 49 8b 06 48 c7 40 20]: run_tasks_from_lists+0x14e/0x89e
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b294808c [29 44 24 1c 8b 4c 24 1c]: process_runnable_tasks+0x37c/0x6f1
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b291679a [83 3d 3f 81 3d 00 01 0f]: run_poll_loop+0x13a/0x537
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b2916d79 [48 8b 1d e0 be 17 00 4c]: main+0x16b0e9
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x7fc1f43c1609 [64 48 89 04 25 30 06 00]: libpthread:+0x8609
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x7fc1f3e1c133 [48 89 c7 b8 3c 00 00 00]: libc:clone+0x43/0x5e
Dec 19 09:48:17 sheldon haproxy[30864]: [NOTICE]   (30864) : haproxy version is 2.8-dev0-46bea1-99
Dec 19 09:48:17 sheldon haproxy[30864]: [NOTICE]   (30864) : path to executable is /usr/sbin/haproxy
Dec 19 09:48:17 sheldon haproxy[30864]: [ALERT]    (30864) : Current worker (30866) exited with code 139 (Segmentation fault)
Dec 19 09:48:17 sheldon haproxy[30864]: [ALERT]    (30864) : exit-on-failure: killing every processes with SIGTERM
Dec 19 09:48:17 sheldon haproxy[30864]: [WARNING]  (30864) : All workers exited. Exiting... (139)
Dec 19 09:48:17 sheldon systemd[1]: haproxy.service: Main process exited, code=exited, status=139/n/a
Dec 19 09:48:17 sheldon systemd[1]: haproxy.service: Failed with result 'exit-code'.
Dec 19 09:48:17 sheldon systemd[1]: haproxy.service: Scheduled restart job, restart counter is at 2.
Dec 19 09:48:17 sheldon systemd[1]: Stopped HAProxy Load Balancer.
Dec 19 09:48:17 sheldon systemd[1]: Starting HAProxy Load Balancer...
Dec 19 09:48:17 sheldon haproxy[31441]: [NOTICE]   (31441) : haproxy version is 2.8-dev0-46bea1-99

@a-denoyelle
Copy link
Contributor

About the last crash, I'm clueless for the moment. If you encounter it frequently, can you activate qmux traces please ?

In the meantime we're looking on the previous crash you reported. You mention that it occurs 4 times per hour. Do you have always have the same backtrace (with h1_get_buf in it) ?

@capflam
Copy link
Member

capflam commented Dec 20, 2022

Ah I didn't understand it was in prometheus. I don't know about it, I only fixed the "show pools" output which was using 32-bit ints. It's possible that prometheus does the same, we'll have to check that with @capflam and @wdauchy as I'm totally ignorant of this area yet.

Indeed, Prometheus exporter and the stats applets both use a 32-bits integer

@gabrieltz
Copy link
Author

About the last crash, I'm clueless for the moment. If you encounter it frequently, can you activate qmux traces please ?

In the meantime we're looking on the previous crash you reported. You mention that it occurs 4 times per hour. Do you have always have the same backtrace (with h1_get_buf in it) ?

I had rolled back to a previous version , i'll set it back to 46bea1 to get a few backtraces and compare

@gabrieltz
Copy link
Author

syslog holds all kinds of errors when i was running 46bea1 , they look kind of random , unless someone was playing
against this particular server, I tried to remove duplicates, also note that there were more crashes in between without any message. I hope those help until i get a few core dumps

Dec 18 01:25:37 sheldon haproxy[3581114]: double free or corruption (!prev)
Dec 18 09:29:15 sheldon haproxy[3640982]: free(): invalid next size (fast)
Dec 18 11:52:52 sheldon haproxy[3786258]: free(): invalid size
Dec 18 14:29:36 sheldon haproxy[3829645]: double free or corruption (out)
Dec 18 20:35:27 sheldon haproxy[3977980]: free(): invalid next size (fast)

Dec 19 08:27:46 sheldon haproxy[3986656]: FATAL: bug condition "qcs->tx.sent_offset < base_off" matched at src/mux_quic.c:1267
Dec 19 08:27:46 sheldon haproxy[3986656]:   call trace(11):
Dec 19 08:27:46 sheldon haproxy[3986656]:   | 0x55e1915f8af6 [c6 04 25 01 00 00 00 00]: main-0x519a
Dec 19 08:27:46 sheldon haproxy[3986656]:   | 0x55e191645b1a [41 01 c5 49 8b 44 24 18]: main+0x47e8a
Dec 19 08:27:46 sheldon haproxy[3986656]:   | 0x55e1916475ad [44 8b 15 14 bf 28 00 45]: qc_io_cb+0x2d/0x77a
Dec 19 08:27:46 sheldon haproxy[3986656]:   | 0x55e1917995be [64 49 8b 06 48 c7 40 20]: run_tasks_from_lists+0x14e/0x89e
Dec 19 08:27:46 sheldon haproxy[3986656]:   | 0x55e19179a08c [29 44 24 1c 8b 4c 24 1c]: process_runnable_tasks+0x37c/0x6f1
Dec 19 08:27:46 sheldon haproxy[3986656]:   | 0x55e19176879a [83 3d 3f 81 3d 00 01 0f]: run_poll_loop+0x13a/0x537
Dec 19 08:27:46 sheldon haproxy[3986656]:   | 0x55e191768d79 [48 8b 1d e0 be 17 00 4c]: main+0x16b0e9
Dec 19 08:27:46 sheldon haproxy[3986656]:   | 0x7fdda53c8609 [64 48 89 04 25 30 06 00]: libpthread:+0x8609
Dec 19 08:27:46 sheldon haproxy[3986656]:   | 0x7fdda4e23133 [48 89 c7 b8 3c 00 00 00]: libc:clone+0x43/0x5e

Dec 19 09:08:57 sheldon haproxy[16272]: malloc(): unsorted double linked list corrupted

Dec 19 09:48:17 sheldon haproxy[30866]: FATAL: bug condition "b_data(b) + len > b_size(b)" matched at include/haproxy/buf.h:547
Dec 19 09:48:17 sheldon haproxy[30866]:   call trace(12):
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b27a686c [c6 04 25 01 00 00 00 00]: main-0x5424
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b27a6a7e [ba 03 00 00 00 48 8d 35]: main-0x5212
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b27f3b1a [41 01 c5 49 8b 44 24 18]: main+0x47e8a
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b27f55ad [44 8b 15 14 bf 28 00 45]: qc_io_cb+0x2d/0x77a
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b29475be [64 49 8b 06 48 c7 40 20]: run_tasks_from_lists+0x14e/0x89e
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b294808c [29 44 24 1c 8b 4c 24 1c]: process_runnable_tasks+0x37c/0x6f1
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b291679a [83 3d 3f 81 3d 00 01 0f]: run_poll_loop+0x13a/0x537
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x5588b2916d79 [48 8b 1d e0 be 17 00 4c]: main+0x16b0e9
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x7fc1f43c1609 [64 48 89 04 25 30 06 00]: libpthread:+0x8609
Dec 19 09:48:17 sheldon haproxy[30866]:   | 0x7fc1f3e1c133 [48 89 c7 b8 3c 00 00 00]: libc:clone+0x43/0x5e

Dec 19 10:20:18 sheldon haproxy[41244]: corrupted double-linked list
Dec 19 11:31:51 sheldon haproxy[60237]: malloc(): unsorted double linked list corrupted

Dec 19 13:46:49 sheldon haproxy[98999]: FATAL: bug condition "stream->release" matched at src/quic_stream.c:67
Dec 19 13:46:49 sheldon haproxy[98999]:   call trace(12):
Dec 19 13:46:49 sheldon haproxy[98999]:   | 0x55e8a30e92f8 [c6 04 25 01 00 00 00 00]: main-0x4998
Dec 19 13:46:49 sheldon haproxy[98999]:   | 0x55e8a3123456 [48 8b 3d 1b 3e 2b 00 48]: quic_cstream_free+0x26/0x41
Dec 19 13:46:49 sheldon haproxy[98999]:   | 0x55e8a3123e0e [8b 0d 14 ee 29 00 85 c9]: quic_conn_release+0x77e/0xf68
Dec 19 13:46:49 sheldon haproxy[98999]:   | 0x55e8a3132629 [81 e5 00 08 00 00 8b 05]: qc_idle_timer_task+0x49/0x173
Dec 19 13:46:49 sheldon haproxy[98999]:   | 0x55e8a328987a [49 89 c7 eb 0f 90 4c 89]: run_tasks_from_lists+0x40a/0x89e
Dec 19 13:46:49 sheldon haproxy[98999]:   | 0x55e8a328a08c [29 44 24 1c 8b 4c 24 1c]: process_runnable_tasks+0x37c/0x6f1
Dec 19 13:46:49 sheldon haproxy[98999]:   | 0x55e8a325879a [83 3d 3f 81 3d 00 01 0f]: run_poll_loop+0x13a/0x537
Dec 19 13:46:49 sheldon haproxy[98999]:   | 0x55e8a3258d79 [48 8b 1d e0 be 17 00 4c]: main+0x16b0e9
Dec 19 13:46:49 sheldon haproxy[98999]:   | 0x7fd3a10c7609 [64 48 89 04 25 30 06 00]: libpthread:+0x8609
Dec 19 13:46:49 sheldon haproxy[98999]:   | 0x7fd3a0b22133 [48 89 c7 b8 3c 00 00 00]: libc:clone+0x43/0x5e

@gabrieltz
Copy link
Author

here is the first crash. In this case there was no syslog entries like the ones above. This was the most frequent crash.

short backtrace
Core was generated by `/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f735c414395 in OPENSSL_cleanse () from /opt/quictls1/lib/libcrypto.so.81.1.1
[Current thread is 1 (Thread 0x7f734a1b8700 (LWP 339476))]
(gdb) bt
#0  0x00007f735c414395 in OPENSSL_cleanse () from /opt/quictls1/lib/libcrypto.so.81.1.1
#1  0x00007f735c3713a3 in CRYPTO_secure_clear_free () from /opt/quictls1/lib/libcrypto.so.81.1.1
#2  0x00007f735c33ad29 in ecx_free () from /opt/quictls1/lib/libcrypto.so.81.1.1
#3  0x00007f735c3630ba in EVP_PKEY_free () from /opt/quictls1/lib/libcrypto.so.81.1.1
#4  0x00007f735c505055 in ssl3_free () from /opt/quictls1/lib/libssl.so.81.1.1
#5  0x00007f735c517726 in SSL_free () from /opt/quictls1/lib/libssl.so.81.1.1
#6  0x00005604f14dab11 in quic_conn_release (qc=<optimized out>) at src/quic_conn.c:5018
#7  0x00005604f14e9629 in qc_idle_timer_task (t=<optimized out>, ctx=0x7f733458d330, state=<optimized out>) at src/quic_conn.c:5116
#8  0x00005604f164087a in run_tasks_from_lists (budgets=<optimized out>) at src/task.c:634
#9  0x00005604f164108c in process_runnable_tasks () at src/task.c:861
#10 0x00005604f160f79a in run_poll_loop () at src/haproxy.c:2913
#11 0x00005604f160fd79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
#12 0x00007f735c579609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#13 0x00007f735bfd4133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
backtrace full
Core was generated by `/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f735c414395 in OPENSSL_cleanse () from /opt/quictls1/lib/libcrypto.so.81.1.1
[Current thread is 1 (Thread 0x7f734a1b8700 (LWP 339476))]
(gdb) thread apply all bt full

Thread 8 (Thread 0x7f734b1ba700 (LWP 339474)):
#0  0x00007f735bfd446e in epoll_wait (epfd=64, events=0x7f7338029b80, maxevents=200, timeout=timeout@entry=20) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00005604f14a8f8d in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
        timeout = 20
        status = <optimized out>
        fd = <optimized out>
        count = <optimized out>
        updt_idx = <optimized out>
        wait_time = 20
        old_fd = <optimized out>
#2  0x00005604f160f75a in run_poll_loop () at src/haproxy.c:2984
        next = <optimized out>
        wake = <optimized out>
        __func__ = <optimized out>
#3  0x00005604f160fd79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {{__wseq = 45, __wseq32 = {__low = 45, __high = 0}}, {__g1_start = 33, __g1_start32 = {__low = 33, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 24, __wrefs = 0, __g_signals = {0, 0}}, __size = "-\000\000\000\000\000\000\000!", '\000' <repeats 23 times>, "\030", '\000' <repeats 14 times>, __align = 45}
#4  0x00007f735c579609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140133158070016, -4806719819869155237, 140723357669694, 140723357669695, 140723357669696, 140133158026624, 4876712299045228635, 4876731428063973467}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#5  0x00007f735bfd4133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 7 (Thread 0x7f734a9b9700 (LWP 339475)):
#0  0x00007f735bfd446e in epoll_wait (epfd=58, events=0x7f7340029b80, maxevents=200, timeout=timeout@entry=20) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00005604f14a8f8d in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
        timeout = 20
        status = <optimized out>
        fd = <optimized out>
        count = <optimized out>
        updt_idx = <optimized out>
        wait_time = 20
        old_fd = <optimized out>
#2  0x00005604f160f75a in run_poll_loop () at src/haproxy.c:2984
        next = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging-- 
        wake = <optimized out>
        __func__ = <optimized out>
#3  0x00005604f160fd79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {{__wseq = 45, __wseq32 = {__low = 45, __high = 0}}, {__g1_start = 33, __g1_start32 = {__low = 33, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 24, __wrefs = 0, __g_signals = {0, 0}}, __size = "-\000\000\000\000\000\000\000!", '\000' <repeats 23 times>, "\030", '\000' <repeats 14 times>, __align = 45}
#4  0x00007f735c579609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140133149677312, -4806719819869155237, 140723357669694, 140723357669695, 140723357669696, 140133149633920, 4876709001047216219, 4876731428063973467}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#5  0x00007f735bfd4133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 6 (Thread 0x7f735bd18700 (LWP 339470)):
#0  0x00007f735bfd446e in epoll_wait (epfd=49, events=0x7f735402a270, maxevents=200, timeout=timeout@entry=24) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00005604f14a8f8d in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
        timeout = 24
        status = <optimized out>
        fd = <optimized out>
        count = <optimized out>
        updt_idx = <optimized out>
        wait_time = 24
        old_fd = <optimized out>
#2  0x00005604f160f75a in run_poll_loop () at src/haproxy.c:2984
        next = <optimized out>
        wake = <optimized out>
        __func__ = <optimized out>
#3  0x00005604f160fd79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {{__wseq = 45, __wseq32 = {__low = 45, __high = 0}}, {__g1_start = 33, __g1_start32 = {__low = 33, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 24, __wrefs = 0, __g_signals = {0, 0}}, __size = "-\000\000\000\000\000\000\000!", '\000' <repeats 23 times>, "\030", '\000' <repeats 14 times>, __align = 45}
#4  0x00007f735c579609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140133438424832, -4806719819869155237, 140723357669694, 140723357669695, 140723357669696, 140133438381440, 4876745749324271707, 4876731428063973467}, m--Type <RET> for more, q to quit, c to continue without paging--
ask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#5  0x00007f735bfd4133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 5 (Thread 0x7f734b9bb700 (LWP 339473)):
#0  0x00007f735bfd446e in epoll_wait (epfd=55, events=0x7f733c029b80, maxevents=200, timeout=timeout@entry=15) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00005604f14a8f8d in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
        timeout = 15
        status = <optimized out>
        fd = <optimized out>
        count = <optimized out>
        updt_idx = <optimized out>
        wait_time = 15
        old_fd = <optimized out>
#2  0x00005604f160f75a in run_poll_loop () at src/haproxy.c:2984
        next = <optimized out>
        wake = <optimized out>
        __func__ = <optimized out>
#3  0x00005604f160fd79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {{__wseq = 45, __wseq32 = {__low = 45, __high = 0}}, {__g1_start = 33, __g1_start32 = {__low = 33, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 24, __wrefs = 0, __g_signals = {0, 0}}, __size = "-\000\000\000\000\000\000\000!", '\000' <repeats 23 times>, "\030", '\000' <repeats 14 times>, __align = 45}
#4  0x00007f735c579609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140133166462720, -4806719819869155237, 140723357669694, 140723357669695, 140723357669696, 140133166419328, 4876711186111828059, 4876731428063973467}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#5  0x00007f735bfd4133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 4 (Thread 0x7f734c1bc700 (LWP 339472)):
#0  0x00007f735bfd446e in epoll_wait (epfd=52, events=0x7f7344029b80, maxevents=200, timeout=timeout@entry=21) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00005604f14a8f8d in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
        timeout = 21
        status = <optimized out>
        fd = <optimized out>
        count = <optimized out>
        updt_idx = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        wait_time = 21
        old_fd = <optimized out>
#2  0x00005604f160f75a in run_poll_loop () at src/haproxy.c:2984
        next = <optimized out>
        wake = <optimized out>
        __func__ = <optimized out>
#3  0x00005604f160fd79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {{__wseq = 45, __wseq32 = {__low = 45, __high = 0}}, {__g1_start = 33, __g1_start32 = {__low = 33, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 24, __wrefs = 0, __g_signals = {0, 0}}, __size = "-\000\000\000\000\000\000\000!", '\000' <repeats 23 times>, "\030", '\000' <repeats 14 times>, __align = 45}
#4  0x00007f735c579609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140133174855424, -4806719819869155237, 140723357669694, 140723357669695, 140723357669696, 140133174812032, 4876696891923796059, 4876731428063973467}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#5  0x00007f735bfd4133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 3 (Thread 0x7f734c9bd700 (LWP 339471)):
#0  0x00007f735bfd446e in epoll_wait (epfd=69, events=0x7f732c029b80, maxevents=200, timeout=timeout@entry=21) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00005604f14a8f8d in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
        timeout = 21
        status = <optimized out>
        fd = <optimized out>
        count = <optimized out>
        updt_idx = <optimized out>
        wait_time = 21
        old_fd = <optimized out>
#2  0x00005604f160f75a in run_poll_loop () at src/haproxy.c:2984
        next = <optimized out>
        wake = <optimized out>
        __func__ = <optimized out>
#3  0x00005604f160fd79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {{__wseq = 45, __wseq32 = {__low = 45, __high = 0}}, {__g1_start = 33, __g1_start32 = {__low = 33, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 24, __wrefs = 0, __g_signals = {0, 0}}, __size = "-\000\000\000\000\000\000\000!", '\000' <repeats 23 times>, "\030", '\000' <repeats 14 times>, __align = 45}
--Type <RET> for more, q to quit, c to continue without paging--
#4  0x00007f735c579609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140133183248128, -4806719819869155237, 140723357669694, 140723357669695, 140723357669696, 140133183204736, 4876695791875297371, 4876731428063973467}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#5  0x00007f735bfd4133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 2 (Thread 0x7f735bd23980 (LWP 339469)):
#0  0x00007f735bfd446e in epoll_wait (epfd=6, events=0x5604f1f2a5b0, maxevents=200, timeout=timeout@entry=20) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00005604f14a8f8d in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
        timeout = 20
        status = <optimized out>
        fd = <optimized out>
        count = <optimized out>
        updt_idx = <optimized out>
        wait_time = 20
        old_fd = <optimized out>
#2  0x00005604f160f75a in run_poll_loop () at src/haproxy.c:2984
        next = <optimized out>
        wake = <optimized out>
        __func__ = <optimized out>
#3  0x00005604f160fd79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {{__wseq = 45, __wseq32 = {__low = 45, __high = 0}}, {__g1_start = 33, __g1_start32 = {__low = 33, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 24, __wrefs = 0, __g_signals = {0, 0}}, __size = "-\000\000\000\000\000\000\000!", '\000' <repeats 23 times>, "\030", '\000' <repeats 14 times>, __align = 45}
#4  0x00005604f14a6675 in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3758
        err = <optimized out>
        retry = <optimized out>
        limit = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615}
        pidfd = <optimized out>
        intovf = <optimized out>
        msg = <optimized out>


Thread 1 (Thread 0x7f734a1b8700 (LWP 339476)):
#0  0x00007f735c414395 in OPENSSL_cleanse () from /opt/quictls1/lib/libcrypto.so.81.1.1
No symbol table info available.
#1  0x00007f735c3713a3 in CRYPTO_secure_clear_free () from /opt/quictls1/lib/libcrypto.so.81.1.1
No symbol table info available.
#2  0x00007f735c33ad29 in ecx_free () from /opt/quictls1/lib/libcrypto.so.81.1.1
No symbol table info available.
#3  0x00007f735c3630ba in EVP_PKEY_free () from /opt/quictls1/lib/libcrypto.so.81.1.1
--Type <RET> for more, q to quit, c to continue without paging--
No symbol table info available.
#4  0x00007f735c505055 in ssl3_free () from /opt/quictls1/lib/libssl.so.81.1.1
No symbol table info available.
#5  0x00007f735c517726 in SSL_free () from /opt/quictls1/lib/libssl.so.81.1.1
No symbol table info available.
#6  0x00005604f14dab11 in quic_conn_release (qc=<optimized out>) at src/quic_conn.c:5018
        i = <optimized out>
        conn_ctx = 0x7f7334efec90
        node = <optimized out>
        app_tls_ctx = <optimized out>
        pkt = <optimized out>
        pktback = <optimized out>
        __FUNCTION__ = "quic_conn_release"
#7  0x00005604f14e9629 in qc_idle_timer_task (t=<optimized out>, ctx=0x7f733458d330, state=<optimized out>) at src/quic_conn.c:5116
        qc = 0x7f733458d330
        prx_counters = 0x5604f1e960c8
        qc_flags = 671090716
        __FUNCTION__ = "qc_idle_timer_task"
#8  0x00005604f164087a in run_tasks_from_lists (budgets=<optimized out>) at src/task.c:634
        process = <optimized out>
        tl_queues = <optimized out>
        t = 0x5604f21c2210
        budget_mask = <optimized out>
        profile_entry = 0x0
        done = <optimized out>
        queue = <optimized out>
        state = <optimized out>
        ctx = <optimized out>
        __func__ = <optimized out>
#9  0x00005604f164108c in process_runnable_tasks () at src/task.c:861
        tt = 0x5604f19a5500 <ha_thread_ctx+2688>
        lrq = <optimized out>
        grq = <optimized out>
        t = <optimized out>
        max = {52, 39, 0, 0}
        max_total = <optimized out>
        tmp_list = <optimized out>
        queue = <optimized out>
        max_processed = <optimized out>
        lpicked = <optimized out>
        gpicked = <optimized out>
        heavy_queued = 1
        budget = <optimized out>
        __lk_r = <optimized out>
        __set_r = <optimized out>
        __msk_r = <optimized out>
        ret = <optimized out>
#10 0x00005604f160f79a in run_poll_loop () at src/haproxy.c:2913
        next = <optimized out>
        wake = <optimized out>
        __func__ = <optimized out>
#11 0x00005604f160fd79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
--Type <RET> for more, q to quit, c to continue without paging--
        ptaf = <optimized out>
        ptif = <optimized out>
        ptdf = <optimized out>
        ptff = <optimized out>
        init_left = 0
        init_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}
        init_cond = {__data = {{__wseq = 45, __wseq32 = {__low = 45, __high = 0}}, {__g1_start = 33, __g1_start32 = {__low = 33, __high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0}, __g1_orig_size = 24, __wrefs = 0, __g_signals = {0, 0}}, __size = "-\000\000\000\000\000\000\000!", '\000' <repeats 23 times>, "\030", '\000' <repeats 14 times>, __align = 45}
#12 0x00007f735c579609 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140133141284608, -4806719819869155237, 140723357669694, 140723357669695, 140723357669696, 140133141241216, 4876710101095714907, 4876731428063973467}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#13 0x00007f735bfd4133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

@a-denoyelle
Copy link
Contributor

a-denoyelle commented Dec 20, 2022

All these kinds of errors must indicate there is a memory corruption. Can you run with -dMno-merge,tag argument please ? This should highlight if there is a double-free somewhere.

@gabrieltz
Copy link
Author

so , it did crash while having this argument , should i see something special ?

there was the following in syslog :
Dec 20 15:33:26 sheldon haproxy[348872]: corrupted size vs. prev_size

short backtrace
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f8fe4c6a859 in __GI_abort () at abort.c:79
#2  0x00007f8fe4cd526e in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f8fe4dff298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007f8fe4cdd2fc in malloc_printerr (str=str@entry=0x7f8fe4dfd44d "corrupted size vs. prev_size") at malloc.c:5347
#4  0x00007f8fe4cdd96b in unlink_chunk (p=p@entry=0x7f8fbde81ab0, av=0x7f8fbc000020) at malloc.c:1454
#5  0x00007f8fe4cdee8b in _int_free (av=0x7f8fbc000020, p=0x7f8fbde81a00, have_lock=<optimized out>) at malloc.c:4342
#6  0x000055b76904dc2d in pool_free_area (size=<optimized out>, area=<optimized out>) at include/haproxy/pool-os.h:47
#7  pool_put_to_os (pool=0x55b76b37e640, ptr=<optimized out>) at src/pool.c:367
#8  0x000055b76904df9f in pool_evict_last_items (pool=0x55b76b37e640, ph=0x55b76b37e7c0, count=8) at src/pool.c:490
#9  0x000055b76904e121 in pool_evict_from_local_caches () at src/pool.c:548
#10 0x000055b768ed2196 in quic_conn_release (qc=<optimized out>) at src/quic_conn.c:5038
#11 0x000055b768ee0629 in qc_idle_timer_task (t=<optimized out>, ctx=0x7f8fbe6de300, state=<optimized out>) at src/quic_conn.c:5116
#12 0x000055b76903787a in run_tasks_from_lists (budgets=<optimized out>) at src/task.c:634
#13 0x000055b76903808c in process_runnable_tasks () at src/task.c:861
#14 0x000055b76900679a in run_poll_loop () at src/haproxy.c:2913
#15 0x000055b769006d79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
#16 0x00007f8fe530c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#17 0x00007f8fe4d67133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
full backtrace
[Current thread is 1 (Thread 0x7f8fd4b72700 (LWP 348876))]
(gdb) thread apply all bt

Thread 8 (Thread 0x7f8fd5b74700 (LWP 348874)):
#0  0x00007f8fe4d6746e in epoll_wait (epfd=49, events=0x7f8fe002a270, maxevents=200, timeout=timeout@entry=21) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x000055b768e9ff8d in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
#2  0x000055b76900675a in run_poll_loop () at src/haproxy.c:2984
#3  0x000055b769006d79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
#4  0x00007f8fe530c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#5  0x00007f8fe4d67133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 7 (Thread 0x7f8fd32ab700 (LWP 348879)):
#0  0x00007f8fe4d6746e in epoll_wait (epfd=66, events=0x7f8fc0029b80, maxevents=200, timeout=timeout@entry=30) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x000055b768e9ff8d in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
#2  0x000055b76900675a in run_poll_loop () at src/haproxy.c:2984
#3  0x000055b769006d79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
#4  0x00007f8fe530c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#5  0x00007f8fe4d67133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7f8fd4371700 (LWP 348877)):
#0  0x00007f8fe4d6746e in epoll_wait (epfd=58, events=0x7f8fc8029b80, maxevents=200, timeout=timeout@entry=9) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x000055b768e9ff8d in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
#2  0x000055b76900675a in run_poll_loop () at src/haproxy.c:2984
#3  0x000055b769006d79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
#4  0x00007f8fe530c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#5  0x00007f8fe4d67133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7f8fe4aab700 (LWP 348873)):
#0  0x00007f8fe4d6746e in epoll_wait (epfd=52, events=0x7f8fcc029b80, maxevents=200, timeout=timeout@entry=25) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x000055b768e9ff8d in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
#2  0x000055b76900675a in run_poll_loop () at src/haproxy.c:2984
#3  0x000055b769006d79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
#4  0x00007f8fe530c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#5  0x00007f8fe4d67133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f8fd3aac700 (LWP 348878)):
#0  0x00007f8fe4d6746e in epoll_wait (epfd=69, events=0x7f8fb4029b80, maxevents=200, timeout=timeout@entry=12) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x000055b768e9ff8d in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
#2  0x000055b76900675a in run_poll_loop () at src/haproxy.c:2984
#3  0x000055b769006d79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
#4  0x00007f8fe530c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#5  0x00007f8fe4d67133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f8fd5373700 (LWP 348875)):
#0  0x00007f8fe4d6746e in epoll_wait (epfd=55, events=0x7f8fc4029b80, maxevents=200, timeout=timeout@entry=26) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x000055b768e9ff8d in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
#2  0x000055b76900675a in run_poll_loop () at src/haproxy.c:2984
#3  0x000055b769006d79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
#4  0x00007f8fe530c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#5  0x00007f8fe4d67133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f8fe4ab6980 (LWP 348872)):
#0  0x00007f8fe4d6746e in epoll_wait (epfd=6, events=0x55b76b7184d0, maxevents=200, timeout=timeout@entry=23) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x000055b768e9ff8d in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
#2  0x000055b76900675a in run_poll_loop () at src/haproxy.c:2984
#3  0x000055b769006d79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
#4  0x000055b768e9d675 in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3758

Thread 1 (Thread 0x7f8fd4b72700 (LWP 348876)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f8fe4c6a859 in __GI_abort () at abort.c:79
#2  0x00007f8fe4cd526e in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f8fe4dff298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007f8fe4cdd2fc in malloc_printerr (str=str@entry=0x7f8fe4dfd44d "corrupted size vs. prev_size") at malloc.c:5347
#4  0x00007f8fe4cdd96b in unlink_chunk (p=p@entry=0x7f8fbde81ab0, av=0x7f8fbc000020) at malloc.c:1454
#5  0x00007f8fe4cdee8b in _int_free (av=0x7f8fbc000020, p=0x7f8fbde81a00, have_lock=<optimized out>) at malloc.c:4342
#6  0x000055b76904dc2d in pool_free_area (size=<optimized out>, area=<optimized out>) at include/haproxy/pool-os.h:47
#7  pool_put_to_os (pool=0x55b76b37e640, ptr=<optimized out>) at src/pool.c:367
#8  0x000055b76904df9f in pool_evict_last_items (pool=0x55b76b37e640, ph=0x55b76b37e7c0, count=8) at src/pool.c:490
#9  0x000055b76904e121 in pool_evict_from_local_caches () at src/pool.c:548
#10 0x000055b768ed2196 in quic_conn_release (qc=<optimized out>) at src/quic_conn.c:5038
#11 0x000055b768ee0629 in qc_idle_timer_task (t=<optimized out>, ctx=0x7f8fbe6de300, state=<optimized out>) at src/quic_conn.c:5116
#12 0x000055b76903787a in run_tasks_from_lists (budgets=<optimized out>) at src/task.c:634
#13 0x000055b76903808c in process_runnable_tasks () at src/task.c:861
#14 0x000055b76900679a in run_poll_loop () at src/haproxy.c:2913
#15 0x000055b769006d79 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3102
#16 0x00007f8fe530c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#17 0x00007f8fe4d67133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@wtarreau
Copy link
Member

Then it means it cannot be a double free of an element allocated from a pool, but that most likely some memory got corrupted after being freed. This caused the damage of the allocation chains that free() detected. We do have special detection for this using -dMno-merge,tag,cold-first,caller,integrity. To explain a bit:

  • no-merge indicates that pools of similar sizes must not be merged
  • tag detects incorrect frees to pools as well as out-of-bounds writes
  • cold-first prefers to reuse older objects first so that use-after-free has more chance of being detected in a cold object
  • caller stores the location of the caller in freed memory areas
  • integrity checks that a previously freed memory area was not tampered with since last call when it's either reallocated, or released to the OS.

This will consume slightly more CPU (due to integrity checks) but should remain pretty much OK. However it will increase chances that an overwritten area will be detected before being freed to the OS, and the dumps of the memory area contents may even indicate us what line of code freed it last, and help figure what is invalid (hence what field of the structure was modified).

We may then have to guide you through some commands if this triggers.

@a-denoyelle
Copy link
Contributor

I found a memory issue related to mux-quic code. I don't know if this will solve your crashes. Can you try the attached patch and report me if you see any improvment please ?

0001-BUG-MEDIUM-mux-quic-fix-double-delete-from-qcc.openi.patch.txt

@gabrieltz
Copy link
Author

Should i remove the extra arguments from haproxy command line before restarting with the patch ?

@wtarreau
Copy link
Member

It doesn't really matter, these arguments will trigger if the bug is elsewhere. They're only here to perform self-checks, they will not hide a bug. So if it works fine with the patch and the args, we know the patch is the fix. And if it doesn't work better, there are great chances for the core to be exploitable to figure where the problem is.

@gabrieltz
Copy link
Author

reached 1 hour uptime, in most cases, with this specific version, it wouldn't have exceeded 27 minutes uptime without the patch
I'm about to declare the patch successful :)

Thank you

@a-denoyelle
Copy link
Contributor

Thanks for the feedback. I will merge the patch soon.

@gabrieltz
Copy link
Author

gabrieltz commented Dec 21, 2022

it's still very stable , uptime 17 hours and no crash.

haproxy-mirror pushed a commit that referenced this issue Dec 21, 2022
qcs instances for bidirectional streams are inserted in
<qcc.opening_list>. It is removed from the list once a full HTTP request
has been parsed. This is required to implement http-request timeout.

In case a stream is deleted before receiving full HTTP request, it also
must be removed from <qcc.opening_list>. This was not the case on first
implementation but has been fixed by the following patch :
  641a65f
  BUG/MINOR: mux-quic: remove qcs from opening-list on free

This means that now a stream can be deleted from the list in two
different functions. Sadly, as LIST_DELETE was used in both cases,
nothing prevented a double-deletion from the list, even though
LIST_INLIST was used. Both calls are replaced with LIST_DEL_INIT which
is idempotent.

This bug causes memory corruption which results in most cases in a
segfault, most of times outside of mux-quic code itself. It has been
found first by gabrieltz who reported it on the github issue #1903. Big
thanks to him for his testing.

This bug also causes failures on several 'M' transfer testcase of QUIC
interop-runner. The s2n-quic client is particularly useful in this case
as segfaults triggers were most of the times on the LIST_DELETE
operation itself. This is probably due to its encapsulating of HEADERS
frame with fin bit delayed in a following empty STREAM frame.

This must be backported wherever the above patch is, up to 2.6.
@capflam
Copy link
Member

capflam commented Dec 22, 2022

Ah I didn't understand it was in prometheus. I don't know about it, I only fixed the "show pools" output which was using 32-bit ints. It's possible that prometheus does the same, we'll have to check that with @capflam and @wdauchy as I'm totally ignorant of this area yet.

Indeed, Prometheus exporter and the stats applets both use a 32-bits integer

FYI, it should be fixed in 2.8-DEV.

FireBurn pushed a commit to FireBurn/haproxy that referenced this issue Jan 19, 2023
qcs instances for bidirectional streams are inserted in
<qcc.opening_list>. It is removed from the list once a full HTTP request
has been parsed. This is required to implement http-request timeout.

In case a stream is deleted before receiving full HTTP request, it also
must be removed from <qcc.opening_list>. This was not the case on first
implementation but has been fixed by the following patch :
  641a65f
  BUG/MINOR: mux-quic: remove qcs from opening-list on free

This means that now a stream can be deleted from the list in two
different functions. Sadly, as LIST_DELETE was used in both cases,
nothing prevented a double-deletion from the list, even though
LIST_INLIST was used. Both calls are replaced with LIST_DEL_INIT which
is idempotent.

This bug causes memory corruption which results in most cases in a
segfault, most of times outside of mux-quic code itself. It has been
found first by gabrieltz who reported it on the github issue haproxy#1903. Big
thanks to him for his testing.

This bug also causes failures on several 'M' transfer testcase of QUIC
interop-runner. The s2n-quic client is particularly useful in this case
as segfaults triggers were most of the times on the LIST_DELETE
operation itself. This is probably due to its encapsulating of HEADERS
frame with fin bit delayed in a following empty STREAM frame.

This must be backported wherever the above patch is, up to 2.6.

(cherry picked from commit 15337fd)
Signed-off-by: Willy Tarreau <w@1wt.eu>
FireBurn pushed a commit to FireBurn/haproxy that referenced this issue Jan 21, 2023
qcs instances for bidirectional streams are inserted in
<qcc.opening_list>. It is removed from the list once a full HTTP request
has been parsed. This is required to implement http-request timeout.

In case a stream is deleted before receiving full HTTP request, it also
must be removed from <qcc.opening_list>. This was not the case on first
implementation but has been fixed by the following patch :
  641a65f
  BUG/MINOR: mux-quic: remove qcs from opening-list on free

This means that now a stream can be deleted from the list in two
different functions. Sadly, as LIST_DELETE was used in both cases,
nothing prevented a double-deletion from the list, even though
LIST_INLIST was used. Both calls are replaced with LIST_DEL_INIT which
is idempotent.

This bug causes memory corruption which results in most cases in a
segfault, most of times outside of mux-quic code itself. It has been
found first by gabrieltz who reported it on the github issue haproxy#1903. Big
thanks to him for his testing.

This bug also causes failures on several 'M' transfer testcase of QUIC
interop-runner. The s2n-quic client is particularly useful in this case
as segfaults triggers were most of the times on the LIST_DELETE
operation itself. This is probably due to its encapsulating of HEADERS
frame with fin bit delayed in a following empty STREAM frame.

This must be backported wherever the above patch is, up to 2.6.

(cherry picked from commit 15337fd)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 151737f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
@wtarreau wtarreau added status: fixed This issue is a now-fixed bug. and removed status: needs-triage This issue needs to be triaged. labels Apr 15, 2023
@wtarreau
Copy link
Member

This one was backported to 2.6.8 but we forgot to close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: fixed This issue is a now-fixed bug. type: bug This issue describes a bug.
Projects
None yet
Development

No branches or pull requests

8 participants