Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU load high when backend server with healthcheck down #1770

Closed
phihos opened this issue Jul 4, 2022 · 36 comments
Closed

CPU load high when backend server with healthcheck down #1770

phihos opened this issue Jul 4, 2022 · 36 comments
Labels
status: fixed This issue is a now-fixed bug. type: doc This issue is about the documentation.

Comments

@phihos
Copy link
Contributor

phihos commented Jul 4, 2022

Detailed Description of the Problem

Hello everyone,

I have a high-load (~10k reqs/s) HaProxy 2.6.1 instance with HTTP mode backends with L4 health checks. Everything works fine until a minor disturbance causes a backend to become unavailable for a few seconds. Then suddenly CPU usage skyrockets up to the point that not even the health checks themselves are performed and all backends are declared dead worsening the issue.
I also found out, that disabling the L4 check by removing the “check” parameter on each server solves the issue.

I also tried the latest 2.4 and 2.5 versions with the same result.

Expected Behavior

I expected the health check impact on CPU usage to be negligible when a backend is unavailable.

Steps to Reproduce the Behavior

I created a minimal example:

global
  log /dev/log    local0
  log /dev/log    local1 notice
  chroot /var/lib/haproxy
  user haproxy
  group haproxy
  daemon

defaults
  mode http

frontend some_frontend 
  bind *:8080
  stats enable
  stats uri /stats
  default_backend some_backend

backend some_backend
  server some_server 8.8.8.8:8000 check

Blocking the IP via iptables -A OUTPUT -p tcp --dst 8.8.8.8 --dport 8000 -j DROP simulates an outage. Bombarding HaProxy with any load testing tool like wrk reveals high CPU usage. After removing check from the config, reloading and load-testing with the same workload again the usage is way lower, almost non-existant.
I put a full Vagrant example in a repo.

Do you have any idea what may have caused this?

No response

Do you have an idea how to solve the issue?

No response

What is your configuration?

See example above.

Output of haproxy -vv

HAProxy version 2.6.1-1ppa1~focal 2022/06/22 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2027.
Known bugs: http://www.haproxy.org/bugs/bugs-2.6.1.html
Running on: Linux 5.4.0-121-generic #137-Ubuntu SMP Wed Jun 15 13:33:07 UTC 2022 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -O2 -fdebug-prefix-map=/build/haproxy-fiMVM6/haproxy-2.6.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_SYSTEMD=1 USE_PROMEX=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL +THREAD +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -ENGINE +GETADDRINFO +OPENSSL +LUA +ACCEPT4 -CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT -QUIC +PROMEX -MEMORY_PROFILING

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=2).
Built with OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
Running on OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with the Prometheus exporter as a service
Built with network namespace support.
Support for malloc_trim() is enabled.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.34 2019-11-21
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 9.4.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : prometheus-exporter
Available filters :
	[CACHE] cache
	[COMP] compression
	[FCGI] fcgi-app
	[SPOE] spoe
	[TRACE] trace

Last Outputs and Backtraces

No response

Additional Information

No response

@phihos phihos added status: needs-triage This issue needs to be triaged. type: bug This issue describes a bug. labels Jul 4, 2022
@capflam
Copy link
Member

capflam commented Jul 4, 2022

With such configuration, if the server is detected as down, an error is immediately returned for every requests. With a load testing tool, this will result to an high CPU usage because of an high request rate. But I guess it recovers if you stop wrk, right ?

Without health-check, the serer is not detected as down. Thus, for each request there is a connection attempt. There is no connection timeout. And by default, there are 3 connection retries. This drastically slow down the request rate.

@phihos
Copy link
Contributor Author

phihos commented Jul 4, 2022

Apologies if this is not a bug. I was redirected here from the forum.

But I guess it recovers if you stop wrk, right ?

Yes it does. In the productive instance though the CPU load was so big, that all backends were marked as down even if they were still available. Correlating to the almost 100% utilization of 48 virtual cores my guess is that HaProxy was not able to perform the health checks in time. Which in turn caused HaProxy to lock up for several minutes.

Is there an intended mechanism to prevent high CPU load when a backend is down?

@capflam
Copy link
Member

capflam commented Jul 5, 2022

No problem. It is not obvious it is a bug or not. I may have wrong. I'm surprised your load is so high if you have several servers to load balance the traffic. Could you share your real configuration ?

@phihos
Copy link
Contributor Author

phihos commented Jul 5, 2022

I'm surprised your load is so high if you have several servers to load balance the traffic.

Actually it is only one server per backend. The IP is a floating IP failing over to the next server in case of failure.

I extracted and redacted the haproxy.cfg. Redacted info is marked with a ?. I also changed the names of paths and domains.

global
  chroot  /var/lib/haproxy
  daemon
  group  haproxy
  log  stdout format raw local0 notice
  maxconn  400000
  nbthread  48
  pidfile  /var/run/haproxy.pid
  ssl-default-bind-options  no-sslv3 no-tlsv10 no-tlsv11
  ssl-dh-param-file  /etc/haproxy/dhparams.pem
  stats  socket /var/lib/haproxy/stats
  tune.bufsize  16384000
  tune.h2.header-table-size  14000
  tune.maxrewrite  131072
  tune.ssl.default-dh-param  2048
  user  haproxy

defaults
  default-server  ca-file /etc/ssl/certs/ca-certificates.crt
  http-reuse  always
  log  global
  option  redispatch
  option  contstats
  option  log-health-checks
  option  forwardfor
  option  dontlog-normal
  option  dontlognull
  option  splice-auto
  retries  3
  stats  enable
  timeout  http-request 10s
  timeout  queue 1m
  timeout  connect 10s
  timeout  client 1m
  timeout  server 1m
  timeout  check 10s
  timeout  http-keep-alive 1m
  timeout  connect 5s
  timeout  client 30s
  timeout  server 30s
  timeout  check 5s
  timeout  queue 5s
  timeout  client-fin 1s
  timeout  server-fin 1s

frontend origin
  bind *:8081 accept-proxy
  bind *:8444 ssl crt /etc/haproxy/ssl accept-proxy alpn h2,http/1.1 ciphersuites ???
  bind *:8445 ssl crt /etc/haproxy/ssl accept-proxy alpn h2,http/1.1 ciphersuites ???
  mode http
  
  acl company_domain                    hdr(host) -i -m beg company.com
  acl company_www_domain                hdr(host) -i -m beg www.company.com
  acl company_sub1_domain               hdr(host) -i -m beg sub1.company.com
  acl company_sub2_domain               hdr(host) -i -m beg sub2.company.com
  acl company_sub3_domain               hdr(host) -i -m beg sub3.company.com
  acl company_sub4_domain               hdr(host) -i -m beg sub4.company.com
  acl company_sub5_domain               hdr(host) -i -m beg sub5.company.com
  acl company_sub6_domain               hdr(host) -i -m beg sub6.company.com
  acl company_subdomain                 hdr_sub(host) -i .company.com
  acl company_www_subdomain             hdr_sub(host) -i .www.company.com
  acl company_sub7_subdomain            hdr_sub(host) -i .sub7.company.com
  acl company_sub8_subdomain            hdr_beg(host) -i sub8.company.com
  acl company_sub9_subdomain            hdr_beg(host) -i sub9.company.com
  acl company_path1_path                path_beg /path1
  acl company_slash_path                path_beg /
  acl company_path2_path                path_beg /path2
  acl company_path3_path                path_beg /path3
  acl company_path4_path                path_beg /path4
  acl company_path5_path                path_reg ^/.*/path5
  acl company_path6_path                path_beg /path6
  acl company_path7_path                path_beg /path7

  acl uri_too_long  url_len gt 4096
  acl greylisted src -f /etc/haproxy/maps/greylisted_subnets.txt
  acl whitelisted src -f /etc/haproxy/maps/whitelisted_subnets.txt
  acl is_json hdr_sub(Content-Type) -i application/json
  acl low_prio_queue src_http_req_rate(http-request-rate-short) ge ?
  acl low_prio_queue src_http_req_rate(http-request-rate-long) ge ?
  acl low_prio_queue_if_greylisted src_http_req_rate(http-request-rate-short) ge ?
  acl low_prio_queue_if_greylisted src_http_req_rate(http-request-rate-long) ge ?
  acl abuser src_http_req_rate(http-request-rate-short) ge ?
  acl abuser src_http_req_rate(http-request-rate-long) ge ?
  acl abuser src_http_err_rate(http-request-rate-short) ge ?
  acl abuser src_http_err_rate(http-request-rate-long) ge ?
  acl abuser src_bytes_out_rate(http-request-rate-short) ge ?
  acl abuser src_bytes_out_rate(http-request-rate-long) ge ?
  acl heavy_abuser src_http_req_rate(http-request-rate-short) ge ?
  acl heavy_abuser src_http_req_rate(http-request-rate-long) ge ?
  acl heavy_abuser src_http_err_rate(http-request-rate-short) ge ?
  acl heavy_abuser src_http_err_rate(http-request-rate-long) ge ?
  acl heavy_abuser src_bytes_out_rate(http-request-rate-short) ge ?
  acl heavy_abuser src_bytes_out_rate(http-request-rate-long) ge ?
  acl abuser_if_greylisted src_http_req_rate(http-request-rate-short) ge ?
  acl abuser_if_greylisted src_http_req_rate(http-request-rate-long) ge ?
  acl abuser_if_greylisted src_http_err_rate(http-request-rate-short) ge ?
  acl abuser_if_greylisted src_http_err_rate(http-request-rate-long) ge ?
  acl abuser_if_greylisted src_bytes_out_rate(http-request-rate-short) ge ?
  acl abuser_if_greylisted src_bytes_out_rate(http-request-rate-long) ge ?
  acl heavy_abuser_if_greylisted src_http_req_rate(http-request-rate-short) ge ?
  acl heavy_abuser_if_greylisted src_http_req_rate(http-request-rate-long) ge ?
  acl heavy_abuser_if_greylisted src_http_err_rate(http-request-rate-short) ge ?
  acl heavy_abuser_if_greylisted src_http_err_rate(http-request-rate-long) ge ?
  acl heavy_abuser_if_greylisted src_bytes_out_rate(http-request-rate-short) ge ?
  acl heavy_abuser_if_greylisted src_bytes_out_rate(http-request-rate-long) ge ?
  acl inc_abuse_cnt src_inc_gpc0(abuse) gt 0
  acl inc_abuse_heavy_cnt src_inc_gpc0(abuse-heavy) gt 0
  acl inc_abuse_but_whitelisted_cnt src_inc_gpc0(abuse-but-whitelisted) gt 0
  acl inc_abuse_heavy_but_whitelisted_cnt src_inc_gpc0(abuse-heavy-but-whitelisted) gt 0
  acl abuse_cnt src_get_gpc0(abuse) gt 0
  acl abuse_heavy_cnt src_get_gpc0(abuse-heavy) gt 0
  
  capture request header Host len 60
  capture request header User-Agent len 60
  capture request header Authorization len 60
  capture cookie session len 40
  capture response header X-Trace-ID len 60
  
  http-error status 429 content-type "text/html; charset=utf-8" lf-file /etc/haproxy/errors/429.html
  http-error status 502 content-type "text/html; charset=utf-8" lf-file /etc/haproxy/errors/5xx.html
  http-error status 503 content-type "text/html; charset=utf-8" lf-file /etc/haproxy/errors/5xx.html
  http-error status 504 content-type "text/html; charset=utf-8" lf-file /etc/haproxy/errors/5xx.html
  
  http-request track-sc0 src table http-request-rate-long
  http-request track-sc1 src table http-request-rate-short
  
  http-request capture url_param(auth_key) len 60 if { req.hdr_cnt(Authorization) eq 0 }

  http-request set-log-level notice

  http-request set-var(txn.ratelimitreason) str("?")
  http-request set-var(txn.special_treatment) int(0)
  http-request set-var(txn.special_treatment) int(1) if greylisted
  http-request set-var(txn.special_treatment) int(2) if whitelisted
  http-request set-var(txn.priority_class) int(0)
  http-request deny status 414 content-type "application/json" lf-file /etc/haproxy/errors/414.json if is_json uri_too_long
  http-request deny status 414 content-type "text/html; charset=utf-8" lf-file /etc/haproxy/errors/414.html if uri_too_long
  http-request set-var(txn.ratelimitreason) str("?")
  http-request silent-drop if abuse_heavy_cnt !whitelisted
  http-request set-var(txn.ratelimitreason) str("?")
  http-request silent-drop if heavy_abuser !whitelisted inc_abuse_heavy_cnt
  http-request silent-drop if greylisted !whitelisted heavy_abuser_if_greylisted inc_abuse_heavy_cnt
  http-request set-var(txn.ratelimitreason) str("?")
  http-request deny status 429 content-type "application/json" lf-file /etc/haproxy/errors/429.json if is_json abuse_cnt
  http-request deny status 429 content-type "text/html; charset=utf-8" lf-file /etc/haproxy/errors/429.html if abuse_cnt
  http-request set-var(txn.ratelimitreason) str("?")
  http-request deny status 429 content-type "application/json" lf-file /etc/haproxy/errors/429.json if is_json abuser !whitelisted inc_abuse_cnt
  http-request deny status 429 content-type "text/html; charset=utf-8" lf-file /etc/haproxy/errors/429.html if abuser !whitelisted inc_abuse_cnt
  http-request deny status 429 content-type "application/json" lf-file /etc/haproxy/errors/429.json if is_json greylisted !whitelisted abuser_if_greylisted inc_abuse_cnt
  http-request deny status 429 content-type "text/html; charset=utf-8" lf-file /etc/haproxy/errors/429.html if greylisted !whitelisted abuser_if_greylisted inc_abuse_cnt
  http-request set-var(txn.ratelimitreason) str("?") if abuser whitelisted inc_abuse_but_whitelisted_cnt
  http-request set-var(txn.ratelimitreason) str("?") if heavy_abuser whitelisted inc_abuse_heavy_but_whitelisted_cnt

  http-request set-log-level info

  http-request set-priority-class int(2) if low_prio_queue
  http-request set-priority-class int(2) if greylisted low_prio_queue_if_greylisted
  http-request set-priority-class int(1) if !low_prio_queue !greylisted || !low_prio_queue_if_greylisted
  http-request set-var(txn.priority_class) prio_class

  http-request redirect scheme https code 301 unless { ssl_fc }
  http-request set-header X-Forwarded-Proto https if { ssl_fc }
  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
  
  http-response set-header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload;"
  http-response set-log-level notice if { status ge 400 }
  
  log-format "%ci:%cp via %fi:%fp: %b/%s Status: %ST Cookies: %CC/%CS Termination: %ts Time: %Ta Retries: %rc Headers: %hr/%hs Request: %HV/%HM%HP Unique-ID: %[unique-id]"
  
  maxconn 150000
  unique-id-format %{+X}o\ 01_%ci:%cp_%fi:%fp_%ST_%Ts_%[var(txn.special_treatment)]_%[var(txn.priority_class)]_%[var(txn.ratelimitreason)]
  unique-id-header X-Unique-ID
  use_backend company-www if company_www_subdomain
  use_backend company-back1 if company_sub9_subdomain
  use_backend company-back2 if company_sub1_domain
  use_backend company-back3 if company_sub4_domain company_path6_path
  use_backend company-back4 if company_sub4_domain company_path7_path
  use_backend company-back5 if company_sub4_domain
  use_backend company-back6 if company_sub5_domain company_path6_path
  use_backend company-back7 if company_sub5_domain
  use_backend company-back8 if company_sub7_subdomain
  use_backend company-back9 if company_sub8_subdomain
  use_backend company-back10 if company_domain company_path1_path
  use_backend company-back11 if company_sub6_domain
  use_backend company-back12 if company_sub2_domain company_path2_path
  use_backend company-back13 if company_sub2_domain company_path3_path
  use_backend company-back14 if company_sub2_domain company_path4_path
  use_backend company-back15 if company_sub2_domain company_path5_path
  use_backend company-back16 if company_sub3_domain
  use_backend company-root if company_domain || company_subdomain

frontend stats
  bind 10.0.1.2:9000
  mode http
  http-request use-service prometheus-exporter if { path /metrics }
  http-request set-log-level silent if TRUE
  option dontlog-normal
  stats enable
  stats uri /stats
  stats refresh 10s

backend abuse
  stick-table type ip size 100K peers company_peers expire 5s store gpc0

backend abuse-but-whitelisted
  stick-table type ip size 100K peers company_peers expire 5s store gpc0

backend abuse-heavy
  stick-table type ip size 100K peers company_peers expire 30s store gpc0

backend abuse-heavy-but-whitelisted
  stick-table type ip size 100K peers company_peers expire 30s store gpc0

backend company-back7
  mode http
  http-error status 502 content-type "application/json; charset=utf-8" lf-file /etc/haproxy/errors/502.json
  http-error status 503 content-type "application/json; charset=utf-8" lf-file /etc/haproxy/errors/503.json
  http-error status 504 content-type "application/json; charset=utf-8" lf-file /etc/haproxy/errors/504.json
  server server1:444 10.0.0.1:444 alpn h2,http/1.1 check  maxconn 30 send-proxy-v2  ssl  verify none

backend company-back6
  mode http
  http-error status 502 content-type "application/json; charset=utf-8" lf-file /etc/haproxy/errors/502.json
  http-error status 503 content-type "application/json; charset=utf-8" lf-file /etc/haproxy/errors/503.json
  http-error status 504 content-type "application/json; charset=utf-8" lf-file /etc/haproxy/errors/504.json
  server server1:444 10.0.0.1:444 alpn h2,http/1.1 check  maxconn 10 send-proxy-v2  ssl  verify none

backend company-back5
  mode http
  http-error status 502 content-type "application/json; charset=utf-8" lf-file /etc/haproxy/errors/502.json
  http-error status 503 content-type "application/json; charset=utf-8" lf-file /etc/haproxy/errors/503.json
  http-error status 504 content-type "application/json; charset=utf-8" lf-file /etc/haproxy/errors/504.json
  server server1:444 10.0.0.1:444 alpn h2,http/1.1 check  maxconn 100 send-proxy-v2  ssl  verify none

backend company-back3
  mode http
  http-error status 502 content-type "application/json; charset=utf-8" lf-file /etc/haproxy/errors/502.json
  http-error status 503 content-type "application/json; charset=utf-8" lf-file /etc/haproxy/errors/503.json
  http-error status 504 content-type "application/json; charset=utf-8" lf-file /etc/haproxy/errors/504.json
  server server1:444 10.0.0.1:444 alpn h2,http/1.1 check  maxconn 50 send-proxy-v2  ssl  verify none

backend company-back4
  mode http
  http-error status 502 content-type "application/json; charset=utf-8" lf-file /etc/haproxy/errors/502.json
  http-error status 503 content-type "application/json; charset=utf-8" lf-file /etc/haproxy/errors/503.json
  http-error status 504 content-type "application/json; charset=utf-8" lf-file /etc/haproxy/errors/504.json
  server server1:444 10.0.0.1:444 alpn h2,http/1.1 check  maxconn 200 send-proxy-v2  ssl  verify none

backend company-back12
  mode http
  acl company_path2_path path_beg /path2
  http-request replace-header path ^/path2(/.*)?$ /path8\1 if company_path2_path
  http-request set-header Host hostname1.company.com
  server server2:443 hostname1.company.com:443 alpn h2,http/1.1 check  check-sni hostname1.company.com maxconn 100 sni str(hostname1.company.com) ssl  verify required

backend company-back13
  mode http
  http-request set-header Host hostname2.company.com
  server server2:443 hostname2.company.com:443 alpn h2,http/1.1 check  check-sni hostname2.company.com maxconn 30 sni str(hostname2.company.com) ssl  verify required

backend company-back14
  mode http
  http-request set-header Host hostname3.company.com
  server server2:443 hostname3.company.com:443 alpn h2,http/1.1 check  check-sni hostname3.company.com maxconn 30 sni str(hostname3.company.com) ssl  verify required

backend company-back15
  mode http
  http-request set-header Host hostname4.company.com
  server server2:443 hostname4.company.com:443 alpn h2,http/1.1 check  check-sni hostname4.company.com maxconn 90 sni str(hostname4.company.com) ssl  verify required

backend company-back2
  mode http
  http-reuse always
  server server3:80 10.0.0.3:80 check  maxconn 50

backend company-back10
  mode http
  http-reuse always
  server server1:444 10.0.0.1:444 alpn h2,http/1.1 check  maxconn 100 send-proxy-v2  ssl  verify none

backend company-root
  mode http
  http-reuse always
  server server1:444 10.0.0.1:444 alpn h2,http/1.1 check  maxconn 100 send-proxy-v2  ssl  verify none

backend company-back8
  mode http
  http-reuse always
  server server1:444 10.0.0.1:444 alpn h2,http/1.1 check  maxconn 100 send-proxy-v2  ssl  verify none

backend company-back1
  mode http
  http-reuse always
  server server1:444 10.0.0.1:444 alpn h2,http/1.1 check  maxconn 100 send-proxy-v2  ssl  verify none

backend company-back9
  mode http
  http-reuse always
  server server1:444 10.0.0.1:444 alpn h2,http/1.1 check  maxconn 100 send-proxy-v2  ssl  verify none

backend company-www
  mode http
  http-reuse always
  server server1:444 10.0.0.1:444 alpn h2,http/1.1 check  maxconn 100 send-proxy-v2  ssl  verify none

backend company-back11
  mode http
  http-reuse always
  server server2:443 hostname5.company.com:443 alpn h2,http/1.1 check  check-sni hostname5.company.com maxconn 100 sni str(hostname5.company.com) ssl  verify required

backend http-request-rate-long
  stick-table type ip size 100K expire 30s peers company_peers store http_req_rate(30s),http_err_rate(30s),bytes_out_rate(30s)

backend http-request-rate-short
  stick-table type ip size 100K expire 1s peers company_peers store http_req_rate(1s),http_err_rate(1s),bytes_out_rate(1s)

backend company-back16
  mode http
  http-request set-header Host hostname6.company.com
  http-reuse always
  server server2:443 hostname6.company.com:443 alpn h2,http/1.1 check  check-sni hostname6.company.com maxconn 100 sni str(hostname6.company.com) ssl  verify required

peers company_peers
  peer peer1.company.com 10.0.1.1:9999
  peer peer2.company.com 10.0.1.2:9999

You can see that many backends have the same servers. This is either due to monitoring reasons or I already know that these will get their own server in the future.

I already tried to contain the damage by setting maxconn and timeout queue with limited success. If any of these backends becomes unavailable CPU usage skyrockets from roughly 25% to nearly 100% utilization. Also all other backends (not only the ones pointing to the same server) become unhealthy as well.

Edit: Since I did not want to reproduce this in production I blocked off one of the servers via iptables on a test instance and wrk'd it from my home internet connection. That alone was enough to significantly raise the user CPU. When two others joined in it was enough to saturate 48 cores of a completely idle instance. That did not seem normal which is why I created this issue.

@capflam
Copy link
Member

capflam commented Jul 6, 2022

With a load testing tools, with only one server in the backends, I'm not surprised the load increases. Because HAProxy immediately returns an error, the request rate is much higher. However, this should not inhibit the health-checks. This point is very strange.

On a production traffic, I'm surprised it changes anything because the request rate should be more or less unchanged. Maybe clients are tempted to retry the requests because they receive 503 errors. But this should be performed with a delay. It may be good to check if the request rate is the same or not when it happens. And again, this should not inhibit the health-checks. Maybe I missed something. This point is very strange.

About the "solution", I guess we can add some delay before sending the 503-Service-Unavailable responses. For instance you can add a tarpit rule in your backends:

  timeout tarpit 1s
  http-request tarpit deny_status 503 if { nbsrv eq 0 }

But it is really important to understand why the load is so high when a server is detected as down.

@phihos
Copy link
Contributor Author

phihos commented Jul 6, 2022

Hey thanks for the answer.

There is an update to this situation: I thought disabling the health checks avoids HaProxy from hogging all CPUs when a backend is down. That turned out to be false just 30 mins ago. The maxconn of the frontend was reached (I underestimated the traffic) and HaProxy started queueing up connections. All CPU cores were fully utilized again.
After adjusting the maxconn values and reloading the CPU usage still was high. Restarting HaProxy did also not help. I then set nbthread from 48 to 40 in the hope that HaProxy would utilize less cores. That seemed to help: The core utilization quickly dropped to normal levels. But I am not entirely sure because I already pulled some of the traffic to another load balancer.

I have full prometheus metrics of this incident. Is there anything specific you are interested in?

@capflam
Copy link
Member

capflam commented Jul 6, 2022

The request rate is a good metric to know if clients are retrying too quickly. But If the problem exists when there is no health checks, it is probably not the issue. You can look at the number of connections in the queue. because it may be a contention issue. A perf top on HAProxy is probably a good way to validate this hypothesis (perf top -g -p PID)

@capflam
Copy link
Member

capflam commented Jul 6, 2022

At this stage, you can also get show activity output. The best is to execute the command when everything works fine and when the problem happens, 2 times separated by 5 seconds for instance.

@phihos
Copy link
Contributor Author

phihos commented Jul 6, 2022

Ok the request rate on the frontend did not rise, but fall as soon as the maxconn limit was reached:
grafik
This is the show activity output:

thread_id: 1 (1..40)
date_now: 1657096993.953662
ctxsw: 128603842 [ 20399618 2771100 2780382 2885226 2794455 2759235 2765311 2798559 2756165 2789558 2755326 2764349 2832112 2760695 2766750 2745881 2774320 2761750 2801053 2763561 2760347 2740582 2759355 2786607 2819037 2759646 2828417 2749634 2769435 2733569 2801948 2769645 2743986 2779701 2794050 2751798 2754814 2741183 2771887 2762795 ]
tasksw: 33839192 [ 10585396 592878 596953 637978 596816 592256 594162 599975 590298 594671 590248 596539 596866 594391 590450 588499 593823 590540 596737 591908 591844 586789 591241 596060 601107 592320 638785 587836 594476 588621 616237 593581 589043 604287 601788 591277 590567 585090 594191 592668 ]
empty_rq: 9708109 [ 373981 205944 235527 404667 208893 224218 216854 206838 217189 210587 204938 282841 213221 224889 209315 215264 216507 211728 223261 217597 224363 209804 196435 207501 219237 234250 434352 209394 216115 224847 388972 209021 208102 354180 312536 272885 205746 212678 214453 228979 ]
long_rq: 148 [ 8 6 2 3 4 3 5 4 3 3 4 1 2 2 2 10 6 4 9 4 2 4 8 5 1 3 2 2 4 5 2 3 1 4 5 2 1 2 4 3 ]
loops: 48889944 [ 8344079 963750 1022223 1519678 977146 987169 979553 976210 974845 976951 963324 1113612 992554 989817 968644 975408 981369 970554 992563 979053 984296 970290 953979 974435 990487 1006105 1625806 966645 981285 980979 1404974 968476 961623 1299406 1198253 1072653 964457 968264 975415 993614 ]
wake_tasks: 2252408 [ 68978 55803 56119 56630 56325 56626 56061 55786 55887 55688 55116 55590 56627 56007 55802 56055 55514 56522 56220 55047 55771 56412 55295 56052 56344 55766 58048 55674 56181 55663 56599 55472 55346 56766 56850 55593 55556 55276 55589 55752 ]
wake_signal: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
poll_io: 35387290 [ 8271052 642623 679691 1093986 651024 645835 646724 653001 641573 650524 642453 741166 664343 649310 643705 642060 648073 643468 653360 644714 643952 644070 642121 650605 655973 660609 1185196 641425 650008 639024 974771 644559 637901 884048 806833 699350 642502 638952 645437 651269 ]
poll_exp: 9210063 [ 52661 205518 232325 366734 208812 223949 216369 206472 216945 210202 204466 274051 212833 224152 209001 214930 216143 211261 223086 217381 224133 209255 195986 207269 218974 232204 388551 208998 215849 224387 360634 208533 207768 333862 299118 267825 205129 212424 214003 227870 ]
poll_drop_fd: 34097 [ 827 816 872 848 812 881 881 909 839 818 795 851 856 772 864 887 903 828 898 808 801 910 816 880 965 881 834 839 859 842 813 851 861 830 874 884 895 811 808 878 ]
poll_skip_fd: 938829 [ 23005 23370 24309 25206 23432 23789 23811 23218 24194 24258 23250 23407 22545 24252 23902 23231 23367 23235 23077 23009 23314 23187 23313 23947 23235 23734 24182 23114 22623 23626 23068 23428 22541 23130 23135 23361 23348 24150 23482 23044 ]
conn_dead: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
stream_calls: 15315844 [ 387779 381729 384269 394725 385321 381565 382722 388390 381206 383270 381285 384100 381827 383817 381200 379189 382946 380309 385210 382157 381081 377195 381700 385318 387865 382267 385740 378717 381406 378906 386387 383467 379763 383769 385081 380266 381067 376610 383073 383150 ]
pool_fail: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
buf_wait: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
cpust_ms_tot: 21585 [ 548 573 506 550 504 473 527 531 555 525 542 566 529 517 571 509 523 472 473 508 540 535 587 526 557 553 562 548 526 542 571 568 567 589 585 519 559 539 557 553 ]
cpust_ms_1s: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
cpust_ms_15s: 22 [ 0 0 0 1 0 0 0 2 5 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 1 0 5 0 0 1 1 0 0 1 0 ]
avg_loop_us: 38 [ 58 78 66 19 35 26 24 26 56 41 35 25 50 30 34 43 37 26 35 39 27 48 43 35 63 25 25 36 66 27 16 33 21 42 46 28 39 32 54 28 ]
accepted: 995004 [ 528027 2444 2505 99813 2559 2545 2213 2646 2556 2645 2593 6097 2333 2062 2587 2609 2649 2525 2413 2607 2334 2794 2624 2604 2597 2289 190763 2436 2660 2411 49929 2699 2570 25072 11929 3612 2033 2662 2491 2067 ]
accq_pushed: 995004 [ 24794 25068 24934 24457 25175 25060 24934 24729 25006 24917 24803 24605 25145 25050 24857 24993 24507 25210 25099 24631 24902 24990 24779 24881 24898 24794 25018 24898 24964 24892 24857 25079 24743 24654 24689 24705 24802 24747 24765 24973 ]
accq_full: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
accq_ring: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
fd_takeover: 954360 [ 23922 23740 24706 25591 23832 24100 24174 23570 24621 24603 23685 23808 22860 24619 24253 23597 23822 23526 23459 23404 23717 23539 23634 24337 23592 24087 24562 23512 22957 23963 23435 23803 22888 23590 23584 23778 23687 24526 23857 23420 ]

This is the first lines of the perf top output:

Samples: 3M of event 'cycles', 4000 Hz, Event count (approx.): 559794136032 lost: 0/0 drop: 135247/1550359
  Children      Self  Shared Object       Symbol
+   27.15%     0.07%  [kernel]            [k] entry_SYSCALL_64_after_hwframe                                                                                                            ◆
+   26.88%     0.20%  [kernel]            [k] do_syscall_64                                                                                                                             ▒
+   12.32%    12.32%  libc-2.31.so        [.] 0x0000000000098568                                                                                                                        ▒
+    8.53%     0.06%  libc-2.31.so        [.] epoll_wait                                                                                                                                ▒
+    8.10%     0.02%  [kernel]            [k] __x64_sys_epoll_wait                                                                                                                      ▒
+    8.08%     0.03%  [kernel]            [k] do_epoll_wait                                                                                                                             ▒
+    7.90%     0.20%  [kernel]            [k] ep_poll                                                                                                                                   ▒
+    6.51%     0.03%  [kernel]            [k] schedule                                                                                                                                  ▒
+    6.41%     0.14%  [kernel]            [k] __schedule                                                                                                                                ▒
+    6.31%     0.02%  libpthread-2.31.so  [.] __libc_send                                                                                                                               ▒
+    6.20%     0.01%  [kernel]            [k] schedule_hrtimeout_range                                                                                                                  ▒
+    6.18%     0.02%  [kernel]            [k] schedule_hrtimeout_range_clock                                                                                                            ▒
+    6.05%     0.02%  [kernel]            [k] __x64_sys_sendto                                                                                                                          ▒
+    6.00%     0.05%  [kernel]            [k] __sys_sendto                                                                                                                              ▒
+    5.99%     0.18%  [kernel]            [k] __tcp_transmit_skb                                                                                                                        ▒
+    5.80%     0.02%  [kernel]            [k] sock_sendmsg                                                                                                                              ▒
+    5.70%     0.02%  [kernel]            [k] inet_sendmsg                                                                                                                              ▒
+    5.67%     0.01%  [kernel]            [k] ip_queue_xmit                                                                                                                             ▒
+    5.67%     0.01%  [kernel]            [k] tcp_sendmsg                                                                                                                               ▒
+    5.66%     0.06%  [kernel]            [k] __ip_queue_xmit                                                                                                                           ▒
+    5.59%     0.01%  [kernel]            [k] ip_local_out                                                                                                                              ▒
+    5.49%     0.14%  [kernel]            [k] tcp_sendmsg_locked                                                                                                                        ▒
+    4.58%     0.11%  [kernel]            [k] tcp_write_xmit                                                                                                                            ▒
+    4.54%     0.01%  [kernel]            [k] __tcp_push_pending_frames                                                                                                                 ▒
+    4.50%     0.03%  [kernel]            [k] __softirqentry_text_start                                                                                                                 ▒
+    4.44%     0.01%  [kernel]            [k] ret_from_intr                                                                                                                             ▒
+    4.43%     0.02%  [kernel]            [k] do_IRQ                                                                                                                                    ▒
+    4.32%     0.01%  [kernel]            [k] irq_exit                                                                                                                                  ▒
+    4.22%     0.05%  [kernel]            [k] net_rx_action                                                                                                                             ▒
+    4.16%     0.02%  [kernel]            [k] tcp_push                                                                                                                                  ▒
+    3.62%     0.03%  [kernel]            [k] ip_output                                                                                                                                 ▒
+    3.42%     0.10%  [kernel]            [k] __perf_event_task_sched_out                                                                                                               ▒
+    3.39%     0.06%  libpthread-2.31.so  [.] __libc_recv                                                                                                                               ▒
+    3.32%     0.00%  [kernel]            [k] task_ctx_sched_out                                                                                                                        ▒
+    3.32%     0.06%  [kernel]            [k] ctx_sched_out                                                                                                                             ▒
+    3.28%     0.02%  [kernel]            [k] ip_finish_output                                                                                                                          ▒
+    3.23%     0.02%  [kernel]            [k] x86_pmu_disable                                                                                                                           ▒
+    3.21%     0.05%  [kernel]            [k] amd_pmu_disable_all                                                                                                                       ▒
     3.21%     0.00%  [kernel]            [k] perf_pmu_disable.part.0                                                                                                                   ▒
+    3.19%     0.03%  [kernel]            [k] __ip_finish_output                                                                                                                        ▒
+    3.17%     0.02%  [kernel]            [k] napi_complete_done                                                                                                                        ▒
+    3.15%     0.01%  [kernel]            [k] gro_normal_list.part.0                                                                                                                    ▒
+    3.15%     0.22%  [kernel]            [k] ip_finish_output2                                                                                                                         ▒
+    3.14%     0.03%  [kernel]            [k] netif_receive_skb_list_internal                                                                                                           ▒
+    3.12%     0.04%  [kernel]            [k] i40e_napi_poll                                                                                                                            ▒
+    3.09%     0.01%  [kernel]            [k] __netif_receive_skb_list_core                                                                                                             ▒
+    2.98%     0.04%  [kernel]            [k] __x64_sys_recvfrom                                                                                                                        ▒
+    2.96%     0.01%  [kernel]            [k] ip_list_rcv                                                                                                                               ▒
+    2.92%     0.02%  [kernel]            [k] ip_sublist_rcv                                                                                                                            ▒
+    2.91%     0.05%  [kernel]            [k] __sys_recvfrom                                                                                                                            ▒
+    2.82%     0.05%  [kernel]            [k] nf_hook_slow                                                                                                                              ▒
+    2.80%     0.00%  [kernel]            [k] dev_queue_xmit                                                                                                                            ▒
+    2.78%     0.13%  [kernel]            [k] __dev_queue_xmit                                                                                                                          ▒
+    2.67%     0.01%  [kernel]            [k] sock_recvmsg                                                                                                                              ▒
+    2.52%     0.02%  [kernel]            [k] inet_recvmsg                                                                                                                              ▒
+    2.47%     0.37%  [kernel]            [k] tcp_recvmsg                                                                                                                               ▒
+    2.33%     0.09%  [kernel]            [k] __qdisc_run                             

Currently everything is fine, but I hope it helps anyway.

@phihos
Copy link
Contributor Author

phihos commented Jul 6, 2022

This is the current queue metric:
grafik

The gaps are HaProxy restarts.

@phihos
Copy link
Contributor Author

phihos commented Jul 6, 2022

Some questions about nbthread: Do you think lowering that actually helped? Is it even recommended to set it to the number of virtual cores?

@phihos
Copy link
Contributor Author

phihos commented Jul 6, 2022

I dug into the metrics and found out something interesting. The timeline is as follows:

  1. Connection limit on the frontend reached. Some requests are not answered anymore. CPU usage is normal.
  2. I raise the limit and reload HaProxy. CPU usage skyrockets.
  3. The limit change seems to require a restart. CPU utilization still very high after restart.
  4. I pull some traffic away via DNS record change and change nbthreads from 48 to 40 and restart. CPU usage normalizes.

Two things are interesting:

  1. Why did HaProxy hyperventilate after the reload?
  2. Did lowering nbthreads actually help?

@phihos
Copy link
Contributor Author

phihos commented Jul 6, 2022

When looking at the server session utilization we can see that with the reload the maxconn of each server was quickly saturated and this is where the CPU utilization rose to 100%:
grafik
grafik

@capflam
Copy link
Member

capflam commented Jul 6, 2022

The CPU usage seems to be related to the queue size. It really seems to be a contention issue on the queues. Some improvements were brought with the 2.6. However, you are playing with queue priority. It may be related. A perf top may help on this point.

@capflam
Copy link
Member

capflam commented Jul 6, 2022

Note that I don't understand how it could be related to health-checks.

@phihos
Copy link
Contributor Author

phihos commented Jul 6, 2022

Oh very interesting. You probably mean I should do another perf top when it happens again right?
So putting clients into two different prio classes is very costly on large queues? I was not aware of that. It was just meant to be the cherry on-top the rate-limiting logic to ensure some fairness when the queuing starts. I will happily get rid of that if that fixes the issue.

Note that I don't understand how it could be related to health-checks.

During the incident today I could not see any health-check failing because it was already disabled I am afraid.

I will get back to you tomorrow when I did some more tests.

@capflam
Copy link
Member

capflam commented Jul 6, 2022

Copy that, thanks. About the queue priority, I would say it should be fairly light. But on high load and with contention issues on queues, it may make the issue worse. I honestly don't know.

@phihos
Copy link
Contributor Author

phihos commented Jul 6, 2022

Hm is preventing contention issues altogether via a short timeout queue or low maxqueue the real solution? Or would'nt you recommend that?

@capflam
Copy link
Member

capflam commented Jul 6, 2022

For now, we must figure out what the problem is. It may be a bug. If we change the settings, it may hide it and let think that everything is working properly. In your case, setting a short timeout queue or a low maxqueue value will result to an increase in 503 errors.

@phihos
Copy link
Contributor Author

phihos commented Jul 7, 2022

Hi, I have not yet done the tests but I found some sort of stack trace/thread dump in the logs. When I restarted HaProxy the first time after raising maxconn apparently HaProxy did not terminate quickly enough so systemd killed the process. And this seems to have caused HaProxy to dump this:

Click to expand!
Thread 1 is about to kill the process.
*>Thread 1 : id=0x7f3e03db2f80 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-117 rqsz=3842
      1/1    stuck=1 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=15981768083 now=18044038368 diff=2062270285
             curr_task=0x7f3ce1f40180 (task) calls=7019 last=0
               fct=0x55bcd50435c0(task_run_applet) ctx=0x7f3ce1f40050(<PEER>)
             strm=0x55bce7b62be0,2 src=<PEER> fe=company_peers be=company_peers dst=unknown
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=848202 rqa=0 rpf=80448000 rpa=0
             scf=0x55bcd5c532c0,EST,40 scb=0x55bcd5c53670,EST,49
             af=0x7f3ce1f40050,7 sab=(nil),0
             cof=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
             cob=0x55bce7b62850,10000300:PASS(0x55bce7b5fa70)/RAW((nil))/tcpv4(295)
             call trace(15):
             | 0x7f3e0477e420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
             | 0x55bcd4f9433b [48 8b 80 88 02 00 00 4c]: main+0x10cf7b
             | 0x55bcd50436ec [48 8b 43 28 f6 43 04 01]: task_run_applet+0x12c/0x7dc
  Thread 2 : id=0x7f3e03da7700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-135 rqsz=3304
      1/2    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17930475658 now=17937070770 diff=6595112
             curr_task=0x7f3b4004f2e0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3b4004eef0
             strm=0x7f3b4004eef0,800 src=172.68.226.175 fe=origin be=origin dst=unknown
             txn=0x7f3a77cf64b0,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f33a24200b0,EST,20 scb=0x7f33a2420120,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3a3dd5b100,a0000300:H2(0x7f3b40dee960)/SSL(0x7f3a565775f0)/tcpv4(87435)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 3 : id=0x7f3de4bd7700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-156 rqsz=4121
      1/3    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=18091001169 now=18097388401 diff=6387232
             curr_task=0x7f3c1fdc3b00 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f319bc0fc20
             strm=0x7f319bc0fc20,c00 src=162.158.114.156 fe=origin be=origin dst=unknown
             txn=0x7f3cb2866f90,3000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7e96a7e80db0,EST,20 scb=0x7f33a88f9690,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3302967160,a0000300:H1(0x7f3cf20cd100)/SSL(0x7f3c434cb540)/tcpv4(129148)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 4 : id=0x7f3de43d6700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-118 rqsz=3957
      1/4    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17995694497 now=18000278888 diff=4584391
             curr_task=0x7f3cbb628970 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f38c351b9f0
             strm=0x7f38c351b9f0,c00 src=162.158.118.41 fe=origin be=origin dst=unknown
             txn=0x7f38c351bf50,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f38c02bc760,EST,20 scb=0x7f3a95bb5760,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3ccb2e1000,a0000300:H1(0x7f3516ba1aa0)/SSL(0x7f3cba1980e0)/tcpv4(19204)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 5 : id=0x7f3de3bd5700 act=1 glob=1 wq=1 rq=0 tl=0 tlsz=-104 rqsz=3731
      1/5    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17875903399 now=17881966811 diff=6063412
             curr_task=0x7f35539b2980 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3c595f2e00
             strm=0x7f3c595f2e00,800 src=172.68.106.126 fe=origin be=origin dst=unknown
             txn=0x7f3c595f83f0,40000 txn.req=MSG_BODY,c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f3c595f9db0,EST,20 scb=0x7f3d0a4a5a30,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3cd2e8a1d0,a0000300:H2(0x7f3a9b0e9d20)/SSL(0x7f3d0af2ed90)/tcpv4(16041)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 6 : id=0x7f3de33d4700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-145 rqsz=3870
      1/6    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17957929749 now=17965276271 diff=7346522
             curr_task=0x7f35afd3cfa0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f35afd3cbb0
             strm=0x7f35afd3cbb0,800 src=172.70.251.158 fe=origin be=origin dst=unknown
             txn=0x7f35afd3d1d0,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f0e0fe80670,EST,20 scb=0x7f3302774040,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3cb251e960,a0000300:H2(0x7f34e9cc3f10)/SSL(0x7f3cc23f7200)/tcpv4(22929)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 7 : id=0x7f3de2bd3700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-126 rqsz=3776
      1/7    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=18011089769 now=18026533185 diff=15443416
             curr_task=0x7f37054d0270 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7eead2867450
             strm=0x7eead2867450,800 src=162.158.5.197 fe=origin be=origin dst=unknown
             txn=0x7eead2867a70,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f37054cfc30,EST,20 scb=0x7f351683a570,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3bcab41bf0,a0000300:H2(0x7f3c30f54bc0)/SSL(0x7f3adc435e30)/tcpv4(53042)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 8 : id=0x7f3de23d2700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-75 rqsz=3929
      1/8    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=18096773269 now=18119611247 diff=22837978
             curr_task=0x7f37a3d09840 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3439f881b0
             strm=0x7f3439f881b0,800 src=162.158.166.69 fe=origin be=origin dst=unknown
             txn=0x7f3008e6bae0,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f343a25e570,EST,20 scb=0x7f0e67e803f0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3d020198c0,a0000300:H2(0x7f3354310d00)/SSL(0x7f3c4572b130)/tcpv4(39564)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 9 : id=0x7f3de1bd1700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-106 rqsz=3378
      1/9    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=18064458136 now=18068448027 diff=3989891
             curr_task=0x7f3caaa6faa0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f356a5cdc10
             strm=0x7f356a5cdc10,c00 src=198.41.242.241 fe=origin be=origin dst=unknown
             txn=0x7f3a00d41080,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f27f9b84030,EST,20 scb=0x7f3ccafc85e0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3a56130b00,a0000300:H2(0x7f27f9e4dc30)/SSL(0x7f3a00e7d460)/tcpv4(83348)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 10: id=0x7f3de13d0700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-151 rqsz=3934
      1/10   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=18035755604 now=18061794223 diff=26038619
             curr_task=0x7f3414ed4270 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3c3d389210
             strm=0x7f3c3d389210,800 src=162.158.118.196 fe=origin be=origin dst=unknown
             txn=0x7f341515c320,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f2f2b45f640,EST,20 scb=0x7ed6e3ac1db0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3c435ef4f0,a0000300:H2(0x7f34154fceb0)/SSL(0x7f3c3d697310)/tcpv4(39904)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 11: id=0x7f3de0bcf700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-190 rqsz=3798
      1/11   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17977044717 now=17984491150 diff=7446433
             curr_task=0x7f3302d16c40 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3b831b5a30
             strm=0x7f3b831b5a30,c00 src=172.70.61.184 fe=origin be=origin dst=unknown
             txn=0x7f3bc09ef0e0,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f3b831b59c0,EST,20 scb=0x7f3b831b5f40,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3caa6a51b0,a0000300:H2(0x7f370b42d1c0)/SSL(0x7f3cc3d58310)/tcpv4(7019)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 12: id=0x7f3ddbfff700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-126 rqsz=3612
      1/12   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17978685969 now=17983931441 diff=5245472
             curr_task=0x7f26d0c3a800 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3b3b182d00
             strm=0x7f3b3b182d00,c00 src=162.158.178.27 fe=origin be=origin dst=unknown
             txn=0x7f3d0aa922c0,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f30254e2d50,EST,20 scb=0x7f3b3b183210,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3c91cf4d30,a0000300:H2(0x7f3c598902c0)/SSL(0x7f3c9f9d9100)/tcpv4(44070)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 13: id=0x7f3ddb7fe700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-185 rqsz=3788
      1/13   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17915118981 now=17930647446 diff=15528465
             curr_task=0x7f2e66f40690 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3cd26cf130
             strm=0x7f3cd26cf130,800 src=162.158.148.170 fe=origin be=origin dst=unknown
             txn=0x7f3603a80620,40000 txn.req=MSG_BODY,c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7ef303e80590,EST,20 scb=0x7f2d4c6fa270,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3ceb453490,a0000300:H2(0x7f36a34f09e0)/SSL(0x7f3cd3338570)/tcpv4(20718)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 14: id=0x7f3ddaffd700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-183 rqsz=3983
      1/14   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17879121396 now=17886236188 diff=7114792
             curr_task=0x7f350b9cf8b0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f39537fb5d0
             strm=0x7f39537fb5d0,c00 src=172.68.118.59 fe=origin be=origin dst=unknown
             txn=0x7f3bd71bdaa0,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f31a0982d80,EST,20 scb=0x7f2f6bcbf3c0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3cba41dc80,a0000300:H2(0x7f3d02143f90)/SSL(0x7f3c9512b3c0)/tcpv4(21795)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 15: id=0x7f3dda7fc700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-275 rqsz=3758
      1/15   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=18244268612 now=18247710263 diff=3441651
             curr_task=0x7f0da7e807e0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f350b537450
             strm=0x7f350b537450,800 src=172.70.222.201 fe=origin be=origin dst=unknown
             txn=0x7f3b9d042b60,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f350bd7e450,EST,20 scb=0x7f31a0875f00,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3bad520730,a0000300:H2(0x7f3b9d336500)/SSL(0x7f3b9d24c290)/tcpv4(55163)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 16: id=0x7f3dd9ffb700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-152 rqsz=3960
      1/16   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17901756689 now=17906278690 diff=4522001
             curr_task=0x7f318a2650c0 (tasklet) calls=1
               fct=0x55bcd4e96c80(ssl_sock_io_cb) ctx=0x7f318a91f3e0
  Thread 17: id=0x7f3dd97fa700 act=1 glob=1 wq=1 rq=0 tl=0 tlsz=-186 rqsz=3694
      1/17   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=18084271485 now=18135608112 diff=51336627
             curr_task=0x7f39f6675110 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f182c7f02f0
             strm=0x7f182c7f02f0,c00 src=172.70.57.170 fe=origin be=origin dst=unknown
             txn=0x7f364f385520,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f2ef4d0cbb0,EST,20 scb=0x7f39f6675230,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3cd288b9e0,a0000300:H1(0x7f32a8c5e640)/SSL(0x7f3cfa92dea0)/tcpv4(9559)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 18: id=0x7f3dd8ff9700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-130 rqsz=3778
      1/18   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17901486985 now=17926630614 diff=25143629
             curr_task=0x7f3a008dc120 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f332a4a5ef0
             strm=0x7f332a4a5ef0,c00 src=141.101.69.129 fe=origin be=origin dst=unknown
             txn=0x7f27f8fe6b20,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f356a439d90,EST,20 scb=0x7f27f8fe6a10,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3cbb640580,a0000300:H1(0x7f3cca7934b0)/SSL(0x7f3cca64db50)/tcpv4(7627)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
 >Thread 19: id=0x7f3dd87f8700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=-188 rqsz=3731
      1/19   stuck=1 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=15951278268 now=18005836824 diff=2054558556
             curr_task=0x55bcd5d9e6c0 (task) calls=46182 last=0
               fct=0x55bcd4f95ad0(process_peer_sync) ctx=0x55bcd5ca5ea0
             call trace(10):
             | 0x55bcd4ff9c3f [89 44 24 04 85 c0 75 29]: ha_dump_backtrace+0x3f/0x2fd
             | 0x55bcd4ffa6ae [48 8b 05 83 6b 1d 00 48]: debug_handler+0x6e/0x10b
             | 0x7f3e0477e420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
             | 0x55bcd4f95b18 [48 8b 41 08 48 85 f0 0f]: process_peer_sync+0x48/0x8e1
  Thread 20: id=0x7f3dd7ff7700 act=1 glob=1 wq=1 rq=0 tl=0 tlsz=-190 rqsz=3720
      1/20   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17908953095 now=17914691428 diff=5738333
             curr_task=0x7ea23be80bc0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3705e4b860
             strm=0x7f3705e4b860,c00 src=172.70.92.167 fe=origin be=origin dst=unknown
             txn=0x7f3c304354f0,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f3539d5a980,EST,20 scb=0x7f3283e80390,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3cd21d2c60,a0000300:H1(0x7f3cba799e40)/SSL(0x7f3cba972c40)/tcpv4(26638)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 21: id=0x7f3dd77f6700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-115 rqsz=3918
      1/21   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17932016389 now=17938982142 diff=6965753
             curr_task=0x7f3c4d48b4c0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f2e8e293390
             strm=0x7f2e8e293390,800 src=162.158.166.247 fe=origin be=origin dst=unknown
             txn=0x7f3c4d676c20,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=908002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f33a248b190,EST,20 scb=0x7f33a248b120,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3cda16c1a0,a01c0300:H2(0x7f3c4db55990)/SSL(0x7f3c4d715cb0)/tcpv4(39996)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 22: id=0x7f3dd6ff5700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-229 rqsz=3546
      1/22   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17980046311 now=17985496262 diff=5449951
             curr_task=0x7f301908dbf0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3532f049a0
             strm=0x7f3532f049a0,800 src=162.158.92.196 fe=origin be=origin dst=unknown
             txn=0x7f373b71e690,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f373b9f5180,EST,20 scb=0x7f373ba61ac0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3cc26d0080,a0000300:H2(0x7f2f014786f0)/SSL(0x7f3afb9b2c90)/tcpv4(69941)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 23: id=0x7f3dd67f4700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-198 rqsz=3687
      1/23   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=18039556423 now=18042149335 diff=2592912
             curr_task=0x7f35b42fd670 (task) calls=2 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f31b8644ce0
             strm=0x7f31b8644ce0,64808 src=198.41.242.95 fe=origin be=??? dst=unknown
             txn=0x7f31b8645300,0 txn.req=MSG_DONE,d txn.rsp=MSG_RPBEFORE,0
             rqf=4cc8e460 rqa=0 rpf=c004a060 rpa=0
             scf=0x7f2f55e94e10,CLO,200 scb=0x7f2f55e94da0,CLO,11
             af=(nil),0 sab=(nil),0
             cof=0x7f35b4f2e3a0,a0000300:H1(0x7f3c1f239c30)/SSL(0x7f352aee77a0)/tcpv4(122684)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 24: id=0x7f3dd5ff3700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-107 rqsz=3848
      1/24   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17997454675 now=18004279647 diff=6824972
             curr_task=0x7f3aea919cf0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3aea915ca0
             strm=0x7f3aea915ca0,800 src=172.69.69.40 fe=origin be=origin dst=unknown
             txn=0x7f3aea9161d0,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f3aea915c30,EST,20 scb=0x7f3aea916160,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3bca2b8460,a0000300:H2(0x7f319b68d410)/SSL(0x7f3c43b495a0)/tcpv4(45155)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 25: id=0x7f3dd57f2700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-217 rqsz=3955
      1/25   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17940895051 now=17981983513 diff=41088462
             curr_task=0x7f352bd7e940 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f35b4a38c70
             strm=0x7f35b4a38c70,800 src=172.71.98.169 fe=origin be=origin dst=unknown
             txn=0x7f3c1fbdd740,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7e960fe808c0,EST,20 scb=0x7f35b458f4f0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3cf20980a0,a0000300:H2(0x7f3cf2463860)/SSL(0x7f3cf23f8d60)/tcpv4(4595)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 26: id=0x7f3dd4ff1700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-162 rqsz=3755
      1/26   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17885343986 now=17890484377 diff=5140391
             curr_task=0x7f3c89ed3730 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3bdece4d20
             strm=0x7f3bdece4d20,800 src=162.158.203.41 fe=origin be=origin dst=unknown
             txn=0x7f3c89eca0d0,40000 txn.req=MSG_BODY,c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f357db4ed90,EST,20 scb=0x7f357db4f090,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3b01f539b0,a0000300:H2(0x7f32a8e949f0)/SSL(0x7f3cfa6941a0)/tcpv4(53134)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 27: id=0x7f3dd47f0700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-141 rqsz=3684
      1/27   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17960679600 now=17966386571 diff=5706971
             curr_task=0x55bd261a8240 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f0e97e80180
             strm=0x7f0e97e80180,c00 src=162.158.148.189 fe=origin be=origin dst=unknown
             txn=0x7f18e3bc8420,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f3c1fbc8080,EST,20 scb=0x7f0e8fe80ea0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3bef383380,a0000300:H2(0x7f352b6a5c80)/SSL(0x7f3c1f9e9e60)/tcpv4(36678)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 28: id=0x7f3dd3fef700 act=1 glob=1 wq=1 rq=0 tl=0 tlsz=-136 rqsz=3815
      1/28   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17934947700 now=17937276270 diff=2328570
             curr_task=0x7f357d204680 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f364fdc44c0
             strm=0x7f364fdc44c0,800 src=162.158.56.136 fe=origin be=origin dst=unknown
             txn=0x7e976c02a2b0,0 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f182cd30020,EST,20 scb=0x7f3cfa7bc5b0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3bacfd3060,a0000300:H1(0x7f3c897a8830)/SSL(0x7f3bde341f40)/tcpv4(137270)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 29: id=0x7f3dd37ee700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=-132 rqsz=4013
      1/29   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17898454057 now=17900488408 diff=2034351
             curr_task=0x7f33a8566f60 (tasklet) calls=1
               fct=0x55bcd4e96c80(ssl_sock_io_cb) ctx=0x7f33a8151000
  Thread 30: id=0x7f3dd2fed700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-213 rqsz=3798
      1/30   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17968128850 now=17975747552 diff=7618702
             curr_task=0x7f33a29403b0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3ca9fdc240
             strm=0x7f3ca9fdc240,800 src=172.68.118.157 fe=origin be=origin dst=unknown
             txn=0x7f35bd912590,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f3caba4a7b0,EST,20 scb=0x7f33a2227cc0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3bd2476270,a0000300:H2(0x7f3b40063860)/SSL(0x7f3caaabe240)/tcpv4(49410)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 31: id=0x7f3dd27ec700 act=1 glob=1 wq=1 rq=0 tl=0 tlsz=-127 rqsz=3928
      1/31   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17845322364 now=17852221227 diff=6898863
             curr_task=0x7f3c959d1490 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7ed37c616440
             strm=0x7ed37c616440,800 src=172.70.222.2 fe=origin be=origin dst=unknown
             txn=0x7ed37c616a40,0 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f3bd741c510,EST,20 scb=0x7f31a096e2b0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f333a6134b0,a0000300:H1(0x7f350bccd990)/SSL(0x7f31a008b8d0)/tcpv4(128293)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 32: id=0x7f3dd1feb700 act=1 glob=1 wq=1 rq=0 tl=0 tlsz=-173 rqsz=3733
      1/32   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17920870315 now=17936869010 diff=15998695
             curr_task=0x7f3cea3e5fc0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f158eaae010
             strm=0x7f158eaae010,800 src=141.101.69.107 fe=origin be=origin dst=unknown
             txn=0x7f3c818f66e0,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f2a5e0b00e0,EST,20 scb=0x7f2a5e0b0150,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3cc2631240,a0000300:H2(0x7f353336bcd0)/SSL(0x7f3cea5ff6b0)/tcpv4(6522)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 33: id=0x7f3dd17ea700 act=1 glob=1 wq=1 rq=0 tl=0 tlsz=-118 rqsz=4069
      1/33   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17853373384 now=17860950485 diff=7577101
             curr_task=0x7f339b942d80 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f339bb1b2d0
             strm=0x7f339bb1b2d0,c00 src=172.69.70.201 fe=origin be=origin dst=unknown
             txn=0x7f353365f790,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f3c810c3db0,EST,20 scb=0x7f3c819fc050,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3cea2d8d80,a0000300:H2(0x7f3afb91b6c0)/SSL(0x7f3ceab87c10)/tcpv4(3360)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 34: id=0x7f3dd0fe9700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-129 rqsz=3873
      1/34   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=18003602995 now=18006516217 diff=2913222
             curr_task=0x7f34633cea60 (task) calls=2 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f35536eb7d0
             strm=0x7f35536eb7d0,64808 src=172.70.222.98 fe=origin be=??? dst=unknown
             txn=0x7f3c595baa80,43000 txn.req=MSG_DONE,4d txn.rsp=MSG_RPBEFORE,0
             rqf=4cc8e460 rqa=0 rpf=c004a060 rpa=0
             scf=0x7f3c5953fc60,CLO,280 scb=0x7f3c595bab70,CLO,11
             af=(nil),0 sab=(nil),0
             cof=0x7f3d139cc240,a0000300:H2(0x7f38139dc3f0)/SSL(0x7f3d0b9a1ed0)/tcpv4(27342)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 35: id=0x7f3dd07e8700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-135 rqsz=3724
      1/35   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17926723373 now=17932570135 diff=5846762
             curr_task=0x7f3c919c0fa0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f0fe3e80390
             strm=0x7f0fe3e80390,c00 src=172.70.246.245 fe=origin be=origin dst=unknown
             txn=0x7f1f293ec520,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f3395292610,EST,20 scb=0x7f3157e80630,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3adc5fcc70,a0000300:H2(0x7f2f63bfb330)/SSL(0x7f3bffc62bf0)/tcpv4(54886)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 36: id=0x7f3dcffe7700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=-233 rqsz=3554
      1/36   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=18009154598 now=18011407199 diff=2252601
             curr_task=0x55bd715afb10 (tasklet) calls=1
               fct=0x55bcd4e96c80(ssl_sock_io_cb) ctx=0x55bd73a31910
  Thread 37: id=0x7f3dcf7e6700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-187 rqsz=3671
      1/37   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17909778205 now=17917208047 diff=7429842
             curr_task=0x7f3d128fca20 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f37a39a58a0
             strm=0x7f37a39a58a0,c00 src=198.41.242.234 fe=origin be=origin dst=unknown
             txn=0x7f3c4538c620,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7e9d7be808b0,EST,20 scb=0x7f35a7c870f0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3cc2e3a8b0,a0000300:H2(0x7f2b65283330)/SSL(0x7f3ad0fb8b50)/tcpv4(62659)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 38: id=0x7f3dcefe5700 act=1 glob=1 wq=1 rq=0 tl=0 tlsz=-198 rqsz=3973
      1/38   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=18009027259 now=18018436453 diff=9409194
             curr_task=0x7f281b0e0b70 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f33eacdfbf0
             strm=0x7f33eacdfbf0,c00 src=162.158.219.36 fe=origin be=origin dst=unknown
             txn=0x7f39387d9500,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f3a2c69e870,EST,20 scb=0x7f2a6f86e0c0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3bc0064450,a0000300:H1(0x7f1f29ee8d20)/SSL(0x7f3c9176eca0)/tcpv4(32086)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 39: id=0x7f3dce7e4700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-144 rqsz=3863
      1/39   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17902433503 now=17911858515 diff=9425012
             curr_task=0x7f3cf23444e0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f14afe80bc0
             strm=0x7f14afe80bc0,800 src=141.101.68.123 fe=origin be=origin dst=unknown
             txn=0x7f3bef588040,40000 txn.req=MSG_BODY,c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f3690b367e0,EST,20 scb=0x7f3690bcc9e0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3bd771ad70,a0000300:H2(0x7f3141594260)/SSL(0x7f3befb14200)/tcpv4(45005)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 40: id=0x7f3dcdfe3700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-207 rqsz=3682
      1/40   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17974151036 now=18009631478 diff=35480442
             curr_task=0x7f3cdb574cb0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3cdb5748c0
             strm=0x7f3cdb5748c0,800 src=162.158.129.41 fe=origin be=origin dst=unknown
             txn=0x7f3cdb574ee0,40000 txn.req=MSG_BODY,c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f2d921a7b20,EST,20 scb=0x7f3cdb574dd0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3b662b0860,a0000300:H2(0x7f318ac170f0)/SSL(0x7f3ab2fa3170)/tcpv4(60271)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 41: id=0x7f3dcd7e2700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-190 rqsz=3823
      1/41   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17966485508 now=18010264924 diff=43779416
             curr_task=0x7f321328e010 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3bef6af120
             strm=0x7f3bef6af120,800 src=172.70.233.42 fe=origin be=origin dst=unknown
             txn=0x7f3bef291fb0,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7ed909b04b70,EST,20 scb=0x7f333a14c840,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3bde512810,a0000300:H2(0x7f15d8f8e890)/SSL(0x7f3bbaf23160)/tcpv4(48682)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 42: id=0x7f3dccfe1700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-236 rqsz=3705
      1/42   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17989049270 now=17997408014 diff=8358744
             curr_task=0x7f3b3afca990 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f2ebcd3f5d0
             strm=0x7f2ebcd3f5d0,800 src=162.158.50.175 fe=origin be=origin dst=unknown
             txn=0x7f3c6dedbc00,0 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f318a0b5ca0,EST,20 scb=0x7f3c6deb8d80,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3d0a83e7d0,a0000300:H1(0x7f3bd27ddd00)/SSL(0x7f3ab2f6e7b0)/tcpv4(141210)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 43: id=0x7f3dcc7e0700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-207 rqsz=3834
      1/43   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=18002002193 now=18013293637 diff=11291444
             curr_task=0x7f3bc04be5c0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3c3d3a45a0
             strm=0x7f3c3d3a45a0,800 src=172.70.251.7 fe=origin be=origin dst=unknown
             txn=0x7f3c3d932ea0,0 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f3c3d932ba0,EST,20 scb=0x7f2e67725010,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f309a589480,a0000300:H1(0x7f3083072b10)/SSL(0x7f3cd29d95e0)/tcpv4(135328)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 44: id=0x7f3dcbfdf700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-138 rqsz=3932
      1/44   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17928732970 now=18034809559 diff=106076589
             curr_task=0x55be1ac26850 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x55be1ac26460
             strm=0x55be1ac26460,800 src=162.158.206.136 fe=origin be=origin dst=unknown
             txn=0x55be1ac26a80,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x55bce99e3370,EST,20 scb=0x55be1ac26970,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3cc9fed690,a0000300:H2(0x55bd533ab0f0)/SSL(0x55bcf2b924b0)/tcpv4(40289)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 45: id=0x7f3dcb7de700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-164 rqsz=4068
      1/45   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=18064519549 now=18067598091 diff=3078542
             curr_task=0x7f3d1a4d4940 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f1793e80530
             strm=0x7f1793e80530,800 src=172.70.122.39 fe=origin be=origin dst=unknown
             txn=0x7f3c457542a0,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f343a02cce0,EST,20 scb=0x7f343a15c140,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3c306c6db0,a0000300:H2(0x7f3c4574e100)/SSL(0x7f3c456df250)/tcpv4(39470)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 46: id=0x7f3dcafdd700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-185 rqsz=3828
      1/46   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=17999942895 now=18006213199 diff=6270304
             curr_task=0x7f33026b63d0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7eee4790bbb0
             strm=0x7eee4790bbb0,c00 src=108.162.229.42 fe=origin be=origin dst=unknown
             txn=0x7eee478e6390,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f0e0fe80140,EST,20 scb=0x7f330225aa90,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3bc09541e0,a0000300:H1(0x7f3bc047dff0)/SSL(0x7f3bc07d8850)/tcpv4(41272)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 47: id=0x7f3dca7dc700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-357 rqsz=3709
      1/47   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=18101687879 now=18107509331 diff=5821452
             curr_task=0x7f3b896fbe10 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3ca367ffd0
             strm=0x7f3ca367ffd0,800 src=162.158.114.193 fe=origin be=origin dst=unknown
             txn=0x7f3ca3680560,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f3ca367de60,EST,20 scb=0x7f3ca36803c0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3b5bb3c010,a0000300:H2(0x7f2e9e361b70)/SSL(0x7f3b025a0880)/tcpv4(59469)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 48: id=0x7f3dc9fdb700 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-384 rqsz=3560
      1/48   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=18484982109 now=18493346882 diff=8364773
             curr_task=0x7f333ac332b0 (task) calls=1 last=0
               fct=0x55bcd4f28f00(process_stream) ctx=0x7f3bbb8ad4d0
             strm=0x7f3bbb8ad4d0,800 src=172.69.134.5 fe=origin be=origin dst=unknown
             txn=0x7f3bbba15690,40000 txn.req=MSG_BODY,c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f15d87c5eb0,EST,20 scb=0x7f15d87c6000,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f3c810fd6e0,a0000300:H2(0x7f3861d94ff0)/SSL(0x7f3d1b951df0)/tcpv4(26053)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)

I do not know whether this helps, but I thought it is worth a shot.

Edit: I greped for this and found two other dumps like this from a week earlier when the health checks were still enabled and this happened the first time. Should I attach them too?

@capflam
Copy link
Member

capflam commented Jul 8, 2022

Your traces is not due to a kill of systemd but yo the internal watchdog. HAProxy was killed because it detected some stuck threads. Here there are 2 stuck threads, both in peers part. It could be good to check your other traces. But it may be only a side effect of the contention problem we suspect.

@phihos
Copy link
Contributor Author

phihos commented Jul 11, 2022

Hi, I was not able to continue on Friday, but today I am back on the case.

Here are the two traces:

Click to expand!
Thread 1 is about to kill the process.
*>Thread 1 : id=0x7f65684d8f80 act=1 glob=1 wq=1 rq=0 tl=1 tlsz=-35137 rqsz=5
      1/1    stuck=1 prof=0 harmless=0 wantrdv=1
             cpu_ns: poll=2207731498287 now=2210378584246 diff=2647085959
             curr_task=0
             call trace(15):
             | 0x7f6568ea4420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
             | 0x7f65688d1a7b [48 3d 01 f0 ff ff 73 01]: libc:madvise+0xb/0x25
             | 0x7f6568854fd6 [ba 01 00 00 00 eb a3 0f]: libc:malloc_trim+0x126/0x2f3
             | 0x565539229f3c [eb 98 e8 cd 66 e6 ff 66]: pool_gc+0x1fc/0x203
             | 0x565539266e83 [48 8b 03 48 89 df 4c 39]: __signal_process_queue+0xb3/0x172
  Thread 2 : id=0x7f65684cd700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-39375 rqsz=259
      1/2    stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1247125981506 now=1249232362900 diff=2106381394
             curr_task=0
  Thread 3 : id=0x7f654472a700 act=1 glob=1 wq=1 rq=1 tl=0 tlsz=-40614 rqsz=1
      1/3    stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1233719243093 now=1236178085888 diff=2458842795
             curr_task=0
  Thread 4 : id=0x7f6543f29700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-41033 rqsz=269
      1/4    stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1224568929051 now=1225685455471 diff=1116526420
             curr_task=0
  Thread 5 : id=0x7f6543728700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-41355 rqsz=236
      1/5    stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1226236376449 now=1228205752742 diff=1969376293
             curr_task=0
  Thread 6 : id=0x7f6542f27700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-43111 rqsz=269
      1/6    stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1232823525957 now=1234292825239 diff=1469299282
             curr_task=0
  Thread 7 : id=0x7f6542726700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-40812 rqsz=265
      1/7    stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1219635350920 now=1221641854973 diff=2006504053
             curr_task=0
  Thread 8 : id=0x7f6541f25700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-39980 rqsz=288
      1/8    stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1250326094826 now=1251930760471 diff=1604665645
             curr_task=0
  Thread 9 : id=0x7f6541724700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-40827 rqsz=239
      1/9    stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1232806526873 now=1234157782059 diff=1351255186
             curr_task=0
  Thread 10: id=0x7f6540f23700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-39875 rqsz=256
      1/10   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1230320842655 now=1231640816205 diff=1319973550
             curr_task=0
  Thread 11: id=0x7f653bfff700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-41894 rqsz=258
      1/11   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1239662398411 now=1241383478238 diff=1721079827
             curr_task=0
  Thread 12: id=0x7f653b7fe700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-41960 rqsz=248
      1/12   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1233187998459 now=1235070044499 diff=1882046040
             curr_task=0
  Thread 13: id=0x7f653affd700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-40484 rqsz=250
      1/13   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1221153202881 now=1222231324313 diff=1078121432
             curr_task=0
  Thread 14: id=0x7f653a7fc700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-43445 rqsz=286
      1/14   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1223714696445 now=1225819554020 diff=2104857575
             curr_task=0
  Thread 15: id=0x7f6539ffb700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-43392 rqsz=264
      1/15   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1222921963994 now=1224080110861 diff=1158146867
             curr_task=0
  Thread 16: id=0x7f65397fa700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=-40912 rqsz=1
      1/16   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1264039370593 now=1265702372393 diff=1663001800
             curr_task=0
  Thread 17: id=0x7f6538ff9700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-40900 rqsz=231
      1/17   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1244357956611 now=1246351460896 diff=1993504285
             curr_task=0
  Thread 18: id=0x7f65387f8700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-40042 rqsz=268
      1/18   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1227737056522 now=1229172097524 diff=1435041002
             curr_task=0
  Thread 19: id=0x7f6537ff7700 act=1 glob=1 wq=1 rq=0 tl=0 tlsz=-40329 rqsz=7
      1/19   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1225327322791 now=1227705824075 diff=2378501284
             curr_task=0
  Thread 20: id=0x7f65377f6700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-42671 rqsz=276
      1/20   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1227292803356 now=1229500784438 diff=2207981082
             curr_task=0
  Thread 21: id=0x7f6536ff5700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=-40868 rqsz=1
      1/21   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1234596260171 now=1237097774917 diff=2501514746
             curr_task=0
  Thread 22: id=0x7f65367f4700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-42357 rqsz=276
      1/22   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1266771650356 now=1267706470381 diff=934820025
             curr_task=0
  Thread 23: id=0x7f6535ff3700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-39607 rqsz=277
      1/23   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1239243459440 now=1240371280861 diff=1127821421
             curr_task=0
  Thread 24: id=0x7f65357f2700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-40196 rqsz=269
      1/24   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1237151728129 now=1238306839007 diff=1155110878
             curr_task=0
  Thread 25: id=0x7f6534ff1700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-40688 rqsz=257
      1/25   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1217791704735 now=1219687069428 diff=1895364693
             curr_task=0
  Thread 26: id=0x7f65347f0700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-39381 rqsz=254
      1/26   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1262822051695 now=1264704883180 diff=1882831485
             curr_task=0
  Thread 27: id=0x7f6533fef700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-40370 rqsz=242
      1/27   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1306929086683 now=1307936660380 diff=1007573697
             curr_task=0
  Thread 28: id=0x7f65337ee700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-40147 rqsz=255
      1/28   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1284177739308 now=1286259771681 diff=2082032373
             curr_task=0
  Thread 29: id=0x7f6532fed700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-48772 rqsz=256
      1/29   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1126523693631 now=1128687631314 diff=2163937683
             curr_task=0
  Thread 30: id=0x7f65327ec700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-40633 rqsz=284
      1/30   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1209890808499 now=1211854684418 diff=1963875919
             curr_task=0
  Thread 31: id=0x7f6531feb700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-40811 rqsz=252
      1/31   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1232306031579 now=1234709875714 diff=2403844135
             curr_task=0
  Thread 32: id=0x7f65317ea700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-49646 rqsz=257
      1/32   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1123897088577 now=1125841266767 diff=1944178190
             curr_task=0
  Thread 33: id=0x7f6530fe9700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-46094 rqsz=246
      1/33   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1203442284927 now=1205295073289 diff=1852788362
             curr_task=0
  Thread 34: id=0x7f65307e8700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-41864 rqsz=280
      1/34   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1236008849944 now=1238181649280 diff=2172799336
             curr_task=0
  Thread 35: id=0x7f652ffe7700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-39437 rqsz=260
      1/35   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1247489788115 now=1249194700929 diff=1704912814
             curr_task=0
  Thread 36: id=0x7f652f7e6700 act=1 glob=1 wq=1 rq=1 tl=0 tlsz=-40852 rqsz=1
      1/36   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1226896398237 now=1229505300859 diff=2608902622
             curr_task=0
  Thread 37: id=0x7f652efe5700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-44144 rqsz=254
      1/37   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1226322225897 now=1228546008433 diff=2223782536
             curr_task=0
  Thread 38: id=0x7f652e7e4700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-42620 rqsz=293
      1/38   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1241423190929 now=1243006283624 diff=1583092695
             curr_task=0
  Thread 39: id=0x7f652dfe3700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-41201 rqsz=258
      1/39   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1242296027588 now=1244403231250 diff=2107203662
             curr_task=0
  Thread 40: id=0x7f652d7e2700 act=1 glob=1 wq=1 rq=0 tl=0 tlsz=-41527 rqsz=0
      1/40   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1226743458167 now=1229238023681 diff=2494565514
             curr_task=0
  Thread 41: id=0x7f652cfe1700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-40368 rqsz=279
      1/41   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1367591165001 now=1369301900015 diff=1710735014
             curr_task=0
  Thread 42: id=0x7f652c7e0700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-40439 rqsz=263
      1/42   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1234615308530 now=1236575669977 diff=1960361447
             curr_task=0
  Thread 43: id=0x7f652bfdf700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=-42405 rqsz=2
      1/43   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1223841816979 now=1226361488178 diff=2519671199
             curr_task=0
  Thread 44: id=0x7f652b7de700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-39782 rqsz=270
      1/44   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1269659380709 now=1270941126102 diff=1281745393
             curr_task=0
  Thread 45: id=0x7f652afdd700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-42490 rqsz=251
      1/45   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1224928538059 now=1226335939062 diff=1407401003
             curr_task=0
  Thread 46: id=0x7f652a7dc700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-41542 rqsz=245
      1/46   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1235480747158 now=1236767427806 diff=1286680648
             curr_task=0
  Thread 47: id=0x7f6529fdb700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-39004 rqsz=271
      1/47   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1249804410669 now=1251067687107 diff=1263276438
             curr_task=0
  Thread 48: id=0x7f65297da700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-40866 rqsz=286
      1/48   stuck=0 prof=0 harmless=1 wantrdv=0
             cpu_ns: poll=1224200964251 now=1225971772589 diff=1770808338
             curr_task=0
Click to expand!
  Thread 1 is about to kill the process.
*>Thread 1 : id=0x7fce81209f80 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-4 rqsz=1819
      1/1    stuck=1 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=34119751570 now=37090268720 diff=2970517150
             curr_task=0x7fcd15592050 (task) calls=51 last=0
               fct=0x559a9c7865c0(task_run_applet) ctx=0x559aba55ee90(<PEER>)
             strm=0x55b0632d76e0,2 src=<PEER> fe=company_peers be=company_peers dst=unknown
             txn=(nil),0 txn.req=-,0 txn.rsp=-,0
             rqf=848002 rqa=0 rpf=80048000 rpa=0
             scf=0x7fcd82de8f50,EST,40 scb=0x7fcd82b27360,EST,249
             af=0x559aba55ee90,7 sab=(nil),0
             cof=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
             cob=0x7edb73e80ca0,10000300:PASS(0x7fcd15391770)/RAW((nil))/tcpv4(97934)
             call trace(15):
             | 0x7fce81bd5420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
             | 0x559a9c6d734d [eb e9 90 f0 48 29 98 88]: main+0x10cf8d
             | 0x559a9c7866ec [48 8b 43 28 f6 43 04 01]: task_run_applet+0x12c/0x7dc
  Thread 2 : id=0x7fce811fe700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=6 rqsz=2819
      1/2    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30588523339 now=31499260712 diff=910737373
             curr_task=0x7d69d365e9c0 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7ddbc3e80070
             strm=0x7ddbc3e80070,7800 src=172.70.214.114 fe=origin be=origin dst=unknown
             txn=0x7d69d36bea20,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7e2fb68db630,EST,20 scb=0x7fcd22153930,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fcd926c3010,a0000300:H2(0x7fccd50ffd40)/SSL(0x7fcd226a2b50)/tcpv4(7206)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 3 : id=0x7fce5d847700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=2680
      1/3    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30561666348 now=30904840131 diff=343173783
             curr_task=0x7e0b82fe94f0 (tasklet) calls=12
               fct=0x559a9c6314a0(h2_io_cb) ctx=0x7e0ed75f48b0
  Thread 4 : id=0x7fce5d046700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=2524
      1/4    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31464633648 now=31469453722 diff=4820074
             curr_task=0
  Thread 5 : id=0x7fce5c845700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=2807
      1/5    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31234902183 now=31240064859 diff=5162676
             curr_task=0
  Thread 6 : id=0x7fce57fff700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=9 rqsz=2590
      1/6    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31412310937 now=32688752057 diff=1276441120
             curr_task=0x7fcc03cc7510 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7fcc03a783f0
             strm=0x7fcc03a783f0,800 src=172.70.188.115 fe=origin be=origin dst=unknown
             txn=0x7f0417e80d00,40000 txn.req=MSG_BODY,c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f802be80c60,EST,20 scb=0x7fcd92c11980,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fcc68cef370,a0000300:H2(0x7e231fc54a50)/SSL(0x7fcd0510d6a0)/tcpv4(58087)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 7 : id=0x7fce577fe700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=0 rqsz=2344
      1/7    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30848273637 now=31434680459 diff=586406822
             curr_task=0x7de606f7b880 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7de6076a26d0
             strm=0x7de6076a26d0,7800 src=172.68.238.140 fe=origin be=origin dst=unknown
             txn=0x7d35abe80370,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7fcd3b72f650,EST,20 scb=0x7fcd0f7e2000,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fcc0be80410,a0000300:H2(0x7fcd3ab07ea0)/SSL(0x7fcd3a2c7fe0)/tcpv4(23370)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 8 : id=0x7fce56ffd700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=14 rqsz=2647
      1/8    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30656317910 now=30720493392 diff=64175482
             curr_task=0x7ed2e7e80460 (task) calls=8 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7ed2e7e80070
             strm=0x7ed2e7e80070,67808 src=162.158.148.208 fe=origin be=? dst=unknown
             txn=0x7ed2e7e80690,0 txn.req=MSG_DONE,d txn.rsp=MSG_RPBEFORE,0
             rqf=4cc80020 rqa=8000 rpf=80002000 rpa=0
             scf=0x7fcca65a2b30,EST,220 scb=0x7ed2e7e80580,QUE,31
             af=(nil),0 sab=(nil),0
             cof=0x7fcd63273330,a0000300:H1(0x7fccbb4ce870)/SSL(0x7fccbb4d1b60)/tcpv4(33656)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 9 : id=0x7fce567fc700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=6 rqsz=2297
      1/9    stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30659844925 now=31455535560 diff=795690635
             curr_task=0x7f930fe80780 (task) calls=2 last=0
               fct=0x559a9c66bf00(process_stream) ctx=(nil)
  Thread 10: id=0x7fce55ffb700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=9 rqsz=2467
      1/10   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30966405735 now=31784129670 diff=817723935
             curr_task=0x7f48c3e80c30 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7fcd4372a970
             strm=0x7fcd4372a970,7800 src=172.70.233.57 fe=origin be=origin dst=unknown
             txn=0x7df26be806e0,40000 txn.req=MSG_BODY,c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7fcd4372b0f0,EST,20 scb=0x7fcd4372b080,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f72af442250,a0000300:H2(0x7fa8c3e567c0)/SSL(0x7dc833473810)/tcpv4(68751)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 11: id=0x7fce557fa700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=6 rqsz=2648
      1/11   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30900231605 now=31592441107 diff=692209502
             curr_task=0x7f9ab783e3c0 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7fcc6875f2e0
             strm=0x7fcc6875f2e0,800 src=172.70.49.199 fe=origin be=origin dst=unknown
             txn=0x7f1297e80710,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f9ab77b1dd0,EST,20 scb=0x7f9ab77b1eb0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fccc586c240,a0000300:H2(0x7f4413539ee0)/SSL(0x7f9ab7462620)/tcpv4(31849)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 12: id=0x7fce54ff9700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=0 rqsz=2448
      1/12   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30189226700 now=31842807457 diff=1653580757
             curr_task=0x7fcd7ad14c00 (tasklet) calls=21
               fct=0x559a9c6314a0(h2_io_cb) ctx=0x7fcd184b43c0
  Thread 13: id=0x7fce547f8700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=1491
      1/13   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=32176332166 now=32213417501 diff=37085335
             curr_task=0
  Thread 14: id=0x7fce53ff7700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=8 rqsz=2308
      1/14   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30637984258 now=31807812675 diff=1169828417
             curr_task=0x7f8b1be80070 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7fcd233d46e0
             strm=0x7fcd233d46e0,800 src=172.70.254.116 fe=origin be=origin dst=unknown
             txn=0x7fcc7fa45700,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7fc82b91af50,EST,20 scb=0x7fc82b91b150,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x559ab881f010,a0000300:H2(0x7fccd53d0830)/SSL(0x7fcd230fbdc0)/tcpv4(7855)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
 >Thread 15: id=0x7fce537f6700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=2388
      1/15   stuck=1 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30770985770 now=32107651653 diff=1336665883
             curr_task=0x7fbb73446e60 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7fcd63872000
             strm=0x7fcd63872000,7800 src=172.70.38.42 fe=origin be=origin dst=unknown
             txn=0x7fbb73cc4760,43000 txn.req=MSG_BODY,4c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7fbb73cada90,EST,20 scb=0x7ed5f5da3c70,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7ec1d31d8600,a0000300:H1(0x7ed5f43dc370)/SSL(0x7ed5f5a7eea0)/tcpv4(20143)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
             call trace(15):
             | 0x559a9c73cc3f [89 44 24 04 85 c0 75 29]: ha_dump_backtrace+0x3f/0x2fd
             | 0x559a9c73d6ae [48 8b 05 83 6b 1d 00 48]: debug_handler+0x6e/0x10b
             | 0x7fce81bd5420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
             | 0x7fce816759ea [c5 fd e7 4f 20 c5 fd e7]: libc:+0x18b9ea
             | 0x559a9c6827e2 [e9 55 fd ff ff 0f b6 0c]: http_reply_to_htx+0x832/0x857
             | 0x559a9c682899 [83 f8 ff 74 72 83 7d 0c]: http_reply_message+0x89/0x25e
             | 0x559a9c682a89 [83 f8 ff 0f 84 06 01 00]: http_reply_and_close+0x19/0x26d
             | 0x559a9c6842a6 [8b 03 f6 c4 f0 75 05 80]: http_process_req_common+0x2d6/0x172a
             | 0x559a9c66df78 [85 c0 0f 85 d2 f6 ff ff]: process_stream+0x2078/0x352b
  Thread 16: id=0x7fce52ff5700 act=1 glob=0 wq=1 rq=0 tl=1 tlsz=-50 rqsz=2218
      1/16   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31702980294 now=32228425938 diff=525445644
             curr_task=0x7e93afe80140 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x55b06363f990
             strm=0x55b06363f990,7800 src=108.162.210.165 fe=origin be=origin dst=unknown
             txn=0x7e93b3e80530,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7e22437ed110,EST,20 scb=0x7fcd8a1134c0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fcd6a910480,a0000300:H2(0x559ab805d110)/SSL(0x559aaf164ba0)/tcpv4(9443)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 17: id=0x7fce527f4700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=2573
      1/17   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30046791518 now=31044877669 diff=998086151
             curr_task=0x7dd283e80140 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7e7c57e80390
             strm=0x7e7c57e80390,800 src=172.68.222.131 fe=origin be=origin dst=unknown
             txn=0x7fab5d64fe50,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7da93a57b920,EST,20 scb=0x7da923e80ad0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fcd526e6fa0,a0000300:H2(0x7fcd52bc4130)/SSL(0x7fcd526e7370)/tcpv4(7658)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 18: id=0x7fce51ff3700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=9 rqsz=2442
      1/18   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30720435165 now=32039440258 diff=1319005093
             curr_task=0x7e8087a74420 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7f39fbe80070
             strm=0x7f39fbe80070,7800 src=172.71.10.138 fe=origin be=origin dst=unknown
             txn=0x7fcce54b6130,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7fcce5babe40,EST,20 scb=0x7fcce526b040,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fcd42b03060,a0000300:H2(0x7fcd5b6f0e60)/SSL(0x7fcd5ab71090)/tcpv4(11732)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 19: id=0x7fce517f2700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=2787
      1/19   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31110186237 now=31352327442 diff=242141205
             curr_task=0x7fcbc3be2c70 (tasklet) calls=23
               fct=0x559a9c6314a0(h2_io_cb) ctx=0x7fcbc3cd2f00
  Thread 20: id=0x7fce50ff1700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=7 rqsz=2489
      1/20   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30650231386 now=31459833306 diff=809601920
             curr_task=0x7fcc039d1700 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7dea4fe80070
             strm=0x7dea4fe80070,7800 src=172.70.247.55 fe=origin be=origin dst=unknown
             txn=0x7faedf1278e0,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7fcd059f1a20,EST,20 scb=0x7dc237e80e50,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7e9313aebaf0,a0000300:H2(0x7e231f0f9e30)/SSL(0x7fcc034b71e0)/tcpv4(11765)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 21: id=0x7fce507f0700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=6 rqsz=2685
      1/21   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30682976399 now=31175290092 diff=492313693
             curr_task=0x7f2efb6d5010 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7f2efb9e7f30
             strm=0x7f2efb9e7f30,800 src=172.68.186.150 fe=origin be=origin dst=unknown
             txn=0x7f2efb8b4e50,40000 txn.req=MSG_RQBEFORE,0 txn.rsp=MSG_RPBEFORE,0
             rqf=500002 rqa=34 rpf=80000000 rpa=0
             scf=0x7fc77363c520,EST,0 scb=0x7fcd6afc0960,INI,1
             af=(nil),0 sab=(nil),0
             cof=0x7e3debe80650,a0000300:H2(0x7e1e06bb8b00)/SSL(0x7fcd6a9e8070)/tcpv4(42268)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 22: id=0x7fce4ffef700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=13 rqsz=2551
      1/22   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31154684386 now=31249295956 diff=94611570
             curr_task=0x7fcd5a8de360 (task) calls=9 last=0
               fct=0x559a9c66bf00(process_stream) ctx=(nil)
  Thread 23: id=0x7fce4f7ee700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=3 rqsz=1950
      1/23   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30125424201 now=31566005660 diff=1440581459
             curr_task=0x7fc1a7aa3f30 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7fc1a7aa3b40
             strm=0x7fc1a7aa3b40,7800 src=172.70.254.90 fe=origin be=origin dst=unknown
             txn=0x7df8a3469f40,40000 txn.req=MSG_BODY,c txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7dbe2f7acac0,EST,20 scb=0x7dbe2f7ac970,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fcd43b14060,a0000300:H2(0x7fccfd960560)/SSL(0x7fcd4b4a8f00)/tcpv4(33988)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 24: id=0x7fce4efed700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=3 rqsz=2351
      1/24   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30694748622 now=31523082600 diff=828333978
             curr_task=0x7fb4abc47610 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7d380be80510
             strm=0x7d380be80510,7800 src=172.70.250.98 fe=origin be=origin dst=unknown
             txn=0x7d37c7e80690,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7e9312ffd760,EST,20 scb=0x7e9312f45cb0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fcd322f73c0,a0000300:H2(0x7fcd23020ce0)/SSL(0x7fcd222f07c0)/tcpv4(3838)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 25: id=0x7fce4e7ec700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=2515
      1/25   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30644674270 now=31281550053 diff=636875783
             curr_task=0x7fcba5865ff0 (tasklet) calls=6
               fct=0x559a9c6314a0(h2_io_cb) ctx=0x7fcd535ccd60
  Thread 26: id=0x7fce4dfeb700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=0 rqsz=2816
      1/26   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31482389733 now=31985947018 diff=503557285
             curr_task=0x7dbe2f5413f0 (tasklet) calls=18
               fct=0x559a9c6314a0(h2_io_cb) ctx=0x7e28621de0b0
  Thread 27: id=0x7fce4d7ea700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=0 rqsz=2500
      1/27   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30892530838 now=31549876133 diff=657345295
             curr_task=0x7f44f72b2bd0 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7f44f72b27e0
             strm=0x7f44f72b27e0,7800 src=141.101.68.218 fe=origin be=origin dst=unknown
             txn=0x7f65c25c1400,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7dc2ea428020,EST,20 scb=0x7fcceb0f39b0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fc4db022e40,a0000300:H2(0x7f65c21516f0)/SSL(0x7e552a0f4920)/tcpv4(58992)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 28: id=0x7fce4cfe9700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=1 rqsz=2372
      1/28   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30068163333 now=31639700187 diff=1571536854
             curr_task=0x7d3043e80460 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7d3043e80780
             strm=0x7d3043e80780,800 src=172.70.247.35 fe=origin be=origin dst=unknown
             txn=0x7d3043e80da0,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7e0156c4b7a0,EST,20 scb=0x7e01567aea40,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f65c2015080,a0000300:H2(0x7df317b42af0)/SSL(0x7df317298bc0)/tcpv4(64473)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 29: id=0x7fce4c7e8700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=2320
      1/29   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31733141966 now=31806841035 diff=73699069
             curr_task=0
  Thread 30: id=0x7fce4bfe7700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=2595
      1/30   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=32398722939 now=32555101320 diff=156378381
             curr_task=0x7fcd0d616d70 (tasklet) calls=26
               fct=0x559a9c6314a0(h2_io_cb) ctx=0x7fcd0fd6c960
  Thread 31: id=0x7fce4b7e6700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=2428
      1/31   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31466442026 now=31524885890 diff=58443864
             curr_task=0
  Thread 32: id=0x7fce4afe5700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=3 rqsz=2278
      1/32   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30934562186 now=31484721608 diff=550159422
             curr_task=0x7e1e06c778c0 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7f2efb68a4e0
             strm=0x7f2efb68a4e0,7800 src=172.70.126.175 fe=origin be=origin dst=unknown
             txn=0x7e1e06c77af0,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7fc773753960,EST,20 scb=0x7fc7737538f0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fcd52bb3c10,a0000300:H1(0x7fcd92b77490)/SSL(0x7fcd6abf4420)/tcpv4(12441)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 33: id=0x7fce4a7e4700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=0 rqsz=2067
      1/33   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=29843400906 now=31525542262 diff=1682141356
             curr_task=0x7fcce556de80 (tasklet) calls=12
               fct=0x559a9c6314a0(h2_io_cb) ctx=0x7fcce5cf1330
  Thread 34: id=0x7fce49fe3700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=0 rqsz=2272
      1/34   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=32195748198 now=32487784201 diff=292036003
             curr_task=0x559afe8b17d0 (tasklet) calls=41
               fct=0x559a9c6314a0(h2_io_cb) ctx=0x559f9fa8a630
  Thread 35: id=0x7fce497e2700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=2 rqsz=2304
      1/35   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30901897658 now=31690301363 diff=788403705
             curr_task=0x7fcbbb8a0280 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7f80e049fad0
             strm=0x7f80e049fad0,7800 src=162.158.90.37 fe=origin be=origin dst=unknown
             txn=0x7f80e04a0080,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7fcd73850f40,EST,20 scb=0x7fcd73851090,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7ec4abe80460,a0000300:H2(0x7d4c03e80a50)/SSL(0x7df9cdf209d0)/tcpv4(63133)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 36: id=0x7fce48fe1700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=2453
      1/36   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31421904178 now=31840608946 diff=418704768
             curr_task=0x7e9ee6de5510 (tasklet) calls=1
               fct=0x559a9c7215d0(sc_conn_io_cb) ctx=0x7fcd1b3c6940
             strm=0x7fcd7a0dd100,17800 src=162.158.114.170 fe=origin be=origin dst=unknown
             txn=0x7f0b22129a40,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40c0e060 rqa=0 rpf=c000c060 rpa=0
             scf=0x7fcd1b3c6940,EST,0 scb=0x7fcd1b3c69b0,CLO,1
             af=(nil),0 sab=(nil),0
             cof=0x7debbb940610,a0000300:H2(0x7f0b224c28f0)/SSL(0x7e9ee6e4b970)/tcpv4(64746)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 37: id=0x7fce487e0700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=4 rqsz=2499
      1/37   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30368565660 now=32597086958 diff=2228521298
             curr_task=0x7f85077be4e0 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7fcb676cb050
             strm=0x7fcb676cb050,7800 src=162.158.114.158 fe=origin be=origin dst=unknown
             txn=0x7f85077bece0,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7fcc54ec4190,EST,20 scb=0x7fcc54ec4050,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fccc5859840,a0000300:H2(0x7fcc54caa6d0)/SSL(0x7fcd437a4580)/tcpv4(31825)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
 >Thread 38: id=0x7fce47fdf700 act=1 glob=1 wq=1 rq=1 tl=1 tlsz=2 rqsz=2145
      1/38   stuck=1 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30481898194 now=32144360559 diff=1662462365
             curr_task=0x7fcd059c4e90 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7f6907c74830
             strm=0x7f6907c74830,7808 src=141.101.68.87 fe=origin be=? dst=unknown
             txn=0x7d399be802d0,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=44d08002 rqa=900 rpf=80000000 rpa=0
             scf=0x7fcd92b71170,EST,20 scb=0x7fcc0382fcc0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fcd930ef0e0,a0000300:H2(0x7fcd925165f0)/SSL(0x7fcd930ef1b0)/tcpv4(17300)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
             call trace(15):
             | 0x559a9c73cc3f [89 44 24 04 85 c0 75 29]: ha_dump_backtrace+0x3f/0x2fd
             | 0x559a9c73d6ae [48 8b 05 83 6b 1d 00 48]: debug_handler+0x6e/0x10b
             | 0x7fce81bd5420 [48 c7 c0 0f 00 00 00 0f]: libpthread:+0x14420
             | 0x7fce816759ef [c5 fd e7 57 40 c5 fd e7]: libc:+0x18b9ef
             | 0x559a9c6827e2 [e9 55 fd ff ff 0f b6 0c]: http_reply_to_htx+0x832/0x857
             | 0x559a9c682899 [83 f8 ff 74 72 83 7d 0c]: http_reply_message+0x89/0x25e
             | 0x559a9c682a89 [83 f8 ff 0f 84 06 01 00]: http_reply_and_close+0x19/0x26d
             | 0x559a9c6842a6 [8b 03 f6 c4 f0 75 05 80]: http_process_req_common+0x2d6/0x172a
             | 0x559a9c66e14f [85 c0 0f 85 ba f6 ff ff]: process_stream+0x224f/0x352b
  Thread 39: id=0x7fce477de700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=6 rqsz=2543
      1/39   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31296804545 now=32119651154 diff=822846609
             curr_task=0x7e9ee6b50da0 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7fcd7aebd000
             strm=0x7fcd7aebd000,7800 src=172.69.227.137 fe=origin be=origin dst=unknown
             txn=0x7e9ee6aae270,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f0b225be5d0,EST,20 scb=0x7e9ee67608c0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7f93ff30b9c0,a0000300:H2(0x7f0b22cbfa80)/SSL(0x7fccc5607dd0)/tcpv4(9715)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 40: id=0x7fce46fdd700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=2266
      1/40   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31211488892 now=31322120821 diff=110631929
             curr_task=0
  Thread 41: id=0x7fce467dc700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=2677
      1/41   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31008039277 now=31088445398 diff=80406121
             curr_task=0
  Thread 42: id=0x7fce45fdb700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=2574
      1/42   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30960194247 now=31147153451 diff=186959204
             curr_task=0x7fcd0d7b7b70 (tasklet) calls=11
               fct=0x559a9c6314a0(h2_io_cb) ctx=0x7fc716834e60
  Thread 43: id=0x7fce457da700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=9 rqsz=2572
      1/43   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31161662224 now=31893195274 diff=731533050
             curr_task=0x7faae873f630 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7faae873f240
             strm=0x7faae873f240,800 src=141.101.68.191 fe=origin be=origin dst=unknown
             txn=0x7faae873b050,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f262fe80760,EST,20 scb=0x7f262fe801e0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fcc0eb1fdf0,a0000300:H2(0x7fcaf7a0eb00)/SSL(0x7fcaf50db890)/tcpv4(35366)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 44: id=0x7fce44fd9700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=8 rqsz=2359
      1/44   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30619791405 now=31403569129 diff=783777724
             curr_task=0x7f49c3758b20 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7df665fa6000
             strm=0x7df665fa6000,800 src=162.158.91.131 fe=origin be=origin dst=unknown
             txn=0x7fcd16ca9f80,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7df66679e280,EST,20 scb=0x7df66623fe70,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fcb6734d010,a0000300:H2(0x7fcd16e5c450)/SSL(0x7faae89af930)/tcpv4(46973)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 45: id=0x7fce447d8700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=12 rqsz=2718
      1/45   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30673554108 now=31119252510 diff=445698402
             curr_task=0x7fbb5fe80530 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7ddcdbe80960
             strm=0x7ddcdbe80960,7800 src=198.41.242.29 fe=origin be=origin dst=unknown
             txn=0x7f988744b8e0,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7f9887430bc0,EST,20 scb=0x7f9887430b50,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7fcd3aeb2610,a0000300:H2(0x7fab5d9baa50)/SSL(0x7fab5cfa6360)/tcpv4(52259)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 46: id=0x7fce43fd7700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=10 rqsz=2454
      1/46   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=30690467121 now=31181956819 diff=491489698
             curr_task=0x7fcc0d695800 (task) calls=1 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7ea92b393250
             strm=0x7ea92b393250,7800 src=172.68.112.160 fe=origin be=origin dst=unknown
             txn=0x7fb183e80870,40000 txn.req=MSG_BODY,d txn.rsp=MSG_RPBEFORE,0
             rqf=40d08002 rqa=30 rpf=80000000 rpa=0
             scf=0x7ea92b39ec70,EST,20 scb=0x7ea92b39ebe0,INI,21
             af=(nil),0 sab=(nil),0
             cof=0x7e231f807030,a0000300:H2(0x7da073e80070)/SSL(0x7df315f62930)/tcpv4(60815)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)
  Thread 47: id=0x7fce437d6700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=0 rqsz=2750
      1/47   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31862613583 now=31910080272 diff=47466689
             curr_task=0
  Thread 48: id=0x7fce42fd5700 act=1 glob=0 wq=1 rq=1 tl=1 tlsz=9 rqsz=2514
      1/48   stuck=0 prof=0 harmless=0 wantrdv=0
             cpu_ns: poll=31967688153 now=32137454324 diff=169766171
             curr_task=0x7fccf907b890 (task) calls=9 last=0
               fct=0x559a9c66bf00(process_stream) ctx=0x7fcc9f306b80
             strm=0x7fcc9f306b80,67808 src=162.158.166.141 fe=origin be=? dst=unknown
             txn=0x7fccf90644f0,40000 txn.req=MSG_DATA,d txn.rsp=MSG_RPBEFORE,0
             rqf=4c80020 rqa=8000 rpf=80002000 rpa=0
             scf=0x7fcc9fa24360,EST,220 scb=0x7fcd6b4d5550,QUE,21
             af=(nil),0 sab=(nil),0
             cof=0x7fcd8b47c870,a0000300:H2(0x7fcc9f4b1df0)/SSL(0x7fcd6b4fee00)/tcpv4(21287)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(-1)

I think the second one contains more information.

There is also another piece of information I found out about the incident last Wednesday. The number of tasks in the run-queue correlated with the CPU usage:
image

I guess this is related to the number of queued connections but I wanted to let you know in case it means something different.

I was not yet able to reproduce this on a test instance, but since I disabled the priority queue settings last Thursday there was a little spike on Friday when a backend failed for three minutes.
image
image
We can see tht the actual CPU spike is shifted by ~30s compared to the queue spike.

Also the are no queued tasks this time. Maybe because we were not at 100% CPU utilization.
image

Luckily I found a comparable event that happened on Thursday a few hours before I turned off the queue manipulation:
image
image
image

In both cases the backend queue reaches ~30k entries but the CPU load is considerably lower after turning off queue manipulation. Of course this is just two data points so I do not want to jump to any conclusions here but it looks promising. I am still wondering if the smaller CPU spike on Friday is still out of the ordinary or expected.

@phihos
Copy link
Contributor Author

phihos commented Jul 12, 2022

Hi, one of our backend services just failed and I was able to gather the needed information during a real incident:

show activity 1

thread_id: 26 (1..40)
date_now: 1657632463.256912
ctxsw: 1446450053 [ 1193177774 3198202008 1175329591 1311914023 1475435981 1151027506 1156796940 1150123683 1162134369 1147865670 1151165353 1157408620 1139695245 1164633804 1155446366 1150810017 1153827015 1138678450 1142989996 1157165498 1147202445 1147677880 1153018762 1149091104 1167423655 1149016310 1153062062 1155912433 1163334320 1142381857 1157636675 1149576358 1148053829 1151925345 1165233245 1159437384 1147336012 1151563884 1149587098 1148791742 ]
tasksw: 3768835246 [ 282906027 1721801769 273934086 356107779 445262601 263745855 265160855 264089467 265931027 263335016 263876917 265022487 261809737 266347778 265021056 264102958 264115621 261536014 262472130 275925098 262920024 263346713 269965306 263717176 267599519 263876822 264156875 265328355 275992387 262441566 264672636 263697860 263756892 264043518 266484291 265891371 263230564 264146763 265083257 265913665 ]
empty_rq: 518257681 [ 9165448 58390037 9128060 9651648 7691006 8591192 8694167 8614039 8747261 8641251 8823330 8664985 6977428 10613667 8955342 8775304 9291883 8955913 8499436 40911221 11396071 8976704 31249301 9046314 12863520 8907093 9577774 7551695 37957098 7384568 9210254 8515138 7438216 9933173 8567666 8321676 13064556 6974374 19149251 24390621 ]
long_rq: 304874 [ 6849 8529 6964 7432 3786 9034 8491 7976 7335 7655 7331 7592 8267 7318 7137 7528 7250 7386 7945 7333 7661 8187 7432 7225 8565 7551 7336 9398 6784 8573 6274 7928 8943 7818 7729 8648 7212 7766 7679 7027 ]
loops: 493139056 [ 427983306 599795650 419552277 476885729 539105752 411808558 411995149 410987591 413898094 410849865 412143630 413169882 407632307 419021877 412138877 411259612 412908865 409497192 409911694 639448253 421590742 410884078 561354826 411099949 445995733 410919843 413520340 410496236 610329573 408540020 413731490 410948931 409218225 414627348 414128550 412159700 431049345 409726467 472291021 510401663 ]
wake_tasks: 823969148 [ 20645189 25089053 20438671 20491246 20704483 20430266 20309128 20408411 20508217 20355793 20439257 20501046 20299547 20409702 20321041 20354637 20400664 20296844 20339350 21468321 20466003 20325970 20806523 20298868 20493657 20322721 20429819 20305170 21084969 20447841 20461495 20456690 20310898 20443080 20529898 20463349 20491161 20444808 20546713 20628649 ]
wake_signal: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
poll_io: 3154161573 [ 270477571 572246974 261754028 320373540 386418630 253972411 254969584 253893675 256363346 253359923 254264334 255545679 251569500 261601002 254835175 253897449 254574313 251525934 252418532 546577343 264901964 253181440 449509746 253352036 300359104 253500725 255047993 254915971 509819993 252308537 255568536 253766569 253244098 256146894 256884417 255566732 277041539 254113139 331856303 382371486 ]
poll_exp: 305928744 [ 7194025 11101951 7289250 7063245 4129283 6876922 6910257 6826387 6961035 6909484 7082263 6898149 5270362 8327711 7170546 7029102 7542623 7252769 6789336 13054790 8570993 7246345 11929327 7329861 7043024 7147269 7768734 5732716 12975742 5645723 7443969 6756014 5699050 8021858 6781725 6529711 9142116 5206966 10177888 11100223 ]
poll_drop_fd: 13031661 [ 333948 322139 330649 322864 331477 330623 324050 324439 331338 321920 327925 331133 320322 331078 324812 321921 327877 318087 320797 326224 328133 322335 321639 322538 329589 323668 330594 323715 329961 319761 333626 322634 320748 328130 331138 328079 326413 323109 320646 321582 ]
poll_skip_fd: 451340888 [ 11345854 11196823 11321714 11249247 11476689 11238588 11388755 11359440 11402597 11228309 11265222 11319887 11177393 11426951 11354930 11283575 11293744 11133779 11154136 11163743 11187830 11195714 11195664 11213691 11517556 11304612 11274288 11441399 11203665 11188004 11360417 11291182 11247593 11182312 11409006 11422717 11183954 11292978 11246795 11200135 ]
conn_dead: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
stream_calls: 1413871274 [ 143832560 142062761 143657751 142290670 144011092 142530159 143320956 142560184 143759875 142234778 142655212 143243787 141307642 144095403 143143429 142711820 142794368 141181062 141733168 141623718 142096184 142264315 142071996 142508149 144492933 142437656 142738027 143252943 142503074 141687163 143209772 142471836 142414263 142773193 144093955 143706871 142121596 142732094 142359464 142152691 ]
pool_fail: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
buf_wait: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
cpust_ms_tot: 836815 [ 20962 15808 16021 20087 13114 15439 18304 20844 16448 19301 15235 15186 42037 16151 20632 20039 15465 20174 19070 15562 15750 18365 19484 19227 42876 19448 15231 36291 20297 35738 20938 19775 35807 15851 15214 19363 14411 42189 15639 19042 ]
cpust_ms_1s: 1454 [ 43 30 22 52 9 22 67 36 11 31 11 24 96 20 52 16 31 51 23 19 24 76 36 31 83 30 19 31 22 29 45 18 25 42 32 23 23 139 46 14 ]
cpust_ms_15s: 28029 [ 766 522 504 629 296 402 625 592 482 647 502 404 1606 438 679 446 351 574 520 579 462 661 625 621 1793 768 526 1494 725 1432 697 541 1553 451 416 605 307 1691 438 659 ]
avg_loop_us: 6306 [ 7061 5799 4414 7149 11412 3459 5497 7143 4857 1035 6251 5715 9091 7709 7050 6375 7509 5929 7612 954 6618 7833 6140 8792 8980 5409 5329 8675 6813 7740 8507 6970 7415 1683 7212 4817 5905 8036 30 7314 ]
accepted: 365084083 [ 429453 569741 511348 527881 482394 432035 504991 502137 485249 525722 501285 471639 453194 398865 514204 510532 413333 499484 510768 208182533 497213 387857 34124181 456448 1871261 478453 376924 504402 86609543 427236 511388 425666 490983 369462 450651 445960 901726 447679 5023744 12856518 ]
accq_pushed: 365084083 [ 9127529 9128790 9124295 9133468 9132933 9133179 9118489 9128200 9130242 9134450 9126407 9123459 9119403 9121304 9128256 9133203 9128681 9129803 9134905 9131200 9132326 9127239 9129752 9122139 9130022 9137329 9126565 9120297 9120465 9126354 9122845 9125497 9120858 9127786 9116595 9118805 9132999 9128776 9127798 9121440 ]
accq_full: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
accq_ring: 17 [ 0 0 0 2 0 0 0 0 0 0 1 2 4 1 0 0 2 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
fd_takeover: 452430966 [ 11374220 11237147 11348856 11275976 11504258 11265050 11414812 11387051 11428057 11254504 11291791 11347064 11204695 11453381 11381490 11309497 11319294 11160332 11180919 11192803 11214057 11222713 11222309 11239622 11545534 11331917 11301601 11469572 11232337 11215722 11389240 11317949 11274602 11207801 11435351 11449977 11209833 11318800 11274665 11226167 ]

show activity 2

thread_id: 27 (1..40)
date_now: 1657632473.596697
ctxsw: 1447336609 [ 1193201134 3198225036 1175351402 1311936594 1475451006 1151050384 1156819278 1150145399 1162157065 1147887909 1151189215 1157432332 1139714822 1164656265 1155469454 1150832910 1153849465 1138701516 1143013424 1157188849 1147225832 1147701106 1153040384 1149114360 1167442194 1149038209 1153085683 1155933207 1163357601 1142402983 1157659828 1149598806 1148074009 1151948197 1165255735 1159459244 1147358844 1151583578 1149609329 1148814277 ]
tasksw: 3769081414 [ 282912492 1721808292 273940059 356113962 445269260 263752089 265167078 264095406 265937085 263341268 263883405 265028923 261815182 266353871 265027256 264109209 264121799 261542397 262478528 275931437 262926537 263353084 269971262 263723522 267604632 263882832 264163291 265334058 275998818 262447377 264679042 263703994 263762561 264049803 266490407 265897416 263236841 264152254 265089388 265919889 ]
empty_rq: 519094783 [ 9187771 58442079 9155726 9666069 7691006 8598537 8708591 8644856 8774759 8698255 8840631 8680963 6977428 10620024 8965195 8805260 9306861 9013235 8507957 40944347 11408166 8989833 31293636 9053956 12863520 8930406 9610574 7551695 37974690 7384568 9243086 8534101 7438216 9959931 8574741 8356181 13077984 6974374 19229646 24415929 ]
long_rq: 314688 [ 7074 8751 7200 7745 3823 9397 8786 8220 7574 7840 7659 7927 8456 7620 7437 7788 7546 7608 8275 7609 7987 8445 7666 7511 8719 7874 7557 9529 7038 8729 6487 8147 9085 8101 8021 8891 7499 7947 7827 7293 ]
loops: 494046891 [ 428007672 599849760 419581666 476901851 539106834 411817437 412011057 411020091 413927578 410909023 412162569 413187449 407633671 419029917 412150430 411291428 412925475 409556644 409921767 639483355 421604451 410898887 561401234 411109362 445997121 410944723 413555150 410498004 610348997 408541723 413766528 410969708 409219935 414655880 414137328 412195994 431064463 409728053 472373919 510428941 ]
wake_tasks: 824022964 [ 20646767 25090162 20439988 20492659 20705565 20431470 20310308 20409592 20509559 20356858 20440532 20502231 20300911 20411200 20322480 20355897 20401922 20298169 20340685 21469667 20467235 20327321 20807819 20300445 20495045 20323945 20431163 20306938 21086473 20449544 20463063 20457955 20312608 20444258 20531237 20464577 20492453 20446394 20547880 20629989 ]
wake_signal: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
poll_io: 3154215005 [ 270479091 572248666 261755327 320374683 386419637 253973592 254970706 253895004 256364890 253361760 254265535 255546915 251570382 261602091 254836390 253898861 254575550 251527622 252419659 546578937 264903206 253182726 449511500 253353247 300360013 253501947 255049664 254917024 509821399 252309631 255570205 253767944 253245128 256148318 256885657 255568136 277042775 254114117 331858623 382373039 ]
poll_exp: 306767859 [ 7216361 11154183 7316943 7077737 4129283 6884294 6924723 6857244 6988607 6966696 7099636 6914189 5270362 8334135 7180421 7059166 7557644 7310251 6797873 13087989 8583088 7259473 11973701 7337519 7043024 7170611 7801572 5732716 12993358 5645723 7476865 6775062 5699050 8048680 6788851 6564314 9155597 5206966 10258371 11125581 ]
poll_drop_fd: 13032989 [ 333979 322163 330679 322896 331477 330675 324102 324469 331361 321949 327979 331174 320365 331121 324854 321948 327914 318107 320837 326264 328175 322371 321673 322563 329612 323719 330628 323721 330003 319781 333648 322662 320770 328169 331171 328110 326457 323153 320664 321626 ]
poll_skip_fd: 451343715 [ 11345931 11196888 11321798 11249320 11476734 11238662 11388817 11359518 11402668 11228389 11265294 11319957 11177468 11427026 11354997 11283651 11293817 11133861 11154207 11163810 11187899 11195777 11195742 11213762 11517624 11304685 11274372 11441445 11203733 11188055 11360502 11291252 11247646 11182388 11409075 11422797 11184020 11293036 11246876 11200216 ]
conn_dead: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
stream_calls: 1414069842 [ 143837720 142067814 143662581 142295806 144012786 142535429 143326223 142565055 143764816 142239685 142660771 143249270 141312172 144100437 143148638 142716898 142799561 141186157 141738638 141628916 142101605 142269666 142076772 142513393 144497224 142442711 142743256 143257621 142508323 141691965 143214807 142476848 142418935 142778371 144099003 143711784 142126864 142736702 142364165 142157750 ]
pool_fail: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
buf_wait: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
cpust_ms_tot: 855322 [ 21376 16076 16446 20496 13325 15727 18697 21370 16634 19694 15659 15439 43266 16558 21088 20509 15749 20670 19408 15914 16016 18781 19896 19505 44109 19869 15479 37203 20690 36618 21408 20159 36771 16087 15522 19841 14635 43240 15941 19451 ]
cpust_ms_1s: 1462 [ 29 34 17 14 7 29 42 39 6 43 34 15 106 52 28 13 32 29 21 18 26 27 16 44 112 40 8 78 42 53 55 27 46 20 25 33 13 144 17 28 ]
cpust_ms_15s: 26819 [ 649 412 551 628 281 394 608 721 316 569 569 383 1678 535 667 592 392 675 482 494 405 624 622 490 1790 640 410 1307 630 1309 684 524 1346 370 464 630 325 1619 449 585 ]
avg_loop_us: 5016 [ 4680 528 5430 7286 10851 3768 6219 5956 4417 2984 3657 5913 9989 6189 6298 2069 5911 1138 5423 1825 5708 2959 2840 5328 8648 7033 1111 6918 5249 6104 4483 4697 5599 1681 6053 6148 6008 7962 170 5414 ]
accepted: 365104430 [ 430016 570486 511831 528250 482720 432448 505307 502665 485802 526456 501657 472004 453492 399266 514657 511079 413769 500166 511196 208183157 497689 388349 34124961 456870 1871610 478835 377576 504851 86610134 427689 512101 426127 491477 370010 451041 446535 902135 448059 5024842 12857115 ]
accq_pushed: 365104430 [ 9128046 9129235 9124783 9134003 9133533 9133665 9118991 9128696 9130746 9134896 9126876 9123951 9119973 9121856 9128775 9133678 9129184 9130284 9135378 9131722 9132800 9127719 9130250 9122686 9130563 9137786 9127060 9120887 9120983 9126936 9123364 9125947 9121470 9128250 9117078 9119292 9133524 9129354 9128268 9121942 ]
accq_full: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
accq_ring: 3 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 ]
fd_takeover: 452433960 [ 11374297 11237219 11348936 11276054 11504308 11265129 11414882 11387130 11428134 11254589 11291872 11347135 11204778 11453458 11381559 11309579 11319380 11160415 11181001 11192868 11214130 11222783 11222395 11239696 11545606 11332000 11301686 11469616 11232408 11215775 11389330 11318027 11274661 11207876 11435420 11450063 11209902 11318863 11274748 11226252 ]

show activity 3

thread_id: 28 (1..40)
date_now: 1657632576.288458
ctxsw: 1455783224 [ 1193420367 3198448184 1175575031 1312153090 1475583720 1151270678 1157040484 1150363696 1162379999 1148105140 1151409441 1157655084 1139883431 1164878691 1155686685 1151044345 1154072100 1138917153 1143231047 1157410199 1147449643 1147924233 1153262616 1149332910 1167608582 1149258321 1153305500 1156114516 1163576983 1142585028 1157877506 1149813718 1148258846 1152172648 1165483342 1159678198 1147580837 1151751140 1149831137 1149029211 ]
tasksw: 3771438124 [ 282972224 1721870183 274001817 356173414 445334673 263812608 265228079 264154998 265997741 263401001 263943399 265090101 261862545 266415462 265087303 264167523 264182801 261601840 262538281 275992223 262988305 263413964 270032246 263783507 267651502 263943586 264223954 265385071 276059140 262498102 264739084 263763180 263814479 264111252 266552722 265957759 263297903 264199421 265150182 265979141 ]
empty_rq: 521755372 [ 9280858 58477036 9294849 9753045 7691006 8677373 8781518 8704327 8861390 8783422 8948556 8779897 6977428 10682706 9030750 8935927 9397045 9072313 8549572 40998302 11493612 9074234 31423594 9108019 12863520 8959732 9688505 7551695 38028721 7384568 9272485 8591839 7438216 10024512 8696090 8419749 13229660 6974374 19352656 24502271 ]
long_rq: 414787 [ 9681 11741 10039 10479 3945 12180 11590 10889 10473 10398 10424 10693 9731 10452 10028 10278 10446 10311 10687 10620 11107 11289 10457 10252 9929 10762 10334 11139 9711 10297 9256 10950 10729 11013 11008 11629 10187 9203 10702 9748 ]
loops: 497310407 [ 428116052 599899154 419736863 477004008 539118147 411912040 412099286 411094881 414029571 411009843 412286448 413302433 407647387 419107905 412230958 411437722 413031402 409630972 409978913 639552668 421705596 410998746 561547385 411178110 446011136 410988544 413648685 410511267 610417704 408555538 413810149 411041966 409233762 414736424 414274358 412274599 431232554 409741912 472513326 510531177 ]
wake_tasks: 824562101 [ 20660195 25103439 20453270 20505996 20716878 20444781 20323863 20423493 20523017 20370318 20453860 20515928 20314627 20424668 20336001 20369314 20415591 20312169 20354553 21483210 20480339 20341048 20821132 20313834 20509060 20337605 20444612 20320201 21099803 20463359 20476565 20471325 20326435 20457964 20544425 20478378 20505945 20460253 20561088 20643559 ]
wake_signal: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
poll_io: 3154646979 [ 270490461 572258755 261767444 320385539 386430157 253985055 254981696 253905762 256375834 253373020 254277187 255558408 251579227 261613022 254847022 253910300 254586940 251538096 252430669 546589982 264914889 253193970 449523821 253363655 300369187 253511963 255061020 254925701 509832157 252318652 255580028 253777972 253254134 256160021 256897154 255578785 277055102 254123000 331871047 382384737 ]
poll_exp: 309430271 [ 7309498 11189234 7456157 7164831 4129283 6963227 6997695 6916743 7075315 7051945 7207627 7013218 5270362 8396832 7246043 7190002 7647906 7369423 6839498 13141925 8668545 7343854 12103685 7391635 7043024 7199981 7879528 5732716 13047371 5645723 7506349 6832893 5699050 8113173 6910352 6627995 9307372 5206966 10381385 11211910 ]
poll_drop_fd: 13045343 [ 334263 322516 330977 323191 331479 331023 324470 324746 331761 322276 328259 331533 320546 331483 325149 322286 328268 318479 321147 326646 328538 322734 322035 322848 329799 324097 330963 323952 330344 319987 334007 323014 321004 328528 331533 328447 326754 323285 321043 321933 ]
poll_skip_fd: 451365721 [ 11346500 11197455 11322364 11249866 11476898 11239223 11389391 11360105 11403278 11228936 11265877 11320542 11177938 11427580 11355532 11284224 11294397 11134422 11154797 11164399 11188493 11196368 11196353 11214316 11518067 11305249 11274949 11441938 11204335 11188555 11361045 11291842 11248159 11182966 11409672 11423341 11184583 11293533 11247464 11200769 ]
conn_dead: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
stream_calls: 1415985619 [ 143887453 142119209 143713746 142345448 144020738 142585960 143377364 142614699 143815652 142289508 142710784 143300273 141350474 144152323 143198965 142764987 142850270 141235715 141788301 141679753 142153258 142320736 142127307 142563887 144535033 142493813 142793880 143300169 142559006 141733655 143265349 142526494 142461888 142829692 144151179 143762188 142177586 142775000 142414538 142206635 ]
pool_fail: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
buf_wait: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
cpust_ms_tot: 1112103 [ 28430 20221 20731 25648 16723 19910 24275 26856 20814 24872 20305 20164 58083 21511 26531 26417 20359 26623 24550 20687 20610 24468 25780 25070 59346 25087 20223 49844 27869 49553 28396 25660 49496 20305 19578 25164 18433 58546 20199 24766 ]
cpust_ms_1s: 2375 [ 53 27 58 24 11 47 89 67 14 19 49 46 149 46 22 37 53 56 31 29 33 56 69 25 223 65 31 108 39 100 53 68 93 47 52 38 17 247 57 27 ]
cpust_ms_15s: 39402 [ 952 838 824 814 650 593 934 859 645 773 731 703 2133 630 824 830 663 858 795 633 709 852 993 938 2458 911 713 1946 1089 1804 966 888 1903 639 699 847 638 2177 776 774 ]
avg_loop_us: 5316 [ 5908 1137 5486 2366 8103 6039 7746 1018 5102 5215 1085 4219 7164 6259 7426 5932 5333 2628 5446 2185 7145 3970 4821 6480 6149 2238 5681 7066 7165 7010 7897 6157 7820 6110 6102 5334 1596 7880 5191 5049 ]
accepted: 365319028 [ 435674 575306 517728 533384 488175 437766 510490 507763 491128 532006 507162 477442 458760 404472 519758 517008 419140 505344 516450 208188291 503294 393700 34131457 461822 1877220 483616 383036 509723 86615430 432954 516664 431033 496751 375551 456498 452077 907972 453347 5030877 12862759 ]
accq_pushed: 365319028 [ 9133205 9134304 9129904 9139070 9141388 9138761 9124151 9133866 9135914 9140080 9131952 9129206 9126114 9126990 9133954 9138893 9134357 9135634 9140727 9136910 9137826 9132996 9135314 9127817 9136736 9142949 9132186 9126655 9126153 9132974 9128604 9131125 9127472 9133499 9122161 9124543 9138617 9135523 9133307 9127191 ]
accq_full: 0 [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
accq_ring: 11 [ 0 0 0 0 0 2 0 0 0 1 0 0 1 0 0 3 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 ]
fd_takeover: 452457379 [ 11374899 11237811 11349545 11276630 11504512 11265720 11415498 11387715 11428782 11255168 11292484 11347736 11205319 11454054 11382121 11310176 11320002 11160996 11181612 11193491 11214749 11223432 11223015 11240286 11546134 11332584 11302296 11470169 11233043 11216323 11389909 11318646 11275234 11208490 11436049 11450646 11210484 11319422 11275365 11226832 ]

perf top 1

haproxy_1

perf top 2

haproxy_2

CPU usage

grafik

HaProxy metrics

grafik

There were no health checks and no set-priority-class. Even after the backend service itself recovered HaProxy continued to use a lot of CPU and send a lot of 5xx responses. I had to restart it to go back to normal.

@wtarreau
Copy link
Member

Thanks for the new details.

There's a huge amount of time spent in stktable_lookup_key(). This one is used for looking up entries in stick tables, for example on track-sc rules. So you very likely have some hot paths in your config that result in extra costs for high connection rates like the one that was provoked by the loss of the server. Maybe you end up storing a lot of information (counters etc) in your stick-tables for each new connection and you're hitting a performance wall.

@phihos
Copy link
Contributor Author

phihos commented Jul 13, 2022

Thanks you for looking at the details. I also posted my config further up. Do you have a hunch what stick-table usage causes this? The ones that are always used are these:

backend http-request-rate-long
  stick-table type ip size 100K expire 30s peers company_peers store http_req_rate(30s),http_err_rate(30s),bytes_out_rate(30s)

backend http-request-rate-short
  stick-table type ip size 100K expire 1s peers company_peers store http_req_rate(1s),http_err_rate(1s),bytes_out_rate(1s)

There are a lot of ACLs using these frequently. But they work fine for 10k requests/sec as long as all backends are up. I do not know what ACLs or other statements cause a spike in lookups when a backend is down.

Do you know how to debug this?

@wtarreau
Copy link
Member

If you have many ACLs referencing the entries, it will depend how your ACLs are written. For example if you perform a lookup for each rule it can start to accumulate. Maybe you could have a few initial rules that retrieve the values you need and place them into variables, then have your ACLs only use these variables ? Or maybe you should use track-sc and make sure to always only refer to the values using sc-*. It's very hard to say without seeing a more detailed config example. As a general rule, keep in mind that if you perform a lookup in a table from a value, it will come with a cost, small but existing. If you use track-sc, a single lookup is performed, then a reference to the line of the table is held, so the sc* rules will simply follow that reference without having to perform another lookup. I don't know if my explanation is clear :-)

@phihos
Copy link
Contributor Author

phihos commented Jul 13, 2022

@wtarreau Oh there is a misunderstanding: I already posted a full example. I left nothing out except a few values. But the ACLs and the way I retrieve values are all there. Could you have another look and confirm that my config is indeed doing it in a very inperformant way?

So you say that for example src_http_err_rate and sc_http_err_rate do the same thing, but the latter is more performant for future lookups? If so I will replace all fetches with the sc_ variants.

@wtarreau
Copy link
Member

Sorry I hadn't seen that part (the dumps are quite long and it's trivial to miss some info while scrolling).

The dump you found in your logs is instructive. It's not systemd that killed the process, it's the process' watchdog that found that one thread was spinning for too long without making any progress. I could resolve the address, it points to this stick-table lock in the peers code:

1603                    HA_SPIN_LOCK(STK_TABLE_LOCK, &st->table->lock);

We know that stick-table accesses are still heavy (no R/W locks, only spin locks even for pure lookups, that certainly needs to be improved). Peers synchronization definitely take quite a share of the lock bandwidth there. And your config example shows that there are indeed a significant number of rules dereferencing the stick-tables.

What I suspect in fact is that during a reload, the old process connects to the new one to teach it its stick-table contents and that the CPU peak you're seeing corresponds to such a reload: the peers that are flooding the stick-tables are competing with many accesses and some threads take way too much time to make forward progress. In addition, the reload is the moment where the old and new process' threads are also competing with each other, often making the situation worse.

This doesn't necessarily explain the wave of 5xx that you're noticing but we need to be prudent before we're certain about which ones they are (ideally finding some in the logs would help so that we know if they are, say, 503, and we can confirm from the termination flags, times, and number of retries what happened). For example the process could be hindered by the other one or by the time wasted dealing with a rush of updates and some short timeouts could be met, e.g. timeout queue if the locking slows everything down.

Thus I think you're not facing a bug in the critical definition of the term but a design limitation that's in the process of being addressed for 2.7 (and admittedly for the user it's a bug in that it doesn't do what you'd rightfully expect it to). However the good news is that there are ways to significantly improve this.

First, half of your ACLs are performing stick-table lookups, 25% for table "long" and 25% for table "short" (14 each per request, to be precise). Given that you're always using these two tables from the same key (src), and you are already tracking them for the same key in sc0 and sc1 respectively, everywhere you use src_http_req_rate(http-request-rate-short) you should replace it with sc_http_req_rate(1) etc. In short, replace src_xxxx(short) with sc_xxxx(1), and src_xxxx(long) with sc_xxxx(short). This will turn your total of 30 stick-table lookups per request to only 2 and will significantly relieve the process!

Second, the function peers_send_teachmsgs() is a mess because it tries to send lots of messages taking a lock for each of them. It will proceed until it either reaches the end or fills a whole buffer. And with your extremely large 16MB buffers (1000 times the standard ones!) it can definitely loop for a long while! So what we need to do is to limit the number of messages processed at once in this function, and also use a trylock instead of a mandatory lock once we've got at least one update, so that if there is contention, the operation is only postponed. I'd indeed prefer to see the peers synchronize a bit slower under load than seeing huge latencies being added because of this!

Third, and related to the second point above, do you really need 16 MB buffers ? I mean, it's by a very large margin the largest value we've seen in 20 years, and even for the CPU's cache efficiency it's far from being optimal. Those who use extremely large buffers for body analysis sometimes go as far as 1 MB. And for fast transfers, 64kB is already optimal.

Thus there are two parts for you and one for us :-) I'm marking the issue as "waiting feedback" to help us find it, please let us know once you've performed the possible updates, and check the effect with perf top to compare. I'm having a look at the ugly function now.

@wtarreau wtarreau added status: feedback required The developers are waiting for a reply from the reporter. and removed status: needs-triage This issue needs to be triaged. labels Jul 19, 2022
@phihos
Copy link
Contributor Author

phihos commented Jul 19, 2022

Hi, thank you very much for this extensive answer!

First, half of your ACLs are performing stick-table lookups, 25% for table "long" and 25% for table "short" (14 each per request, to be precise). Given that you're always using these two tables from the same key (src), and you are already tracking them for the same key in sc0 and sc1 respectively, everywhere you use src_http_req_rate(http-request-rate-short) you should replace it with sc_http_req_rate(1) etc. In short, replace src_xxxx(short) with sc_xxxx(1), and src_xxxx(long) with sc_xxxx(short). This will turn your total of 30 stick-table lookups per request to only 2 and will significantly relieve the process!

I thought as much from your previous answer and already implemented that. It certainly helped with the robustness. I had another smaller incident today with high CPU load when the backend's maxconn was reached and did another perf top:
19-07-2022-haproxy-perf-top

This time I think the incident with high CPU load came from the rather high timeout server 30s. I think setting this to 5s helped but I could not really confirm this yet.

Third, and related to the second point above, do you really need 16 MB buffers ? I mean, it's by a very large margin the largest value we've seen in 20 years, and even for the CPU's cache efficiency it's far from being optimal. Those who use extremely large buffers for body analysis sometimes go as far as 1 MB. And for fast transfers, 64kB is already optimal.

I probably misunderstood the docs at this point. Some requests might become that large and I though I would have to increase this value to prevent them from being rejected. Your answer suggests that this is not the case. I also was not aware that this also contributes to congestion. Should I set that back to the default or maybe default + tune.maxrewrite?

Edit: Just out of curiosity: What is the purpose of the src_xxx lookup when sc_xxx is more performant? I think that is not really clarified in the docs or did I miss something there?

@wtarreau
Copy link
Member

I had another smaller incident today with high CPU load when the backend's maxconn

Ah very interesting, then it starts to get clearer in my head. Dequeuing connections to a saturated server also involves some extra CPU cost with threads. It's got way better in 2.6 than before but it still comes with a cost (the queue is ordered by priority so there's a bit of serializing there). I suspect that you've been facing a combination of important stick-table lookups, peers messages, and dequeuing cost.

This time I think the incident with high CPU load came from the rather high timeout server 30s. I think setting this to 5s helped but I could not really confirm this yet.

It will not be related to that, though it might be due to a sudden rush on the queue that reached a point of no return (why is another story).

In your "perf top" output it now shows that the CPU is spent in a syscall, very likely send(), which is somewhat a good sign as it indicates that your efforts on the configuration have paid since it's no longer the haproxy process that's eating CPU cycles. However I'm concerned by the huge cost for a send() and it reminds me something I observed a while ago, which forces me to ask an embarrassing question... Is this an AMD EPYC ? I'm asking this because that's exactly the type of things I've observed there, due to the core complexes really being distinct CPUs with interconnects between their respective caches, and making all of them work together costs a lot. I've had more performance by using all the CPUs and threads of a single core complex than by using them all together.

Note that another possible explanation for the high cost in a send() syscall is the large buffers. A send() would need to allocate socket buffers and to copy all that data, which can take quite some time. Most likely smaller buffers will cause less overhead (how much less is difficult to say).

Regarding bufsize:

I probably misunderstood the docs at this point. Some requests might become that large and I though I would have to increase this value to prevent them from being rejected. Your answer suggests that this is not the case.

Indeed that's absolutely not the case. Haproxy takes care of buffering the least possible to limit the latency to the lowest possible, so a request (or a response) flows directly between the client and the server through it, and only the minimum possible amount is kept in buffers. The reason why the buffer is configurable is that the whole headers from a request or response message must fit into a buffer, and some people receive very large headers (e.g. kerberos cookies) which can be up to 32 or even 64 kB sometimes. In this case they have to increase it so that the whole headers can be received at once. And finally the rare ones doing some content analysis (e.g. using some of the experimental WAF plugins) may need to receive enough of a request to give a verdict, thus again they'll need to have larger buffers.

I also was not aware that this also contributes to congestion.

Just to give you a hint, a CPU's L2 cache is still fast, the L3 is shared and very slow. Thus the more operations you can perform within your L2 cache (typically 512 kB to 2 MB) the better. Operations you perform that cannot fit there will trash all that was there and push it to the slower L3, causing lots of reloads (cache misses). That's why it's important to remain stingy with memory usage ;-)

Should I set that back to the default or maybe default + tune.maxrewrite?

Just comment it out and comment tune.maxrewrite as well. The buffer size will be 16kB and maxrewrite will be 1kB, which means that there will always be room to receive up to 15 kB of headers (that's roughly 8 full screens of HTTP headers), and add up to 1 kB of extra header from your application-specific rules on top of that. It's been a very long time since anyone complained about buffer sizes, so I really think you shouldn't worry about this for regular HTTP traffic.

Just out of curiosity: What is the purpose of the src_xxx lookup when sc_xxx is more performant? I think that is not really clarified in the docs or did I miss something there?

src doesn't require tracking, so you can perform table lookups before tracking for example. You could imagine looking up very early in tcp-request connection and only track in HTTP rules, just to give one example. One trick that is less known is that when you're tracking, with sc you can look up the same key that you're already tracking, but in another table. Sometimes it's convenient with the gpt/gpc stuff where you can set/get tags/counters for a specific key. It will cost a lookup again but that's convenient when you've first computed a complex key that you don't want to recompute.

@phihos
Copy link
Contributor Author

phihos commented Jul 19, 2022

However I'm concerned by the huge cost for a send() and it reminds me something I observed a while ago, which forces me to ask an embarrassing question... Is this an AMD EPYC ?

OMG it is! Its a AMD EPYC 7443. I need to get new hardware I guess 😓

The reason why the buffer is configurable is that the whole headers from a request or response message must fit into a buffer, and some people receive very large headers (e.g. kerberos cookies) which can be up to 32 or even 64 kB sometimes.

Sadly this is the case. Due to an overly long URL parameter the header can become as big as 128 KiB. This is why I edited these parameters in the first place.

Just to give you a hint, a CPU's L2 cache is still fast, the L3 is shared and very slow. Thus the more operations you can perform within your L2 cache (typically 512 kB to 2 MB) the better. Operations you perform that cannot fit there will trash all that was there and push it to the slower L3, causing lots of reloads (cache misses). That's why it's important to remain stingy with memory usage ;-)

That totally makes sense the way you put it. Thank you for taking the time to explain that to me :-)

src doesn't require tracking, so you can perform table lookups before tracking for example. You could imagine looking up very early in tcp-request connection and only track in HTTP rules, just to give one example. One trick that is less known is that when you're tracking, with sc you can look up the same key that you're already tracking, but in another table. Sometimes it's convenient with the gpt/gpc stuff where you can set/get tags/counters for a specific key. It will cost a lookup again but that's convenient when you've first computed a complex key that you don't want to recompute.

Come to think of it: I got that from a non-official blogpost and did not think too much about what the difference might be. Maybe a hint in the docs about the performance implications might prevent others from making the same mistake.

@wtarreau
Copy link
Member

Its a AMD EPYC 7443. I need to get new hardware I guess

No, I don't think you'd need to change your hardware. You may figure that your workload is pretty fine on a single core complex. You'll have to run "lscpu -e" and pick only the CPUs of one value of the L3 cores. Or maybe two values if you really need more load.

Also, one of the goals of the work I'm currently on to implement thread groups precisely is to work much better on such machines. On ours (74F3, possibly very close to yours), I already increased the perf by 45% in 2.7. Thus it's a matter of finding the right tuning that will be sufficient to let you wait for 2.7 :-)

Sadly this is the case. Due to an overly long URL parameter the header can become as big as 128 KiB. This is why I edited these parameters in the first place.

OK no problem, then just set it to 128kB. That's obviously ugly but if both the client and the server agree to process it, why not! It will always be much better than 16 MB! By the way I've made progress on limiting the amount of message processed at once in the peers (making it configurable), but figured that if I set it too low we can trigger the loop protection, indicating that something is still imperfect there, so I'd first want to figure how to deal with this better. But I could deliver you a patch if you'd see more crashes in the peers code.

Come to think of it: I got that from a non-official blogpost and did not think too much about what the difference might be. Maybe a hint in the docs about the performance implications might prevent others from making the same mistake

That's a good idea. It's just that it's terribly difficult to document hints in a configuration doc, especially when you think you know what you're doing. Feel free to recheck the areas you think you visited and suggest a proposal to improve that. We could also add some entries in the wiki about how to optimize rules, but again it's a very long and tedious work that needs to be collaborative to be efficient.

@phihos
Copy link
Contributor Author

phihos commented Aug 3, 2022

Hi @wtarreau,

I did my homework (in fact I did it the very next day, but I wanted to make sure it really works) and I just wanted to say thank you and it works very well now! CPU usage is rock solid on the same level now.

You'll have to run "lscpu -e" and pick only the CPUs of one value of the L3 cores. Or maybe two values if you really need more load.

I took two L3 caches and 24/48 cores. Per-core utilization went from ~20% on 48 to ~35-40%. Not sure if that helped but it did not hurt either.

By the way I've made progress on limiting the amount of message processed at once in the peers (making it configurable), but figured that if I set it too low we can trigger the loop protection, indicating that something is still imperfect there, so I'd first want to figure how to deal with this better. But I could deliver you a patch if you'd see more crashes in the peers code.

That is great news! An intermediate patch will not be necessary though. For now I disabled syncing to large tables and wait for the next release to turn it on again.

Feel free to recheck the areas you think you visited and suggest a proposal to improve that.

What about we turn this

src_bytes_in_rate([<table>]) : integer
  Returns the average bytes rate from the incoming connection's source address
  in the current proxy's stick-table or in the designated stick-table, measured
  in amount of bytes over the period configured in the table. If the address is
  not found, zero is returned. See also sc/sc0/sc1/sc2_bytes_in_rate.

into this

src_bytes_in_rate([<table>]) : integer
  Returns the average bytes rate from the incoming connection's source address
  in the current proxy's stick-table or in the designated stick-table, measured
  in amount of bytes over the period configured in the table. If the address is
  not found, zero is returned. See also sc/sc0/sc1/sc2_bytes_in_rate (more performant when using sticky counters).

I just changed the last sentence. I definitely read these kind of sections when trying to understand how to get data for a src IP. I think I settled on the src_ variant because it sounded more like the legitimate way to look up by src IP. A hint like the one in the parentheses would have helped. I think each of these sections has a "see also" sentence in the end that ould be extended that way. I know the wording is not perfect but I hope you get the idea.

Again thank you very much for all your help. You all are building great product!

@wtarreau
Copy link
Member

wtarreau commented Aug 3, 2022

I did my homework (in fact I did it the very next day, but I wanted to make sure it really works) and I just wanted to say thank you and it works very well now! CPU usage is rock solid on the same level now.
I took two L3 caches and utilize two L3 caches and 24/48 cores. Per-core utilization went from ~20% on 48 to ~35-40%. Not sure if that helped but it did not hurt either.

Oh that's excellent news! Thanks for the feedback! Regarding the peers stuff, too much context-switch since last time and it completely went off my head :-( I'll have to get back to it.

Your suggestion for the doc makes sense, it could be sufficient like this, indeed. We'll handle this, thank you!

@wtarreau wtarreau added status: reviewed This issue was reviewed. A fix is required. and removed status: feedback required The developers are waiting for a reply from the reporter. labels Aug 3, 2022
@wtarreau
Copy link
Member

wtarreau commented Aug 3, 2022

So I'm tagging as reviewed and summarizing what's left to be done for simplicity:

  • update doc according to proposal above
  • rework patch to apply limit to number of messages processed by peers at once so that we don't trigger the loop protection anymore with small limits.

haproxy-mirror pushed a commit that referenced this issue Aug 23, 2022
As seen in GH issue #1770, peers synchronization do not cope well with
very large buffers because by default the only two reasons for stopping
the processing of updates is either that the end was reached or that
the buffer is full. This can cause high latencies, and even rightfully
trigger the watchdog when the operations are numerous and slowed down
by competition on the stick-table lock.

This patch introduces a limit to the number of messages one may send
at once, which now defaults to 200, regardless of the buffer size. This
means taking and releasing the lock up to 400 times in a row, which is
costly enough to let some other parts work.

After some observation this could be backported to 2.6. If so, however,
previous commits "BUG/MEDIUM: applet: fix incorrect check for abnormal
return condition from handler" and "BUG/MINOR: applet: make the call_rate
only count the no-progress calls" must be backported otherwise the call
rate might trigger the looping protection.
FireBurn pushed a commit to FireBurn/haproxy that referenced this issue Sep 21, 2022
As seen in GH issue haproxy#1770, peers synchronization do not cope well with
very large buffers because by default the only two reasons for stopping
the processing of updates is either that the end was reached or that
the buffer is full. This can cause high latencies, and even rightfully
trigger the watchdog when the operations are numerous and slowed down
by competition on the stick-table lock.

This patch introduces a limit to the number of messages one may send
at once, which now defaults to 200, regardless of the buffer size. This
means taking and releasing the lock up to 400 times in a row, which is
costly enough to let some other parts work.

After some observation this could be backported to 2.6. If so, however,
previous commits "BUG/MEDIUM: applet: fix incorrect check for abnormal
return condition from handler" and "BUG/MINOR: applet: make the call_rate
only count the no-progress calls" must be backported otherwise the call
rate might trigger the looping protection.

(cherry picked from commit 8bd146d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
@capflam capflam added type: doc This issue is about the documentation. and removed type: bug This issue describes a bug. labels Sep 21, 2022
@wtarreau wtarreau added status: fixed This issue is a now-fixed bug. 2.6 This issue affects the HAProxy 2.6 stable branch. and removed status: reviewed This issue was reviewed. A fix is required. labels Sep 27, 2022
@wtarreau wtarreau removed the 2.6 This issue affects the HAProxy 2.6 stable branch. label Nov 25, 2022
@wtarreau
Copy link
Member

This was already backported to 2.6.6 as commit dd9b366 so there's no need to keep it open anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: fixed This issue is a now-fixed bug. type: doc This issue is about the documentation.
Projects
None yet
Development

No branches or pull requests

3 participants