Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL Session ID resumption broken in 2.4 #1297

Closed
lukastribus opened this issue Jun 20, 2021 · 25 comments
Closed

SSL Session ID resumption broken in 2.4 #1297

lukastribus opened this issue Jun 20, 2021 · 25 comments
Labels
severity: medium This issue is of MEDIUM severity. status: fixed This issue is a now-fixed bug. subsystem: ssl This issue is within the SSL / TLS subsystem. type: bug This issue describes a bug.

Comments

@lukastribus
Copy link
Member

lukastribus commented Jun 20, 2021

Detailed description of the problem

SSL Session ID resumption (<= TLSv1.2) is broken in haproxy 2.4 and later since c4bfa59 @capflam

Reported by Shawn Heisey on the ML:
https://www.mail-archive.com/haproxy@formilux.org/msg40737.html

Expected behavior

SSL Session ID resumption working fine.

Steps to reproduce the behavior

  1. use post c4bfa59 code (any 2.4 release) and TLSv1.2
  2. test SSL session resumption either by:

Do you have any idea what may have caused this?

c4bfa59 but unsure why.

Do you have an idea how to solve the issue?

No.

What is your configuration?

frontend a
 mode http
 bind :443 ssl crt /root/hapcert alpn h2,http/1.1

Output of haproxy -vv and uname -a

root@ubuntu-16gb-hel1-1:~/haproxy-2.4# ./haproxy -vv
HAProxy version 2.4.1 2021/06/17 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2026.
Known bugs: http://www.haproxy.org/bugs/bugs-2.4.1.html
Running on: Linux 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 01:11:19 UTC 2021 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_OPENSSL=1
  DEBUG   =

Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 -CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS -OT -QUIC -PROMEX -MEMORY_PROFILING

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=4).
Built with OpenSSL version : OpenSSL 1.1.1  11 Sep 2018
Running on OpenSSL version : OpenSSL 1.1.1  11 Sep 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built without PCRE or PCRE2 support (using libc's regex instead)
Encrypted password support via crypt(3): yes
Built with gcc compiler version 7.5.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTTP       side=FE|BE     mux=H2       flags=HTX|CLEAN_ABRT|HOL_RISK|NO_UPG
            fcgi : mode=HTTP       side=BE        mux=FCGI     flags=HTX|HOL_RISK|NO_UPG
       <default> : mode=HTTP       side=FE|BE     mux=H1       flags=HTX
              h1 : mode=HTTP       side=FE|BE     mux=H1       flags=HTX|NO_UPG
       <default> : mode=TCP        side=FE|BE     mux=PASS     flags=
            none : mode=TCP        side=FE|BE     mux=PASS     flags=NO_UPG

Available services : none

Available filters :
        [SPOE] spoe
        [CACHE] cache
        [FCGI] fcgi-app
        [COMP] compression
        [TRACE] trace

root@ubuntu-16gb-hel1-1:~/haproxy-2.4# uname -a
Linux ubuntu-16gb-hel1-1 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 01:11:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
root@ubuntu-16gb-hel1-1:~/haproxy-2.4#

If HAProxy crashed: Last outputs and backtraces

no crash

Additional information (if helpful)

root@ubuntu-16gb-hel1-1:~/haproxy-2.4# git bisect log
git bisect start
# good: [3a00c915fd241fc398a080a11ccac9c5c46791ce] [RELEASE] Released version 2.2.0
git bisect good 3a00c915fd241fc398a080a11ccac9c5c46791ce
# bad: [6cbbecf09734aeb5fa8bb88f36f06a6f6d35e813] [RELEASE] Released version 2.4.0
git bisect bad 6cbbecf09734aeb5fa8bb88f36f06a6f6d35e813
# bad: [d808f1759dcb9fcf28b68f835b56e9c0caddddcc] BUG/MINOR: stats: Continue to fill frontend stats on unimplemented metric
git bisect bad d808f1759dcb9fcf28b68f835b56e9c0caddddcc
# good: [2fbe6940f4cce550928fd665df4a48b9d0df99b5] MINOR: stats: indicate the number of servers in a backend's status
git bisect good 2fbe6940f4cce550928fd665df4a48b9d0df99b5
# good: [c1c66a4759f34665d532deac9efa2571bc5e63ca] MINOR: mux-h1: rework the h1_timeout_task() function
git bisect good c1c66a4759f34665d532deac9efa2571bc5e63ca
# bad: [26c49d9eb07fceba22ac2410a5196ff5d99c9c6e] MINOR: quic: Add traces to congestion avoidance NewReno callback.
git bisect bad 26c49d9eb07fceba22ac2410a5196ff5d99c9c6e
# bad: [2ded48dd27c2c57128921ae94d0b2a68162719e8] MINOR: connection: make conn_sock_drain() use the control layer's ->drain()
git bisect bad 2ded48dd27c2c57128921ae94d0b2a68162719e8
# bad: [6884742c651e65d065be9fc10cfaf258dae353f0] DOC: spoa/python: Fixing typo in IP related error messages
git bisect bad 6884742c651e65d065be9fc10cfaf258dae353f0
# bad: [da831fa068a1cc115066d597fc94a3a1bfad0c9d] CLEANUP: http-ana: Remove TX_WAIT_NEXT_RQ unsued flag
git bisect bad da831fa068a1cc115066d597fc94a3a1bfad0c9d
# good: [4c8ad8423268a059e112cad5080d3061bb8e72f2] MINOR: mux: Add a ctl parameter to get the exit status of the multiplexers
git bisect good 4c8ad8423268a059e112cad5080d3061bb8e72f2
# bad: [d5ac6de74aa4d0bbf06cbb0b4e1bbeee09d02038] DOC: config: Add notes about errors emitted by H1 mux
git bisect bad d5ac6de74aa4d0bbf06cbb0b4e1bbeee09d02038
# good: [c18fc234d975775e3d3e0b69a692c222ae4172bb] MINOR: mux-h1: Add functions to send HTTP errors from the mux
git bisect good c18fc234d975775e3d3e0b69a692c222ae4172bb
# bad: [c4bfa59f1d534f226d309c9932c0d35b279cffe7] MAJOR: mux-h1: Create the client stream as later as possible
git bisect bad c4bfa59f1d534f226d309c9932c0d35b279cffe7
# first bad commit: [c4bfa59f1d534f226d309c9932c0d35b279cffe7] MAJOR: mux-h1: Create the client stream as later as possible
root@ubuntu-16gb-hel1-1:~/haproxy-2.4#

This is good:

root@ubuntu-16gb-hel1-1:~# openssl s_client -connect ubuntu-16gb-hel1-1.ltri.eu:443 -reconnect -no_ticket -servername ubuntu-16gb-hel1-1.ltri.eu -tls1_2 2>/dev/null | grep "Cipher is"
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Reused, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Reused, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Reused, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Reused, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Reused, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
^C
root@ubuntu-16gb-hel1-1:~#

This is bad:

root@ubuntu-16gb-hel1-1:~# openssl s_client -connect ubuntu-16gb-hel1-1.ltri.eu:443 -reconnect -no_ticket -servername ubuntu-16gb-hel1-1.ltri.eu -tls1_2 2>/dev/null | grep "Cipher is"
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
^C
@lukastribus lukastribus added type: bug This issue describes a bug. dev This issue affects the HAProxy development branch. severity: medium This issue is of MEDIUM severity. subsystem: ssl This issue is within the SSL / TLS subsystem. status: reviewed This issue was reviewed. A fix is required. 2.4 This issue affects the HAProxy 2.4 stable branch. labels Jun 20, 2021
@wtarreau
Copy link
Member

Now I think I've found why it worked for me:

OpenSSL 1.0.2:

New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Reused, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Reused, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Reused, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Reused, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Reused, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384

OpenSSL 1.1.1:

New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384

OpenSSL 1.0.2 above doesn't use TLSv1.2, so based on your bisect it would indicate that it's resumption in TLSv1.2 only that change after the bisected patch. Something probably depends on timing :-/

@lukastribus
Copy link
Member Author

The bisected commit in itself doesn't really touch SSL code, so I guess a timing issue makes sense (unfortunately).

@chipitsine
Copy link
Member

Using testssl.sh seem to be good thing to create automated tests

@kirankumar1991
Copy link

kirankumar1991 commented Jul 13, 2021

I've noticed this issue with HAProxy versions > 2.2.0 built with OpenSSL 1.1.1k. However, if I rebuild the same HAProxy versions with OpenSSL 1.0.2k-fips, then session resumption works fine.

HAProxy 2.2.8 built with OpenSSL 1.0.2k-fips

HA-Proxy version 2.2.8-7bf78d7 2021/01/13 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2025.
Known bugs: http://www.haproxy.org/bugs/bugs-2.2.8.html
Running on: Linux 3.10.0-1160.25.1.el7.x86_64 #1 SMP Wed Apr 28 21:49:45 UTC 2021 x86_64
Build options :
  TARGET  = linux-glibc-legacy
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
  OPTIONS = USE_PCRE=1 USE_PCRE_JIT=1 USE_STATIC_PCRE=1 USE_OPENSSL=1 USE_SLZ=1
  DEBUG   =

Feature list : +EPOLL -KQUEUE +NETFILTER +PCRE +PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED -BACKTRACE +STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 -CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY -TFO -NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=3).
Built with OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.43 2019-02-23
Running on PCRE version : 8.43 2019-02-23
PCRE library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 4.8.5 20150623 (Red Hat 4.8.5-44)

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
            fcgi : mode=HTTP       side=BE        mux=FCGI
       <default> : mode=HTTP       side=FE|BE     mux=H1
              h2 : mode=HTTP       side=FE|BE     mux=H2
       <default> : mode=TCP        side=FE|BE     mux=PASS

Available services : none

Available filters :
	[SPOE] spoe
	[COMP] compression
	[TRACE] trace
	[CACHE] cache
	[FCGI] fcgi-app

HAProxy 2.2.8 built with OpenSSL 1.1.1k

Status: long-term supported branch - will stop receiving fixes around Q2 2025.
Known bugs: http://www.haproxy.org/bugs/bugs-2.2.8.html
Running on: Linux 3.10.0-1160.25.1.el7.x86_64 #1 SMP Wed Apr 28 21:49:45 UTC 2021 x86_64
Build options :
  TARGET  = linux-glibc-legacy
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
  OPTIONS = USE_PCRE=1 USE_PCRE_JIT=1 USE_STATIC_PCRE=1 USE_OPENSSL=1 USE_SLZ=1
  DEBUG   =

Feature list : +EPOLL -KQUEUE +NETFILTER +PCRE +PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED -BACKTRACE +STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 -CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY -TFO -NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=3).
Built with OpenSSL version : OpenSSL 1.1.1k  25 Mar 2021
Running on OpenSSL version : OpenSSL 1.1.1k  25 Mar 2021
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.43 2019-02-23
Running on PCRE version : 8.43 2019-02-23
PCRE library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 4.8.5 20150623 (Red Hat 4.8.5-44)

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
            fcgi : mode=HTTP       side=BE        mux=FCGI
       <default> : mode=HTTP       side=FE|BE     mux=H1
              h2 : mode=HTTP       side=FE|BE     mux=H2
       <default> : mode=TCP        side=FE|BE     mux=PASS

Available services : none

Available filters :
	[SPOE] spoe
	[COMP] compression
	[TRACE] trace
	[CACHE] cache
	[FCGI] fcgi-app

I did git bisect and narrowed it down to the following commit, which is causing the sessions not to resume.

git bisect start
# bad: [7350448be5ee509ed1ef94250e36338b234d1320] BUG/MAJOR: pools: second fix for incomplete backport of lockless pool fix
git bisect bad 7350448be5ee509ed1ef94250e36338b234d1320
# good: [e54b43af1ec9dc656c04b09118583cb3c2ce56fa] [RELEASE] Released version 2.1.0
git bisect good e54b43af1ec9dc656c04b09118583cb3c2ce56fa
# bad: [62f79fe68a87c66e44bc172959f03f810572e4fa] MEDIUM: checks: Make post-41 the default mode for mysql checks
git bisect bad 62f79fe68a87c66e44bc172959f03f810572e4fa
# bad: [0643b0e7e60278fecaed1ab05fe9ba1a7d6567a5] MINOR: proxy: Make `header_unique_id` a `struct ist`
git bisect bad 0643b0e7e60278fecaed1ab05fe9ba1a7d6567a5
# good: [4450b587dd3065d22678d537d1082ec2567e98a8] MINOR: connection: remove CO_FL_SSL_WAIT_HS from CO_FL_HANDSHAKE
git bisect good 4450b587dd3065d22678d537d1082ec2567e98a8
# good: [6328b73d2884896c48566a375ee83a2155a7f839] BUILD: enable ERR=1 in github cygwin builds
git bisect good 6328b73d2884896c48566a375ee83a2155a7f839
# good: [6f507c7c5d51d589347dfcb3a50721e095a1b2ba] MINOR: ssl: resolve ocsp_issuer later
git bisect good 6f507c7c5d51d589347dfcb3a50721e095a1b2ba
# good: [0214b45a61cd7cfd59d729b23f497a687192cde6] MINOR: debug: call backtrace() once upon startup
git bisect good 0214b45a61cd7cfd59d729b23f497a687192cde6
# good: [8de5c4fa15d612bd70456881f8fef13b8fe37e2a] MEDIUM: connection: only call ->wake() for connect() without I/O
git bisect good 8de5c4fa15d612bd70456881f8fef13b8fe37e2a
# bad: [cfca1422c7de05e34b72d2d57bc057b215aa9f8f] MINOR: ssl: reach a ckch_store from a sni_ctx
git bisect bad cfca1422c7de05e34b72d2d57bc057b215aa9f8f
# bad: [127a74dd48035b22c2273eddfae3713a5a65d053] MINOR: stream: Add stream_generate_unique_id function
git bisect bad 127a74dd48035b22c2273eddfae3713a5a65d053
# bad: [2c1f37d3537c725e87c728bd2c5e7d3b380cdd37] OPTIM: mux-h1: subscribe rather than waking up at a few other places
git bisect bad 2c1f37d3537c725e87c728bd2c5e7d3b380cdd37
# good: [6f95f6e111304c40fd18eee78a7e3372596d599f] OPTIM: connection: disable receiving on disabled events when the run queue is too high
git bisect good 6f95f6e111304c40fd18eee78a7e3372596d599f
# first bad commit: [2c1f37d3537c725e87c728bd2c5e7d3b380cdd37] OPTIM: mux-h1: subscribe rather than waking up at a few other places

http://git.haproxy.org/?p=haproxy-2.2.git;a=commitdiff;h=2c1f37d3537c725e87c728bd2c5e7d3b380cdd37

I tried reverting the changes in the above commit and noticed that sessions were resumed again. I'm still figuring out my way around the HAProxy code and I have no idea how this change in the connection multiplexer logic is affecting the SSL session resumption in HAProxy versions > 2.2.0 built using OpenSSL 1.1.1k. Any idea or pointers?

We are using 1.8.20 and considering 2.2.8 now. I'd like to know if it's safe to have the above change reverted and use it in production. Thanks for the help!

@elyograg
Copy link

elyograg commented Aug 5, 2021

Looks like the commit Kiran found (on 2.2) is much simpler than the one Lukas found (on 2.4).

I notice that both commits modified mux_h1.c ... my config is using http2 on the front end and apache logs confirm that it's also using http2 on the backend. Is that source file used for http2 as well?

@elyograg
Copy link

elyograg commented Aug 5, 2021

Looks like both qualys and testssl.sh use http1.1 for their tests. So the fact that I've got http2 configured on the frontend may not mean anything.

@elyograg
Copy link

I do wonder whether this same problem would occur with http/2, given that the commit that caused the problem seems to be related to 1.x.

@elyograg
Copy link

elyograg commented Oct 9, 2021

Looks like both qualys and testssl.sh use http1.1 for their tests. So the fact that I've got http2 configured on the frontend may not mean anything.

I was watching the haproxy log during the tests, which is where I saw HTTP 1.1. Looking at qualys results most recently, I do see that the test results talk about h2 for some of the client negotiations. No idea whether the specific tests for SSL resumption are h2 or h1.

@cmason3
Copy link

cmason3 commented Oct 14, 2021

I think I might be experiencing the same issue - I have tied this on both 2.4.7 and 2.5-dev9, but whenever I do a Qualys scan it is reporting that Session resumption (caching): No (IDs assigned but not accepted). I have also noticed that 0-RTT for HTTP/2 isn't being detected even though I have allow-0rtt in my configuration.

Using the same configuration I have also tried this with 2.3.14 and everything works fine - both session resumption as well as 0-RTT for H2. Are both of these related to the above issue?

If it helps I am using the official HAProxy Docker images for all my tests - configuration is at https://github.com/cmason3/jinjafx/blob/main/jinjafx_server/docker/haproxy.cfg

@wtarreau
Copy link
Member

Thanks all for your feedback. I know that @EmericBr and @wlallemand have found some issues regarding the closing code that ought to call the transport-layer shutdown first to store the session, so there is definitely an issue there that is still under scrutiny. I'm afraid there will be no more progress on this before next week.

@wtarreau wtarreau added status: fixed This issue is a now-fixed bug. and removed dev This issue affects the HAProxy development branch. status: reviewed This issue was reviewed. A fix is required. 2.4 This issue affects the HAProxy 2.4 stable branch. labels Nov 3, 2021
@wtarreau
Copy link
Member

wtarreau commented Nov 3, 2021

With Christopher we found some issues in the HTTP/1 and HTTP/2 muxes that caused the TLS shutdown not to be performed on keep-alive connections, often resulting in the TLS session not being committed. The one about H2 was fixed by the work on making sure the GOAWAY frame wouldn't be destroyed (as that was ruined for the same reasons, unclean shutdown). The one on H1 was fixed separately, and both were merged into 2.5-dev12 and backported to 2.4.8. As such, I'm marking this "fixed".

Anyone facing these issues is encouraged to retry with either version.

Let's keep this issue open a few more days to collect any possible feedback, after what we could probably close it.

@elyograg
Copy link

elyograg commented Nov 3, 2021

2.4.8 tested with testssl.sh ... and now it passes session resumption with both tickets and ids.

@lukastribus
Copy link
Member Author

The fix is e76b4f0 (which requires a85c522).

@wtarreau
Copy link
Member

wtarreau commented Nov 3, 2021

Ah yes, thanks for this, Lukas, indeed there's no direct mention there.

Thanks for the quick feedback @elyograg, much appreciated!

@cmason3
Copy link

cmason3 commented Nov 4, 2021

Nice one - both session resumption and 0-rtt issues have been resolved with 2.4.8 👍

I know someone has already confirmed, but I have just performed a Qualys scan against 2.4.8 - I would have responded earlier but my build scripts pull container images from Docker Hub and I was waiting for it to be updated.

Session resumption (caching) | Yes
Session resumption (tickets) | No
0-RTT enabled | Yes

@chipitsine
Copy link
Member

@cmason3 , can you share steps how you run testssl.sh?

@cmason3
Copy link

cmason3 commented Nov 4, 2021

can you share steps how you run testssl.sh?

@chipitsine, It was @elyograg who was using testssl.sh, I have been using https://www.ssllabs.com/ssltest/ to verify behaviour.

@wtarreau
Copy link
Member

wtarreau commented Nov 4, 2021

Thanks. I've just relaunched it on haproxy.org which runs on 2.5-dev12, and for us session resumption is marked as "Yes" with both tickets and caching. [edit: removed the link since it doesn't seem to cache for a long time and re-triggers a test when clicking on it]

Out of curiosity, do you know if in your case the tests were conducted over HTTP/1 or HTTP/2 ? I'm asking because all the relevant fixes from 2.5-dev12 were backported to 2.4.8, but the H2 close improvement that solved the problem in H2 was an accidental byproduct and was postponed to next version (so that we have a bit of exposure in 2.5 before risking to break anything in 2.4).

Would you by any chance be interested in testing 2.5-dev12, or testing a patch on top of your 2.4.8 to see if it improves the situation ? (I would then send you a backport of the one we intend to backport later anyway).

@cmason3
Copy link

cmason3 commented Nov 4, 2021

Just pulled the docker image for 2.5-dev12 and rerun the same test via Qualys and I get the same results as 2.4.8 for both session resumption and 0-rtt. From what you have said, do I assume 0-rtt should still be broken in 2.4.8 and only fixed in 2.5-dev12?

I am not sure if I can tell whether Qualys is using h1 or h2 for what checks as it makes a lot of connection requests to verify client support using both h1 and h2. Let me try and enable some logging in haproxy and get back to you.

@wtarreau
Copy link
Member

wtarreau commented Nov 4, 2021

Thanks for the test. No it should not be broken. I had a doubt regarding one possibility in H2 regarding one of the patches tha was not yet backported (but not related to 0-rtt either).

If for you it doesn't work with 2.5-dev12 while for me it does, there's something more subtle.

Regarding the Qualys test, I've only found HTTP/1 requests in my logs here, thus the only missing fix in 2.4 is not relevant to this either.

@mikecarlton
Copy link

Are there plans to backport this to 2.2 line? The comment on e76b4f0 recommends backporting as far back as 2.0

@wtarreau
Copy link
Member

wtarreau commented Nov 4, 2021

Then it will be done as indicated in the commit message. We just try not to mess up with backports, so most often we apply sensitive patches to the latest branches first and progressively propagate to other ones. Those on older branches expect less movement, less often changes and less risks of regressions. That's the only way to aim in that direction. But do not worry, 2.2 should arrive soon.

@chipitsine
Copy link
Member

Thanks. I've just relaunched it on haproxy.org which runs on 2.5-dev12, and for us session resumption is marked as "Yes" with both tickets and caching. [edit: removed the link since it doesn't seem to cache for a long time and re-triggers a test when clicking on it]

Out of curiosity, do you know if in your case the tests were conducted over HTTP/1 or HTTP/2 ? I'm asking because all the relevant fixes from 2.5-dev12 were backported to 2.4.8, but the H2 close improvement that solved the problem in H2 was an accidental byproduct and was postponed to next version (so that we have a bit of exposure in 2.5 before risking to break anything in 2.4).

Would you by any chance be interested in testing 2.5-dev12, or testing a patch on top of your 2.4.8 to see if it improves the situation ? (I would then send you a backport of the one we intend to backport later anyway).

I noticed that TLS1.3 is not enabled on www.haproxy.org, is it done in purpose ?

@wtarreau
Copy link
Member

wtarreau commented Nov 5, 2021

It's not on purpose, but it's running an aging centos 7 so it doesn't have it :-)

FireBurn pushed a commit to FireBurn/haproxy that referenced this issue Nov 16, 2021
While in H1 we can usually close quickly, in H2 a client might be sending
window updates or anything while we're sending a GOAWAY and the pending
data in the socket buffers at the moment the close() is performed on the
socket results in the output data being lost and an RST being emitted.

One example where this happens easily is with h2spec, which randomly
reports connection resets when waiting for a GOAWAY while haproxy sends
it, as seen in issue haproxy#1422. With h2spec it's not window updates that are
causing this but the fact that h2spec has to upload the payload that
comes with invalid frames to accommodate various implementations, and
does that in two different segments. When haproxy aborts on the invalid
frame header, the payload was not yet received and causes an RST to
be sent.

Here we're dealing with this two ways:
  - we perform a shutdown(WR) on the connection to forcefully push pending
    data on a front connection after the xprt is shut and closed ;
  - we drain pending data
  - then we close

This totally solves the issue with h2spec, and the extra cost is very
low, especially if we consider that H2 connections are not set up and
torn down often. This issue was never observed with regular clients,
most likely because this pattern does not happen in regular traffic.

After more testing it could make sense to backport this, at least to
avoid reporting errors on h2spec tests.

(cherry picked from commit 0b22247)
[cf: depends on "MINOR: connection: add a new CO_FL_WANT_DRAIN flag to force
     drain on close". This patch is in fact a bug fix, related to the issue
     haproxy#1297.]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
FireBurn pushed a commit to FireBurn/haproxy that referenced this issue Nov 24, 2021
While in H1 we can usually close quickly, in H2 a client might be sending
window updates or anything while we're sending a GOAWAY and the pending
data in the socket buffers at the moment the close() is performed on the
socket results in the output data being lost and an RST being emitted.

One example where this happens easily is with h2spec, which randomly
reports connection resets when waiting for a GOAWAY while haproxy sends
it, as seen in issue haproxy#1422. With h2spec it's not window updates that are
causing this but the fact that h2spec has to upload the payload that
comes with invalid frames to accommodate various implementations, and
does that in two different segments. When haproxy aborts on the invalid
frame header, the payload was not yet received and causes an RST to
be sent.

Here we're dealing with this two ways:
  - we perform a shutdown(WR) on the connection to forcefully push pending
    data on a front connection after the xprt is shut and closed ;
  - we drain pending data
  - then we close

This totally solves the issue with h2spec, and the extra cost is very
low, especially if we consider that H2 connections are not set up and
torn down often. This issue was never observed with regular clients,
most likely because this pattern does not happen in regular traffic.

After more testing it could make sense to backport this, at least to
avoid reporting errors on h2spec tests.

(cherry picked from commit 0b22247)
[cf: depends on "MINOR: connection: add a new CO_FL_WANT_DRAIN flag to force
     drain on close". This patch is in fact a bug fix, related to the issue
     haproxy#1297.]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit daa0e5f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
FireBurn pushed a commit to FireBurn/haproxy that referenced this issue Nov 25, 2021
While in H1 we can usually close quickly, in H2 a client might be sending
window updates or anything while we're sending a GOAWAY and the pending
data in the socket buffers at the moment the close() is performed on the
socket results in the output data being lost and an RST being emitted.

One example where this happens easily is with h2spec, which randomly
reports connection resets when waiting for a GOAWAY while haproxy sends
it, as seen in issue haproxy#1422. With h2spec it's not window updates that are
causing this but the fact that h2spec has to upload the payload that
comes with invalid frames to accommodate various implementations, and
does that in two different segments. When haproxy aborts on the invalid
frame header, the payload was not yet received and causes an RST to
be sent.

Here we're dealing with this two ways:
  - we perform a shutdown(WR) on the connection to forcefully push pending
    data on a front connection after the xprt is shut and closed ;
  - we drain pending data
  - then we close

This totally solves the issue with h2spec, and the extra cost is very
low, especially if we consider that H2 connections are not set up and
torn down often. This issue was never observed with regular clients,
most likely because this pattern does not happen in regular traffic.

After more testing it could make sense to backport this, at least to
avoid reporting errors on h2spec tests.

(cherry picked from commit 0b22247)
[cf: depends on "MINOR: connection: add a new CO_FL_WANT_DRAIN flag to force
     drain on close". This patch is in fact a bug fix, related to the issue
     haproxy#1297.]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit daa0e5f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 47cc04a8123dd203abc2cdceaeb86845276bb150)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
@capflam capflam closed this as completed Dec 3, 2021
FireBurn pushed a commit to FireBurn/haproxy that referenced this issue Dec 4, 2021
While in H1 we can usually close quickly, in H2 a client might be sending
window updates or anything while we're sending a GOAWAY and the pending
data in the socket buffers at the moment the close() is performed on the
socket results in the output data being lost and an RST being emitted.

One example where this happens easily is with h2spec, which randomly
reports connection resets when waiting for a GOAWAY while haproxy sends
it, as seen in issue haproxy#1422. With h2spec it's not window updates that are
causing this but the fact that h2spec has to upload the payload that
comes with invalid frames to accommodate various implementations, and
does that in two different segments. When haproxy aborts on the invalid
frame header, the payload was not yet received and causes an RST to
be sent.

Here we're dealing with this two ways:
  - we perform a shutdown(WR) on the connection to forcefully push pending
    data on a front connection after the xprt is shut and closed ;
  - we drain pending data
  - then we close

This totally solves the issue with h2spec, and the extra cost is very
low, especially if we consider that H2 connections are not set up and
torn down often. This issue was never observed with regular clients,
most likely because this pattern does not happen in regular traffic.

After more testing it could make sense to backport this, at least to
avoid reporting errors on h2spec tests.

(cherry picked from commit 0b22247)
[cf: depends on "MINOR: connection: add a new CO_FL_WANT_DRAIN flag to force
     drain on close". This patch is in fact a bug fix, related to the issue
     haproxy#1297.]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit daa0e5f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 47cc04a8123dd203abc2cdceaeb86845276bb150)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 5c6249f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
@elyograg
Copy link

@chipitsine I just cloned the git repo, changed into the new directory, and did "./testssl.sh https://server.domain.tld" and it began the test. The output looks very similar to the SSL Labs server test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity: medium This issue is of MEDIUM severity. status: fixed This issue is a now-fixed bug. subsystem: ssl This issue is within the SSL / TLS subsystem. type: bug This issue describes a bug.
Projects
None yet
Development

No branches or pull requests

8 participants