Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server resets connection after client hello (4 extra timestamp bytes) (v1.1.1f,g,k; but not on debian 11; really weird) #17140

Closed
tgoeg opened this issue Nov 25, 2021 · 9 comments
Labels
triaged: question The issue contains a question

Comments

@tgoeg
Copy link

tgoeg commented Nov 25, 2021

This issue is so strange I don't know where to look next.

On Debian 10 (1.1.1g-1+0~20200421.17+debian9~1.gbpf6902f) and Ubuntu 20.04.3 LTS (1.1.1f-1ubuntu2.9) , if I issue

openssl s_client -connect bmbwf.gv.at:443

I always get a connection reset directly after the client hello(write:errno=104). So does curl (curl: (35) Unknown SSL protocol error in connection to www.bmbwf.gv.at:443 or curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to www.bmbwf.gv.at:443)
Browsers are happy with the site, however, and so is Debian 11 with 1.1.1k-1+deb11u1.

The weird part is this: if I install the exact same version of openssl (Debian 11's 1.1.1k) from the exact same .deb on Ubuntu 20.04 as a test as well, along with libssl1.1, I still get the error on Ubuntu!

I cannot wrap my head around this.

2021-11-25_210614_screenshot
The only difference I see in packet captures is an invalid GMT Unix Timestamp, and the request on every OS but Debian 11 has these 4 extra bytes!
What's also strange is that TLSv1 vs TLSv1.3 foldout identifier (highlighted; it's no actual field I think but some deduction done by wireshark) although the actual Version fields below it do not differ. Maybe that's a consequence of these 4 extra bytes?
Is it somehow configurable system-wide to use the timestamp?
I only found SSL_MODE_SEND_CLIENTHELLO_TIME in older sources but I am not sure how this could be (un)set system wide?

Can anyone not on a Debian 11 (based distros) reproduce this?

It must be something server-specific because I don't see this with any other server, but on the other hand, browsers seem happy so the server can't be configured in a wrong way. ssllabs.com output is also good.
On the other hand it's clearly a client problem as well. It seems the server is the only one being stricter when presented with a(n invalid) timestamp field (and rightly so).

Thanks in advance!

@tgoeg tgoeg added the issue: question The issue was opened to ask a question label Nov 25, 2021
@t8m t8m added triaged: question The issue contains a question and removed issue: question The issue was opened to ask a question labels Nov 25, 2021
@mattcaswell
Copy link
Member

In TLSv1.3 the Random field is defined as a 32-bit random number. In TLSv1.2 and below it is defined as a 4 byte time value followed by 28 bytes of random data. However the standard is explicit that the 4 byte time value does not have to be correct!

From RFC5246

The current time and date in standard UNIX 32-bit format
(seconds since the midnight starting Jan 1, 1970, UTC, ignoring
leap seconds) according to the sender's internal clock. Clocks
are not required to be set correctly by the basic TLS protocol;

So it would be incorrect for an implementation to reject a handshake on the basis of an unexpected time value in the random. In fact I tried hacking s_client to set the SSL_MODE_SEND_CLIENTHELLO_TIME mode and connections to this server still fail. So I think it is something else....but what I do not know.

@t8m
Copy link
Member

t8m commented Nov 26, 2021

I believe that the browsers are not able to connect to bmbwf.gv.at:443 either. But they automatically try www.bmbwf.gv.at:443 and that works and it works with s_client as well.

@tgoeg
Copy link
Author

tgoeg commented Nov 29, 2021

I can confirm I can connect to www.bmbwf.gv.at on all hosts that do not work without the www subdomain.
(But it still does not solve my inital problem of using linkchecker to verify if all links to this server are valid ;-) )
However, this does not explain why my Debian 11 does connect successfully without the www subdomain.
Just did a capture. SNI is clearly set to bmbwf.gv.at and it still does work. With the exact same debian package installed.

2021-11-29_160743_screenshot
So I think this boils down to three questions:

  • Why does every OS but Debian 11 send a timestamp with the same package installed? Is this configurable?
  • Is the server at bmbwf.gv.at (not the subdomain!) configured improperly as it seems to choke on invalid timestamps?
  • Wouldn't it be sensible to send a correct timestamp? It reduces the entropy, I know, but on the other hand I don't quite understand why one would specify a timestamp field that does not actually carry one.. Compatibility with older implementations I guess. Still, filling it with an invalid timestamp just for higher entropy feels a little hacky. However, it's specified like this, I know, so it should work out as well. Still I feel it seems to produce potential (common?) problems as I don't think they wrote their own TLS implementation. It must be some (configuration of some) standard product that produces this behavior, and this will most likely again be openssl, right? (or are there other common TLS implementations for Apache (which serves the bmbwf.gv.at domain)?)

@tgoeg
Copy link
Author

tgoeg commented Nov 29, 2021

The actual error in linkchecker somehow seems to indicate it does use the www subdomain and still it does not work:

URL        `https://www.bmbwf.gv.at/Themen/HS-Uni/Aktuelles/corona/'
Name       `Website des Bildungsministeriums (BMBWF)'
Parent URL https://my.web.site/path/, line 275, col 76
Real URL   https://www.bmbwf.gv.at/Themen/HS-Uni/Aktuelles/corona/
Check time 2.806 seconds
Result     Error: SSLError: HTTPSConnectionPool(host='www.bmbwf.gv.at', port=443): Max retries exceeded with url: /Themen/HS-Uni/Aktuelles/corona/ (Caused by SSLError(SSLError("bad handshake: SysCallError(104, 'ECONNRESET')")))

(Linkchecker - again, using the same version - does work on my Debian 11 host)

Update:
Tried again now, I see the same issue with the www subdomain, just using s_client:

# openssl s_client -connect bmbwf.gv.at:443
CONNECTED(00000003)
write:errno=104
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 0 bytes and written 303 bytes
Verification: OK
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 0 (ok)
---
# openssl s_client -connect www.bmbwf.gv.at:443
CONNECTED(00000003)
write:errno=104
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 0 bytes and written 307 bytes
Verification: OK
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 0 (ok)
---

@t8m: Which OS do you use to successfully connect to the www subdomain? And do you get the error when connecting without the www subdomain on the same host? I only have two kinds of host:

  • All but Debian 11: Connection reset with or without www
  • Debian 11: Connection successful with and without www
    So that does not make a difference in my case.

I can officially pronounce I'm thoroughly confused now.

@mattcaswell
Copy link
Member

Could it be that that server has a server farm with load balancer servicing it? If one server in the farm has a different configuration to the others then it could be that if you happen to connect to the "correct" one then it works. Otherwise it fails. Are you sure debian 11 always works? You might like to try it again and see if it still works. I have seen strange scenarios where different people get different results with the same configuration due to this kind of thing.

@tgoeg
Copy link
Author

tgoeg commented Nov 29, 2021

Yes, this might cause such issues, I know.
My Debian 11 host however has checked the site over and over again and has never had a single failure.
(And it never sends 32 random bytes but always 28. That seems to be the more plausible, client side explanation why it works, I think)

@mattcaswell
Copy link
Member

(And it never sends 32 random bytes but always 28. That seems to be the more plausible, client side explanation why it works, I think)

As I mentioned above:

In fact I tried hacking s_client to set the SSL_MODE_SEND_CLIENTHELLO_TIME mode and connections to this server still fail.

So, I'm confident that it is not related to the time value in the random.

@tgoeg
Copy link
Author

tgoeg commented Nov 29, 2021

I just copied over the slightly differing openssl.cnf from Debian 11, but it does not make a difference.
And regarding the load balancer issue:
Yes, you're absolutely right. There seems to be an issue on server side (as well), as I can sometimes get a successful session setup on Ubuntu as well, if I retry a few times, regardless of which domain I use.
Still, I never get unsuccessful connections on Debian 11 (and I don't see any other differences in the captures).
Capturing a successful connection from Ubuntu shows no timestamp and even 230 bytes for the client hello length.
This is due to wireshark's parsing that stops interpreting the first 4 bytes once the session is established (cf. curl/curl#6466 (reply in thread)), so yes, the timestamp field is not to blame.

The only explanation that comes to mind is that I somehow always get loadbalanced to the same (group of correctly configured) server(s) from my Debian 11 host, probably because the allocation algorithm takes the source IP or something else specific to this host into account and always directs my traffic to the working host(s), though this is just a wild guess.

I'll try to contact the relevant persons in charge to try to get a fix.
Still I wonder why no browser user ever noticed this (or maybe they do, or they have the same lucky ID that makes them hit the "good" servers ;-) )
Or it is just my hosts that get a special treatment!

I'll get back to this issue if there's a fix on server side, thanks for your great support guys!

@tgoeg
Copy link
Author

tgoeg commented Jan 25, 2023

Seems to be fixed on server side, though I never got an answer.

@tgoeg tgoeg closed this as completed Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged: question The issue contains a question
Projects
None yet
Development

No branches or pull requests

3 participants