Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue #910: Bad packet received by server when hearbeat is enabled #911

Merged
merged 3 commits into from
Apr 18, 2024

Conversation

rasantel
Copy link
Contributor

@rasantel rasantel commented Nov 10, 2023

Problem

As mentioned in #910, when enabling the default keep-alive (the heartbeater) and then connecting, the ssh server receives a bad packet immediately after the key exchange and closes the connection.

Diagnosis

The sshd server logs show:

Nov  9 16:19:28 raul-dev sshd[399917]: debug1: expecting SSH2_MSG_NEWKEYS [preauth]
Nov  9 16:19:28 raul-dev sshd[399917]: debug3: receive packet: type 21 [preauth]
Nov  9 16:19:28 raul-dev sshd[399917]: debug1: SSH2_MSG_NEWKEYS received [preauth]
Nov  9 16:19:28 raul-dev sshd[399917]: debug2: set_newkeys: mode 0 [preauth]
Nov  9 16:19:28 raul-dev sshd[399917]: debug1: rekey in after 134217728 blocks [preauth]
Nov  9 16:19:28 raul-dev sshd[399917]: debug1: KEX done [preauth]
Nov  9 16:19:28 raul-dev sshd[399917]: Bad packet length 2412619996. [preauth]
Nov  9 16:19:28 raul-dev sshd[399917]: debug3: send packet: type 1 [preauth]
Nov  9 16:19:28 raul-dev sshd[399917]: ssh_dispatch_run_fatal: Connection from 127.0.0.1 port 57466: Connection corrupted [preauth]

and the SSHJ logs show:

[2023-11-09T16:19:28,937][DEBUG][net.schmizz.sshj.transport.KeyExchanger][sshj-Reader-localhost/127.0.0.1:22-1699546768919][KeyExchanger.sendNewKeys] Sending SSH_MSG_NEWKEYS
[2023-11-09T16:19:28,937][TRACE][net.schmizz.sshj.transport.Encoder][sshj-Reader-localhost/127.0.0.1:22-1699546768919][Encoder.encode] Encoding packet #2: 15
[2023-11-09T16:19:28,937][TRACE][net.schmizz.sshj.transport.Decoder][sshj-Reader-localhost/127.0.0.1:22-1699546768919][Decoder.decode] Received packet #2: 15
[2023-11-09T16:19:28,937][TRACE][net.schmizz.sshj.transport.TransportImpl][sshj-Reader-localhost/127.0.0.1:22-1699546768919][TransportImpl.handle] Received packet NEWKEYS
[2023-11-09T16:19:28,937][DEBUG][net.schmizz.sshj.transport.KeyExchanger][sshj-Reader-localhost/127.0.0.1:22-1699546768919][KeyExchanger.handle] Received SSH_MSG_NEWKEYS
[2023-11-09T16:19:28,938][DEBUG][net.schmizz.keepalive.Heartbeater][sshj-Heartbeater-localhost/127.0.0.1:22-1699546768919][KeepAlive.run] Heartbeater Started with interval [60 seconds]
[2023-11-09T16:19:28,938][DEBUG][net.schmizz.keepalive.Heartbeater][sshj-Heartbeater-localhost/127.0.0.1:22-1699546768919][KeepAlive.run] Heartbeater Sending after interval [60 seconds]
[2023-11-09T16:19:28,938][TRACE][net.schmizz.sshj.transport.Encoder][sshj-Heartbeater-localhost/127.0.0.1:22-1699546768919][Encoder.encode] Encoding packet #3: 02
[2023-11-09T16:19:28,943][DEBUG][net.schmizz.concurrent.Promise][sshj-Reader-localhost/127.0.0.1:22-1699546768919][Promise.deliver] Setting <<kexinit sent>> to `null`
[2023-11-09T16:19:28,943][DEBUG][net.schmizz.concurrent.Promise][sshj-Reader-localhost/127.0.0.1:22-1699546768919][Promise.deliver] Setting <<kex done>> to `SOME`
[2023-11-09T16:19:28,943][DEBUG][net.schmizz.sshj.SSHClient][Test worker][SSHClient.doKex] Key exchange took 0.011 seconds
[2023-11-09T16:19:28,943][DEBUG][com.delphix.appliance.server.dhm.impl.SshManagerImpl$SshClientImpl][Test worker][SshManagerImpl$SshClientImpl.connectWithAuthMethod] Connected with client: net.schmizz.sshj.SSHClient@743fad70 - LA: /127.0.0.1 LP: 57466 RA: localhost/127.0.0.1
[2023-11-09T16:19:28,944][DEBUG][net.schmizz.concurrent.Promise][Test worker][Promise.deliver] Setting <<service accept>> to `null`
[2023-11-09T16:19:28,944][DEBUG][net.schmizz.sshj.transport.TransportImpl][Test worker][TransportImpl.sendServiceRequest] Sending SSH_MSG_SERVICE_REQUEST for ssh-userauth
[2023-11-09T16:19:28,944][DEBUG][net.schmizz.concurrent.Promise][Test worker][Promise.tryRetrieve] Awaiting <<service accept>>
[2023-11-09T16:19:28,945][DEBUG][net.schmizz.sshj.transport.TransportImpl][sshj-Reader-localhost/127.0.0.1:22-1699546768919][TransportImpl.handle] Received SSH_MSG_EXT_INFO
[2023-11-09T16:19:28,982][ERROR][net.schmizz.sshj.transport.TransportImpl][sshj-Reader-localhost/127.0.0.1:22-1699546768919][TransportImpl.die] Dying because - Broken transport; encountered EOF
net.schmizz.sshj.transport.TransportException: Broken transport; encountered EOF
        at net.schmizz.sshj.transport.Reader.run(Reader.java:58)

Based on the timings of these messages, I conjectured that the hearbeat (SSH_MSG_IGNORE=2) is sent by SSHJ to the server right after the server has sent and received SSG_MSG_NEWKEYS and thus switched to the new keys (according to https://www.ietf.org/rfc/rfc4253.txt) but right before SSHJ itself has updated its keys, which happens at KeyExchanger.gotNewKeys, when handling the SSG_MSG_NEWKEYS from the server. As a consequence, the heartbeat is encoded with the old keys and the server tries to decode it with the new keys, which results in garbage (e.g. Bad packet length 2412619996 in the server log).

I am able to reproduce the issue consistently by:

  1. Attaching a debugger to my java program that uses SSHJ.
  2. Setting a breakpoint at HeartBeater.doKeepAlive.
  3. Setting another breakpoint at the entrance of KeyExchanger.gotNewKeys. Once this breakpoint is hit, the server has switched to the new keys but the SSHJ client hasn't yet.
  4. Once both breakpoints are hit, resume the hearbeat thread so that it sends the heartbeat to the server using the old keys.
  5. Resume the reader thread that was stopped at KeyExchanger.gotNewKeys.

After 4, the server logs will show Bad packet length <garbage> and after 5 the SSHJ client's reader thread or the hearbeater thread will detect a broken transport and close the connection.

Solution

When the server and client are switching from the old to the new keys at the end of the key exchange, there shouldn't be any messages being sent other than final key exchange messages and new service requests (e.g. user auth). Therefore, I propose to start of the heartbeater thread after, rather than before, the key exchange completes.

Commit 2: Updated the PR to address the possibility of this same bug happening during re-keying. I generalized the solution to add IGNORE to the cases in which TransportImpl.write waits for an ongoing key exchange to complete before writing. This covers both the initial key exchange and any subsequent re-keying.

Note that the other keep-alive provider, KeepAliveRunner, uses the message GLOBAL_REQUEST (80) which is already covered in TransportImpl.write by the check !m.in(1, 49).

Testing

Using the same breakpoints approach I used to reproduce the issue, I verified that now the heartbeater thread will block in Transport.write at kexer.waitForDone() and will unblock only after the key exchange has completed. The server now always recognizes this message and no longer closes the connection.

Note that, in theory, while the hearbeater is in Transport.write, another thread could potentially start a key exchange after the if (kexer.isKexOngoing()) check but before the actual write of IGNORE happens, and switch to new keys right after that write (so the server could still try to decode with new keys an IGNORE encoded with old keys). However, the writeLock acquired at the beginning of Transport.write should guarantee that a key exchange cannot happen during that time because the exchange needs to write its own messages -- which it can't do until the lock is released.

@rasantel
Copy link
Contributor Author

It would be nice if @exceptionfactory also looks into this PR, who has dealt with hearbeat issues in the past too.

@exceptionfactory
Copy link
Contributor

Thanks for summarizing this issue @rasantel.

Unfortunately just moving the doKex() invocation is unlikely to provide a reliable solution. The heartbeat process runs in a separate thread, so I believe there is still a potential race condition with either approach.

@rasantel
Copy link
Contributor Author

rasantel commented Nov 10, 2023

Thanks for summarizing this issue @rasantel.

Unfortunately just moving the doKex() invocation is unlikely to provide a reliable solution. The heartbeat process runs in a separate thread, so I believe there is still a potential race condition with either approach.

Thanks for checking, @exceptionfactory . Yes, the heartbeater still runs in its own thread, but by starting that thread after the key exchange completes, this change guarantees that it will always send hearbeats with the new keys. The bug happens when it's sent with the old keys.

Maybe you mean that this could still be a problem if there is a re-keying later? I see now that the same bug could happen during re-keying. I'll look into providing a better solution.

@exceptionfactory
Copy link
Contributor

Thanks for checking, @exceptionfactory . Yes, the heartbeater still runs in its own thread, but by starting that thread after the key exchange completes, this change guarantees that it will always send hearbeats with the new keys. The bug happens when it's sent with the old keys.

Maybe you mean that this could still be a problem if there is a re-keying later? I see now that the same bug could happen during re-keying. I'll look into providing a better solution.

I may have been thinking of a different issue, so on further review, it looks like TransportImpl.doDex() calls KeyExchanger.startKex(true), indicating that it should wait for completion. Based on that, it does seem like waiting to start the heartbeat thread until after the key exchange should work.

@rasantel
Copy link
Contributor Author

@exceptionfactory I updated the PR to address the possibility of this same bug happening during re-keying. I generalized the solution to add IGNORE to the cases in which TransportImpl.write waits for an ongoing key exchange to complete before writing. This covers both the initial key exchange and any subsequent re-keying.

@exceptionfactory
Copy link
Contributor

@exceptionfactory I updated the PR to address the possibility of this same bug happening during re-keying. I generalized the solution to add IGNORE to the cases in which TransportImpl.write waits for an ongoing key exchange to complete before writing. This covers both the initial key exchange and any subsequent re-keying.

Thanks for implementing the generalized solution @rasantel. Hopefully @hierynomus can take a closer look soon.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 0% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 68.45%. Comparing base (03f8b22) to head (9343b80).
Report is 7 commits behind head on master.

❗ Current head 9343b80 differs from pull request most recent head 6c26a68. Consider uploading reports for the commit 6c26a68 to get more accurate results

Files Patch % Lines
...java/net/schmizz/sshj/transport/TransportImpl.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #911      +/-   ##
============================================
- Coverage     68.96%   68.45%   -0.52%     
+ Complexity     1448     1407      -41     
============================================
  Files           208      207       -1     
  Lines          7602     7504      -98     
  Branches        658      642      -16     
============================================
- Hits           5243     5137     -106     
- Misses         2009     2023      +14     
+ Partials        350      344       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hierynomus hierynomus merged commit 607e805 into hierynomus:master Apr 18, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants