Ssh connection occasionally aborted by dropbear while establishing the connection (regression) #173

m5jt · 2022-06-13T19:35:39Z

We use dropbear on the server side, to create an ssh session with a command-line application.
An ssh client connects to it. Sometimes the connection fails.
We repeat the connection sequence a large number of times in an automated test.

The problem started after upgrading from 2018-76 to 2022-82. We believe that the regression appeared in 2022-82 because it seems related to issue #85 (https://github.com/mkj/dropbear/pull/85/files)

We found the failure mechanism, as well as two potential fixes

Failure mechanism :

In function session_loop() (common-session.c) :
- The problem starts when select() returns 2 while chansess.pid is not yet initialized. When this happens :
  - ses.channel_signal_pending is set to 1 (because there is something in ses.signal_pipe[0])
  - process_packet() is also called (because there is something in ses.sock_in) and initializes ses.channels[0]
- From this point, the problem happens everytime.
Then, in function channelio() (common-channel.c)
- The « for » loop is executed once because sess.channel[0] is initialized
- Because ses.channel_signal_pending is set, do_check_close is set to 1, causing check_close() to be called
Then, sesscheckclose() (in srv-chansession.c) is called by check_close()
- At that moment, chansess->pid is not initialized yet.
- This causes sesscheckclose() to return 1, and the session is closed .

Potential fixes :

We found 2 « hacks » that make our test pass. But we do not know if they address the root cause of the problem or if they cause other issues.
The first possible hack is in session_loop(). Add condition « if (ses.channel_signal_pending == 0) » around the code block « /* process session socket's incoming data */ ».
The second possible hack is to modify the return statement in sesscheckclose() to remove the « chansess->pid == 0 » condition

mkj · 2022-06-14T02:09:23Z

Thanks for the report. I think the problem might occur when SIGCHLD arrives after loophandler() gets called, but before channelio()'s channel_signal_pending test. (loophandler() is the callback svr_chansess_checksignal())

If it's easy for you to currently reproduce, would you be able to try putting a call svr_chansess_checksignal(); at the start of sesscheckclose() ? I'll try add a reproducible test case for myself and see if there's a less bodgy fix.

m5jt · 2022-06-14T15:09:37Z

Hello mkj! Thanks for the response.

Our automated test can reproduce the problem rather quickly.
I tried the patch that you proposed in the previous comment. It did not fix the problem.
To make sure the setup is consistent with past observations, I also ran the same test :

with no mod at all – the failure is present
with our hack in session_loop() – the failure is absent

Note that for the tests today, I removed the debug traces we previously added, so they don’t interfere. These traces tend to increase the occurrence of the issue.

mkj · 2022-06-14T15:41:57Z

Ah yes, my theory doesn't make sense - I'd forgotten that the signal handler is no longer asynchronous (noticed some other things to clean up there too).

I'm a bit puzzled why ses.channel_signal_pending is set if chansess.pid isn't initialized. A sigchld should only occur after the pid has been filled out, unless another child process (unrelated to that channel) has exited? Could you give me some more details of your test setup so I can try reproduce it (or if you want to paste/email me debug traces I have a look at them)

m5jt · 2022-06-14T17:19:53Z

Deleted.

m5jt · 2022-06-16T19:59:42Z

Hello Matt. Thanks for the support.

Your last comment helped us understand how the problem happens. We previously made a modification in srv_auth.c (function recv_msg_userauth_request()) where we fork a child process to perform user/password validation.

This fork causes a call to sesssigchild_handler() (in srv-chansession.c) , which writes into ses.signal_pipe. This then causes session_loop() to set ses.channel_signal_pending to 1 before there is a pid in ChanSess.pid.

The reason why this worked in the past is probably because the older versions of dropbear did not nave the « chansess->pid==0 » condition in the return() statement of sesscheckclose().

And the reason why it doesn’t fail all the time is because most of the time, there is nothing to read from ses.sock_in in the same loop iteration.

I’m trying to find a way to deal with that. Do you have suggestions?
I can imagine a few potential ideas, but nothing solid yet.

Find a way to make sesssigchild_handler() ignore the fork made for password validation (don’t know if this is possible)
Avoid calling read_packet() in the same loop iteration when ses.channel_signal_pending == 1 (have not enough insight to foresee side effects)

m5jt · 2022-06-21T19:53:14Z

I close this issue. Thanks for the hints, this was very helpful
The problem was caused by a modification that we made that forks a child process, causing the SIGCHLD handler to be called and thus setting ses.channel_signal_pending to 1. It was solved by exiting the sigchld handler immediately at the begining in this situation.

m5jt closed this as completed Jun 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ssh connection occasionally aborted by dropbear while establishing the connection (regression) #173

Ssh connection occasionally aborted by dropbear while establishing the connection (regression) #173

m5jt commented Jun 13, 2022 •

edited

Loading

mkj commented Jun 14, 2022 •

edited

Loading

m5jt commented Jun 14, 2022

mkj commented Jun 14, 2022

m5jt commented Jun 14, 2022

m5jt commented Jun 16, 2022

m5jt commented Jun 21, 2022

Ssh connection occasionally aborted by dropbear while establishing the connection (regression) #173

Ssh connection occasionally aborted by dropbear while establishing the connection (regression) #173

Comments

m5jt commented Jun 13, 2022 • edited Loading

mkj commented Jun 14, 2022 • edited Loading

m5jt commented Jun 14, 2022

mkj commented Jun 14, 2022

m5jt commented Jun 14, 2022

m5jt commented Jun 16, 2022

m5jt commented Jun 21, 2022

m5jt commented Jun 13, 2022 •

edited

Loading

mkj commented Jun 14, 2022 •

edited

Loading