Waiting to close ssh connection until socket is done sending data by KaushikMalapati · Pull Request #6 · pcdshub/switchtool

KaushikMalapati · 2025-11-23T05:32:53Z

https://jira.slac.stanford.edu/browse/ECS-6845

If you open the switchtool gui or press refresh, you often get SSHExceptions like this

Exception (client): Error reading SSH protocol banner
Traceback (most recent call last):
  File "/cds/group/pcds/pyps/conda/py39/envs/pcds-5.4.2/lib/python3.9/site-packages/paramiko/transport.py", line 2271, in _check_banner
    buf = self.packetizer.readline(timeout)
  File "/cds/group/pcds/pyps/conda/py39/envs/pcds-5.4.2/lib/python3.9/site-packages/paramiko/packet.py", line 380, in readline
    buf += self._read_timeout(timeout)
  File "/cds/group/pcds/pyps/conda/py39/envs/pcds-5.4.2/lib/python3.9/site-packages/paramiko/packet.py", line 609, in _read_timeout
    raise EOFError()
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/cds/group/pcds/pyps/conda/py39/envs/pcds-5.4.2/lib/python3.9/site-packages/paramiko/transport.py", line 2094, in run
    self._check_banner()
  File "/cds/group/pcds/pyps/conda/py39/envs/pcds-5.4.2/lib/python3.9/site-packages/paramiko/transport.py", line 2275, in _check_banner
    raise SSHException(
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner

The exceptions happens when calling SSHClient.open

switchtool/psnet/survey/command.py

Line 220 in 160f22b

self.ssh.connect(host, self.port, self.user, self.pw,

and is caught, but I believe that it's still printed because paramiko raises an SSHException while handling the initial EOFError, which I don't know can be silenced without changing the library. Why do we get this error in the first place? If you don't send the exit command to a switch but still call ssh.close you no longer get the error. Presumably the switch continues to do some cleanup of the ssh session after and trying to create a new connection during that time fails. I saw this on all switch types that switchtool supports except cisco, althought the only cisco switch I know of is switch-fee-far which I can't ping. I found some discussion about ssh bugs fixed by newer firmware on some forums discussing ruckus models, but I'm not sure if it would help in this case and I also think it'd be unreasonable to try and update all of our switches purely for qol.

I could have just not called exit, but this felt unsatisfactory and also would have forced the removal of recv_exit_status (it would block forever since we are no longer exiting). I tried polling until the socket data was empty which worked, but I thought using the file descriptor looked a bit cleaner - the read call will block until EOF (socket is done reading) or we timeout (so we can't ever block indefinitely) or we get OSError (if the socket closes before/during the read). I ran this on every switch that pings and two switches a few dozen times and it seems to work. I'm not 100% sure since it might just be a lot more unlikely because of the additional time spent before attempting a new connection, but I'm also not sure how to debug this on a lower level since it seems like it's all happening switch-side (the original error happens when paramiko expects to read the ssh banner but get a zero byte message instead). Happy to discuss alternative ways of trying to prevent this issue or to find out what causes it in more detail.

tangkong · 2025-11-24T17:49:31Z

I haven't worked much with the switchtool/switches in general, but some preliminary poking around suggests that paramiko might be hamstringing us, being too low level. I wonder if eventually we consider moving to some abstraction library like netmiko.

I'll put away the showerthoughts and take a look at the actual problem/solution here

KaushikMalapati · 2025-11-24T19:54:14Z

netmiko is a good idea, but it uses paramiko under the hood and throws the same SSHExceptions when doing looping connects

from netmiko import ConnectHandler
for i in range(10):
    with ConnectHandler(secret='secret', device_type='device_type', host='switch', username='username', password='password\') as net_connect:
        net_connect.enable()
        a = net_connect.send_command("show logging")
        print(i)
        net_connect.exit_enable_mode("exit")

althought maybe it has a easier way to block until complete cleanup?

tangkong · 2025-11-24T20:15:02Z

My brief reading suggested that netmiko might wrap paramiko and handle some of the nuts and bolts of ssh connections for us.

I'm not going to suggest this PR becomes an investigation and migration to another ssh communication library. From what I can see, this seems to work well.

tangkong

This seems to fix the problem for me, as strange as the solution is (after we exit, we read from the socket again?)

Perhaps improving this requires a more concerted effort

Waiting to close ssh connection until socket is done sending data

daca46e

KaushikMalapati requested a review from a team as a code owner November 23, 2025 05:32

tangkong approved these changes Nov 24, 2025

View reviewed changes

KaushikMalapati merged commit a1fbe8d into pcdshub:master Dec 1, 2025

KaushikMalapati deleted the sshspam branch December 1, 2025 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Waiting to close ssh connection until socket is done sending data#6

Waiting to close ssh connection until socket is done sending data#6
KaushikMalapati merged 1 commit intopcdshub:masterfrom
KaushikMalapati:sshspam

KaushikMalapati commented Nov 23, 2025

Uh oh!

tangkong commented Nov 24, 2025

Uh oh!

KaushikMalapati commented Nov 24, 2025

Uh oh!

tangkong commented Nov 24, 2025

Uh oh!

tangkong left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KaushikMalapati commented Nov 23, 2025

Uh oh!

tangkong commented Nov 24, 2025

Uh oh!

KaushikMalapati commented Nov 24, 2025

Uh oh!

tangkong commented Nov 24, 2025

Uh oh!

tangkong left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants