New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common up break out of Transport.run() loop #1041

Merged
merged 1 commit into from Aug 23, 2017

Conversation

Projects
None yet
2 participants
@radssh
Contributor

radssh commented Aug 19, 2017

Can’t seem to reason out any advantage of clearing self.active and
calling self.packetizer.close() in these situations instead of simply
breaking out of loop and allowing the additional conditional cleanups
to be done. Currently looking into tackling some needed cleanup in
auth_handler, and not having the auth_handler.abort() called on server
disconnect feels like a bug - who knows?

Common up break out of Transport.run() loop
Can’t seem to reason out any advantage of clearing self.active and
calling self.packetizer.close() in these situations instead of simply
breaking out of loop and allowing the additional conditional cleanups
to be done. Currently looking into tackling some needed cleanup in
auth_handler, and not having the auth_handler.abort() called on server
disconnect feels like a bug - who knows?
@bitprophet

This comment has been minimized.

Member

bitprophet commented Aug 21, 2017

Thanks for poking at this!

Have you looked at the git blame history for these lines to see when/why they were added? Packetizer close calls are often aligned with hard-to-troubleshoot hang issues so I get really twitchy when anyone touches them 😁

If these lines are in the class of "existed since time immemorial" (like much of the codebase, dating from e.g. 2004-ish) and we can eg run some simple tests in a loop for a long time w/o encountering issues, probably fine; but if anyone's added or moved these lines in the last few years, probably requires more thought.

@radssh

This comment has been minimized.

Contributor

radssh commented Aug 21, 2017

I did not dig into the code history initially (but should have). Jumping in the way-back machine to

elif ptype == MSG_DISCONNECT:
self._parse_disconnect(m)
self.active = False
self.packetizer.close()
break
elif ptype == MSG_DEBUG:
self._parse_debug(m)
continue
if self.expected_packet != 0:
if ptype != self.expected_packet:
raise SSHException('Expecting packet %d, got %d' % (self.expected_packet, ptype))
self.expected_packet = 0
if (ptype >= 30) and (ptype <= 39):
self.kex_engine.parse_next(ptype, m)
continue
if self._handler_table.has_key(ptype):
self._handler_table[ptype](self, m)
elif self._channel_handler_table.has_key(ptype):
chanid = m.get_int()
if self.channels.has_key(chanid):
self._channel_handler_table[ptype](self.channels[chanid], m)
else:
self._log(ERROR, 'Channel request for unknown channel %d' % chanid)
self.active = False
self.packetizer.close()
else:
the pertinent part of that section looks surprisingly untouched in the last 12 years!

There are additions to the end of loop cleanup code since then, which appear to be reasonable enough additions, with conditional ifs and try:, but (shock) not matching the steps listed in Transport.close() or Transport.stop_thread(), which may be leading to the thread hanging joy.

Looking at the code base then, and seeing that Transport was a base class that auth_transport.py was building on to bolt-on authentication, goes a long way toward explaining why the authentication logic is currently so convoluted 😈

@bitprophet

This comment has been minimized.

Member

bitprophet commented Aug 23, 2017

Color me surprised that the last few years' worth of periodic hang/shutdown bugfixes didn't actually catch everything. So this falls under my 1st category of "not touched since prehistory", meaning, sure, let's try it. For funsies I'll run the test suite in your branch in an error-counting loop thing I have, just in case.

@bitprophet

This comment has been minimized.

Member

bitprophet commented Aug 23, 2017

Ran test suite 100 times in a row, zero errors of any kind (unless you count my recently-tweaked error counter dividing by zero...heh). worksforme.

@bitprophet bitprophet added the Support label Aug 23, 2017

@bitprophet bitprophet merged commit cb7fb16 into paramiko:master Aug 23, 2017

2 of 3 checks passed

codecov/patch 0% of diff hit (target 75.17%)
Details
codecov/project 75.26% (+0.08%) compared to 08f5037
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

bitprophet added a commit that referenced this pull request Aug 23, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment