New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix SSHClient/Transport leak, remove unneeded references #952
Conversation
94a7061
to
b39ab50
Compare
4 similar comments
side note: is every test variant (with different python version) resulting in a separate coverage report ... |
b39ab50
to
f18a2cf
Compare
f18a2cf
to
d7dc296
Compare
I've twiddled the commits a bit, and force-pushed over the previous state. |
2 similar comments
Just turned off coveralls comments because yes, this is completely ridiculous. (CodeCov seems better anyhow and I mean to switch to that sometime...) Don't have time to specifically review & confirm/test this at the moment but I like how much red is in the PR and at a high level (and re: the related ticket) I'm +1 on the idea that the resource manager concept is cruft that needs axing. Thanks for this! |
@agronick could you run your test case on this branch, to confirm the leak is fixed for you? @omeranson could you run your use case on this branch, and confirm that the connection streams keep working for you? (since I removed the back-reference which is theoretically no longer needed) |
Sure, I will when I get to work tomorrow. Thanks for getting this merged so fast. |
I'm happy to be the author. I would have made a PR but there were a number of ways to fix it. I didn't know which one would be the most beneficial or if the ResourceManager did anything special I wasn't aware of. |
It's not merged yet ;) I've tested this branch a bit, and functionality-wise stuff still works, for my use cases (mostly fabric). But I've also run some minimal-case experiments, and Transport objects do not seem to be GCed unless they are explicitly close()ed. I've tested both python 2.7.13 and 3.6.1 on macOS. SSHClient objects are easily GCed and cause no problems once gone (now). But even if no channels are created on them, Transport objects remain, even after a sleep and a |
It's because of threads. From # for thread cleanup
_active_threads = []
def _join_lingering_threads():
for thr in _active_threads:
thr.stop_thread()
import atexit
atexit.register(_join_lingering_threads) from def run(self):
# (use the exposed "run" method, because if we specify a thread target
# of a private method, threading.Thread will keep a reference to it
# indefinitely, creating a GC cycle and not letting Transport ever be
# GC'd. it's a bug in Thread.)
# Hold reference to 'sys' so we can test sys.modules to detect
# interpreter shutdown.
self.sys = sys
# active=True occurs before the thread is launched, to avoid a race
_active_threads.append(self)
if self.server_mode:
self._log(DEBUG, 'starting thread (server mode): %s' % hex(long(id(self)) & xffffffff))
else:
self._log(DEBUG, 'starting thread (client mode): %s' % hex(long(id(self)) & xffffffff))
... very long run() method ...
except Exception as e:
self._log(ERROR, 'Unknown exception: ' + str(e))
self._log(ERROR, util.tb_strings())
self.saved_exception = e
_active_threads.remove(self) So this Long story short, it looks like So that's why this ResourceManager was created to tie the lifetime of the Transport to the SSHClient (but could still probably be replaced with a So it seems the choice is between:
|
Python documentation says:
So I'm not sure how much the "atexit thread cleanup" is needed. But |
I can imagine a possible fix, but it's a really huge refactor: Make the Transport not a Thread itself, but instead have a reference to a TransportThread. Make Channel a wrapper for a RawChannel (or whatever). The point is that while Channel and Transport refer to each other, and to the inner TransportThread and RawChannel, the inner TransportThread and RawChannel do not have references back out. So if the Transport and all Channel objs are no longer referred to by user code, they can be detected as a cycle and garbage collected, and the Transport can stop the TransportThread on its way out (doing what the SSHClient destructor did, but not while the Transport or a Channel or a ChannelFile are still being used by the user). I don't think any of us currently have the time for a change of that magnitude though ... |
e74121d
to
374cd37
Compare
New conclusion: This branch is an improvement over the current state. #949 was really about the fact that it was possible for the SSHClient and Transport to leak, even if There's a bunch of factors that contributed, but the smallest and most recent one is: the I think the best fix is (still) to get rid of both the back-reference and the ResourceManager. I'm going to remove the addition of the destructor because I think it's useless (Transport will never be destructed until after I've experimented and tested enough now - I think this change is good. |
@ploxiln My scenario works on with this change. Thanks. |
That's not sufficient because of the Channels. Consider calling So, a wrapper transport needs to override all the methods which return channels, and return wrapper channels (which have references to the wrapper transport). |
965b9ea
to
a6e6fe7
Compare
I have removed the last commit I added, the one adding a destructor to SSHClient. Now this pull request contains strictly only small improvement for each preference / use-case. This is not the GC-able Transport you wanted, but it means you need to monkey-patch only one thing (SSHClient destructor) to get that. This is a bit of progress. I encourage you to add the full wrapper Transport solution (or propose re-adding the simple SSHClient destructor) in a separate follow-up pull request. |
I'll try it. We should iron out specifics first and whats sufficient. Is it enough to implement what I posted above with changes for the channel to reference the transport? |
I think that wrapper classes for Transport, Channel, and ChannelFile, should be sufficient. Wrapper ChannelFile should have a reference to wrapper Channel should have a reference to wrapper Transport. Only when the user no longer references any of the wrappers, the wrapper transport can be garbage collected and close all the real stuff. EDIT: ChannelFile doesn't need two separate wrapper/real classes, it just needs to reference the wrapper Channel. |
I don't get why you need to wrap channel. As long as everything is accessing Transport and only Transport is accessing TransportThread it should be fine. Why would anything need direct access for TransportThread? If channel is GCed, and Transport has no references, the transport thread gets stopped. Now what is sufficient for making a pass-throughs. It is just a class like the one I posted above or do the methods need to be explicitly there? This would be the easiest way. 2nd easiest is to prefix every method on TransportThread with an underscore and have the public methods on transport pass though to TransportThread. That shows up in debuggers. 3rd Easiest is to start by moving the run method to TransportThread and the functions it calls. Try to split up each function between the two in a way that makes logical sense, while keeping all the public methods on Transport and adding pass-throughs where appropriate. |
TransportThread needs references to (real) Channels. So (real) Channels will never be GCed. (But, the TransportThread does not need references to ChannelFiles.) The longer this thread gets, the smaller the chance of any change being merged ;) EDIT: just clarifying that I am not one who can merge, so someone else will need to digest this thread. Thus the recommendation to break out the refactor for GC-enablement to a new pull-request, for easier digestion. |
Give me two weeks and I'll have something. I'd really like some clarification on how this should be done though. Is it important that the methods are explicitly declared on the helper classes? |
The back-reference from Transport to SSHClient was added because the SSHClient had a destructor that would close the Transport, and some users did not want the Transport closed when the SSHClient was garbage collected. The SSHClient destructor was not a normal destructor, it was implemented with the ResourceManager singleton. Together with the back-reference, this prevented the GC cycle detector from freeing the SSHClient and Transport.
It was only recently added, and it's not really needed after the ResourceManager removal. Removing it allows the SSHClient to be garbage-collected if only the Transport (and Channels) are still in use.
dc7d0f6
to
6f057de
Compare
6f057de
to
91d644a
Compare
rebased to resolve conflicts. also tweaked a related test I happened to notice so it could be re-enabled for python3 |
Fix to #949 - thanks @agronick for discovering and debugging the problem.
Caused by a fix in #891. But, it's a matter of choice and policy.
This should also be back-ported to
1.17.z
(This issue description has been updated.)
The back-reference from Transport to SSHClient was added because
the SSHClient had a destructor that would close the Transport,
and some users did not want the Transport closed when the SSHClient
was garbage collected.
The SSHClient destructor was not a normal destructor, it was
implemented with the ResourceManager singleton. This sometimes
prevented the GC cycle detector from freeing the SSHClient and
Transport even after the Transport Thread stopped running.
We can simplify these problems by just getting rid of the
ResourceManager, and the back-reference. Transports cannot be
garbage-collected while their Thread is running,
.close()
mustbe called (on the SSHClient or the Transport).