-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Channel.close() race condition in pipe.py #2274
base: main
Are you sure you want to change the base?
Fix Channel.close() race condition in pipe.py #2274
Conversation
`self._closed = True` was moved to the top. Built-in assignment is python is atomic and there is no need to protect it with a lock. If there is a `Bad file descriptor` (errno == 9). We can safely disregard it if `._closed` is set
Fixes this BUG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of questions about the implementation. Thanks for the PR!
paramiko/pipe.py
Outdated
try: | ||
os.write(self._wfd, b"*") | ||
except OSError as e: | ||
if e.errno == 9 and self._closed: | ||
# The pipe was closed, no need to do anything | ||
return | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not also need something analogous to protect the os.read(self._rfd, 1)
call above in clear()
?
(We might not; again, just raising the question.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the pipe
's perspective, yes. And it would be wise to protect it. However, the way the pipe is used, this does not happen. The same thread
- Opens,
- clears and
- closes
the pipe. So there is no race condition between .clear()
and `.close().
A second thread sets it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same request, please add a comment here explaining why a similar race catch is not (currently) required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
paramiko/pipe.py
Outdated
os.close(self._rfd) | ||
os.close(self._wfd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pettitpeon, is there any chance of either of these two .close()
calls failing, making self._closed == True
a possibly inaccurate representation of the pipe state?
(I would think that a failure of either of these two calls would signal an error state for the pipe such that self._closed == True
is a better state than == False
... but I thought I'd raise the question.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any chance of either of these two .close() calls failing?
Yes, it could happen. If either .close()
fails, it would raise an OSError
. In that case some FD might stay open, but the Pipe is not usable anyways. There is no use-case of using it after .close()
. Worst case scenario, we leak resources until the application exits. The most probable scenario is that the close() exception is not handled and your program terminates anyways.
A small improvement would be to have single try
s on each .close()
. In that case if, the first fails and leaks, we might still close the second one correctly.
In any case, I see .close()
errors as non-handable, we close() file descriptors as a "best-effort" to release the resource, but cannot really do anything if it fails.
making
self._closed == True
a possibly inaccurate representation of the pipe state?
Not really, after close()
has been called, the object is disposed and no one should it anyways. The "broken state" is meaningless.
Last, the pipe
is most probably a child of Channel
and when the channel gets destroyed it tries to re-close, which fails if the pipe was previously closed, but it is disregarded by a catch-all try. This is OK, and aligns with the idea that it is a non-handable error.
Line 136 in f4d7fec
def __del__(self): |
paramiko/pipe.py
Outdated
try: | ||
os.write(self._wfd, b"*") | ||
except OSError as e: | ||
if e.errno == 9 and self._closed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for this if
condition to ever be True
?
We've already trapped for self._closed == True
with the first if
in the method, so it seems like we'd only ever hit this point with self._closed == False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the flowing (race condition) can happen
- The
write
thread sees._closed == False
and continues - The
close
thread closes the descriptors before thewrite
thread writes on line 67 - The
write
fails towrite()
because the FD has been closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, this is the crux of the race condition, got it.
For the benefit of future readers of the code, please add a comment here describing the race condition as you just described it.
Otherwise we risk someone thinking the same thing I did and removing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I would think it would be challenging to do, but: is there any good way to write a test for this? I would figure that any such test would require setting up a test harness with a server to connect to, and then run the open/close cycle you describe in the issue for some period of time long enough to trigger the race condition 90%+ of the time... but that would make for a flaky test, and that'd be not great. Interested in any thoughts you have on it, though. |
Oh, also -- I added |
I will look into this. It might be possible by tunneling/forwarding an ssh connection and sending ssh echos in a loop. |
Unfortunately, I could not think of a way of easily reproducing the bug. I tried tunneling SSH connections through a local forward (demo/forward.py), but SSH is too smart and rejects connections from the server side if it is being spammed (DoS attack mitigation). So I cannot open and close connections fast enough to run into the race condition. |
I fixed the review issues. I do not know what to to about the changelog. |
Minor tweak to the PR, and then I think it's ready to flag for review by bitprophet. It still needs a CHANGELOG entry - you said you're not sure what to do about that... take a look at |
Co-authored-by: Brian Skinn <brian.skinn@gmail.com>
Co-authored-by: Brian Skinn <brian.skinn@gmail.com>
Gonna give it a try today |
Dear @bskinn @jun66j5 @bitprophet, What else is needed for the merge? thanks! |
Hi all again! please review the PR when possible |
# "best effort" approach | ||
try: | ||
os.close(self._rfd) | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why OSError
is not used rather than Exception
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we want to be as broad as possible in the try/except. If any exception is raised during the closing of the rd file descriptor, we still want to guarantee the closing of the wr file descriptor. Catching only one exception type could be too narrow and leave the pipe in an invalid state (one descriptor open and the other one closed). But I won't be able to test my changes on Windows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we want to be as broad as possible in the try/except.
I'm unsure your comment. What I'm saying is:
try:
os.close(self._rfd)
except Exception:
pass
try:
os.close(self._wfd)
except Exception:
pass
should be
try:
os.close(self._rfd)
except OSError:
pass
try:
os.close(self._wfd)
except OSError:
pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the problem with catching OSError
is that if os.close() raises a different exception for whatever unknow reason, the pipe could go into an invalid state and potentially leak opened file descriptors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try:
os.close(self._rfd) # imagine this throws RuntimeError
except OSError:
pass
try:
os.close(self._wfd) # in that case, the _wrd would not be closed
except OSError:
pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave these except
s as a decision for bitprophet.
This PR is to fix issues in |
TBH, I did not look into the windows pipe since I use paramiko only in linux. If desired I can look into it and replicate the changes in the WindowsPipe. Still I'd like to merge this fix asap to remove my local patches |
I figure this is up to bitprophet whether he wants to merge with just the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting nit.
It looks like the two failing CircleCI checks were due to an error on CircleCI's side. Likely all is ok with the test suite here. |
Co-authored-by: Brian Skinn <brian.skinn@gmail.com>
Thanks I applied your formatting suggestion. I'll await the final review. Cheers |
Great, thanks! And, thanks for your patience here. bitprophet is in the middle of a dry spell in terms of his open source bandwidth, which has slowed everything down across his projects. |
self._closed = True
was moved to the top. Built-in assignment is python is atomic and there is no need to protect it with a lock.
If there is a
Bad file descriptor
(errno == 9). We can safely disregard it if._closed
is set