-
-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SFTP uploads randomly fail #1851
Comments
Could you do |
Thanks for that! We've just released a change to log the result of this method, but unfortunately it only shows one error, that was also in the SSH logs. This is the same for both successful and failed uploads:
|
So I feel like this code snippet (SFTP::partial_init_sftp_connection) is prob being hit: $this->channel_status[self::CHANNEL] = NET_SSH2_MSG_CHANNEL_OPEN;
$response = $this->get_channel_packet(self::CHANNEL, true);
if ($response === true && $this->isTimeout()) {
return false;
}
Like in 3.0 every public function calls private function precheck()
{
if (!($this->bitmap & SSH2::MASK_LOGIN)) {
return false;
}
if ($this->pwd === false) {
return $this->init_sftp_connection();
}
return true;
}
So that'd explain you're having multiple NET_SSH2_MSG_CHANNEL_OPEN'S / NET_SSH2_MSG_CHANNEL_OPEN_CONFIRMATION's in your log. Now, the question is... why would the timeout flag be set? A few possibilities exist (since the log that you posted doesn't any overwhelmingly slow packets):
Anyway, to confirm this, I guess the first thing I'd do is Assuming that that outputs yes I'd prob modify |
Thanks for your input, this is some very valuable information! Since this only happened after the 3.0 update, the precheck could be interesting indeed. Regarding the two possibilities you mentioned:
I trust that Flysystem correctly applies the configuration we give it, so the timeout should be 20 seconds. Given the short times in the logs, I think that should be plenty
This is an interesting one and could definitely be the case, as the Laravel service container apparently shares its bound instances across all jobs running in the same queue through Horizon. I'm not sure if the storage connection is actually stored in that service container as well, but I'll look into that right away as well as logging the |
Back to say that you are right! It does say that there's a timeout whenever this fails |
Try https://github.com/terrafrost/phpseclib/tree/timeout-debug and lmk what you get as the output. Here's the specific commit that's new to that branch: |
Got this log from a failed upload:
And this from a successful one, 30 seconds later:
Other than the latter obviously having a lot more |
I apologize for the delay. Anyway, from what you posted, it doesn't seem like there should be a timeout. Maybe I added another commit to that same branch with more debug code. Thanks! |
Ah it does seem like it. We are now indeed getting the stream_select error messages as described in the code comment. For example:
It is now also no longer saying there's a timeout. These are the full logs again, though they still seem the same: Failed upload
Successful upload
Thanks again! |
Hi @daanhaitsma, the same issue is happening to me. Could you let me know if you found a solution to this problem? |
@mozex - did you try the same debugging branch that the original poster tried?: https://github.com/terrafrost/phpseclib/tree/timeout-debug If not then if you could try it and lmk what the output is that'd be great. If you are getting the same error as the OP was (stream_select(): Unable to select [4]: Interrupted system call (max_fd=29)) then that would suggest that you're getting a bunch of interrupted system calls. It'd be interesting to know what was causing all those interruptions to happen. I could look into implementing a workaround, as well, but (1) the timeout functionality prob wouldn't work, (2) all the interrupted system calls could have other hard to predict consequences as well and (3) as I've never been able to reproduce this problem it'd be hard for me to actually test anything. Because of (3), SSH access to a server that reproduces the problem would help (if the issue is indeed with stream_select, as it was for the OP) |
Assuming the issue is due to
So 4 corresponds to errno, which in turn, I guess, corresponds to EINTR. quoting https://stackoverflow.com/a/14262151/569976 :
One thing that's unclear to me is if the time elapsed will still be updated. Also, how frequently are you getting these interrupts? Either way, it'd be cool to have access to a system that reproduces this. And I don't mean SFTP access - I mean SSH2 access. Like I need to be able to run phpseclib from that machine. Thanks! |
Try this branch and see if it helps: https://github.com/terrafrost/phpseclib/tree/3.0-stream-select-interrupt |
Thank you for your quick response; I've been trying to reproduce the error since then, but it has been fixed, and I don't know why! I haven't done any debugging on it, and I don't have any more information about it. I waited a couple of days to reproduce it but no luck. |
Hi @terrafrost , hope you are doing well. One of my client also faced this issue. We tried the latest version of V2 and V3 of phpseclib. V2 doesn't report any error and fails while V3 report this error: Then I applied the fix from this branch: https://github.com/terrafrost/phpseclib/tree/3.0-stream-select-interrupt in both V2 and V3 and tested those versions on client site. Both versions with the fix worked! |
Thanks for that! I went ahead and merged it: |
We're using phpseclib in Laravel through the
thephpleague/flysystem-sftp-v3
package and after updating we have encountered a lot of errors. We have previously created an issue at Flysystem as wellQuick overview of the versions:
From the flysystem package, we are randomly getting an
UnableToCreateDirectory
exception, when directory has already existed for years. Now after storing the logs from phpseclib, we believe we've found some interesting leads.NET_SSH2_MSG_CHANNEL_OPEN
1 time, receivesNET_SSH2_MSG_CHANNEL_OPEN_CONFIRMATION
1 timeNET_SSH2_MSG_CHANNEL_OPEN
5 times, but receivesNET_SSH2_MSG_CHANNEL_OPEN_CONFIRMATION
only 4 timesThese are the parts of the logs that I think are most interesting and show a clear difference:
Now our next question is: What would cause
NET_SSH2_MSG_CHANNEL_OPEN
to be sent twice in a row (in the beginning, beforeNET_SSH2_MSG_GLOBAL_REQUEST
), without receiving eitherNET_SSH2_MSG_CHANNEL_OPEN_CONFIRMATION
orNET_SSH2_MSG_CHANNEL_OPEN_FAILURE
in between?The text was updated successfully, but these errors were encountered: