Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallel download #945

Merged
merged 5 commits into from
Mar 14, 2016
Merged

Conversation

raxbg
Copy link

@raxbg raxbg commented Feb 11, 2016

This may be completely wrong implementation, but it worked in my tests. It sped up the download from ~400 Kb/s to ~12 Mb/s.

@terrafrost
Copy link
Member

So, basically, you're doing to put() what's already been done to get(). ie. with put() 50 requests are sent, at a time, and then 50 responses are read back.

Overall, I like the idea but I think the implementation is a little wonky.

For example,

                $possible_packet_sizes = array($this->max_sftp_packet, $size);
                if ($limit - $read > 0) {
                    $possible_packet_sizes[] = $limit - $read;
                }
                $packet_size = min($possible_packet_sizes);

I think this would be better:

                $packet_size = $length > 0 ? min($this->max_sftp_packet, $length - $read) : $this->max_sftp_packet;

With that $limit could be deleted as well. But I think we can do a little better than that. So there's this part:

        if ($length > 0 && $length <= $offset - $start) {
            if ($local_file === false) {
                $content = substr($content, 0, $length);
            } else {
                ftruncate($fp, $length + $res_offset);
            }
        }

With your changes the $length > 0 check is no longer necessary. I think we can probably eliminate that whole section if the $length <= $offset - $start thing is incorporated into the $packet_size assignment.

Give me a few days to review this in more depth.

Thanks!

@raxbg
Copy link
Author

raxbg commented Feb 14, 2016

So, basically, you're doing to put() what's already been done to get()

Well, it is the other way around, the modified function is get() with the technique already present in put(). But I guess you meant to say that.

With that $limit could be deleted as well.

$limit is also used to prevent unnecessary requests if enough for the desired amount of data have already been sent.

Also, do you think it is a good idea to add one more argument to the get() function, which will be for a progress callback like in the put() function. It will be very convenient.

Thanks :)

@raxbg
Copy link
Author

raxbg commented Feb 17, 2016

Hi, I just wanted to let you know that I made some changes, because I had issues with the previous implementation.

I changed this section a little

                $possible_packet_sizes = array($this->max_sftp_packet, $size);
                if ($limit - $read > 0) {
                    $possible_packet_sizes[] = $limit - $read;
                }
                $packet_size = min($possible_packet_sizes);

but I fear that by using $packet_size = $length > 0 ? min($this->max_sftp_packet, $length - $read) : $this->max_sftp_packet; $length - $read may become negative if there there is a bug somewhere :D

@terrafrost
Copy link
Member

but I fear that by using $packet_size = $length > 0 ? min($this->max_sftp_packet, $length - $read) : $this->max_sftp_packet; $length - $read may become negative if there there is a bug somewhere :D

Well $this->max_sftp_packet can't be negative because it's hard-coded. $length can't be negative either because of the whole $length > 0 that my proposal has you doing. Thus the only way for $packet_size to be negative is if $read > $length but it can't because in your while loop you're doing $length < 0 || $read < $length. So if $length < 0 then $packet_size is $this->max_sftp_packet. If $length > 0 then $read < $length, at which point $length - $read is never negative.

            $request_id_offset = 5;// just like that, a random number

idk why you're doing this. I mean, sure, I could see some benefit if you were worried about getting responses in a non-sequential order, but you're not checking for that. As is it seems that all you're doing is using it to keep track of how many NET_SFTP_READ packets you've sent and if that's all that you're using it for... you don't need to use $this->request_id to keep track of that. And if you're not using $this->request_id you can do away with the frequent resetting of it (eg. $this->request_id=1;) and you can just have it start from 0 instead of 5.

        while (!$break_loop) {

Why not do while (true) {? Then instead of $break_loop = true;, later, you can do break;.

@raxbg
Copy link
Author

raxbg commented Feb 23, 2016

$read > $length is exactly what I was afraid may happen due to a bug in my implementation. I will change it and give it a try today.

Regarding the $request_id_offset - I am not very familiar with the SSH and SFTP protocols. I scanned the RFC to get some basic idea of how the library worked and what was implemented how, but I did not find information when should different request_ids be sent. Initially I planned to check the request_id on each response and buffer if it is not sequential, but then it turned out the library does not care about the request_id in the responses. I was still not sure if everything will work well if I get rid of the $request_id_offset, so I left it there. I will remove it and test today. However do you think it will be a good idea to set the request_id properly when reading the responses? It looks like this may come very handy in HHVM. I imagine downloading multiple files that way 😄

Why not do while (true) {? Then instead of $break_loop = true;, later, you can do break;.

No particular reason, both approaches look the same to me. If I am to do while (true) { I will have to add an if statement, after the while loop which is reading the responses, in order to check if an error happened. I just decided to go the other way with a control variable.

I'll do the suggested changes today. Thanks for the help!

@raxbg
Copy link
Author

raxbg commented Feb 24, 2016

Hi, I just wanted to let you know that there is a memory leak when transferring thousands of small files. I think I fixed it and I am now testing. If everything goes well I will do 1 more commit later today, or tomorrow.

@raxbg
Copy link
Author

raxbg commented Feb 25, 2016

I pushed the changes. The memory leak seems fixed now.

@terrafrost terrafrost merged commit 75d4c3b into phpseclib:master Mar 14, 2016
@terrafrost
Copy link
Member

I apologize for the tardyness of this response - I've been busy moving.

Anyway, this has been merged. I did make a few tweaks.

  • I renamed $subtemp to $tempoffset, which I think is a more accurate name
  • I renamed $break_loop to $clear_responses (same reason as above)
  • I took the while loop out of the if ($i > 0) check (which I deleted) and added a if (!$i) break; check before the while loop.

Thanks!

terrafrost added a commit to terrafrost/phpseclib that referenced this pull request Aug 28, 2023
the code that's being removed has its origins in
65193d9.
in that commit the packet length is set outside of the while loop.

this would continue to be the case until
phpseclib#945.
terrafrost added a commit to terrafrost/phpseclib that referenced this pull request Aug 28, 2023
the code that's being removed has its origins in 65193d9. in that commit the packet length is set outside of the while loop. this would continue to be the case until phpseclib#945.
terrafrost added a commit to terrafrost/phpseclib that referenced this pull request Aug 28, 2023
the code that's being removed has its origins in 65193d9. in that commit the packet length is set outside of the while loop. this would continue to be the case until phpseclib#945.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants