-
Notifications
You must be signed in to change notification settings - Fork 26
Fix a bug where downloading files is sometimes very slow #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note that if you prefer as small patch as possible, then the following patch is enough to fix this issue. diff --git a/lib/Net/HTTP/Methods.pm b/lib/Net/HTTP/Methods.pm
index 7b9aee8..e890b6e 100644
--- a/lib/Net/HTTP/Methods.pm
+++ b/lib/Net/HTTP/Methods.pm
@@ -273,9 +273,7 @@ sub my_readline {
my $bytes_read = $self->sysread($_, 1024, length);
if(defined $bytes_read) {
$new_bytes += $bytes_read;
- last if $bytes_read < 1024;
- # We got exactly 1024 bytes, so we need to select() to know if there is more data
- last unless $self->can_read(0);
+ last;
}
elsif($!{EINTR} || $!{EAGAIN} || $!{EWOULDBLOCK}) {
redo READ; |
This looks good to me. |
@oalders Thank you for reviewing this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I also asked for some other eyes and they agreed everything looks good.
I'll probably release a new Net::HTTP on Monday morning. |
I have run into the length modulo 1024 problem in Net::HTTP too. I don't know if it's in this code path or one of the others that have already been fixed, but I would love to see this fix in a released version. Thanks for finding and fixing it! |
@jonjensen this was pushed out about a week ago. Thanks for confirming about the bug. |
@oalders Sorry I missed that fact. Thanks! |
I'm using LWP::UserAgent (and Net::HTTP) for downloading large files.
Then I realized that downloading files sometimes took a lot of time and used a lot of memory.
How to reproduce this issue
I assume you use Linux.
(1) Prepare a 100MB file, and start HTTP server:
(2) Prepare the following script, which downloads file.txt to out.txt
(3) Execute download.pl with time and strace command several times.
Then you will see two cases.
In one case (a), you'll see
read(3, ...4096) = 4096
repeatedly, which is expected and takes 6~7sec.In the other case (b), you'll see
read(3, ...1024) = 1024
repeatedly, which is unexpected, takes 20+sec, and uses a lot of memory.What is the problem?
Net::HTTP::Methods uses my_readline() to read data from socket when it expect "line" data such as HTTP Status line, HTTP Header line, and it keep data in memory.
In current implementation of my_readline(), if
$sock->sysread($buf, 1024)
returns exactly 1024, then it will retry sysread() again.I don't see any reason to retry sysread() here, and if we retry sysread(), all data is stored in memory.
Note that we check whether we have "line" data or not at https://github.com/libwww-perl/Net-HTTP/blob/master/lib/Net/HTTP/Methods.pm#L258-L259
How to fix the problem
Do not retry sysread() when
$sock->sysread($buf, 1024) == 1024
.See also
Commits related to my_readline()
8311cd1
01f7937