Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial chunks passed to content_cb #260

Open
buttafoo opened this issue Jul 26, 2017 · 0 comments
Open

Partial chunks passed to content_cb #260

buttafoo opened this issue Jul 26, 2017 · 0 comments

Comments

@buttafoo
Copy link

buttafoo commented Jul 26, 2017

Using libwww-perl-6.26 and Net-HTTP-6.16 and the following code:

#!/usr/bin/perl

use strict;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new(keep_alive => 1);
### This doesn't work either
### $ua->add_handler('response_data' => \&stream_catcher_handler);

$ua->get('http://127.0.0.1:1081/', ':content_cb' => \&stream_catcher);

#### This doesn't work either:
#### $ua->get('https://jigsaw.w3.org/HTTP/ChunkedScript', ':content_cb' => \&stream_catcher);

sub stream_catcher
{
  my ($chunk, $response, $protocol) = @_;
  my $len = length($chunk);
  print "Stream CB Caught [$len]: $chunk\n";
  return 1;
}

... against the following faked http server:

perl -e '$|=1; print "HTTP/1.0 200 OK\r\nTransfer-Encoding: chunked\r\n\r\n" ;foreach $i (0 .. 5) { $s = "[$i " . ("x" x 8000) . " $i]"; printf("%x", length($s)); print "\r\n$s\r\n"; sleep 2}; print "0\r\n\r\n"' | nc -l 1081

Or even this site: https://jigsaw.w3.org/HTTP/ChunkedScript

I get back partial chunks. If I "use LWP::Protocol::Net::Curl" it works flawlessly. Here's an example of it cleaving the chunks:

Stream CB Caught [810]: [0 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Stream CB Caught [7195]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ......

I believe this to be an interaction between LWP::Protocol::http and Net::HTTP::Methods. It all boils down to Net/HTTP/Methods.pm:my_read(), my_readline() and read_entity_body(). LWP::Protocol::collect() calls read_entity_body() which first calls my_readline() and then calls my_read().

First, my_readline() reads 1024 bytes from the socket and stores it in $self->{http_buf}, my_read(), however, will either read from the http_buf if it's nonzero length, or it will read from the socket, but not both. LWP::Protocol::collect() keeps calling read_entity_body() until it reaches the end of the chunk, but with each call to read_entity_body() it calls my stream_catcher() callback.

The upshot is that my callback gets called multiple times for each chunk. If I set :read_size_hint to a huge number (64K) and set the Rest route to return 16K lines, my callback will be called twice, once with ~1024 bytes, and then again with 15K as it reads from the 1024K buffer and then the remainder from the socket, respectively. If I set the :read_size_hint to be lower, my callback will be called multiple times.

I also think this line in LWP::Protocol::http, which is used to read the headers, is also causing the initial chunk to be short:

my $n = $socket->sysread($buf, 1024, length($buf));

This code for Net::HTTP::Methods seems to fix it, I don't know if it breaks other cases though:

    sub my_read
    {
      die if @_ > 3;
      my $self = shift;
      my $len = $_[1];
      my $rlen = $len;
      my $results = '';
      my ($bn, $sn) = (0, 0);
      for (${*$self}{'http_buf'}) {
        if (length) {
          $_[0] .= substr($_, 0, $len, '');
          $bn = length($_[0]);
          $rlen -= $bn;
        }
        if ($rlen > 0) {
          $sn = $self->sysread($results, $rlen);
          $_[0] .= $results if $sn;
          $sn += $bn if (defined($sn) && $bn); # If the sysread returns undef, but there's something in the buffer, that length will get lost
          return $sn;
        }
        return $bn;
      }
    }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant