Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Copied over from subversion

  • Loading branch information...
commit 3ac3e9485424c4c8acad65b8312428cce68282b2 0 parents
@evdb authored
2  .gitignore
@@ -0,0 +1,2 @@
+HTTP-Async-*
+Makefile
67 CHANGES
@@ -0,0 +1,67 @@
+CHANGES to HTTP::Async
+
+0.09 - Thu Sep 13 18:58:13 BST 2007
+
+ * added requirement for Pod::Coverage >= 0.19 if perl >= 5.9.0
+
+ * moved polite.t test into t/ so that it gets run by the makefile.
+
+0.08 - Wed Sep 12 22:35:33 BST 2007
+
+ * Deleted Module::Build
+
+ * Removed test in bad-hosts.t that was unreliable. I think that it was failing
+ under certain proxy configs.
+
+0.07 - Sun Feb 18 15:00:46 GMT 2007
+
+ * Added proper handling of 304 responses based on code patch and test by
+ Tomohiro Ikebe from livedoor.jp
+
+0.06 - Tue Feb 6 10:48:15 GMT 2007
+
+ * Changed the request uri that is used so that it has the host in for proxy
+ requests and does not otherwise. This is to comply with the RFC for HTTP
+ ( http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2 ).
+
+0.05 - Fri Nov 17 08:42:49 GMT 2006
+
+ * Added ability to pass arguments to new to configure the async object.
+
+0.04 - Thu Sep 28 13:42:25 BST 2006
+
+ * Fixed stupid bug that caused the polite module to crash if the numbers of
+ requests per domain were not the same.
+
+0.03 - Wed Sep 27 11:00:14 BST 2006
+
+ * Created HTTP::Async::Polite that adds limits to the scraping to avoid over
+ stretching the domain being scraped.
+
+ * Increased the delay in poll-interval tests to stop them failing on slow
+ machines.
+
+ * Added pod tests, README and Makefile.PL in an attempt to achieve kwalitee.
+
+0.02 - Wed Sep 6 09:36:01 BST 2006 - svn r30
+
+ * Changed the timeout to be an inactivity timeout and added a
+ max_request_length to limit the amount of time that a request can be
+ running for.
+
+ * Added more diagnostics to the tests to try to find the bug that is causing
+ MIYAGAWA issues.
+
+ * Created TODO and CHANGES docs.
+
+ * Added error checking to catch connections that fail before the headers are
+ sent. (patch submitted by Egor Egorov)
+
+ * Added ability to specify proxy to use. (based on patch from Egor Egorov)
+
+ * Added 'add_with_opts' method that lets you override the default options
+ for this request.
+
+0.01 - XXXXXXXXXX - svn r24
+
+ * Initial release onto CPAN.
26 MANIFEST
@@ -0,0 +1,26 @@
+CHANGES
+diffs/connect.timeout.diff
+lib/HTTP/Async.pm
+lib/HTTP/Async/Polite.pm
+Makefile.PL
+MANIFEST This list of files
+README
+t/bad-connections.t
+t/bad-headers.t
+t/bad-hosts.t
+t/make-url-absolute.t
+t/not_modified.t
+t/pod-coverage.t
+t/pod.t
+t/polite.t
+t/poll-interval.t
+t/proxy.t
+t/real-servers.t
+t/redirects.t
+t/setup.t
+t/strip_host_from_uri.t
+t/template.t
+t/test_utils.pl
+t/TestServer.pm
+t/timeout.t
+TODO
27 Makefile.PL
@@ -0,0 +1,27 @@
+use strict;
+use warnings;
+
+use ExtUtils::MakeMaker;
+
+WriteMakefile(
+ 'NAME' => 'HTTP::Async',
+ 'VERSION_FROM' => 'lib/HTTP/Async.pm',
+ LICENSE => 'perl',
+ 'PREREQ_PM' => {
+ 'Carp' => 0,
+ 'Data::Dumper' => 0,
+ 'HTTP::Request' => 0,
+ 'HTTP::Response' => 0,
+ 'HTTP::Server::Simple::CGI' => 0,
+ 'HTTP::Status' => 0,
+ 'IO::Select' => 0,
+ 'LWP::UserAgent' => 0,
+ 'Net::HTTP' => 0,
+ 'Net::HTTP::NB' => 0,
+ 'Test::HTTP::Server::Simple' => 0,
+ 'Test::More' => 0,
+ 'Time::HiRes' => 0,
+ 'URI' => 0,
+ 'URI::Escape' => 0,
+ },
+);
21 README
@@ -0,0 +1,21 @@
+HTTP::Async
+
+This module lets you process several HTTP connections at once, in parallel and
+without blocking.
+
+INSTALLATION
+
+To install you can use the following commands:
+
+ perl Makefile.PL
+ make
+ make test
+ make install
+
+
+COPYRIGHT AND LICENCE
+
+Copyright (C) 2006, Edmund von der Burg
+
+This library is free software; you can redistribute it and/or modify
+it under the same terms as Perl itself.
32 TODO
@@ -0,0 +1,32 @@
+TODOs for HTTP::Async
+
+ * Add ability to pass in a file handle that will be used to store the
+ content in. Aimed at people downloading large files that would otherwise
+ fill up the memory. Could be done so that downloads greater than a certain
+ size get sent to file rather than to memory. How to return this in the
+ HTTP::Response though.
+
+ * Do what is needed to get CPANTS tests to pass / run.
+
+ * Make sending non blocking - both the data and the initial DNS lookup.
+
+ * Integrate the changes from Egor - set alarm to catch connection timeout.
+
+ * Change max_redirects to max_redirect to be consistent with LWP::UserAgent
+
+ * Add a max_content_size that will break connections if the content is to
+ big - default is no limit.
+
+ * Switch to using Test::Class so that the tests are faster as they don't
+ spend so much time starting and stopping the test server(s).
+
+ * Change tests so that the port used is chosen so as not to conflict with a
+ port that is already in use.
+
+ * Change the user agent so that if it is not set in the request passed in then
+ it defaults to 'HTTP::Async vx.xx' or some such. Should also be possible to
+ set it in the opts or in the HTTP::Request.
+
+ * Add 'info' sub that can be linked to SIGINFO to provide a summary of what is
+ going on eg "print $async->info( 'terse' )".
+
813 lib/HTTP/Async.pm
@@ -0,0 +1,813 @@
+use strict;
+use warnings;
+
+package HTTP::Async;
+
+our $VERSION = '0.09';
+
+use Carp;
+use Data::Dumper;
+use HTTP::Response;
+use IO::Select;
+use Net::HTTP::NB;
+use Net::HTTP;
+use URI;
+use Time::HiRes qw( time sleep );
+
+=head1 NAME
+
+HTTP::Async - process multiple HTTP requests in parallel without blocking.
+
+=head1 SYNOPSIS
+
+Create an object and add some requests to it:
+
+ use HTTP::Async;
+ my $async = HTTP::Async->new;
+
+ # create some requests and add them to the queue.
+ $async->add( HTTP::Request->new( GET => 'http://www.perl.org/' ) );
+ $async->add( HTTP::Request->new( GET => 'http://www.ecclestoad.co.uk/' ) );
+
+and then EITHER process the responses as they come back:
+
+ while ( my $response = $async->wait_for_next_response ) {
+ # Do some processing with $response
+ }
+
+OR do something else if there is no response ready:
+
+ while ( $async->not_empty ) {
+ if ( my $response = $async->next_response ) {
+ # deal with $response
+ } else {
+ # do something else
+ {
+ }
+
+OR just use the async object to fetch stuff in the background and deal with
+the responses at the end.
+
+ # Do some long code...
+ for ( 1 .. 100 ) {
+ some_function();
+ $async->poke; # lets it check for incoming data.
+ }
+
+ while ( my $response = $async->wait_for_next_response ) {
+ # Do some processing with $response
+ }
+
+=head1 DESCRIPTION
+
+Although using the conventional C<LWP::UserAgent> is fast and easy it does
+have some drawbacks - the code execution blocks until the request has been
+completed and it is only possible to process one request at a time.
+C<HTTP::Async> attempts to address these limitations.
+
+It gives you a 'Async' object that you can add requests to, and then get the
+requests off as they finish. The actual sending and receiving of the requests
+is abstracted. As soon as you add a request it is transmitted, if there are
+too many requests in progress at the moment they are queued. There is no
+concept of starting or stopping - it runs continuously.
+
+Whilst it is waiting to receive data it returns control to the code that
+called it meaning that you can carry out processing whilst fetching data from
+the network. All without forking or threading - it is actually done using
+C<select> lists.
+
+=head1 Default settings:
+
+There are a number of default settings that should be suitable for most uses.
+However in some circumstances you might wish to change these.
+
+ slots: 20
+ timeout: 180 (seconds)
+ max_request_time: 300 (seconds)
+ max_redirects: 7
+ poll_interval: 0.05 (seconds)
+ proxy_host: ''
+ proxy_port: ''
+
+=head1 METHODS
+
+=head2 new
+
+ my $async = HTTP::Async->new( %args );
+
+Creates a new HTTP::Async object and sets it up. Variations from the default
+can be set by passing them in as C<%args>.
+
+=cut
+
+sub new {
+ my $class = shift;
+ my $self = bless {
+
+ opts => {
+ slots => 20,
+ max_redirects => 7,
+ timeout => 180,
+ max_request_time => 300,
+ poll_interval => 0.05,
+ },
+
+ id_opts => {},
+
+ to_send => [],
+ in_progress => {},
+ to_return => [],
+
+ current_id => 0,
+ fileno_to_id => {},
+ }, $class;
+
+ $self->_init(@_);
+
+ return $self;
+}
+
+sub _init {
+ my $self = shift;
+ my %args = @_;
+ $self->_set_opt( $_ => $args{$_} ) for sort keys %args;
+ return $self;
+}
+
+sub _next_id { return ++$_[0]->{current_id} }
+
+=head2 slots, timeout, max_request_time, poll_interval, max_redirects, proxy_host and proxy_port
+
+ $old_value = $async->slots;
+ $new_value = $async->slots( $new_value );
+
+Get/setters for the C<$async> objects config settings. Timeout is for
+inactivity and is in seconds.
+
+Slots is the maximum number of parallel requests to make.
+
+=cut
+
+my %GET_SET_KEYS = map { $_ => 1 } qw( slots poll_interval
+ timeout max_request_time max_redirects
+ proxy_host proxy_port );
+
+sub _add_get_set_key {
+ my $class = shift;
+ my $key = shift;
+ $GET_SET_KEYS{$key} = 1;
+}
+
+sub _get_opt {
+ my $self = shift;
+ my $key = shift;
+ my $id = shift;
+ die "$key not valid for _get_opt" unless $GET_SET_KEYS{$key};
+
+ # If there is an option set for this id then use that, otherwise fall back
+ # to the defaults.
+ return $self->{id_opts}{$id}{$key}
+ if $id && defined $self->{id_opts}{$id}{$key};
+
+ return $self->{opts}{$key};
+
+}
+
+sub _set_opt {
+ my $self = shift;
+ my $key = shift;
+ die "$key not valid for _set_opt" unless $GET_SET_KEYS{$key};
+ $self->{opts}{$key} = shift if @_;
+ return $self->{opts}{$key};
+}
+
+foreach my $key ( keys %GET_SET_KEYS ) {
+ eval "
+ sub $key {
+ my \$self = shift;
+ return scalar \@_
+ ? \$self->_set_opt( '$key', \@_ )
+ : \$self->_get_opt( '$key' );
+ }
+ ";
+}
+
+=head2 add
+
+ my @ids = $async->add(@requests);
+ my $first_id = $async->add(@requests);
+
+Adds requests to the queues. Each request is given an unique integer id (for
+this C<$async>) that can be used to track the requests if needed. If called in
+list context an array of ids is returned, in scalar context the id of the
+first request added is returned.
+
+=cut
+
+sub add {
+ my $self = shift;
+ my @returns = ();
+
+ foreach my $req (@_) {
+ push @returns, $self->add_with_opts( $req, {} );
+ }
+
+ return wantarray ? @returns : $returns[0];
+}
+
+=head2 add_with_opts
+
+ my $id = $async->add_with_opts( $request, \%opts );
+
+This method lets you add a single request to the queue with options that
+differ from the defaults. For example you might wish to set a longer timeout
+or to use a specific proxy. Returns the id of the request.
+
+=cut
+
+sub add_with_opts {
+ my $self = shift;
+ my $req = shift;
+ my $opts = shift;
+ my $id = $self->_next_id;
+
+ push @{ $$self{to_send} }, [ $req, $id ];
+ $self->{id_opts}{$id} = $opts;
+ $self->poke;
+
+ return $id;
+}
+
+=head2 poke
+
+ $async->poke;
+
+At fairly frequent intervals some housekeeping needs to performed - such as
+reading recieved data and starting new requests. Calling C<poke> lets the
+object do this and then return quickly. Usually you will not need to use this
+as most other methods do it for you.
+
+You should use C<poke> if your code is spending time elsewhere (ie not using
+the async object) to allow it to keep the data flowing over the network. If it
+is not used then the buffers may fill up and completed responses will not be
+replaced with new requests.
+
+=cut
+
+sub poke {
+ my $self = shift;
+
+ $self->_process_in_progress;
+ $self->_process_to_send;
+
+ return 1;
+}
+
+=head2 next_response
+
+ my $response = $async->next_response;
+ my ( $response, $id ) = $async->next_response;
+
+Returns the next response (as a L<HTTP::Response> object) that is waiting, or
+returns undef if there is none. In list context it returns a (response, id)
+pair, or an empty list if none. Does not wait for a response so returns very
+quickly.
+
+=cut
+
+sub next_response {
+ my $self = shift;
+ return $self->_next_response(0);
+}
+
+=head2 wait_for_next_response
+
+ my $response = $async->wait_for_next_response( 3.5 );
+ my ( $response, $id ) = $async->wait_for_next_response( 3.5 );
+
+As C<next_response> but only returns if there is a next response or the time
+in seconds passed in has elapsed. If no time is given then it blocks. Whilst
+waiting it checks the queues every c<poll_interval> seconds. The times can be
+fractional seconds.
+
+=cut
+
+sub wait_for_next_response {
+ my $self = shift;
+ my $wait_for = shift;
+
+ $wait_for = $self->max_request_time
+ if !defined $wait_for;
+
+ return $self->_next_response($wait_for);
+}
+
+sub _next_response {
+ my $self = shift;
+ my $wait_for = shift || 0;
+ my $end_time = time + $wait_for;
+ my $resp_and_id = undef;
+
+ while ( !$self->empty ) {
+ $resp_and_id = shift @{ $$self{to_return} };
+
+ # last if we have a response or we have run out of time.
+ last
+ if $resp_and_id
+ || time > $end_time;
+
+ # sleep for the default sleep time.
+ # warn "sleeping for " . $self->poll_interval;
+ sleep $self->poll_interval;
+ }
+
+ # If there is no result return false.
+ return unless $resp_and_id;
+
+ # We have a response - delete the options for it from the store.
+ delete $self->{id_opts}{ $resp_and_id->[1] };
+
+ # If we have a result return list or response depending on
+ # context.
+ return wantarray
+ ? @$resp_and_id
+ : $resp_and_id->[0];
+}
+
+=head2 to_send_count, to_return_count, in_progress_count and total_count
+
+ my $pending = $async->to_send_count;
+
+Returns the number of items in the various stages of processing.
+
+=cut
+
+sub to_send_count { my $s = shift; $s->poke; scalar @{ $$s{to_send} }; }
+sub to_return_count { my $s = shift; $s->poke; scalar @{ $$s{to_return} }; }
+
+sub in_progress_count {
+ my $s = shift;
+ $s->poke;
+ scalar keys %{ $$s{in_progress} };
+}
+
+sub total_count {
+ my $self = shift;
+
+ my $count = 0 #
+ + $self->to_send_count #
+ + $self->in_progress_count #
+ + $self->to_return_count;
+
+ return $count;
+}
+
+=head2 info
+
+ print $async->info;
+
+Prints a line describing what the current state is.
+
+=cut
+
+sub info {
+ my $self = shift;
+
+ return sprintf(
+ "HTTP::Async status: %4u,%4u,%4u (send, progress, return)\n",
+ $self->to_send_count, #
+ $self->in_progress_count, #
+ $self->to_return_count
+ );
+}
+
+=head2 empty, not_empty
+
+ while ( $async->not_empty ) { ...; }
+ while (1) { ...; last if $async->empty; }
+
+Returns true or false depending on whether there are request or responses
+still on the object.
+
+=cut
+
+sub empty {
+ my $self = shift;
+ return $self->total_count ? 0 : 1;
+}
+
+sub not_empty {
+ my $self = shift;
+ return !$self->empty;
+}
+
+=head2 DESTROY
+
+The destroy method croaks if an object is destroyed but is not empty. This is
+to help with debugging.
+
+=cut
+
+sub DESTROY {
+ my $self = shift;
+ my $class = ref $self;
+
+ carp "$class object destroyed but still in use"
+ if $self->total_count;
+
+ carp "$class INTERNAL ERROR: 'id_opts' not empty"
+ if scalar keys %{ $self->{id_opts} };
+
+ return;
+}
+
+# Go through all the values on the select list and check to see if
+# they have been fully received yet.
+
+sub _process_in_progress {
+ my $self = shift;
+ my %seen_ids = ();
+
+ HANDLE:
+ foreach my $s ( $self->_io_select->can_read(0) ) {
+
+ # Get the id and add it to the hash of seen ids so we don't check it
+ # later for errors.
+ my $id = $self->{fileno_to_id}{ $s->fileno }
+ || die "INTERNAL ERROR: could not got id for fileno";
+ $seen_ids{$id}++;
+
+ my $hashref = $$self{in_progress}{$id};
+ my $tmp = $hashref->{tmp} ||= {};
+
+ # warn Dumper $hashref;
+
+ # Check that we have not timed-out.
+ if ( time > $hashref->{timeout_at}
+ || time > $hashref->{finish_by} )
+ {
+
+ # warn sprintf "Timeout: %.3f > %.3f", #
+ # time, $hashref->{timeout_at};
+
+ $self->_add_error_response_to_return(
+ id => $id,
+ code => 504,
+ request => $hashref->{request},
+ previous => $hashref->{previous},
+ content => 'Timed out',
+ );
+
+ $self->_io_select->remove($s);
+ delete $$self{fileno_to_id}{ $s->fileno };
+ next HANDLE;
+ }
+
+ # If there is a code then read the body.
+ if ( $$tmp{code} ) {
+ my $buf;
+ my $n = $s->read_entity_body( $buf, 1024 * 16 ); # 16kB
+ $$tmp{is_complete} = 1 unless $n;
+ $$tmp{content} .= $buf;
+
+ # warn "Received " . length( $buf ) ;
+
+ # warn $buf;
+ }
+
+ # If no code try to read the headers.
+ else {
+ $s->flush;
+
+ my ( $code, $message, %headers );
+
+ eval {
+ ( $code, $message, %headers ) =
+ $s->read_response_headers( laxed => 1, junk_out => [] );
+ };
+
+ if ($@) {
+ $self->_add_error_response_to_return(
+ 'code' => 504,
+ 'content' => $@,
+ 'id' => $id,
+ 'request' => $hashref->{request},
+ 'previous' => $hashref->{previous}
+ );
+ $self->_io_select->remove($s);
+ delete $$self{fileno_to_id}{ $s->fileno };
+ next HANDLE;
+ }
+
+ if ($code) {
+
+ # warn "Got headers: $code $message " . time;
+
+ $$tmp{code} = $code;
+ $$tmp{message} = $message;
+ my @headers_array = map { $_, $headers{$_} } keys %headers;
+ $$tmp{headers} = \@headers_array;
+
+ }
+ }
+
+ # Reset the timeout.
+ $hashref->{timeout_at} = time + $self->_get_opt( 'timeout', $id );
+ # warn "recieved - timeout set to '$hashref->{timeout_at}'";
+
+ # If the message is complete then create a request and add it
+ # to 'to_return';
+ if ( $$tmp{is_complete} ) {
+ delete $$self{fileno_to_id}{ $s->fileno };
+ $self->_io_select->remove($s);
+
+ # warn Dumper $$hashref{content};
+
+ my $response = HTTP::Response->new(
+ @$tmp{ 'code', 'message', 'headers', 'content' } );
+
+ $response->request( $hashref->{request} );
+ $response->previous( $hashref->{previous} ) if $hashref->{previous};
+
+ # If it was a redirect and there are still redirects left
+ # create a new request and unshift it onto the 'to_send'
+ # array.
+ if (
+ $response->is_redirect # is a redirect
+ && $hashref->{redirects_left} > 0 # and we still want to follow
+ && $response->code != 304 # not a 'not modified' reponse
+ )
+ {
+
+ $hashref->{redirects_left}--;
+
+ my $loc = $response->header('Location');
+ my $uri = $response->request->uri;
+
+ warn "Problem: " . Dumper( { loc => $loc, uri => $uri } )
+ unless $uri && ref $uri && $loc && !ref $loc;
+
+ my $url = _make_url_absolute( url => $loc, ref => $uri );
+
+ my $request = HTTP::Request->new( 'GET', $url );
+
+ $self->_send_request( [ $request, $id ] );
+ $hashref->{previous} = $response;
+ }
+ else {
+ $self->_add_to_return_queue( [ $response, $id ] );
+ delete $$self{in_progress}{$id};
+ }
+
+ delete $hashref->{tmp};
+ }
+ }
+
+ # warn Dumper(
+ # {
+ # in_progress => $self->{in_progress},
+ # seen_ids => \%seen_ids,
+ # }
+ # );
+
+ foreach my $id ( keys %{ $self->{in_progress} } ) {
+
+ # skip this one if it was processed above.
+ next if $seen_ids{$id};
+
+ my $hashref = $self->{in_progress}{$id};
+
+ if ( time > $hashref->{timeout_at}
+ || time > $hashref->{finish_by} )
+ {
+
+ # warn Dumper( { hashref => $hashref, now => time } );
+
+ # we have a request that has timed out - handle it
+ $self->_add_error_response_to_return(
+ id => $id,
+ code => 504,
+ request => $hashref->{request},
+ previous => $hashref->{previous},
+ content => 'Timed out',
+ );
+
+ my $s = $hashref->{handle};
+ $self->_io_select->remove($s);
+ delete $$self{fileno_to_id}{ $s->fileno };
+ }
+ }
+
+ return 1;
+}
+
+sub _add_to_return_queue {
+ my $self = shift;
+ my $req_and_id = shift;
+ push @{ $$self{to_return} }, $req_and_id;
+ return 1;
+}
+
+# Add all the items waiting to be sent to 'to_send' up to the 'slots'
+# limit.
+
+sub _process_to_send {
+ my $self = shift;
+
+ while ( scalar @{ $$self{to_send} }
+ && $self->slots > scalar keys %{ $$self{in_progress} } )
+ {
+ $self->_send_request( shift @{ $$self{to_send} } );
+ }
+
+ return 1;
+}
+
+sub _send_request {
+ my $self = shift;
+ my $r_and_id = shift;
+ my ( $request, $id ) = @$r_and_id;
+
+ my $uri = URI->new( $request->uri );
+
+ my %args = ();
+
+ # We need to use a different request_uri for proxied requests. Decide to use
+ # this if a proxy port or host is set.
+ #
+ # http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2
+ $args{Host} = $uri->host;
+ $args{PeerAddr} = $self->_get_opt( 'proxy_host', $id );
+ $args{PeerPort} = $self->_get_opt( 'proxy_port', $id );
+
+ my $request_is_to_proxy =
+ ( $args{PeerAddr} || $args{PeerPort} ) # if either are set...
+ ? 1 # ...then we are a proxy request
+ : 0; # ...otherwise not
+
+ # If we did not get a setting from the proxy then use the uri values.
+ $args{PeerAddr} ||= $uri->host;
+ $args{PeerPort} ||= $uri->port;
+
+ my $s = eval { Net::HTTP::NB->new(%args) };
+
+ # We could not create a request - fake up a 503 response with
+ # error as content.
+ if ( !$s ) {
+
+ $self->_add_error_response_to_return(
+ id => $id,
+ code => 503,
+ request => $request,
+ previous => $$self{in_progress}{$id}{previous},
+ content => $@,
+ );
+
+ return 1;
+ }
+
+ my %headers = %{ $request->{_headers} };
+
+ # Decide what to use as the request_uri
+ my $request_uri = $request_is_to_proxy # is this a proxy request....
+ ? $uri->as_string # ... if so use full url
+ : _strip_host_from_uri($uri); # ...else strip off scheme, host and port
+
+ croak "Could not write request to $uri '$!'"
+ unless $s->write_request( $request->method, $request_uri, %headers,
+ $request->content );
+
+ $self->_io_select->add($s);
+
+ my $time = time;
+ my $entry = $$self{in_progress}{$id} ||= {};
+
+ $$self{fileno_to_id}{ $s->fileno } = $id;
+
+ $entry->{request} = $request;
+ $entry->{started_at} = $time;
+
+
+ $entry->{timeout_at} = $time + $self->_get_opt( 'timeout', $id );
+ # warn "sent - timeout set to '$entry->{timeout_at}'";
+
+ $entry->{finish_by} = $time + $self->_get_opt( 'max_request_time', $id );
+ $entry->{handle} = $s;
+
+ $entry->{redirects_left} = $self->_get_opt( 'max_redirects', $id )
+ unless exists $entry->{redirects_left};
+
+ return 1;
+}
+
+sub _strip_host_from_uri {
+ my $uri = shift;
+
+ my $scheme_and_auth = quotemeta( $uri->scheme . '://' . $uri->authority );
+ my $url = $uri->as_string;
+
+ $url =~ s/^$scheme_and_auth//;
+ $url = "/$url" unless $url =~ m{^/};
+
+ return $url;
+}
+
+sub _io_select {
+ my $self = shift;
+ return $$self{io_select} ||= IO::Select->new();
+}
+
+sub _make_url_absolute {
+ my %args = @_;
+
+ my $in = $args{url};
+ my $ref = $args{ref};
+
+ return $in if $in =~ m{ \A http:// }xms;
+
+ my $ret = $ref->scheme . '://' . $ref->authority;
+ return $ret . $in if $in =~ m{ \A / }xms;
+
+ $ret .= $ref->path;
+ return $ret . $in if $in =~ m{ \A [\?\#\;] }xms;
+
+ $ret =~ s{ [^/]+ \z }{}xms;
+ return $ret . $in;
+}
+
+sub _add_error_response_to_return {
+ my $self = shift;
+ my %args = @_;
+
+ use HTTP::Status;
+
+ my $response =
+ HTTP::Response->new( $args{code}, status_message( $args{code} ),
+ undef, $args{content} );
+
+ $response->request( $args{request} );
+ $response->previous( $args{previous} ) if $args{previous};
+
+ $self->_add_to_return_queue( [ $response, $args{id} ] );
+ delete $$self{in_progress}{ $args{id} };
+
+ return $response;
+
+}
+
+=head1 SEE ALSO
+
+L<HTTP::Async::Polite> - a polite form of this module. Slows the scraping down
+by domain so that the remote server is not overloaded.
+
+=head1 GOTCHAS
+
+The responses may not come back in the same order as the requests were made.
+
+=head1 THANKS
+
+Egor Egorov contributed patches for proxies, catching connections that die
+before headers sent and more.
+
+Tomohiro Ikebe from livedoor.jp submitted patches (and a test) to properly
+handle 304 responses.
+
+=head1 AUTHOR
+
+Edmund von der Burg C<< <evdb@ecclestoad.co.uk> >>.
+
+L<http://www.ecclestoad.co.uk/>
+
+=head1 LICENCE AND COPYRIGHT
+
+Copyright (c) 2006, Edmund von der Burg C<< <evdb@ecclestoad.co.uk> >>.
+All rights reserved.
+
+This module is free software; you can redistribute it and/or modify it under
+the same terms as Perl itself.
+
+=head1 DISCLAIMER OF WARRANTY
+
+BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE
+SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE
+STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE
+SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
+INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
+FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND
+PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE,
+YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
+
+IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY
+COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE
+SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO
+LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR
+THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER
+SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+=cut
+
+1;
+
208 lib/HTTP/Async/Polite.pm
@@ -0,0 +1,208 @@
+use strict;
+use warnings;
+
+package HTTP::Async::Polite;
+use base 'HTTP::Async';
+
+our $VERSION = '0.05';
+
+use Carp;
+use Data::Dumper;
+use Time::HiRes qw( time sleep );
+use URI;
+
+=head1 NAME
+
+HTTP::Async::Polite - politely process multiple HTTP requests
+
+=head1 SYNOPSIS
+
+See L<HTTP::Async> - the usage is unchanged.
+
+=head1 DESCRIPTION
+
+This L<HTTP::Async> module allows you to have many requests going on at once.
+This can be very rude if you are fetching several pages from the same domain.
+This module add limits to the number of simultaneous requests to a given
+domain and adds an interval between the requests.
+
+In all other ways it is identical in use to the original L<HTTP::Async>.
+
+=head1 NEW METHODS
+
+=head2 send_interval
+
+Getter and setter for the C<send_interval> - the time in seconds to leave
+between each request for a given domain. By default this is set to 5 seconds.
+
+=cut
+
+sub send_interval {
+ my $self = shift;
+ return scalar @_
+ ? $self->_set_opt( 'send_interval', @_ )
+ : $self->_get_opt('send_interval');
+}
+
+=head1 OVERLOADED METHODS
+
+These methods are overloaded but otherwise work exactly as the original
+methods did. The docs here just describe what they do differently.
+
+=head2 new
+
+Sets the C<send_interval> value to the default of 5 seconds.
+
+=cut
+
+sub new {
+ my $class = shift;
+
+ my $self = $class->SUPER::new;
+
+ # Set the interval between sends.
+ $self->{opts}{send_interval} = 5; # seconds
+ $class->_add_get_set_key('send_interval');
+
+ $self->_init(@_);
+
+ return $self;
+}
+
+=head2 add_with_opts
+
+Adds the request to the correct queue depending on the domain.
+
+=cut
+
+sub add_with_opts {
+ my $self = shift;
+ my $req = shift;
+ my $opts = shift;
+ my $id = $self->_next_id;
+
+ # Instead of putting this request and opts directly onto the to_send array
+ # instead get the domain and add it to the domain's queue. Store this
+ # domain with the opts so that it is easy to get at.
+ my $uri = URI->new( $req->uri );
+ my $host = $uri->host;
+ my $port = $uri->port;
+ my $domain = "$host:$port";
+ $opts->{_domain} = $domain;
+
+ # Get the domain array - create it if needed.
+ my $domain_arrayref = $self->{domain_stats}{$domain}{to_send} ||= [];
+
+ push @{$domain_arrayref}, [ $req, $id ];
+ $self->{id_opts}{$id} = $opts;
+
+ $self->poke;
+
+ return $id;
+}
+
+=head2 to_send_count
+
+Returns the number of requests waiting to be sent. This is the number in the
+actual queue plus the number in each domain specific queue.
+
+=cut
+
+sub to_send_count {
+ my $self = shift;
+ $self->poke;
+
+ my $count = scalar @{ $$self{to_send} };
+
+ $count += scalar @{ $self->{domain_stats}{$_}{to_send} }
+ for keys %{ $self->{domain_stats} };
+
+ return $count;
+}
+
+sub _process_to_send {
+ my $self = shift;
+
+ # Go through the domain specific queues and add all requests that we can
+ # to the real queue.
+ foreach my $domain ( keys %{ $self->{domain_stats} } ) {
+
+ my $domain_stats = $self->{domain_stats}{$domain};
+ next unless scalar @{ $domain_stats->{to_send} };
+
+ # warn "TRYING TO ADD REQUEST FOR $domain";
+ # warn sleep 5;
+
+ # Check that this request is good to go.
+ next if $domain_stats->{count};
+ next unless time > ( $domain_stats->{next_send} || 0 );
+
+ # We can add this request.
+ $domain_stats->{count}++;
+ push @{ $self->{to_send} }, shift @{ $domain_stats->{to_send} };
+ }
+
+ # Use the original to send the requests on the queue.
+ return $self->SUPER::_process_to_send;
+}
+
+sub _add_to_return_queue {
+ my $self = shift;
+ my $req_and_id = shift;
+
+ # decrement the count for this domain so that another request can start.
+ # Also set the interval so that we don't scrape too fast.
+ my $id = $req_and_id->[1];
+ my $domain = $self->{id_opts}{$id}{_domain};
+ my $domain_stat = $self->{domain_stats}{$domain};
+ my $interval = $self->_get_opt( 'send_interval', $id );
+
+ $domain_stat->{count}--;
+ $domain_stat->{next_send} = time + $interval;
+
+ return $self->SUPER::_add_to_return_queue($req_and_id);
+}
+
+=head1 SEE ALSO
+
+L<HTTP::Async> - the module that this one is based on.
+
+=head1 AUTHOR
+
+Edmund von der Burg C<< <evdb@ecclestoad.co.uk> >>.
+
+L<http://www.ecclestoad.co.uk/>
+
+=head1 LICENCE AND COPYRIGHT
+
+Copyright (c) 2006, Edmund von der Burg C<< <evdb@ecclestoad.co.uk> >>.
+All rights reserved.
+
+This module is free software; you can redistribute it and/or modify it under
+the same terms as Perl itself.
+
+=head1 DISCLAIMER OF WARRANTY
+
+BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE
+SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE
+STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE
+SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
+INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
+FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND
+PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE,
+YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
+
+IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY
+COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE
+SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO
+LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR
+THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER
+SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+=cut
+
+1;
+
144 t/TestServer.pm
@@ -0,0 +1,144 @@
+use strict;
+use warnings;
+
+# Provide a simple server that can be used to test the various bits.
+package TestServer;
+use base qw/Test::HTTP::Server::Simple HTTP::Server::Simple::CGI/;
+
+use Time::HiRes qw(sleep time);
+use Data::Dumper;
+use LWP::UserAgent;
+
+sub handle_request {
+ my ( $self, $cgi ) = @_;
+ my $params = $cgi->Vars;
+
+ # If we are on port 8081 then we are a proxy - we should forward the
+ # requests.
+ return act_as_proxy(@_) if $self->port == 8081;
+
+ # We should act as a final destination server and so expect an absolute URL.
+ my $request_uri = $ENV{REQUEST_URI};
+ if ( $request_uri !~ m!^/! ) {
+ warn "ERROR - not absolute request_uri '$request_uri'";
+ return;
+ }
+
+ # Flush the output so that it goes straight away. Needed for the timeout
+ # trickle tests.
+ $self->stdout_handle->autoflush(1);
+
+ # warn "START REQUEST - " . time;
+ # warn Dumper($params);
+
+ # Do the right thing depending on what is asked of us.
+ if ( exists $params->{redirect} ) {
+ my $num = $params->{redirect} || 0;
+ $num--;
+
+ if ( $num > 0 ) {
+ print $cgi->redirect( -uri => "?redirect=$num", -nph => 1, );
+ print "You are being redirected...";
+ }
+ else {
+ print $cgi->header( -nph => 1 );
+ print "No longer redirecting";
+ }
+ }
+
+ elsif ( exists $params->{delay} ) {
+ sleep( $params->{delay} );
+ print $cgi->header( -nph => 1 );
+ print "Delayed for '$params->{delay}'.\n";
+ }
+
+ elsif ( exists $params->{trickle} ) {
+
+ print $cgi->header( -nph => 1 );
+
+ my $trickle_for = $params->{trickle};
+ my $finish_at = time + $trickle_for;
+
+ local $| = 1;
+
+ while ( time <= $finish_at ) {
+ print time . " trickle $$\n";
+ sleep 0.1;
+ }
+
+ print "Trickled for '$trickle_for'.\n";
+ }
+
+ elsif ( exists $params->{bad_header} ) {
+ my $headers = $cgi->header( -nph => 1, );
+
+ # trim trailing whitspace to single newline.
+ $headers =~ s{ \s* \z }{\n}xms;
+
+ # Add a bad header:
+ $headers .= "Bad header: BANG!\n";
+
+ print $headers . "\n\n";
+ print "Produced some bad headers.";
+ }
+
+ elsif ( my $when = $params->{break_connection} ) {
+
+ for (1) {
+ last if $when eq 'before_headers';
+ print $cgi->header( -nph => 1 );
+
+ last if $when eq 'before_content';
+ print "content\n";
+ }
+ }
+
+ elsif ( my $id = $params->{set_time} ) {
+ my $now = time;
+ print $cgi->header( -nph => 1 );
+ print "$id\n$now\n";
+ }
+
+ elsif ( exists $params->{not_modified} ) {
+ my $last_modified = HTTP::Date::time2str( time - 60 * 60 * 24 );
+ print $cgi->header(
+ -status => '304',
+ -nph => 1,
+ 'Last-Modified' => $last_modified,
+ );
+ print "content\n";
+ }
+
+ else {
+ warn "DON'T KNOW WHAT TO DO: " . Dumper $params;
+ }
+
+ # warn "STOP REQUEST - " . time;
+
+}
+
+sub act_as_proxy {
+ my ( $self, $cgi ) = @_;
+
+ my $request_uri = $ENV{REQUEST_URI};
+
+ # According to the RFC the request_uri must be fully qualified if the
+ # request is to a proxy and absolute if it is to a destination server. CHeck
+ # that this is the case.
+ #
+ # http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2
+ if ( $request_uri !~ m!^http://! ) {
+ warn "ERROR - not fully qualified request_uri '$request_uri'";
+ return;
+ }
+
+ my $response = LWP::UserAgent->new( max_redirect => 0 )->get($request_uri);
+
+ # Add a header so that we know that this was proxied.
+ $response->header( WasProxied => 'yes' );
+
+ print $response->as_string;
+ return 1;
+}
+
+1;
25 t/bad-connections.t
@@ -0,0 +1,25 @@
+
+use strict;
+use warnings;
+
+use Test::More tests => 5;
+use HTTP::Request;
+
+require 't/TestServer.pm';
+my $s = TestServer->new;
+my $url_root = $s->started_ok("starting a test server");
+
+use HTTP::Async;
+my $q = HTTP::Async->new;
+
+my %tests = (
+ "$url_root/foo/bar?break_connection=before_headers" => 504,
+ "$url_root/foo/bar?break_connection=before_content" => 200,
+);
+
+while ( my ( $url, $code ) = each %tests ) {
+ my $req = HTTP::Request->new( 'GET', $url );
+ ok $q->add($req), "Added request to the queue - $url";
+ my $res = $q->wait_for_next_response;
+ is $res->code, $code, "Got a '$code' response";
+}
26 t/bad-headers.t
@@ -0,0 +1,26 @@
+
+use strict;
+use warnings;
+
+use Test::More tests => 3;
+use HTTP::Request;
+
+require 't/TestServer.pm';
+my $s = TestServer->new;
+my $url_root = $s->started_ok("starting a test server");
+
+use HTTP::Async;
+my $q = HTTP::Async->new;
+
+# Check that a couple of redirects work.
+my $url = "$url_root/foo/bar?bad_header=1";
+
+# warn $url;
+# getc;
+
+my $req = HTTP::Request->new( 'GET', $url );
+ok $q->add($req), "Added request to the queue";
+$q->poke while !$q->to_return_count;
+
+my $res = $q->next_response;
+is $res->code, 200, "Got a response";
28 t/bad-hosts.t
@@ -0,0 +1,28 @@
+
+use strict;
+use warnings;
+
+use Test::More tests => 9;
+
+use HTTP::Request;
+
+use HTTP::Async;
+my $q = HTTP::Async->new;
+
+# Try to add some requests for bad hosts. HTTP::Async should not fail
+# but should return HTTP::Responses with the correct status code etc.
+
+my @bad_requests =
+ map { HTTP::Request->new( GET => $_ ) }
+ ( 'http://i.dont.exist/foo/bar', 'ftp://wrong.protocol.com/foo/bar' );
+
+ok $q->add(@bad_requests), "Added bad requests";
+
+while ( $q->not_empty ) {
+ my $res = $q->next_response || next;
+
+ isa_ok $res, 'HTTP::Response', "Got a proper response";
+ ok !$res->is_success, "Response was not a success";
+ ok $res->is_error, "Response was an error";
+ ok $res->request, "response has a request attached.";
+}
66 t/dead-connection.t
@@ -0,0 +1,66 @@
+# Hello Edmund,
+#
+# Thanks for HTTP::Async! I have a question about it, that I cannot figure out
+# myself. I'm playing with HTTP::Async in various corner cases, and there's one
+# particular error I'm getting:
+#
+# HTTP::Async object destroyed but still in use at a.pl line 0
+# HTTP::Async INTERNAL ERROR: 'id_opts' not empty at a.pl line 0
+#
+# and the code is
+
+use strict;
+use warnings;
+use HTTP::Async;
+use HTTP::Request;
+use IO::Socket::INET;
+use Time::HiRes;
+
+use Test::More tests => 10;
+
+my $port = 9999;
+my $abort_period = 3;
+
+foreach my $arg_key (qw(timeout max_request_time)) {
+
+ # open a socket that will accept connections but never respond
+ my $sock = IO::Socket::INET->new(
+ Listen => 5,
+ LocalAddr => 'localhost',
+ LocalPort => $port,
+ Proto => 'tcp'
+ ) || die "Could not open a socket on port '$port' - maybe in use?";
+ ok $sock, "opened socket on port '$port'";
+
+ my $async = HTTP::Async->new( $arg_key => $abort_period );
+ ok $async, "creating async using $arg_key => $abort_period";
+
+ my $req = HTTP::Request->new( GET => "http://localhost:$port/" );
+ my $id = $async->add($req);
+ ok $id, "Added request, given id '$id'";
+
+ # set up time started and when it should end. Add one second to be generous.
+ my $added_time = time;
+ my $should_end_time = $added_time + $abort_period + 1;
+
+ my $res = undef;
+
+ while (!$res) {
+ $res = $async->wait_for_next_response(1);
+
+ # Check that we have not been waiting too long.
+ last if time > $should_end_time;
+ }
+
+ ok $res, "got a response";
+ is $res->code, 504, "got faked up timeout response";
+}
+
+# I expected that $response should be defined and contain a fake 504 error.
+# It's either I'm doing something wrong or ..?
+#
+#
+# --
+# Sincerely,
+#
+# Dmitry Karasik
24 t/make-url-absolute.t
@@ -0,0 +1,24 @@
+
+use strict;
+use warnings;
+
+use Test::More tests => 4;
+
+use HTTP::Async;
+use URI;
+
+my $full_url = URI->new('http://www.test.com:8080/foo/bar?baz=bundy');
+
+my @tests = (
+ 'http://www.test.com:8080/foo/bar?baz=bundy', '/foo/bar?baz=bundy',
+ 'bar?baz=bundy', '?baz=bundy',
+);
+
+foreach my $test (@tests) {
+ my $url = HTTP::Async::_make_url_absolute(
+ url => $test,
+ ref => $full_url,
+ );
+
+ is "$url", "$full_url", "$test -> $full_url";
+}
29 t/not_modified.t
@@ -0,0 +1,29 @@
+use strict;
+use warnings;
+
+use Test::More tests => 4;
+use HTTP::Request;
+use HTTP::Async;
+
+require 't/TestServer.pm';
+
+my $s = TestServer->new;
+my $url_root = $s->started_ok("starting a test server");
+
+my $q = HTTP::Async->new;
+
+{
+ my $url = "$url_root/?not_modified=1";
+
+ my $req = HTTP::Request->new( 'GET', $url );
+ ok $q->add($req), "Added request to the queue";
+ my $res = $q->wait_for_next_response;
+
+ # use Data::Dumper;
+ # warn Dumper $res;
+
+ is $res->code, 304, "304 Not modified";
+ ok !$res->previous, "does not have a previous reponse";
+}
+
+1;
14 t/pod-coverage.t
@@ -0,0 +1,14 @@
+use strict;
+use warnings;
+
+use Test::More;
+
+eval "use Test::Pod::Coverage 1.00;";
+plan skip_all => "Test::Pod::Coverage > 1.00 required" if $@;
+
+if ( $] >= 5.009 ) {
+ eval "use Pod::Coverage 0.19;";
+ plan skip_all => "Pod::Coverage >= 0.19 required for perls >= 5.9" if $@;
+}
+
+all_pod_coverage_ok();
7 t/pod.t
@@ -0,0 +1,7 @@
+use strict;
+use warnings;
+
+use Test::More;
+eval "use Test::Pod 1.00";
+plan skip_all => "Test::Pod 1.00 required for testing POD" if $@;
+all_pod_files_ok();
87 t/polite.t
@@ -0,0 +1,87 @@
+use strict;
+use warnings;
+
+use Test::More tests => 14;
+use HTTP::Request;
+use Data::Dumper;
+
+use HTTP::Async::Polite;
+my $q = HTTP::Async::Polite->new;
+
+# Check that we can set and get the interval.
+is $q->send_interval, 5, "default interval is 5 seconds";
+ok $q->send_interval(3), "change interval to 3 seconds";
+is $q->send_interval, 3, "new interval is 3 seconds";
+
+require 't/TestServer.pm';
+
+my @servers = map { TestServer->new($_) } 80800 .. 80801;
+my @url_roots = ();
+
+foreach my $s (@servers) {
+ push @url_roots, $s->started_ok("starting a test server");
+}
+
+# Fire off three requests to two different servers. Check that the correct
+# interval is observed between each request and that the two different servers
+# were scaped in parallel. Also add another request so that the lists are not
+# balanced.
+my @urls =
+ map {
+ my $url_root = $_;
+ my ($port) = $url_root =~ m/\d+$/g;
+ my $number = $_ eq $url_roots[0] ? 3 : 4;
+ my @ret = map { "$url_root/?set_time=$port-$_" } 1 .. $number;
+ @ret;
+ } @url_roots;
+
+my @requests = map { HTTP::Request->new( GET => $_ ) } @urls;
+ok $q->add(@requests), "Add the requests";
+
+is $q->to_send_count, 5, "Got correct to_send count";
+is $q->total_count, 7, "Got correct total count";
+
+# Get all the responses.
+my @responses = ();
+while ( my $res = $q->wait_for_next_response ) {
+ push @responses, $res;
+}
+
+is scalar(@responses), 7, "got six responses back";
+
+# Extract the url and the timestamp from the responses;
+my %data = ();
+foreach my $res (@responses) {
+ my ( $id, $timestamp ) = split /\n/, $res->content, 2;
+ my ( $port, $number ) = split /-/, $id, 2;
+
+ # Skip if the number is greater than 3 - extra req to test unbalanced list
+ next if $number > 3;
+
+ s/\s+//g for $port, $number, $timestamp;
+ $data{$port}{$number} = $timestamp;
+}
+
+# diag Dumper \%data;
+
+# Check that the requests did not come too close together.
+my @first_times = ();
+foreach my $port ( sort keys %data ) {
+
+ my @times = sort { $a <=> $b } values %{ $data{$port} };
+
+ my $last_time = shift @times;
+ push @first_times, $last_time;
+
+ foreach my $time (@times) {
+
+ cmp_ok $time - $last_time, ">", 3,
+ "at least three seconds between requests to same domain";
+
+ $last_time = $time;
+ }
+}
+
+# check that the first two requests were near each other.
+cmp_ok abs( $first_times[0] - $first_times[1] ), "<", 1,
+ "at most 1 second between first two requests";
85 t/poll-interval.t
@@ -0,0 +1,85 @@
+
+use strict;
+use warnings;
+
+use Test::More tests => 24;
+use HTTP::Request;
+use Time::HiRes 'time';
+
+BEGIN {
+ require 't/test_utils.pl';
+}
+
+require 't/TestServer.pm';
+my $s = TestServer->new;
+my $url_root = $s->started_ok("starting a test server");
+
+use HTTP::Async;
+my $q = HTTP::Async->new;
+
+# Send off a long request - check that next_response returns at once
+# but that wait_for_next_response returns only when the response has arrived.
+
+# Check that the poll interval is at a sensible default.
+is $q->poll_interval, 0.05, "\$q->poll_interval == 0.05";
+
+# Check that the poll interval is changeable.
+is $q->poll_interval(0.1), 0.1, "set poll_interval to 0.1";
+is $q->poll_interval, 0.1, "\$q->poll_interval == 0.1";
+
+{
+
+ # Get the time since the request was made.
+ reset_timer();
+
+ my $url = "$url_root?delay=3";
+ my $req = HTTP::Request->new( 'GET', $url );
+ ok $q->add($req), "Added request to the queue - $url";
+
+ # Does next_response return immediately
+ ok !$q->next_response, "next_response returns at once";
+ delay_lt_ok 0.4, "Returned quickly (less than 0.4 secs)";
+
+ ok !$q->wait_for_next_response(0),
+ "wait_for_next_response(0) returns at once";
+ delay_lt_ok 0.4, "Returned quickly (less than 0.4 secs)";
+
+ ok !$q->wait_for_next_response(1),
+ "wait_for_next_response(1) returns after 1 sec without a response";
+
+ delay_ge_ok 1, "Returned after 1 sec delay";
+ delay_lt_ok 1.4, "Returned before 1.4 sec delay";
+
+ my $response = $q->wait_for_next_response();
+ ok $response, "wait_for_next_response got the response";
+ delay_gt_ok 3, "Returned after 3 sec delay";
+
+ is $response->code, 200, "good response (200)";
+ ok $response->is_success, "is a success";
+}
+
+{
+ reset_timer();
+
+ my $url = "$url_root?delay=1";
+ my $req = HTTP::Request->new( 'GET', $url );
+ ok $q->add($req), "Added request to the queue - $url";
+
+ my $response = $q->wait_for_next_response;
+
+ ok $response, "wait_for_next_response got the response";
+
+ delay_gt_ok 1, "Returned after 1 sec delay";
+ delay_lt_ok 2, "Returned before 2 sec delay";
+
+ is $response->code, 200, "good response (200)";
+ ok $response->is_success, "is a success";
+}
+
+{ # Check that wait_for_next_response does not hang if there is nothing
+ # to wait for.
+ reset_timer();
+ ok !$q->wait_for_next_response, "Did not get a response";
+ delay_lt_ok 1, "Returned in less than 1 sec";
+}
+
50 t/proxy.t
@@ -0,0 +1,50 @@
+
+use strict;
+use warnings;
+use URI::Escape;
+
+use Test::More tests => 16;
+use HTTP::Request;
+
+require 't/TestServer.pm';
+my $s1 = TestServer->new(8080);
+my $s1_url_root = $s1->started_ok("starting a test server");
+
+my $s2 = TestServer->new(8081);
+my $s2_url_root = $s2->started_ok("starting a test server");
+
+ok( $_, "got $_" ) for $s1_url_root, $s2_url_root;
+
+my %tests = (
+ "$s1_url_root/foo/bar?redirect=2" => 200,
+ "$s1_url_root/foo/bar?delay=1" => 200,
+);
+
+use HTTP::Async;
+my $q = HTTP::Async->new;
+
+foreach my $via_proxy ( 0, 1 ) {
+
+ while ( my ( $url, $code ) = each %tests ) {
+
+ my $req = HTTP::Request->new( 'GET', $url );
+
+ my %opts = ( proxy_host => '127.0.0.1', proxy_port => 8081, );
+
+ my $id =
+ $via_proxy
+ ? $q->add_with_opts( $req, \%opts )
+ : $q->add($req);
+
+ ok $id, "Added request to the queue - $url";
+
+ my $res = $q->wait_for_next_response;
+ is( $res->code, $code, "Got a '$code' response" )
+ || warn $res->as_string;
+
+ # check that the proxy header was found if this was a proxy request.
+ my $proxy_header = $res->header('WasProxied') || '';
+ my $expected = $via_proxy ? 'yes' : '';
+ is $proxy_header, $expected, "check for proxy header '$expected'";
+ }
+}
51 t/real-servers.t
@@ -0,0 +1,51 @@
+use strict;
+use warnings;
+
+use Test::More;
+
+plan skip_all => "enable these tests by setting REAL_SERVERS"
+ unless $ENV{REAL_SERVERS};
+
+use HTTP::Request;
+use Time::HiRes 'usleep';
+
+# Create requests for a few well known sites.
+my @requests =
+ map { HTTP::Request->new( GET => "http://www.$_" ) }
+ sort qw( google.com yahoo.com ecclestoad.co.uk );
+
+my $tests_per_request = 4;
+plan tests => 3 + $tests_per_request * scalar @requests;
+
+use_ok 'HTTP::Async';
+
+my $q = HTTP::Async->new;
+isa_ok $q, 'HTTP::Async';
+
+# Put all of these onto the queue.
+ok( $q->add($_), "Added request for " . $_->uri ) for @requests;
+
+# Process the queue until they all complete.
+my @responses = ();
+
+while ( $q->not_empty ) {
+
+ my $res = $q->next_response;
+ if ($res) {
+ pass "Got the response from " . $res->request->uri;
+ push @responses, $res;
+ }
+ else {
+ usleep( 1_000_000 * 0.1 ); # 0.1 seconds
+ next;
+ }
+
+ ok $res->is_success, "is success";
+}
+
+# Check that we got the number needed and that all the responses are
+# HTTP::Response objects.
+is scalar @responses, scalar @requests, "Got the expected number of responses";
+isa_ok( $_, 'HTTP::Response', "Got a HTTP::Response object" ) for @responses;
+
+# print $_->content for @responses;
81 t/redirects.t
@@ -0,0 +1,81 @@
+
+use strict;
+use warnings;
+
+use Test::More tests => 21;
+use HTTP::Request;
+
+require 't/TestServer.pm';
+my $s = TestServer->new;
+my $url_root = $s->started_ok("starting a test server");
+
+use HTTP::Async;
+my $q = HTTP::Async->new;
+
+# Check that the max_redirects is at a sensible level.
+is $q->max_redirects, 7, "max_redirects == 7";
+
+# Send a request to somewhere that will redirect a certain number of
+# times:
+#
+# ?redirect=$num - if $num is > 0 then it redirects to $num - 1;
+
+{ # Check that a couple of redirects work.
+ my $url = "$url_root/foo/bar?redirect=3";
+
+ # warn $url;
+ # getc;
+
+ my $req = HTTP::Request->new( 'GET', $url );
+ ok $q->add($req), "Added request to the queue";
+ $q->poke while !$q->to_return_count;
+
+ my $res = $q->next_response;
+ is $res->code, 200, "No longer a redirect";
+ ok $res->previous, "Has a previous reponse";
+ is $res->previous->code, 302, "previous request was a redirect";
+}
+
+{ # check that 20 redirects stop after the expected number.
+ my $url = "$url_root?redirect=20";
+ my $req = HTTP::Request->new( 'GET', $url );
+ ok $q->add($req), "Added request to the queue";
+ $q->poke while !$q->to_return_count;
+
+ my $res = $q->next_response;
+ is $res->code, 302, "Still a redirect";
+ ok $res->previous, "Has a previous reponse";
+ is $res->previous->code, 302, "previous request was a redirect";
+ is $res->request->uri->as_string, 'http://localhost:8080?redirect=13',
+ "last request url correct";
+}
+
+{ # Set the max_redirect higher and try again.
+
+ ok $q->max_redirects(30), "Set the max_redirects higher.";
+
+ my $url = "$url_root?redirect=20";
+ my $req = HTTP::Request->new( 'GET', $url );
+ ok $q->add($req), "Added request to the queue";
+ $q->poke while !$q->to_return_count;
+
+ my $res = $q->next_response;
+ is $res->code, 200, "No longer a redirect";
+ ok $res->previous, "Has a previous reponse";
+ is $res->previous->code, 302, "previous request was a redirect";
+}
+
+{ # Set the max_redirect to zero and check that none happen.
+
+ is $q->max_redirects(0), 0, "Set the max_redirects to zero.";
+ is $q->max_redirects, 0, "max_redirects is set to zero.";
+
+ my $url = "$url_root?redirect=20";
+ my $req = HTTP::Request->new( 'GET', $url );
+ ok $q->add($req), "Added request to the queue";
+ $q->poke while !$q->to_return_count;
+
+ my $res = $q->next_response;
+ is $res->code, 302, "No longer a redirect";
+ ok !$res->previous, "Have no previous reponse";
+}
21 t/setup.t
@@ -0,0 +1,21 @@
+
+use strict;
+use warnings;
+
+use Test::More tests => 16;
+
+use HTTP::Async;
+use HTTP::Async::Polite;
+
+foreach my $class ( 'HTTP::Async', 'HTTP::Async::Polite' ) {
+ foreach my $number ( 0, 3 ) {
+
+ my $q1 = $class->new;
+ is $q1->max_redirects($number), $number, "set to $number";
+ is $q1->max_redirects, $number, "got $number";
+
+ my $q2 = $class->new( max_redirects => $number );
+ ok $q2, "created object";
+ is $q2->max_redirects, $number, "got $number";
+ }
+}
28 t/strip_host_from_uri.t
@@ -0,0 +1,28 @@
+use strict;
+use warnings;
+
+use Test::More;
+use HTTP::Async;
+use URI;
+
+my %tests = (
+'http://www.w3.org:8080/Protocols/rfc2616/rfc2616-sec5.html?foo=bar#sec5.1.2'
+ => '/Protocols/rfc2616/rfc2616-sec5.html?foo=bar#sec5.1.2',
+
+ 'http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html?foo=bar#sec5.1.2' =>
+ '/Protocols/rfc2616/rfc2616-sec5.html?foo=bar#sec5.1.2',
+
+ 'https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html?foo=bar#sec5.1.2' =>
+ '/Protocols/rfc2616/rfc2616-sec5.html?foo=bar#sec5.1.2',
+
+ 'https://www.w3.org:80/Protocols' => '/Protocols',
+
+ 'http://localhost:8080?delay=3' => '/?delay=3'
+);
+
+plan tests => scalar keys %tests;
+
+while ( my ( $in, $expected ) = each %tests ) {
+ my $out = HTTP::Async::_strip_host_from_uri( URI->new($in) );
+ is $out, $expected, "correctly stripped $in to $out";
+}
20 t/template.t
@@ -0,0 +1,20 @@
+
+use strict;
+use warnings;
+
+use Test::More skip_all => 'just a template to base other tests on';
+
+use Test::More tests => 5;
+
+use HTTP::Async;
+my $q = HTTP::Async->new;
+
+require 't/TestServer.pm';
+
+# my $s = TestServer->new;
+# my $url_root = $s->started_ok("starting a test server");
+
+my @servers = map { TestServer->new($_) } 80800 .. 80804;
+foreach my $s (@servers) {
+ my $url_root = $s->started_ok("starting a test server");
+}
26 t/test_utils.pl
@@ -0,0 +1,26 @@
+use strict;
+use warnings;
+
+use Time::HiRes qw(time);
+
+{
+ my $start_time = undef;
+
+ sub reset_timer { return $start_time = time; }
+
+ sub delay_lt_ok ($$) { return delay_ok( '<', @_ ); }
+ sub delay_le_ok ($$) { return delay_ok( '<=', @_ ); }
+ sub delay_ge_ok ($$) { return delay_ok( '>=', @_ ); }
+ sub delay_gt_ok ($$) { return delay_ok( '>', @_ ); }
+
+ sub delay_ok ($$$) {
+ my ( $cmp, $delay, $message ) = @_;
+
+ my $timer = time - $start_time;
+
+ my $display_test = sprintf '%.2f %s %.2f', $timer, $cmp, $delay;
+ return cmp_ok $timer, $cmp, $delay, "$message ($display_test)";
+ }
+}
+
+1;
67 t/timeout.t
@@ -0,0 +1,67 @@
+
+use strict;
+use warnings;
+
+use Test::More tests => 20;
+use HTTP::Request;
+
+require 't/TestServer.pm';
+my $s = TestServer->new;
+my $url_root = $s->started_ok("starting a test server");
+
+use HTTP::Async;
+my $q = HTTP::Async->new;
+
+# Check that the timeout is at a sensible default.
+is $q->timeout, 180, "\$q->timeout == 180";
+
+{ # Send a request that should return quickly
+ my $url = "$url_root?delay=0";
+ my $req = HTTP::Request->new( 'GET', $url );
+ ok $q->add($req), "Added request to the queue - $url";
+ my $res = $q->wait_for_next_response;
+ is $res->code, 200, "Not timed out (200)";
+}
+
+is $q->timeout(2), 2, "Set the timeout really low";
+
+{ # Send a request that should timeout
+ my $url = "$url_root?delay=3";
+ my $req = HTTP::Request->new( 'GET', $url );
+ ok $q->add($req), "Added delayed request to the queue - $url";
+ my $res = $q->wait_for_next_response;
+ is $res->code, 504, "timed out (504)";
+ ok $res->is_error, "is an error";
+}
+
+{ # Send a request that should not timeout as it is trickling back data.
+ my $url = "$url_root?trickle=4";
+ my $req = HTTP::Request->new( 'GET', $url );
+ ok $q->add($req), "Added trickle request to the queue - $url";
+ my $res = $q->wait_for_next_response;
+ is $res->code, 200, "response ok (200)";
+ ok !$res->is_error, "is not an error";
+}
+
+is $q->timeout(1), 1, "Set the timeout really low";
+is $q->max_request_time(1), 1, "Set the max_request_time really low";
+
+{ # Send a request that should timeout despite trickling back data.
+ my $url = "$url_root?trickle=3";
+ my $req = HTTP::Request->new( 'GET', $url );
+ ok $q->add($req), "Added trickle request to the queue - $url";
+ my $res = $q->wait_for_next_response;
+ is $res->code, 504, "timed out (504)";
+ ok $res->is_error, "is an error";
+}
+
+is $q->timeout(10), 10, "Lengthen the timeout";
+is $q->max_request_time(300), 300, "Lengthen the max_request_time";
+
+{ # Send same request that should now be ok
+ my $url = "$url_root?delay=3";
+ my $req = HTTP::Request->new( 'GET', $url );
+ ok $q->add($req), "Added delayed request to the queue - $url";
+ my $res = $q->wait_for_next_response;
+ is $res->code, 200, "Not timed out (200)";
+}
Please sign in to comment.
Something went wrong with that request. Please try again.