Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow size limit of data read #147

Merged
merged 3 commits into from
Oct 2, 2015
Merged

Allow size limit of data read #147

merged 3 commits into from
Oct 2, 2015

Conversation

ozh
Copy link
Collaborator

@ozh ozh commented Apr 17, 2015

This PR allows to fetch the XXX first bytes of a remote URL

Changes:

  • New option: response_byte_limit to limit response body (integer | false)
  • New constant: CHUNK, used to define the buffer chunk size, that was hardcoded in the socket transport class as 1160 and is now used also in the cURL class
  • New test that passes with both transport

Key concept with cURL:

<?php
$url  = 'http://httpbin.org/';
$html ='';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);

// These 3 settings
curl_setopt($ch, CURLOPT_BUFFERSIZE, 10); // smaller chunks = more progress checks
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_WRITEFUNCTION, 'write_function');

// callback : returns either strlen of data read (and continues) or 0 and halts with an error
function write_function($handle, $data) {
    global $html;
    $html .= $data;
    if (strlen($html) > 100) { // max size we want + one buffer size
        return 0;
    }
    else {
        return strlen($data);
    }
}


$res = curl_exec($ch);
curl_close($ch);
var_dump( $res, $html );
// $res = false, because we caused an error
// $html = the 100 first byte of the URL response

Key concept with sockets is obvious, instead of reading till feof($fp) you check the size of downloaded stuff so far.

@ozh
Copy link
Collaborator Author

ozh commented Apr 17, 2015

For the record and posterity : curl has CURLOPT_RANGE, or could use CURLOPT_HTTPHEADER, array("Range: bytes=0-100")to read from a specified range, but it's not 100% compatible with every hosts (see there and comments below that)

@coveralls
Copy link

Coverage Status

Coverage decreased (-2.37%) to 85.24% when pulling 429f6c8 on ozh:maxlen into ea37f1e on rmccue:master.

@ozh
Copy link
Collaborator Author

ozh commented Apr 17, 2015

Yeah unit test time out.

ozh added 2 commits April 18, 2015 15:02
This makes more sense and will help understand the code in the future
Conflicts:
	library/Requests/Transport/cURL.php
*
* @var integer
*/
const CHUNK = 1160;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably call this BUFFER_SIZE or similar; chunk might get confusing when dealing with HTTP/1.1 chunked responses.

@rmccue
Copy link
Collaborator

rmccue commented Apr 27, 2015

New constant: CHUNK, used to define the buffer chunk size, that was hardcoded in the socket transport class as 1160 and is now used also in the cURL class

👍

New option: response_byte_limit to limit response body (integer | false)
New test that passes with both transport

I wonder if this is the right way to do it. Would we be better off supporting range requests instead? What's the original motivation for your change here? :)

(In addition, if we properly support range requests, we'd allow resuming requests properly.)

@rmccue rmccue added this to the 1.7 milestone Apr 27, 2015
@ozh
Copy link
Collaborator Author

ozh commented Apr 27, 2015

It seems that range requests aren't 100% supported by servers and you cannot test beforehand if they are -- see for instance http://stackoverflow.com/questions/985455/curl-how-to-limit-size-of-get/985471#985471 and comments below

@ozh
Copy link
Collaborator Author

ozh commented Apr 29, 2015

@rmccue Can you trigger the unit tests to restart on Travis for this PR? They timed out after indicating one failure but I'm not sure it's related to that test

Also: thoughts on using range requests vs my more compatible approach?

@rmccue
Copy link
Collaborator

rmccue commented Oct 2, 2015

We need this for rmccue/Requests-WPHTTP#1 - thanks so much for making a start here 💃

I'm going to fork this so I can merge master into it.

@rmccue rmccue mentioned this pull request Oct 2, 2015
@rmccue rmccue merged commit 87734fd into WordPress:master Oct 2, 2015
@rmccue rmccue mentioned this pull request Oct 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants