Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd "Make a partial download with HTTP range headers" example #291 #309
Conversation
budziq
requested changes
Oct 4, 2017
| let url = Url::parse("https://httpbin.org/range/1024?duration=2")?; | ||
| let response = client.head(url.to_owned())?.send()?; | ||
| if let Some(range) = response.headers().get::<ContentRange>() { |
This comment has been minimized.
This comment has been minimized.
budziq
Oct 4, 2017
Collaborator
instead of if let i would suggest using ok_or combinator to convert into a Result and immediately ?
| @@ -19,6 +19,7 @@ | |||
| | [Extract all links from a webpage HTML][ex-extract-links-webpage] | [![reqwest-badge]][reqwest] [![select-badge]][select] | [![cat-net-badge]][cat-net] | | |||
| | [Check webpage for broken links][ex-check-broken-links] | [![reqwest-badge]][reqwest] [![select-badge]][select] [![url-badge]][url] | [![cat-net-badge]][cat-net] | | |||
| | [Extract all unique links from a MediaWiki markup][ex-extract-mediawiki-links] | [![reqwest-badge]][reqwest] [![regex-badge]][regex] | [![cat-net-badge]][cat-net] | | |||
| | [Make a partial download with HTTP range headers][ex-progress-with-range] | [![reqwest-badge]][reqwest] [![regex-badge]][regex] | [![cat-net-badge]][cat-net] | | |||
This comment has been minimized.
This comment has been minimized.
| # } | ||
| # } | ||
| const TOTAL_CHUNKS: u64 = 20; |
This comment has been minimized.
This comment has been minimized.
budziq
Oct 4, 2017
Collaborator
i would suggest using fixed buffer size instead of chunk number. Otherwise things are going to get weird with small content sizes and there is no reason to request exactly n times.
| use reqwest::header::{Range, ContentRange, ContentRangeSpec}; | ||
| use url::Url; | ||
| use std::io::{Read, Write}; |
This comment has been minimized.
This comment has been minimized.
| if let Some(range) = response.headers().get::<ContentRange>() { | ||
| if let ContentRangeSpec::Bytes { instance_length, .. } = **range { | ||
| let content_length = instance_length.unwrap_or(0); |
This comment has been minimized.
This comment has been minimized.
budziq
Oct 4, 2017
Collaborator
we can drop the if let and unwrap_or here
with something like:
let content_length = match **range {
ContentRangeSpec::Bytes { instance_length: Some(len), .. } => len,
_ => bail!("response doesn't include the expected ranges"),
};| } | ||
| assert_eq!(content_length as usize, content.len()); | ||
| println!("finished with success!"); |
This comment has been minimized.
This comment has been minimized.
| const TOTAL_CHUNKS: u64 = 20; | ||
| fn run() -> Result<()> { | ||
| let client = reqwest::Client::new()?; |
This comment has been minimized.
This comment has been minimized.
budziq
Oct 4, 2017
Collaborator
we have just bumped to reqwest 0.8 which uses Result more sparingly so this will not compile now/ the same for head and get calls below
| let client = reqwest::Client::new()?; | ||
| let url = Url::parse("https://httpbin.org/range/1024?duration=2")?; | ||
| let response = client.head(url.to_owned())?.send()?; |
This comment has been minimized.
This comment has been minimized.
budziq
Oct 4, 2017
Collaborator
reqwest uses Url::parse internally so basically below will work
let url = "https://httpbin.org/range/1024?duration=2";
let response = client.head(url).send()?;without the need for all the additional to_owned calls below
| @@ -1070,6 +1152,7 @@ fn run() -> Result<()> { | |||
| [`reqwest::RequestBuilder`]: https://docs.rs/reqwest/0.6.2/reqwest/struct.RequestBuilder.html | |||
| [`reqwest::Response`]: https://docs.rs/reqwest/*/reqwest/struct.Response.html | |||
| [`reqwest::get`]: https://docs.rs/reqwest/*/reqwest/fn.get.html | |||
| [`reqwest::header::Range`]: https://docs.rs/reqwest/0.7.3/reqwest/header/enum.Range.html | |||
This comment has been minimized.
This comment has been minimized.
budziq
Oct 4, 2017
Collaborator
please use wildcard version * in docs.rs links (please also update any other links with hardcoded version that you find in this file)
| [![reqwest-badge]][reqwest] [![cat-net-badge]][cat-net] | ||
| Download content using [`reqwest::get`] and setting the [`reqwest::header::Range`] to do partial |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Thank you @budziq for your comments, realy appreciated. I'm no sure about the rebase I did in my fork to get the latest changes and the update of the |
This comment has been minimized.
This comment has been minimized.
|
@jbolila It seams that there was a problem with the rebase. You have pulled and rebased some of the already merged commits making this PR not mergable. git remote add upstream git@github.com:rust-lang-nursery/rust-cookbook.git
git remote update # get latest changes
git checkout f46683b -b jbolila_pr_fix # get your original commit
git rebase upstream/master # place it on top of current master
git cherry-pick b1bf49dec8 # add your latest fixesthen in the branch "jbolila_pr_fix" you'll have the state that you will want to push to your origin/master (with --force) to update the PR into a proper condition. Then we will continue with the review. |
This comment has been minimized.
This comment has been minimized.
|
I hope is right, now.
|
This comment has been minimized.
This comment has been minimized.
|
unfortunately not yet. instead of merging jbolila_fix_pr you should overwrite master with it. git checkout jbolila_fix_pr
git branch -D master
git checkout -b master
git push -u origin master --force |
jbolila
force-pushed the
jbolila:master
branch
from
a0370ed
to
fcacb9f
Oct 4, 2017
budziq
requested changes
Oct 7, 2017
| println!("content-lenght: {} download in progress..", content_length); | ||
| let mut chunk_starts: u64 = 0; | ||
| let mut chunk_ends = if 100 > content_length { |
This comment has been minimized.
This comment has been minimized.
budziq
Oct 7, 2017
Collaborator
Although I would suggest not to hardcode "100" here but have some value/constant indicating the buffer size
Also are you upto converting the offset calculation code into an iterator returning Range::bytes? That would be awesome!
This comment has been minimized.
This comment has been minimized.
jbolila
Oct 7, 2017
Author
Contributor
I can try that one, although looks like in the std Range::step_by() is still in the nightly build only #27741. I'll do it for now manually inside the for loop.
This comment has been minimized.
This comment has been minimized.
budziq
Oct 7, 2017
Collaborator
the step_by would not help here much as it would just return every N'th value of the range (treated as iterator here) so
for s in (0..1024).step_by(100) {
println!("{:?}", s);
}would give
0
100
200
300
400
500
600
700
800
900
1000
Which would help with the implementation but would not give a full solution.
I would propose to make own iterator returning
hyper::header::Range::bytes with each iteration containing the next range we want to obtain
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
jbolila
Oct 8, 2017
Author
Contributor
@budziq please let me know if it's ok, and if you wanna hide some of these extra lines. Thanks!
| _ => bail!("response doesn't include the expected ranges"), | ||
| }; | ||
| let mut content: Vec<u8> = Vec::with_capacity(content_length as usize); |
This comment has been minimized.
This comment has been minimized.
budziq
Oct 7, 2017
Collaborator
most of the time when we use partial downloading it's due to the fact that we either want to resume the download later or the whole content is to larg to hold in a contiguous memory buffer.
I would suggest using content buffer length as the each partial content request length (upto the last possibly shorter one). and each time writing it into a file with given offset.
This comment has been minimized.
This comment has been minimized.
jbolila
Oct 7, 2017
Author
Contributor
I'll use the value of the constant to set the size of the buffer.
What will be the filename? wouldn't that be to much for this example, I'm just concerned about the amount of code for the reqwest/ranges compared to the file handle.
This comment has been minimized.
This comment has been minimized.
budziq
Oct 7, 2017
Collaborator
opening and writing to a file would be only two additional lines which we could hide with # anyway. the filename can be anything like "download.bin"
jbolila
force-pushed the
jbolila:master
branch
from
b00fcc4
to
82dadcd
Oct 8, 2017
budziq
requested changes
Oct 9, 2017
| bail!("Unexpected server response: {}", response.status()) | ||
| } | ||
| let mut content: Vec<u8> = Vec::with_capacity(BUFFER_SIZE as usize); |
This comment has been minimized.
This comment has been minimized.
budziq
Oct 9, 2017
Collaborator
no need to allocate the buffer on each iteration. Lets allocate it before the loop and use Vec::clear after each iteration
| let mut response = client.get(url).header(range).send()?; | ||
| if response.status() != reqwest::StatusCode::PartialContent && |
This comment has been minimized.
This comment has been minimized.
budziq
Oct 9, 2017
Collaborator
Id suggest using equivalent boolean expression that might be easier to read
(!a && !b) => !(a || b)
This comment has been minimized.
This comment has been minimized.
jbolila
Oct 9, 2017
Author
Contributor
I'll do it as you wish, although for me it's not easier to read this way.
if !(response.status() == reqwest::StatusCode::PartialContent ||
response.status() == reqwest::StatusCode::Ok)
{
bail!("Unexpected server response: {}", response.status())
}| # } | ||
| # } | ||
| const BUFFER_SIZE: u64 = 100; |
This comment has been minimized.
This comment has been minimized.
budziq
Oct 9, 2017
Collaborator
the example is no_run anyway. we can go wild with the buffer and request size
the biggest request handled by httpbin seams to be around 102400 but I did not check precisely. The buffer can be set to something like 10KB to still show partial content (this is still unreasonably small. we might have to add a comment that normally it would be bigger )
| const BUFFER_SIZE: u64 = 100; | ||
| struct PartialRange { |
This comment has been minimized.
This comment has been minimized.
budziq
Oct 9, 2017
Collaborator
I propose to merge the PartialRange and PartialRangeIter into one which will be much more concise.
All we will need will be:
- the struct
struct PartialRangeIter {
start: u64, // content range start
end: u64, // content range end
buffer_size: usize, // external buffer size
}- The constructor
newfunction below - iter impl which will be much simplified
We should not need anything more complicated than
impl Iterator for PartialRangeIter {
type Item = Range;
fn next(&mut self) -> Option<Self::Item> {
if self.start >= self.end {
None
} else {
let old_start = self.start;
self.start += cmp::min(self.buffer_size as u64, self.end - self.start +1);
Some(Range::bytes(old_start, self.start -1))
}
}
}
This comment has been minimized.
This comment has been minimized.
jbolila
Oct 9, 2017
•
Author
Contributor
I have tried that approach during this end up with the following error, and now is back again with this code you're posting here:
error[E0599]: no method named `iter` found for type `PartialRangeIter` in the current scope
--> src/main.rs:77:25
|
77 | for range in ranges.iter() {
| ^^^^
here is the complete struct + impl:
#[derive(Debug)]
struct PartialRangeIter {
start: u64, // content range start
end: u64, // content range end
buffer_size: usize, // external buffer size
}
impl PartialRangeIter {
pub fn new(content_range: &ContentRangeSpec, buffer_size: usize) -> Result<PartialRangeIter> {
if buffer_size == 0 {
Err("invalid buffer_size, give a value greater than zero.")?;
}
match *content_range {
ContentRangeSpec::Bytes { range: Some(range), .. } => Ok(PartialRangeIter {
start: range.0,
end: range.1,
buffer_size,
}),
_ => Err("invalid range specification")?,
}
}
}
impl Iterator for PartialRangeIter {
type Item = Range;
fn next(&mut self) -> Option<Self::Item> {
if self.start >= self.end {
None
} else {
let old_start = self.start;
self.start += std::cmp::min(self.buffer_size as u64, self.end - self.start + 1);
Some(Range::bytes(old_start, self.start - 1))
}
}
}can you explain?
This comment has been minimized.
This comment has been minimized.
budziq
Oct 9, 2017
Collaborator
for range in ranges.iter() {
There is nothing to provide the iter method nor is it required.
Som types provide iter, into_iter or both to return a separate helper iterator type.
In this situation we already have an iterator type instance as it can be constructed with PartialRangeIter::new (not being tied to any other type).
So following would simply work
for range in ranges {I hope that helpes
| } => (content_length, range.0, range.1), | ||
| _ => (0, 0, 0), | ||
| }; | ||
This comment has been minimized.
This comment has been minimized.
| range: Some(range), | ||
| instance_length: Some(content_length), | ||
| } => (content_length, range.0, range.1), | ||
| _ => (0, 0, 0), |
This comment has been minimized.
This comment has been minimized.
budziq
Oct 9, 2017
Collaborator
the iterator with (0,0,0) does not make sense/ would be invalid so I would suggest returning Err("invalid range speccification")? in this match leg,
| } | ||
| impl PartialRange { | ||
| pub fn new(content_range: &ContentRangeSpec) -> PartialRange { |
This comment has been minimized.
This comment has been minimized.
budziq
Oct 9, 2017
Collaborator
this looks like a failable operation I would suggest returning Result<PartialRangeIter> from this function
also we will need the buffer_size as an argument. I would strongly advise against using global constants throughout the code. This leads to bad refactoring experience.
we will also need to check the buffer_size for being > 0 and return Err if it is not
| let (content_length, range_starts, range_ends) = match *content_range { | ||
| ContentRangeSpec::Bytes { | ||
| range: Some(range), | ||
| instance_length: Some(content_length), |
This comment has been minimized.
This comment has been minimized.
budziq
Oct 9, 2017
Collaborator
how about directly constructing the Ok(PartialRangeIter{....}) here and leaving the match as the last statement so it is naturally returned from the function. If it will be to dense, we can always return to the parameter extraction in match and returning after it.
jbolila
added some commits
Oct 1, 2017
jbolila
force-pushed the
jbolila:master
branch
from
f9a4e52
to
c2b9fbe
Oct 9, 2017
budziq
merged commit 9084c13
into
rust-lang-nursery:master
Oct 9, 2017
This comment has been minimized.
This comment has been minimized.
|
Merged, Excellent work @jbolila ! |
This comment has been minimized.
This comment has been minimized.
|
Thank you @budziq !! I have learned a few things during this days, thanks for your patience and guidance. |
This comment has been minimized.
This comment has been minimized.
|
No problem! That is exactly what this effort is for! |
jbolila commentedOct 1, 2017
This is my first PR in a Rust project, probably not the most idiomatic Rust but I'm here to learn.