Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "Make a partial download with HTTP range headers" example #291 #309

Merged
merged 5 commits into from Oct 9, 2017

Conversation

Projects
None yet
2 participants
@jbolila
Copy link
Contributor

jbolila commented Oct 1, 2017

This is my first PR in a Rust project, probably not the most idiomatic Rust but I'm here to learn.

@budziq
Copy link
Collaborator

budziq left a comment

@jbolila Thanks, the example looks almost perfect! Please find some minor suggestion below

src/net.md Outdated
let url = Url::parse("https://httpbin.org/range/1024?duration=2")?;
let response = client.head(url.to_owned())?.send()?;
if let Some(range) = response.headers().get::<ContentRange>() {

This comment has been minimized.

@budziq

budziq Oct 4, 2017

Collaborator

instead of if let i would suggest using ok_or combinator to convert into a Result and immediately ?

src/net.md Outdated
@@ -19,6 +19,7 @@
| [Extract all links from a webpage HTML][ex-extract-links-webpage] | [![reqwest-badge]][reqwest] [![select-badge]][select] | [![cat-net-badge]][cat-net] |
| [Check webpage for broken links][ex-check-broken-links] | [![reqwest-badge]][reqwest] [![select-badge]][select] [![url-badge]][url] | [![cat-net-badge]][cat-net] |
| [Extract all unique links from a MediaWiki markup][ex-extract-mediawiki-links] | [![reqwest-badge]][reqwest] [![regex-badge]][regex] | [![cat-net-badge]][cat-net] |
| [Make a partial download with HTTP range headers][ex-progress-with-range] | [![reqwest-badge]][reqwest] [![regex-badge]][regex] | [![cat-net-badge]][cat-net] |

This comment has been minimized.

@budziq

budziq Oct 4, 2017

Collaborator

please also update the intro.md

src/net.md Outdated
# }
# }
const TOTAL_CHUNKS: u64 = 20;

This comment has been minimized.

@budziq

budziq Oct 4, 2017

Collaborator

i would suggest using fixed buffer size instead of chunk number. Otherwise things are going to get weird with small content sizes and there is no reason to request exactly n times.

src/net.md Outdated
use reqwest::header::{Range, ContentRange, ContentRangeSpec};
use url::Url;
use std::io::{Read, Write};

This comment has been minimized.

@budziq

budziq Oct 4, 2017

Collaborator

please use std before tbe crate uses

src/net.md Outdated
if let Some(range) = response.headers().get::<ContentRange>() {
if let ContentRangeSpec::Bytes { instance_length, .. } = **range {
let content_length = instance_length.unwrap_or(0);

This comment has been minimized.

@budziq

budziq Oct 4, 2017

Collaborator

we can drop the if let and unwrap_or here
with something like:

        let content_length = match **range {
            ContentRangeSpec::Bytes { instance_length: Some(len), .. } => len,
            _ => bail!("response doesn't include the expected ranges"),
        };
src/net.md Outdated
}
assert_eq!(content_length as usize, content.len());
println!("finished with success!");

This comment has been minimized.

@budziq

budziq Oct 4, 2017

Collaborator

please drop the assert_eq here.

src/net.md Outdated
const TOTAL_CHUNKS: u64 = 20;
fn run() -> Result<()> {
let client = reqwest::Client::new()?;

This comment has been minimized.

@budziq

budziq Oct 4, 2017

Collaborator

we have just bumped to reqwest 0.8 which uses Result more sparingly so this will not compile now/ the same for head and get calls below

src/net.md Outdated
let client = reqwest::Client::new()?;
let url = Url::parse("https://httpbin.org/range/1024?duration=2")?;
let response = client.head(url.to_owned())?.send()?;

This comment has been minimized.

@budziq

budziq Oct 4, 2017

Collaborator

reqwest uses Url::parse internally so basically below will work

    let url = "https://httpbin.org/range/1024?duration=2";
    let response = client.head(url).send()?;

without the need for all the additional to_owned calls below

src/net.md Outdated
@@ -1070,6 +1152,7 @@ fn run() -> Result<()> {
[`reqwest::RequestBuilder`]: https://docs.rs/reqwest/0.6.2/reqwest/struct.RequestBuilder.html
[`reqwest::Response`]: https://docs.rs/reqwest/*/reqwest/struct.Response.html
[`reqwest::get`]: https://docs.rs/reqwest/*/reqwest/fn.get.html
[`reqwest::header::Range`]: https://docs.rs/reqwest/0.7.3/reqwest/header/enum.Range.html

This comment has been minimized.

@budziq

budziq Oct 4, 2017

Collaborator

please use wildcard version * in docs.rs links (please also update any other links with hardcoded version that you find in this file)

src/net.md Outdated
[![reqwest-badge]][reqwest] [![cat-net-badge]][cat-net]
Download content using [`reqwest::get`] and setting the [`reqwest::header::Range`] to do partial

This comment has been minimized.

@budziq

budziq Oct 4, 2017

Collaborator

I would mention about getting the ContentRange with head request.

@jbolila

This comment has been minimized.

Copy link
Contributor Author

jbolila commented Oct 4, 2017

Thank you @budziq for your comments, realy appreciated. I'm no sure about the rebase I did in my fork to get the latest changes and the update of the Cargo.toml .. (i don't have much experience with PR)

@budziq

This comment has been minimized.

Copy link
Collaborator

budziq commented Oct 4, 2017

@jbolila It seams that there was a problem with the rebase. You have pulled and rebased some of the already merged commits making this PR not mergable.
I would suggest backtracking a little.

git remote add upstream git@github.com:rust-lang-nursery/rust-cookbook.git
git remote update                       # get latest changes
git checkout f46683b -b jbolila_pr_fix  # get your original commit
git rebase upstream/master              # place it on top of current master
git cherry-pick b1bf49dec8              # add your latest fixes

then in the branch "jbolila_pr_fix" you'll have the state that you will want to push to your origin/master (with --force) to update the PR into a proper condition.

Then we will continue with the review.
If you'll have any further problems, do not hesitate to ask.

@jbolila

This comment has been minimized.

Copy link
Contributor Author

jbolila commented Oct 4, 2017

I hope is right, now.

*   a0370ed (HEAD -> master) Merge branch 'jbolila_pr_fix'
|\  
| * fcacb9f (jbolila_pr_fix) Changes requested in the PR 309
| * 4e675fb Add "Make a partial download with HTTP range headers" example #291
| * b7ab129 (upstream/master) Parse a complex version string.
| * 6ffdf8e Minor fixes
..
@budziq

This comment has been minimized.

Copy link
Collaborator

budziq commented Oct 4, 2017

unfortunately not yet. instead of merging jbolila_fix_pr you should overwrite master with it.

git checkout jbolila_fix_pr
git branch -D master
git checkout -b master
git push -u origin master --force

@jbolila jbolila force-pushed the jbolila:master branch from a0370ed to fcacb9f Oct 4, 2017

@budziq
Copy link
Collaborator

budziq left a comment

@jbolila nice! Just few suggestions and we are ready to merge!

src/net.md Outdated
println!("content-lenght: {} download in progress..", content_length);
let mut chunk_starts: u64 = 0;
let mut chunk_ends = if 100 > content_length {

This comment has been minimized.

@budziq

budziq Oct 7, 2017

Collaborator

Although I would suggest not to hardcode "100" here but have some value/constant indicating the buffer size

Also are you upto converting the offset calculation code into an iterator returning Range::bytes? That would be awesome!

This comment has been minimized.

@jbolila

jbolila Oct 7, 2017

Author Contributor

I can try that one, although looks like in the std Range::step_by() is still in the nightly build only #27741. I'll do it for now manually inside the for loop.

This comment has been minimized.

@budziq

budziq Oct 7, 2017

Collaborator

the step_by would not help here much as it would just return every N'th value of the range (treated as iterator here) so

 for s in (0..1024).step_by(100) {
    println!("{:?}", s);
 }

would give

0
100
200
300
400
500
600
700
800
900
1000

Which would help with the implementation but would not give a full solution.
I would propose to make own iterator returning
hyper::header::Range::bytes with each iteration containing the next range we want to obtain

This comment has been minimized.

@budziq

budziq Oct 7, 2017

Collaborator

Anyhow, if you'll have any problems or concerns I'll gladly help!

This comment has been minimized.

@jbolila

jbolila Oct 8, 2017

Author Contributor

@budziq please let me know if it's ok, and if you wanna hide some of these extra lines. Thanks!

src/net.md Outdated
_ => bail!("response doesn't include the expected ranges"),
};
let mut content: Vec<u8> = Vec::with_capacity(content_length as usize);

This comment has been minimized.

@budziq

budziq Oct 7, 2017

Collaborator

most of the time when we use partial downloading it's due to the fact that we either want to resume the download later or the whole content is to larg to hold in a contiguous memory buffer.

I would suggest using content buffer length as the each partial content request length (upto the last possibly shorter one). and each time writing it into a file with given offset.

This comment has been minimized.

@jbolila

jbolila Oct 7, 2017

Author Contributor

I'll use the value of the constant to set the size of the buffer.

What will be the filename? wouldn't that be to much for this example, I'm just concerned about the amount of code for the reqwest/ranges compared to the file handle.

This comment has been minimized.

@budziq

budziq Oct 7, 2017

Collaborator

opening and writing to a file would be only two additional lines which we could hide with # anyway. the filename can be anything like "download.bin"

@jbolila jbolila force-pushed the jbolila:master branch from b00fcc4 to 82dadcd Oct 8, 2017

@budziq
Copy link
Collaborator

budziq left a comment

@jbolila Nice!. I've left few suggestions to make the example more concise. This will probably be the last round of review before merge!

src/net.md Outdated
bail!("Unexpected server response: {}", response.status())
}
let mut content: Vec<u8> = Vec::with_capacity(BUFFER_SIZE as usize);

This comment has been minimized.

@budziq

budziq Oct 9, 2017

Collaborator

no need to allocate the buffer on each iteration. Lets allocate it before the loop and use Vec::clear after each iteration

src/net.md Outdated
let mut response = client.get(url).header(range).send()?;
if response.status() != reqwest::StatusCode::PartialContent &&

This comment has been minimized.

@budziq

budziq Oct 9, 2017

Collaborator

Id suggest using equivalent boolean expression that might be easier to read
(!a && !b) => !(a || b)

This comment has been minimized.

@jbolila

jbolila Oct 9, 2017

Author Contributor

I'll do it as you wish, although for me it's not easier to read this way.

        if !(response.status() == reqwest::StatusCode::PartialContent ||
                 response.status() == reqwest::StatusCode::Ok)
        {
            bail!("Unexpected server response: {}", response.status())
        }
src/net.md Outdated
# }
# }
const BUFFER_SIZE: u64 = 100;

This comment has been minimized.

@budziq

budziq Oct 9, 2017

Collaborator

the example is no_run anyway. we can go wild with the buffer and request size
the biggest request handled by httpbin seams to be around 102400 but I did not check precisely. The buffer can be set to something like 10KB to still show partial content (this is still unreasonably small. we might have to add a comment that normally it would be bigger )

src/net.md Outdated
const BUFFER_SIZE: u64 = 100;
struct PartialRange {

This comment has been minimized.

@budziq

budziq Oct 9, 2017

Collaborator

I propose to merge the PartialRange and PartialRangeIter into one which will be much more concise.

All we will need will be:

  • the struct
struct PartialRangeIter {
    start: u64,  // content range start
    end: u64,   // content range end
    buffer_size: usize, // external buffer size 
}
  • The constructor new function below
  • iter impl which will be much simplified

We should not need anything more complicated than

impl Iterator for PartialRangeIter {
    type Item = Range;

    fn next(&mut self) -> Option<Self::Item> {
        if self.start >= self.end {
            None
        } else {
            let old_start = self.start;
            self.start += cmp::min(self.buffer_size as u64, self.end - self.start +1);
            Some(Range::bytes(old_start, self.start -1))
        }
    }
}

This comment has been minimized.

@jbolila

jbolila Oct 9, 2017

Author Contributor

I have tried that approach during this end up with the following error, and now is back again with this code you're posting here:

error[E0599]: no method named `iter` found for type `PartialRangeIter` in the current scope
  --> src/main.rs:77:25
   |
77 |     for range in ranges.iter() {
   |                         ^^^^

here is the complete struct + impl:

#[derive(Debug)]
struct PartialRangeIter {
    start: u64, // content range start
    end: u64, // content range end
    buffer_size: usize, // external buffer size
}

impl PartialRangeIter {
    pub fn new(content_range: &ContentRangeSpec, buffer_size: usize) -> Result<PartialRangeIter> {
        if buffer_size == 0 {
            Err("invalid buffer_size, give a value greater than zero.")?;
        }

        match *content_range {
            ContentRangeSpec::Bytes { range: Some(range), .. } => Ok(PartialRangeIter {
                start: range.0,
                end: range.1,
                buffer_size,
            }),
            _ => Err("invalid range specification")?,
        }
    }
}

impl Iterator for PartialRangeIter {
    type Item = Range;

    fn next(&mut self) -> Option<Self::Item> {
        if self.start >= self.end {
            None
        } else {
            let old_start = self.start;
            self.start += std::cmp::min(self.buffer_size as u64, self.end - self.start + 1);
            Some(Range::bytes(old_start, self.start - 1))
        }
    }
}

can you explain?

This comment has been minimized.

@budziq

budziq Oct 9, 2017

Collaborator

for range in ranges.iter() {

There is nothing to provide the iter method nor is it required.
Som types provide iter, into_iter or both to return a separate helper iterator type.
In this situation we already have an iterator type instance as it can be constructed with PartialRangeIter::new (not being tied to any other type).

So following would simply work

for range in ranges {

I hope that helpes 👍

src/net.md Outdated
} => (content_length, range.0, range.1),
_ => (0, 0, 0),
};

This comment has been minimized.

@budziq

budziq Oct 9, 2017

Collaborator

anything below here upto the run function will not be neaded

src/net.md Outdated
range: Some(range),
instance_length: Some(content_length),
} => (content_length, range.0, range.1),
_ => (0, 0, 0),

This comment has been minimized.

@budziq

budziq Oct 9, 2017

Collaborator

the iterator with (0,0,0) does not make sense/ would be invalid so I would suggest returning Err("invalid range speccification")? in this match leg,

src/net.md Outdated
}
impl PartialRange {
pub fn new(content_range: &ContentRangeSpec) -> PartialRange {

This comment has been minimized.

@budziq

budziq Oct 9, 2017

Collaborator

this looks like a failable operation I would suggest returning Result<PartialRangeIter> from this function

also we will need the buffer_size as an argument. I would strongly advise against using global constants throughout the code. This leads to bad refactoring experience.

we will also need to check the buffer_size for being > 0 and return Err if it is not

src/net.md Outdated
let (content_length, range_starts, range_ends) = match *content_range {
ContentRangeSpec::Bytes {
range: Some(range),
instance_length: Some(content_length),

This comment has been minimized.

@budziq

budziq Oct 9, 2017

Collaborator

how about directly constructing the Ok(PartialRangeIter{....}) here and leaving the match as the last statement so it is naturally returned from the function. If it will be to dense, we can always return to the parameter extraction in match and returning after it.

@jbolila jbolila force-pushed the jbolila:master branch from f9a4e52 to c2b9fbe Oct 9, 2017

@budziq budziq merged commit 9084c13 into rust-lang-nursery:master Oct 9, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@budziq

This comment has been minimized.

Copy link
Collaborator

budziq commented Oct 9, 2017

Merged, Excellent work @jbolila !
I've squashed the commits and I'll post minor cleanup shortly.

@jbolila

This comment has been minimized.

Copy link
Contributor Author

jbolila commented Oct 10, 2017

Thank you @budziq !! I have learned a few things during this days, thanks for your patience and guidance.

@budziq

This comment has been minimized.

Copy link
Collaborator

budziq commented Oct 10, 2017

No problem! That is exactly what this effort is for!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.