Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a threadpool to calculate SHA1 in all *.iso files in a folder. #274

Merged
merged 1 commit into from Oct 2, 2017

Conversation

Projects
None yet
2 participants
@ericho
Copy link
Contributor

ericho commented Aug 31, 2017

No description provided.

@budziq
Copy link
Collaborator

budziq left a comment

Very nice :) although I would suggest some changes below:

@@ -1,11 +1,12 @@
# Concurrency

| Recipe | Crates | Categories |

This comment has been minimized.

@budziq

budziq Aug 31, 2017

Collaborator

no need to rewrite the table here. please update it in the same manner as in the case of intro.md

This comment has been minimized.

@ericho

ericho Sep 9, 2017

Author Contributor

It is fixed.. I mess with something but I'm not sure what was 😄


```rust,no_run
extern crate walkdir;
extern crate md5;

This comment has been minimized.

@budziq

budziq Aug 31, 2017

Collaborator

CI fails due to usage of new crate without adding it to cargo.toml
also I would prefer not to add new crate at this point. I would suggest using already present ring crate (it does not support md5 but you will have support for sha1 and sha256 algorithms )

This comment has been minimized.

@ericho

ericho Sep 9, 2017

Author Contributor

I changed the example to use ring and SHA1

.filter(|e| !e.file_type().is_dir()){
let entry = entry.clone();
pool.execute(move || {
let file = File::open(entry.path()).unwrap();

This comment has been minimized.

let mut content: Vec<u8> = Vec::new();
buf_reader.read_to_end(&mut content);
let digest = md5::compute(content);
println!("{} {:x}", entry.path().display(), digest);

This comment has been minimized.

@budziq

budziq Aug 31, 2017

Collaborator

instead of printing these out here I would suggest using mpsc to pass the computation results to the main thread (please see the fractal threadpool example)

let cpus = num_cpus::get();
let pool = ThreadPool::new(cpus);
for entry in WalkDir::new(".")

This comment has been minimized.

@budziq

budziq Aug 31, 2017

Collaborator

lets go wild and compute the sha1 of all *.iso files in given directory or all files listed in
or the ones listed in
https://cloud-images.ubuntu.com/trusty/current/SHA1SUMS
https://cloud-images.ubuntu.com/trusty/current/

In this way we could have an actual useful example. something along these lines - "Verify in parallel downloaded files consistency with SHA1 algorithm"

@budziq budziq referenced this pull request Aug 31, 2017

Closed

Cookbook ideas for rayon #272

6 of 6 tasks complete
@ericho

This comment has been minimized.

Copy link
Contributor Author

ericho commented Sep 1, 2017

Thank you @budziq for the feedback, I'll work on the fixes 😄

@ericho ericho force-pushed the ericho:threadpool branch from 985b189 to b0cd956 Sep 9, 2017

@ericho ericho changed the title Use a threadpool to calculate md5 in all files in a folder. Use a threadpool to calculate SHA1 in all *.iso files in a folder. Sep 9, 2017

@ericho ericho force-pushed the ericho:threadpool branch from b0cd956 to b6c2d26 Sep 9, 2017

@budziq
Copy link
Collaborator

budziq left a comment

Hi @ericho, nice work, I would suggest few modifications to make it slightly more idiomatic.

@@ -6,6 +6,7 @@
| [Sort a vector in parallel][ex-rayon-parallel-sort] | [![rayon-badge]][rayon] [![rand-badge]][rand] | [![cat-concurrency-badge]][cat-concurrency] |
| [Spawn a short-lived thread][ex-crossbeam-spawn] | [![crossbeam-badge]][crossbeam] | [![cat-concurrency-badge]][cat-concurrency] |
| [Draw fractal dispatching work to a thread pool][ex-threadpool-fractal] | [![threadpool-badge]][threadpool] [![num-badge]][num] [![num_cpus-badge]][num_cpus] [![image-badge]][image] | [![cat-concurrency-badge]][cat-concurrency][![cat-science-badge]][cat-science][![cat-rendering-badge]][cat-rendering] |
| [![threadpool-badge]][threadpool] [![walkdir-badge]][walkdir] [![num_cpus-badge]][num_cpus] | [Use a threadpool to calculate SHA1 to all files in a folder][threadpool-walk] | [![cat-concurrency-badge]][cat-concurrency][![cat-filesystem-badge]][cat-filesystem] |

This comment has been minimized.

@budziq

budziq Sep 11, 2017

Collaborator

The title goes into first column and badges into the second. Please see the rendered version for comparison ;)

Also I would probably change the title to something along the lines of "Calculate SHA1 sum of *.iso files in parallel"

@@ -63,6 +63,8 @@ community. It needs and welcomes help. For details see
| [Sort a vector in parallel][ex-rayon-parallel-sort] | [![rayon-badge]][rayon] [![rand-badge]][rand] | [![cat-concurrency-badge]][cat-concurrency] |
| [Spawn a short-lived thread][ex-crossbeam-spawn] | [![crossbeam-badge]][crossbeam] | [![cat-concurrency-badge]][cat-concurrency] |
| [Draw fractal dispatching work to a thread pool][ex-threadpool-fractal] | [![threadpool-badge]][threadpool] [![num-badge]][num] [![num_cpus-badge]][num_cpus] [![image-badge]][image] | [![cat-concurrency-badge]][cat-concurrency][![cat-science-badge]][cat-science][![cat-rendering-badge]][cat-rendering] |
| [![threadpool-badge]][threadpool] [![walkdir-badge]][walkdir] [![num_cpus-badge]][num_cpus] | [Use a threadpool to calculate SHA1 to all files in a folder][threadpool-walk] | [![cat-concurrency-badge]][cat-concurrency][![cat-filesystem-badge]][cat-filesystem] |

This comment has been minimized.

@budziq

budziq Sep 11, 2017

Collaborator

The title goes into first column and badges into the second.
Also the title should be hyperlinked.

# }

fn run() -> Result<()> {
let cpus = num_cpus::get();

This comment has been minimized.

@budziq

budziq Sep 11, 2017

Collaborator

no need for stand alone variable

ThreadPool::new(num_cpus:get()) is short ans sweet

fn run() -> Result<()> {
let cpus = num_cpus::get();
let pool = ThreadPool::new(cpus);
let mut file_counter: usize = 0;

This comment has been minimized.

@budziq

budziq Sep 11, 2017

Collaborator

no need for the file_counter if you do drop(tx) after the for loop the rx.iter() will not hang.

return;
},
};
let mut buf_reader = BufReader::new(file);

This comment has been minimized.

@budziq

budziq Sep 11, 2017

Collaborator

Reading the whole file into memory is not really the best way to go. Ring provides a nice incremental api.
for its usage please see the
calculate the sha-256 digest of a file example

Moreover it would be much nicer to extract the file reading and sha1 calculation into a separate function taking a AsRef<Path> and returning a Result. Then you would have only one tx.send() without the verbose matches all over the place.

}
});
}

This comment has been minimized.

@budziq

budziq Sep 11, 2017

Collaborator

no need to use take as long as you drop(tx) prior to this loop.

use walkdir::{WalkDir, DirEntry};
use std::fs::File;
use std::io::BufReader;
use std::io::prelude::*;

This comment has been minimized.

@budziq

budziq Sep 11, 2017

Collaborator

no need to bring the whole prelude here. please import only the required traits.

.follow_links(true)
.into_iter()
.filter_map(|e| e.ok())
.filter(|e| !e.file_type().is_dir() && is_iso(e)) {

This comment has been minimized.

@budziq

budziq Sep 11, 2017

Collaborator
  • I would operate on Pathor PathBuf instead of DirEntry here It is lighter, more idiomatic and has all the required interfaces to boot :).

# // Verify the iso extension
# fn is_iso(entry: &DirEntry) -> bool {
# entry.file_name()

This comment has been minimized.

@budziq

budziq Sep 11, 2017

Collaborator

If you use Path or PathBuf (or better AsRef<Path>) instead of DirEntry you'll be able to
use use Path::extension() and compare directly to "iso"

.filter(|e| !e.file_type().is_dir() && is_iso(e)) {
// Need to know how many entry were sent to the pool.
file_counter = file_counter + 1;
let entry = entry.clone();

This comment has been minimized.

@budziq

budziq Sep 11, 2017

Collaborator

this will not be needed if you use PathBuf instead as the owned value will just get moved into the closure

let (tx, rx) = channel();

// Look in the current directory.
for entry in WalkDir::new(".")

This comment has been minimized.

@budziq

budziq Sep 11, 2017

Collaborator

I would parametrize the call with a variable event if its value is hardcoded earlier to something like "/home/user/downloads".

This comment has been minimized.

@budziq

budziq Sep 19, 2017

Collaborator

Hi @ericho do you need any help in finalizing this example?

This comment has been minimized.

@ericho

ericho Sep 24, 2017

Author Contributor

Hi, sorry, I was busy these last days. I think I solved all the comments.

BTW, thanks for the review, I learned a lot doing this example 😄

@ericho ericho force-pushed the ericho:threadpool branch 3 times, most recently from b6dc32c to f837745 Sep 24, 2017

@budziq
Copy link
Collaborator

budziq left a comment

Nice work @ericho!
Thanks for sticking with it. Just few minor touches and we are ready to merge!

BTW, thanks for the review, I learned a lot doing this example 😄

Ha! I'm very glad to read that! This is exactly the point of this effort 🥇

use std::path::Path;
use threadpool::ThreadPool;
use std::sync::mpsc::channel;
use ring::digest::{Context, Digest, SHA256};

This comment has been minimized.

@budziq

budziq Sep 24, 2017

Collaborator

we probably wanted SHA1 instead of SHA256

use walkdir::WalkDir;
use std::fs::File;
use std::io::BufReader;
use std::io::prelude::Read;

This comment has been minimized.

@budziq

budziq Sep 24, 2017

Collaborator

nearly everything in std::io::prelude is reexported to std::io
so we can write it as
use std::io::{BufReader, Read}

# Io(std::io::Error);
# }
# }

This comment has been minimized.

@budziq

budziq Sep 24, 2017

Collaborator

please add additional # here so we avoid 2 blank spaces

}
Ok(())
}

This comment has been minimized.

@budziq

budziq Sep 24, 2017

Collaborator

please remove the single blank space

# fn is_iso(entry: &Path) -> bool {
# match entry.extension() {
# Some(e) if e == "iso"=> true,
# Some(_) => false,

This comment has been minimized.

@budziq

budziq Sep 24, 2017

Collaborator

this match can be shortened even further into just two branches

    match entry.extension() {
        Some(e) if e.to_string_lossy().to_lowercase() == "iso" => true,
        _ => false,
    }

## Calculate SHA1 sum of *.iso files concurrently
[![threadpool-badge]][threadpool] [![num_cpus-badge]][num_cpus] [![walkdir-badge]][walkdir] [![cat-concurrency-badge]][cat-concurrency][![cat-filesystem-badge]][cat-filesystem]

This comment has been minimized.

@budziq

budziq Sep 24, 2017

Collaborator

i would add ring crate badge here

# }
# }
fn compute_digest<P: AsRef<Path>>(filepath: P) -> Result<(Digest, String)> {

This comment has been minimized.

@budziq

budziq Sep 24, 2017

Collaborator

if you return Result<(Digest, P)> you can completely drop the filename variable and just return Ok((context.finish(), filepath))

fn compute_digest<P: AsRef<Path>>(filepath: P) -> Result<(Digest, String)> {
let filename = String::from(filepath.as_ref().to_string_lossy());
let file = File::open(filepath)?;
let mut buf_reader = BufReader::new(file);

This comment has been minimized.

@budziq

budziq Sep 24, 2017

Collaborator

we don't need a separate variable for file. we can write it as BufReader::new(File::open(filepath)?); (the file will be moved into the BufReaderanyway)

.into_iter()
.filter_map(|e| e.ok())
.filter(|e| !e.path().is_dir() && is_iso(e.path())) {
let path = entry.path().to_path_buf();

This comment has been minimized.

@budziq

budziq Sep 24, 2017

Collaborator

instead of to_path_buf we can use to_owned here.

[![threadpool-badge]][threadpool] [![num_cpus-badge]][num_cpus] [![walkdir-badge]][walkdir] [![cat-concurrency-badge]][cat-concurrency][![cat-filesystem-badge]][cat-filesystem]
This example uses `threadpool`, `walkdir` and `ring` crates to calculate the SHA1 for every file present in the current directory. A threadpool `pool` is created using the number of cpus present in the system. Then every `entry` returned by `walkdir` is passed into this pool to perform the operations of reading and compute SHA1. At the end the program waits for all jobs to finish. To get better results, compile this program in release mode.

This comment has been minimized.

@budziq

budziq Sep 24, 2017

Collaborator
  1. It would replace

"This example uses threadpool, walkdir and ring crates to calculate ..."

with "Calculates ...". The reader already knows which crates are used form the heading.

  1. Actually this example recursively calculates SHA256 of every iso file present in "/home/user/Downloads"

So please update the code or the description ;)

  1. typo - "reading and compute" => "reading and computing"

  2. There are several identifiers mentioned pool, entry etc. Please link these to docs

@ericho ericho referenced this pull request Sep 26, 2017

Open

Cookbook ideas for threadpool crate #225

2 of 3 tasks complete

@ericho ericho force-pushed the ericho:threadpool branch 3 times, most recently from b33ac39 to 9ea7434 Sep 30, 2017

@ericho ericho force-pushed the ericho:threadpool branch from 9ea7434 to cdbb634 Sep 30, 2017

@budziq budziq merged commit 859953c into rust-lang-nursery:master Oct 2, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@budziq

This comment has been minimized.

Copy link
Collaborator

budziq commented Oct 2, 2017

Thanks @ericho great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.