Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upUse a threadpool to calculate SHA1 in all *.iso files in a folder. #274
Conversation
budziq
requested changes
Aug 31, 2017
|
Very nice :) although I would suggest some changes below: |
| @@ -1,11 +1,12 @@ | |||
| # Concurrency | |||
|
|
|||
| | Recipe | Crates | Categories | | |||
This comment has been minimized.
This comment has been minimized.
budziq
Aug 31, 2017
Collaborator
no need to rewrite the table here. please update it in the same manner as in the case of intro.md
This comment has been minimized.
This comment has been minimized.
ericho
Sep 9, 2017
Author
Contributor
It is fixed.. I mess with something but I'm not sure what was
|
|
||
| ```rust,no_run | ||
| extern crate walkdir; | ||
| extern crate md5; |
This comment has been minimized.
This comment has been minimized.
budziq
Aug 31, 2017
Collaborator
CI fails due to usage of new crate without adding it to cargo.toml
also I would prefer not to add new crate at this point. I would suggest using already present ring crate (it does not support md5 but you will have support for sha1 and sha256 algorithms )
This comment has been minimized.
This comment has been minimized.
| .filter(|e| !e.file_type().is_dir()){ | ||
| let entry = entry.clone(); | ||
| pool.execute(move || { | ||
| let file = File::open(entry.path()).unwrap(); |
This comment has been minimized.
This comment has been minimized.
budziq
Aug 31, 2017
Collaborator
no unwraps please ;) please check https://rust-lang-nursery.github.io/rust-cookbook/about.html#a-note-about-error-handling
| let mut content: Vec<u8> = Vec::new(); | ||
| buf_reader.read_to_end(&mut content); | ||
| let digest = md5::compute(content); | ||
| println!("{} {:x}", entry.path().display(), digest); |
This comment has been minimized.
This comment has been minimized.
budziq
Aug 31, 2017
Collaborator
instead of printing these out here I would suggest using mpsc to pass the computation results to the main thread (please see the fractal threadpool example)
| let cpus = num_cpus::get(); | ||
| let pool = ThreadPool::new(cpus); | ||
| for entry in WalkDir::new(".") |
This comment has been minimized.
This comment has been minimized.
budziq
Aug 31, 2017
Collaborator
lets go wild and compute the sha1 of all *.iso files in given directory or all files listed in
or the ones listed in
https://cloud-images.ubuntu.com/trusty/current/SHA1SUMS
https://cloud-images.ubuntu.com/trusty/current/
In this way we could have an actual useful example. something along these lines - "Verify in parallel downloaded files consistency with SHA1 algorithm"
budziq
added
claimed
easy
example
labels
Aug 31, 2017
This comment has been minimized.
This comment has been minimized.
|
Thank you @budziq for the feedback, I'll work on the fixes |
ericho
force-pushed the
ericho:threadpool
branch
from
985b189
to
b0cd956
Sep 9, 2017
ericho
changed the title
Use a threadpool to calculate md5 in all files in a folder.
Use a threadpool to calculate SHA1 in all *.iso files in a folder.
Sep 9, 2017
ericho
force-pushed the
ericho:threadpool
branch
from
b0cd956
to
b6c2d26
Sep 9, 2017
budziq
requested changes
Sep 11, 2017
| @@ -6,6 +6,7 @@ | |||
| | [Sort a vector in parallel][ex-rayon-parallel-sort] | [![rayon-badge]][rayon] [![rand-badge]][rand] | [![cat-concurrency-badge]][cat-concurrency] | | |||
| | [Spawn a short-lived thread][ex-crossbeam-spawn] | [![crossbeam-badge]][crossbeam] | [![cat-concurrency-badge]][cat-concurrency] | | |||
| | [Draw fractal dispatching work to a thread pool][ex-threadpool-fractal] | [![threadpool-badge]][threadpool] [![num-badge]][num] [![num_cpus-badge]][num_cpus] [![image-badge]][image] | [![cat-concurrency-badge]][cat-concurrency][![cat-science-badge]][cat-science][![cat-rendering-badge]][cat-rendering] | | |||
| | [![threadpool-badge]][threadpool] [![walkdir-badge]][walkdir] [![num_cpus-badge]][num_cpus] | [Use a threadpool to calculate SHA1 to all files in a folder][threadpool-walk] | [![cat-concurrency-badge]][cat-concurrency][![cat-filesystem-badge]][cat-filesystem] | | |||
This comment has been minimized.
This comment has been minimized.
budziq
Sep 11, 2017
•
Collaborator
The title goes into first column and badges into the second. Please see the rendered version for comparison ;)
Also I would probably change the title to something along the lines of "Calculate SHA1 sum of *.iso files in parallel"
| @@ -63,6 +63,8 @@ community. It needs and welcomes help. For details see | |||
| | [Sort a vector in parallel][ex-rayon-parallel-sort] | [![rayon-badge]][rayon] [![rand-badge]][rand] | [![cat-concurrency-badge]][cat-concurrency] | | |||
| | [Spawn a short-lived thread][ex-crossbeam-spawn] | [![crossbeam-badge]][crossbeam] | [![cat-concurrency-badge]][cat-concurrency] | | |||
| | [Draw fractal dispatching work to a thread pool][ex-threadpool-fractal] | [![threadpool-badge]][threadpool] [![num-badge]][num] [![num_cpus-badge]][num_cpus] [![image-badge]][image] | [![cat-concurrency-badge]][cat-concurrency][![cat-science-badge]][cat-science][![cat-rendering-badge]][cat-rendering] | | |||
| | [![threadpool-badge]][threadpool] [![walkdir-badge]][walkdir] [![num_cpus-badge]][num_cpus] | [Use a threadpool to calculate SHA1 to all files in a folder][threadpool-walk] | [![cat-concurrency-badge]][cat-concurrency][![cat-filesystem-badge]][cat-filesystem] | | |||
This comment has been minimized.
This comment has been minimized.
budziq
Sep 11, 2017
•
Collaborator
The title goes into first column and badges into the second.
Also the title should be hyperlinked.
| # } | ||
|
|
||
| fn run() -> Result<()> { | ||
| let cpus = num_cpus::get(); |
This comment has been minimized.
This comment has been minimized.
budziq
Sep 11, 2017
Collaborator
no need for stand alone variable
ThreadPool::new(num_cpus:get()) is short ans sweet
| fn run() -> Result<()> { | ||
| let cpus = num_cpus::get(); | ||
| let pool = ThreadPool::new(cpus); | ||
| let mut file_counter: usize = 0; |
This comment has been minimized.
This comment has been minimized.
budziq
Sep 11, 2017
Collaborator
no need for the file_counter if you do drop(tx) after the for loop the rx.iter() will not hang.
| return; | ||
| }, | ||
| }; | ||
| let mut buf_reader = BufReader::new(file); |
This comment has been minimized.
This comment has been minimized.
budziq
Sep 11, 2017
Collaborator
Reading the whole file into memory is not really the best way to go. Ring provides a nice incremental api.
for its usage please see the
calculate the sha-256 digest of a file example
Moreover it would be much nicer to extract the file reading and sha1 calculation into a separate function taking a AsRef<Path> and returning a Result. Then you would have only one tx.send() without the verbose matches all over the place.
| use walkdir::{WalkDir, DirEntry}; | ||
| use std::fs::File; | ||
| use std::io::BufReader; | ||
| use std::io::prelude::*; |
This comment has been minimized.
This comment has been minimized.
budziq
Sep 11, 2017
Collaborator
no need to bring the whole prelude here. please import only the required traits.
| .follow_links(true) | ||
| .into_iter() | ||
| .filter_map(|e| e.ok()) | ||
| .filter(|e| !e.file_type().is_dir() && is_iso(e)) { |
This comment has been minimized.
This comment has been minimized.
budziq
Sep 11, 2017
•
Collaborator
- I would operate on
PathorPathBufinstead ofDirEntryhere It is lighter, more idiomatic and has all the required interfaces to boot :).
|
|
||
| # // Verify the iso extension | ||
| # fn is_iso(entry: &DirEntry) -> bool { | ||
| # entry.file_name() |
This comment has been minimized.
This comment has been minimized.
budziq
Sep 11, 2017
Collaborator
If you use Path or PathBuf (or better AsRef<Path>) instead of DirEntry you'll be able to
use use Path::extension() and compare directly to "iso"
| .filter(|e| !e.file_type().is_dir() && is_iso(e)) { | ||
| // Need to know how many entry were sent to the pool. | ||
| file_counter = file_counter + 1; | ||
| let entry = entry.clone(); |
This comment has been minimized.
This comment has been minimized.
budziq
Sep 11, 2017
Collaborator
this will not be needed if you use PathBuf instead as the owned value will just get moved into the closure
budziq
reviewed
Sep 11, 2017
| let (tx, rx) = channel(); | ||
|
|
||
| // Look in the current directory. | ||
| for entry in WalkDir::new(".") |
This comment has been minimized.
This comment has been minimized.
budziq
Sep 11, 2017
Collaborator
I would parametrize the call with a variable event if its value is hardcoded earlier to something like "/home/user/downloads".
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
ericho
Sep 24, 2017
Author
Contributor
Hi, sorry, I was busy these last days. I think I solved all the comments.
BTW, thanks for the review, I learned a lot doing this example
ericho
force-pushed the
ericho:threadpool
branch
3 times, most recently
from
b6dc32c
to
f837745
Sep 24, 2017
budziq
requested changes
Sep 24, 2017
| use std::path::Path; | ||
| use threadpool::ThreadPool; | ||
| use std::sync::mpsc::channel; | ||
| use ring::digest::{Context, Digest, SHA256}; |
This comment has been minimized.
This comment has been minimized.
| use walkdir::WalkDir; | ||
| use std::fs::File; | ||
| use std::io::BufReader; | ||
| use std::io::prelude::Read; |
This comment has been minimized.
This comment has been minimized.
budziq
Sep 24, 2017
Collaborator
nearly everything in std::io::prelude is reexported to std::io
so we can write it as
use std::io::{BufReader, Read}
| # fn is_iso(entry: &Path) -> bool { | ||
| # match entry.extension() { | ||
| # Some(e) if e == "iso"=> true, | ||
| # Some(_) => false, |
This comment has been minimized.
This comment has been minimized.
budziq
Sep 24, 2017
Collaborator
this match can be shortened even further into just two branches
match entry.extension() {
Some(e) if e.to_string_lossy().to_lowercase() == "iso" => true,
_ => false,
}|
|
||
| ## Calculate SHA1 sum of *.iso files concurrently | ||
| [![threadpool-badge]][threadpool] [![num_cpus-badge]][num_cpus] [![walkdir-badge]][walkdir] [![cat-concurrency-badge]][cat-concurrency][![cat-filesystem-badge]][cat-filesystem] |
This comment has been minimized.
This comment has been minimized.
| # } | ||
| # } | ||
| fn compute_digest<P: AsRef<Path>>(filepath: P) -> Result<(Digest, String)> { |
This comment has been minimized.
This comment has been minimized.
budziq
Sep 24, 2017
Collaborator
if you return Result<(Digest, P)> you can completely drop the filename variable and just return Ok((context.finish(), filepath))
| fn compute_digest<P: AsRef<Path>>(filepath: P) -> Result<(Digest, String)> { | ||
| let filename = String::from(filepath.as_ref().to_string_lossy()); | ||
| let file = File::open(filepath)?; | ||
| let mut buf_reader = BufReader::new(file); |
This comment has been minimized.
This comment has been minimized.
budziq
Sep 24, 2017
Collaborator
we don't need a separate variable for file. we can write it as BufReader::new(File::open(filepath)?); (the file will be moved into the BufReaderanyway)
| .into_iter() | ||
| .filter_map(|e| e.ok()) | ||
| .filter(|e| !e.path().is_dir() && is_iso(e.path())) { | ||
| let path = entry.path().to_path_buf(); |
This comment has been minimized.
This comment has been minimized.
| [![threadpool-badge]][threadpool] [![num_cpus-badge]][num_cpus] [![walkdir-badge]][walkdir] [![cat-concurrency-badge]][cat-concurrency][![cat-filesystem-badge]][cat-filesystem] | ||
| This example uses `threadpool`, `walkdir` and `ring` crates to calculate the SHA1 for every file present in the current directory. A threadpool `pool` is created using the number of cpus present in the system. Then every `entry` returned by `walkdir` is passed into this pool to perform the operations of reading and compute SHA1. At the end the program waits for all jobs to finish. To get better results, compile this program in release mode. |
This comment has been minimized.
This comment has been minimized.
budziq
Sep 24, 2017
Collaborator
- It would replace
"This example uses
threadpool,walkdirandringcrates to calculate ..."
with "Calculates ...". The reader already knows which crates are used form the heading.
- Actually this example recursively calculates SHA256 of every iso file present in "/home/user/Downloads"
So please update the code or the description ;)
-
typo - "reading and compute" => "reading and computing"
-
There are several identifiers mentioned
pool,entryetc. Please link these to docs
ericho
force-pushed the
ericho:threadpool
branch
3 times, most recently
from
b33ac39
to
9ea7434
Sep 30, 2017
ericho
force-pushed the
ericho:threadpool
branch
from
9ea7434
to
cdbb634
Sep 30, 2017
budziq
merged commit 859953c
into
rust-lang-nursery:master
Oct 2, 2017
This comment has been minimized.
This comment has been minimized.
|
Thanks @ericho great work! |
ericho commentedAug 31, 2017
No description provided.