Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I/O: Read::bytes() is very slow, even on a BufReader #69

Closed
Shnatsel opened this issue Oct 7, 2023 · 6 comments
Closed

I/O: Read::bytes() is very slow, even on a BufReader #69

Shnatsel opened this issue Oct 7, 2023 · 6 comments

Comments

@Shnatsel
Copy link
Contributor

Shnatsel commented Oct 7, 2023

The performance of Read::bytes() iterator is atrocious, even over a BufReader.

Here's a sample program calculating a sum of all bytes in the file with various approaches:

In this test Read::bytes() 10x slower than reading the whole file into memory up front and 20x slower than manually reading into a smallish buffer over and over. The performance gap is so large that even reading line-by-line with UTF-8 validation is still multiple times faster!

Benchmarked using hyperfine on this public domain book in plain text format repeated 50 times.

@nnethercote
Copy link
Owner

I found there's a bug in your third program, which causes it to give different answers to the first two programs for any file with more than 64 KiB chars. The problem is that on the final iteration chunk is only partially filled with new data, but the program sums all the elements within it, which includes some old ones from the second-last iteration.

Here's a version that uses take to fix the problem, and also avoids having two test_file.read() calls. It seems to give similar performance to the original version.

use std::io::Read;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let test_file_path = std::env::args_os().nth(1).expect("No input file specified!");
    
    let mut test_file = std::fs::File::open(test_file_path)?;

    // Instead of reading the whole file in memory, read it in small-ish chunks that fit into CPU
    // cache. This also dramatically reduces memory usage because we never keep more than 64 KiB in
    // memory.
    let mut chunk = vec![0; 65536];
    let mut sum: u64 = 0;
    loop {
        let bytes_read = test_file.read(&mut chunk)?;
        if bytes_read == 0 {
            break;
        }
        sum += chunk.iter().take(bytes_read).map(|i| *i as u64).sum::<u64>();
    }
    
    println!("{sum}"); 
    Ok(()) 
}

@Shnatsel
Copy link
Contributor Author

Good catch! Thanks!

There may be some bugs like these lurking still, I didn't do any in-depth testing yet. The fully manual line reader is definitely buggy, but probably not in ways that matter for performance.

I've adopted your version of the manual read loop, only changing chunk.iter().take(bytes_read) to chunk[..bytes_read].iter() so that it definitely doesn't introduce an extra branch for every element. It would be a pretty bad optimization gaffe if it ever did in this example, but for long iterator chains - who knows!

@nnethercote
Copy link
Owner

I filed rust-lang/rust#116651 which speeds up Read::bytes quite a bit, though it's still a lot slower than the alternatives. And I added some explanation of this to its docs.

@Shnatsel
Copy link
Contributor Author

Oh, that's amazing! I didn't expect my observations to lead to actual speedups!

I cannot wrap my head about why that transformation makes the code faster. How does reading into a struct member instead of a local variable speed things up?

@nnethercote
Copy link
Owner

nnethercote commented Oct 12, 2023

I cannot wrap my head about why that transformation makes the code faster. How does reading into a struct member instead of a local variable speed things up?

As per the commit log, there's no need to initialize it to zero on every iteration. But that was only a small improvement, inlining was most of the initial improvement. And I just added another improvement using read_exact. Now the Bytes::read version is only 2x and 4x slower than the other versions on my machine, down from 5x and 10x.

@nnethercote
Copy link
Owner

Now that rust-lang/rust#116775 and rust-lang/rust#116785 have landed, Read::bytes is competitive with alternative approaches, so I don't think we need to document anything in the perf book. Thanks for the suggestion, that was a productive exercise :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants