Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: read_to_end suffers poor performance when reading huge object #2658

Closed
YuJuncen opened this issue Oct 25, 2022 · 2 comments
Closed

Comments

@YuJuncen
Copy link

YuJuncen commented Oct 25, 2022

background

Hi,
I'm trying to read a blob file from my disk, like:

#[tokio::main]
async fn main() -> std::io::Result<()> {
    let mut f = File::open("128M.blob").await?.compat();
    let mut data = Vec::with_capacity(128 * 1024 * 1024);
    AsyncReadExt::read_to_end(&mut f, &mut data).await?;
}

In my machine, this code snippet just executed about 2 mins before it returning.

> time ./read_huge_file

________________________________________________________
Executed in  130.45 secs    fish           external
   usr time  129.77 secs  670.00 micros  129.77 secs
   sys time    1.05 secs  539.00 micros    1.05 secs

investigation

I think this is a little strange, with CPU profiling, I have found that the most of CPU time are used to filling the buffer with 0.

image

I have read the code of read_to_end_internal, it seems that this function would try to fill the unused part of the vector with zero.

let mut g = Guard { len: buf.len(), buf };
loop {
if g.len == g.buf.len() {
unsafe {
g.buf.reserve(32);
let capacity = g.buf.capacity();
g.buf.set_len(capacity);
super::initialize(&rd, &mut g.buf[g.len..]);
}
}

I guess the problem here should be: once the inner reader didn't fully fill the remaining buffer and enter Pending state then, perhaps we must need to fill the remained capacity again in the next poll.

With the perf trace command, we can see that the file implementation of tokio probably reads a 16KB chunk in each read syscall, and would fire a new system call at next poll, hence entering the Pending state, (I guess) it would be like:

File::poll() [emitting result from last task] (returns) Ready(16384)
File::poll() [spawn the read task] (returns) Pending
File::poll() [emitting result from last task] (returns) Ready(16384)
File::poll() [spawn the read task] (returns) Pending
...

With the buffer being slowly full-filled, each time we re-polling the ReadToEnd future, we need to fill the remaining part of buffer, perhaps like:

ReadToEnd::poll() (calls) mem::fill(buf[0..128M]) (returns) Pending
ReadToEnd::poll() [runtime re-polls the future] (calls) mem::fill(buf[16384..128M]) (returns) Pending
ReadToEnd::poll() [runtime re-polls the future] (calls) mem::fill(buf[32768..128M]) (returns) Pending
...
ReadToEnd::poll() [runtime re-polls the future] (calls) mem::fill(buf[128M-16384..128M]) (returns) Ready (128M)

As a result, we have filled O(m * n) bytes, where n is the capacity, m is how many times the inner future returns Pending. Which would probably result in low performance when reading huge files.

my questions

I'm wondering what is the purpose of initializing the remaining?

I guess if the purpose is to avoid to access uninitialized memory, perhaps we can record an initialized index in the ReadToEnd future, and initialize memory from it (instead of from buf.len()) can help to relive this problem?

Any suggestions about that?

@YuJuncen
Copy link
Author

YuJuncen commented Oct 31, 2022

Well, I have noticed about the comment above that function... So I guess this would be a known issue(or just by design)? If we are not going to do something over that, I'm going to close this.

@YuJuncen
Copy link
Author

I have made a simple workaround, for people who meet this problem, this perhaps works:

async fn read_to_end<R: AsyncRead>(r: R, v: &mut Vec<u8>) -> std::io::Result<u64> {
    let mut c = Cursor::new(v);
    async_io::copy(r, &mut c).await
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant