Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid blocking on CloseHandle #1850

Merged
merged 2 commits into from
May 18, 2019
Merged

Conversation

rbtcollins
Copy link
Contributor

@rbtcollins rbtcollins commented May 12, 2019

On Windows closing a file involves CloseHandle, which can be quite
slow (6+ms) for various reasons, including circumstances outside our
control such as virus scanners, content indexing, device driver cache
write-back synchronisation and so forth. Rather than having a super
long list of all the things users need to do to optimise the performance
of CloseHandle, use a small threadpool to avoid blocking package
extraction when closing file handles.

This does run a risk of resource exhaustion, but as we only have 20k
files in the largest package today that should not be a problem in
practice.

https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-January/078404.html
provided inspiration for this.

My benchmark system went from 21/22s to 11s with this change with both
4 or 8 threads (for rust-docs installation).

My surface running rustup release is 44s for rust-docs, and 22s with this branch.

@rbtcollins
Copy link
Contributor Author

This is a draft at this point - depends on tar accepting a change I have in preparations

rbtcollins added a commit to rbtcollins/tar-rs that referenced this pull request May 12, 2019
This permits optimisation of unpacking on Windows - which is perhaps
overly special cased to include in tar itself? Doing this for rustup
which already doesn't use the Archive wrapper :/.

With this patch, I reduced rustup's rust-docs install from 21/22s to
11s on my bench machine - details at
rust-lang/rustup#1850

I haven't filled in all the possibilities for the Unpacked enum, but
happy to do so if you'd like that done - it seemed like it wouldn't
be useful at this stage as there are no live objects to return.
@kinnison kinnison changed the title Avoid blocking on CloseHandle WIP: Avoid blocking on CloseHandle May 12, 2019
@rbtcollins
Copy link
Contributor Author

This may be enough to permit closing #1540

@kinnison
Copy link
Contributor

For the purpose of seeing the CI complete, could you update your Cargo.toml to use a git repo instead of a path for the patch to tar-rs ?

@lnicola
Copy link
Member

lnicola commented May 13, 2019

(writing here to spam fewer people than in #1540)

@rbtcollins your PRs are awesome, thanks!

Do you think it's worth implementing the streaming unpack I mentioned in #1540 (comment)?

@rbtcollins
Copy link
Contributor Author

(writing here to spam fewer people than in #1540)

@rbtcollins your PRs are awesome, thanks!

Thank you :).

Do you think it's worth implementing the streaming unpack I mentioned in #1540 (comment)?

I think its worth doing the experiment. It may be a bit awkward as the downloaded is in futures and tar-rs is sync code, but perhaps just slamming them together as a learning test would answer the question about impact, and we can make it nice if it has a big benefit.

From what you've said though, you have a slow package server // slow link to the package server, and you want to do the unpacking while the download progresses, because the download is so slow that you can mask all the local IO overheads during the download period.

I think that will definitely work for you; I don't think it will be a negative for anyone, though it will have an impact on folk that depend on the partial download recovery/retry mechanism today unless care is taken to preserve that.

We don't have any stats on how many folk have download times that are roughly the same or greater than the unpacking time (and thus would benefit as you do with making the unpacking concurrent).

@lnicola
Copy link
Member

lnicola commented May 13, 2019

From what you've said though, you have a slow package server // slow link to the package server, and you want to do the unpacking while the download progresses, because the download is so slow that you can mask all the local IO overheads during the download period.

Even if they're the same speed, downloading and extracting files in parallel could almost halve the installation time.

It may be a bit awkward as the downloaded is in futures and tar-rs is sync code

That impedance mismatch can be solved by using the https://docs.rs/futures/0.1.27/futures/stream/struct.Wait.html, which adapts a Stream into an Iterator. But you're right, retries might throw a wrench in this (though I'm not sure how they work in rustup). EDIT: unfortunately, an Iterator is not Read, so the archive would apparently need to be buffered somewhere. Damn. To some sort of VecDeque, perhaps?

@rbtcollins
Copy link
Contributor Author

From what you've said though, you have a slow package server // slow link to the package server, and you want to do the unpacking while the download progresses, because the download is so slow that you can mask all the local IO overheads during the download period.

Even if they're the same speed, downloading and extracting files in parallel could almost halve the installation time.

Maybe! Lets find out.

It may be a bit awkward as the downloaded is in futures and tar-rs is sync code

That impedance mismatch can be solved by using the https://docs.rs/futures/0.1.27/futures/stream/struct.Wait.html, which adapts a Stream into an Iterator. But you're right, retries might throw a wrench in this (though I'm not sure how they work in rustup). EDIT: unfortunately, an Iterator is not Read, so the archive would apparently need to be buffered somewhere. Damn. To some sort of VecDeque, perhaps?

I don't think the in-process retry patch has merged yet; so you can ignore that. However partial downloads are saved to a .partial file, and that can be resumed if the download fails by running rustup again. I think that is a useful capability, so a mergable version of what you want to achieve would want to still stream the archive to disk.

In terms of joining the two bits together; a few possilbities:

  • use a socketpair/pipe - write received bytes to both disk and the socket pair, have the extraction code read from the socketpair/pipe
  • implement Read on a struct which can read from the file thats been written to disk, knows the full and received length from the download tracker metadata, and blocks as needed to permit more data to arrive. Would need a sync mechanism for error propogation/notification, but not very hard.

Back on the design though, one more consideration is validation - right now we make some assumptions about the safety of the archive we unpack based on validating it's hash - and in future GPG signature; mirrors of the packages may make trusting partly downloaded content more complex.

rbtcollins added a commit to rbtcollins/tar-rs that referenced this pull request May 13, 2019
This permits optimisation of unpacking on Windows - which is perhaps
overly special cased to include in tar itself? Doing this for rustup
which already doesn't use the Archive wrapper :/.

With this patch, I reduced rustup's rust-docs install from 21/22s to
11s on my bench machine - details at
rust-lang/rustup#1850

I haven't filled in all the possibilities for the Unpacked enum, but
happy to do so if you'd like that done - it seemed like it wouldn't
be useful at this stage as there are no live objects to return.
rbtcollins added a commit to rbtcollins/tar-rs that referenced this pull request May 16, 2019
This permits optimisation of unpacking on Windows - which is perhaps
overly special cased to include in tar itself? Doing this for rustup
which already doesn't use the Archive wrapper :/.

With this patch, I reduced rustup's rust-docs install from 21/22s to
11s on my bench machine - details at
rust-lang/rustup#1850

I haven't filled in all the possibilities for the Unpacked enum, but
happy to do so if you'd like that done - it seemed like it wouldn't
be useful at this stage as there are no live objects to return.
rbtcollins added a commit to rbtcollins/tar-rs that referenced this pull request May 16, 2019
This permits optimisation of unpacking on Windows - which is perhaps
overly special cased to include in tar itself? Doing this for rustup
which already doesn't use the Archive wrapper :/.

With this patch, I reduced rustup's rust-docs install from 21/22s to
11s on my bench machine - details at
rust-lang/rustup#1850

I haven't filled in all the possibilities for the Unpacked enum, but
happy to do so if you'd like that done - it seemed like it wouldn't
be useful at this stage as there are no live objects to return.
rbtcollins added a commit to rbtcollins/tar-rs that referenced this pull request May 17, 2019
This permits optimisation of unpacking on Windows - which is perhaps
overly special cased to include in tar itself? Doing this for rustup
which already doesn't use the Archive wrapper :/.

With this patch, I reduced rustup's rust-docs install from 21/22s to
11s on my bench machine - details at
rust-lang/rustup#1850

I haven't filled in all the possibilities for the Unpacked enum, but
happy to do so if you'd like that done - it seemed like it wouldn't
be useful at this stage as there are no live objects to return.
@rbtcollins rbtcollins force-pushed the new-tar-rs branch 2 times, most recently from eecc128 to 8accbf1 Compare May 18, 2019 01:00
@rbtcollins
Copy link
Contributor Author

I'm working on sane reporting of the time after the tar is fully read
just a teaser to show that it works :P

info: downloading component 'rustc'
 60.5 MiB /  60.5 MiB (100 %)  42.0 MiB/s in  1s ETA:  0s
info: downloading component 'rust-std'
 55.3 MiB /  55.3 MiB (100 %)  13.4 MiB/s in  5s ETA:  0s
info: downloading component 'cargo'
  3.0 MiB /   3.0 MiB (100 %)   1.8 MiB/s in  1s ETA:  0s
info: downloading component 'rust-docs'
info: installing component 'rustc'
 60.5 MiB /  60.5 MiB (100 %)  13.9 MiB/s in  4s ETA:  0s
pending close of 7 files
info: installing component 'rust-std'
 55.3 MiB /  55.3 MiB (100 %)  14.4 MiB/s in  3s ETA:  0s
pending close of 3 files
info: installing component 'cargo'
pending close of 7 files
info: installing component 'rust-docs'
 11.1 MiB /  11.1 MiB (100 %) 718.4 KiB/s in 11s ETA:  0s
pending close of 15840 files

  nightly-x86_64-pc-windows-msvc installed - rustc 1.36.0-nightly (73a3a90d2 2019-05-17)

info: checking for self-updates

@rbtcollins
Copy link
Contributor Author

Nearly there:

info: installing component 'rust-docs'
 11.1 MiB /  11.1 MiB (100 %) 540.8 KiB/s in 12s ETA:  0s
Closing 16009 deferred file handles
 15.6 KiB /  15.6 KiB (100 %) 373 B/s in 40s ETA:  0s

(thats with 4 threads, defender on).

4 threads, defender off gets this output:

info: installing component 'rust-docs'
 11.1 MiB /  11.1 MiB (100 %) 436.8 KiB/s in 12s ETA:  0s

64 threads, defender off:

info: installing component 'rust-docs'
 11.1 MiB /  11.1 MiB (100 %) 387.2 KiB/s in 12s ETA:  0s

64 threads, defender on:

info: installing component 'rust-docs'
 11.1 MiB /  11.1 MiB (100 %) 288.0 KiB/s in 17s ETA:  0s

-> so running with threads==core-count is much better.

Files are in bytes, but other things we may track are not.
On Windows closing a file involves CloseHandle, which can be quite
slow (6+ms) for various reasons, including circumstances outside our
control such as virus scanners, content indexing, device driver cache
write-back synchronisation and so forth. Rather than having a super
long list of all the things users need to do to optimise the performance
of CloseHandle, use a small threadpool to avoid blocking package
extraction when closing file handles.

This does run a risk of resource exhaustion, but as we only have 20k
files in the largest package today that should not be a problem in
practice.

https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-January/078404.html
provided inspiration for this.

My benchmark system went from 21/22s to 11s with this change with both
4 or 8 threads.
@rbtcollins
Copy link
Contributor Author

rbtcollins commented May 18, 2019

Final version, looks like this on a constrained machine (e.g. a surface):

Closing 16017 deferred file handles
 15.6 Kihandles /  15.6 Kihandles (100 %) 354 handles/s in 41s ETA:  0s

@rbtcollins rbtcollins changed the title WIP: Avoid blocking on CloseHandle Avoid blocking on CloseHandle May 18, 2019
Copy link
Contributor

@kinnison kinnison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of typos, but honestly I don't want to hold things up for those and we can fix them later if we re-touch the code. This looks excellent. Thank you so much for working through all of this with Alex.

@kinnison kinnison merged commit 5580890 into rust-lang:master May 18, 2019
@rbtcollins rbtcollins deleted the new-tar-rs branch May 18, 2019 18:49
@rbtcollins
Copy link
Contributor Author

Thank you; please let me know on discord any followup typo fixes etc you'd like me to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants