Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid blocking on CloseHandle #1850

Merged
merged 2 commits into from May 18, 2019

Conversation

Projects
None yet
3 participants
@rbtcollins
Copy link
Collaborator

commented May 12, 2019

On Windows closing a file involves CloseHandle, which can be quite
slow (6+ms) for various reasons, including circumstances outside our
control such as virus scanners, content indexing, device driver cache
write-back synchronisation and so forth. Rather than having a super
long list of all the things users need to do to optimise the performance
of CloseHandle, use a small threadpool to avoid blocking package
extraction when closing file handles.

This does run a risk of resource exhaustion, but as we only have 20k
files in the largest package today that should not be a problem in
practice.

https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-January/078404.html
provided inspiration for this.

My benchmark system went from 21/22s to 11s with this change with both
4 or 8 threads (for rust-docs installation).

My surface running rustup release is 44s for rust-docs, and 22s with this branch.

@rbtcollins

This comment has been minimized.

Copy link
Collaborator Author

commented May 12, 2019

This is a draft at this point - depends on tar accepting a change I have in preparations

@rbtcollins rbtcollins force-pushed the rbtcollins:new-tar-rs branch from 0141020 to e12ced6 May 12, 2019

rbtcollins added a commit to rbtcollins/tar-rs that referenced this pull request May 12, 2019

Return the File instance from Entry::unpack
This permits optimisation of unpacking on Windows - which is perhaps
overly special cased to include in tar itself? Doing this for rustup
which already doesn't use the Archive wrapper :/.

With this patch, I reduced rustup's rust-docs install from 21/22s to
11s on my bench machine - details at
rust-lang/rustup.rs#1850

I haven't filled in all the possibilities for the Unpacked enum, but
happy to do so if you'd like that done - it seemed like it wouldn't
be useful at this stage as there are no live objects to return.

@kinnison kinnison changed the title Avoid blocking on CloseHandle WIP: Avoid blocking on CloseHandle May 12, 2019

@rbtcollins

This comment has been minimized.

Copy link
Collaborator Author

commented May 12, 2019

This may be enough to permit closing #1540

@kinnison

This comment has been minimized.

Copy link
Collaborator

commented May 13, 2019

For the purpose of seeing the CI complete, could you update your Cargo.toml to use a git repo instead of a path for the patch to tar-rs ?

@lnicola

This comment has been minimized.

Copy link

commented May 13, 2019

(writing here to spam fewer people than in #1540)

@rbtcollins your PRs are awesome, thanks!

Do you think it's worth implementing the streaming unpack I mentioned in #1540 (comment)?

@rbtcollins

This comment has been minimized.

Copy link
Collaborator Author

commented May 13, 2019

(writing here to spam fewer people than in #1540)

@rbtcollins your PRs are awesome, thanks!

Thank you :).

Do you think it's worth implementing the streaming unpack I mentioned in #1540 (comment)?

I think its worth doing the experiment. It may be a bit awkward as the downloaded is in futures and tar-rs is sync code, but perhaps just slamming them together as a learning test would answer the question about impact, and we can make it nice if it has a big benefit.

From what you've said though, you have a slow package server // slow link to the package server, and you want to do the unpacking while the download progresses, because the download is so slow that you can mask all the local IO overheads during the download period.

I think that will definitely work for you; I don't think it will be a negative for anyone, though it will have an impact on folk that depend on the partial download recovery/retry mechanism today unless care is taken to preserve that.

We don't have any stats on how many folk have download times that are roughly the same or greater than the unpacking time (and thus would benefit as you do with making the unpacking concurrent).

@lnicola

This comment has been minimized.

Copy link

commented May 13, 2019

From what you've said though, you have a slow package server // slow link to the package server, and you want to do the unpacking while the download progresses, because the download is so slow that you can mask all the local IO overheads during the download period.

Even if they're the same speed, downloading and extracting files in parallel could almost halve the installation time.

It may be a bit awkward as the downloaded is in futures and tar-rs is sync code

That impedance mismatch can be solved by using the https://docs.rs/futures/0.1.27/futures/stream/struct.Wait.html, which adapts a Stream into an Iterator. But you're right, retries might throw a wrench in this (though I'm not sure how they work in rustup). EDIT: unfortunately, an Iterator is not Read, so the archive would apparently need to be buffered somewhere. Damn. To some sort of VecDeque, perhaps?

@rbtcollins

This comment has been minimized.

Copy link
Collaborator Author

commented May 13, 2019

From what you've said though, you have a slow package server // slow link to the package server, and you want to do the unpacking while the download progresses, because the download is so slow that you can mask all the local IO overheads during the download period.

Even if they're the same speed, downloading and extracting files in parallel could almost halve the installation time.

Maybe! Lets find out.

It may be a bit awkward as the downloaded is in futures and tar-rs is sync code

That impedance mismatch can be solved by using the https://docs.rs/futures/0.1.27/futures/stream/struct.Wait.html, which adapts a Stream into an Iterator. But you're right, retries might throw a wrench in this (though I'm not sure how they work in rustup). EDIT: unfortunately, an Iterator is not Read, so the archive would apparently need to be buffered somewhere. Damn. To some sort of VecDeque, perhaps?

I don't think the in-process retry patch has merged yet; so you can ignore that. However partial downloads are saved to a .partial file, and that can be resumed if the download fails by running rustup again. I think that is a useful capability, so a mergable version of what you want to achieve would want to still stream the archive to disk.

In terms of joining the two bits together; a few possilbities:

  • use a socketpair/pipe - write received bytes to both disk and the socket pair, have the extraction code read from the socketpair/pipe
  • implement Read on a struct which can read from the file thats been written to disk, knows the full and received length from the download tracker metadata, and blocks as needed to permit more data to arrive. Would need a sync mechanism for error propogation/notification, but not very hard.

Back on the design though, one more consideration is validation - right now we make some assumptions about the safety of the archive we unpack based on validating it's hash - and in future GPG signature; mirrors of the packages may make trusting partly downloaded content more complex.

rbtcollins added a commit to rbtcollins/tar-rs that referenced this pull request May 13, 2019

Return the File instance from Entry::unpack
This permits optimisation of unpacking on Windows - which is perhaps
overly special cased to include in tar itself? Doing this for rustup
which already doesn't use the Archive wrapper :/.

With this patch, I reduced rustup's rust-docs install from 21/22s to
11s on my bench machine - details at
rust-lang/rustup.rs#1850

I haven't filled in all the possibilities for the Unpacked enum, but
happy to do so if you'd like that done - it seemed like it wouldn't
be useful at this stage as there are no live objects to return.

rbtcollins added a commit to rbtcollins/tar-rs that referenced this pull request May 16, 2019

Return the File instance from Entry::unpack
This permits optimisation of unpacking on Windows - which is perhaps
overly special cased to include in tar itself? Doing this for rustup
which already doesn't use the Archive wrapper :/.

With this patch, I reduced rustup's rust-docs install from 21/22s to
11s on my bench machine - details at
rust-lang/rustup.rs#1850

I haven't filled in all the possibilities for the Unpacked enum, but
happy to do so if you'd like that done - it seemed like it wouldn't
be useful at this stage as there are no live objects to return.

rbtcollins added a commit to rbtcollins/tar-rs that referenced this pull request May 16, 2019

Return the File instance from Entry::unpack
This permits optimisation of unpacking on Windows - which is perhaps
overly special cased to include in tar itself? Doing this for rustup
which already doesn't use the Archive wrapper :/.

With this patch, I reduced rustup's rust-docs install from 21/22s to
11s on my bench machine - details at
rust-lang/rustup.rs#1850

I haven't filled in all the possibilities for the Unpacked enum, but
happy to do so if you'd like that done - it seemed like it wouldn't
be useful at this stage as there are no live objects to return.

rbtcollins added a commit to rbtcollins/tar-rs that referenced this pull request May 17, 2019

Return the File instance from Entry::unpack
This permits optimisation of unpacking on Windows - which is perhaps
overly special cased to include in tar itself? Doing this for rustup
which already doesn't use the Archive wrapper :/.

With this patch, I reduced rustup's rust-docs install from 21/22s to
11s on my bench machine - details at
rust-lang/rustup.rs#1850

I haven't filled in all the possibilities for the Unpacked enum, but
happy to do so if you'd like that done - it seemed like it wouldn't
be useful at this stage as there are no live objects to return.

@rbtcollins rbtcollins force-pushed the rbtcollins:new-tar-rs branch 2 times, most recently from eecc128 to 8accbf1 May 17, 2019

@rbtcollins

This comment has been minimized.

Copy link
Collaborator Author

commented May 18, 2019

I'm working on sane reporting of the time after the tar is fully read
just a teaser to show that it works :P

info: downloading component 'rustc'
 60.5 MiB /  60.5 MiB (100 %)  42.0 MiB/s in  1s ETA:  0s
info: downloading component 'rust-std'
 55.3 MiB /  55.3 MiB (100 %)  13.4 MiB/s in  5s ETA:  0s
info: downloading component 'cargo'
  3.0 MiB /   3.0 MiB (100 %)   1.8 MiB/s in  1s ETA:  0s
info: downloading component 'rust-docs'
info: installing component 'rustc'
 60.5 MiB /  60.5 MiB (100 %)  13.9 MiB/s in  4s ETA:  0s
pending close of 7 files
info: installing component 'rust-std'
 55.3 MiB /  55.3 MiB (100 %)  14.4 MiB/s in  3s ETA:  0s
pending close of 3 files
info: installing component 'cargo'
pending close of 7 files
info: installing component 'rust-docs'
 11.1 MiB /  11.1 MiB (100 %) 718.4 KiB/s in 11s ETA:  0s
pending close of 15840 files

  nightly-x86_64-pc-windows-msvc installed - rustc 1.36.0-nightly (73a3a90d2 2019-05-17)

info: checking for self-updates

@rbtcollins rbtcollins force-pushed the rbtcollins:new-tar-rs branch from 8accbf1 to 0bb14ce May 18, 2019

@rbtcollins

This comment has been minimized.

Copy link
Collaborator Author

commented May 18, 2019

Nearly there:

info: installing component 'rust-docs'
 11.1 MiB /  11.1 MiB (100 %) 540.8 KiB/s in 12s ETA:  0s
Closing 16009 deferred file handles
 15.6 KiB /  15.6 KiB (100 %) 373 B/s in 40s ETA:  0s

(thats with 4 threads, defender on).

4 threads, defender off gets this output:

info: installing component 'rust-docs'
 11.1 MiB /  11.1 MiB (100 %) 436.8 KiB/s in 12s ETA:  0s

64 threads, defender off:

info: installing component 'rust-docs'
 11.1 MiB /  11.1 MiB (100 %) 387.2 KiB/s in 12s ETA:  0s

64 threads, defender on:

info: installing component 'rust-docs'
 11.1 MiB /  11.1 MiB (100 %) 288.0 KiB/s in 17s ETA:  0s

-> so running with threads==core-count is much better.

rbtcollins added some commits May 18, 2019

Customisable tracker units
Files are in bytes, but other things we may track are not.
Avoid blocking on CloseHandle
On Windows closing a file involves CloseHandle, which can be quite
slow (6+ms) for various reasons, including circumstances outside our
control such as virus scanners, content indexing, device driver cache
write-back synchronisation and so forth. Rather than having a super
long list of all the things users need to do to optimise the performance
of CloseHandle, use a small threadpool to avoid blocking package
extraction when closing file handles.

This does run a risk of resource exhaustion, but as we only have 20k
files in the largest package today that should not be a problem in
practice.

https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-January/078404.html
provided inspiration for this.

My benchmark system went from 21/22s to 11s with this change with both
4 or 8 threads.
@rbtcollins

This comment has been minimized.

Copy link
Collaborator Author

commented May 18, 2019

Final version, looks like this on a constrained machine (e.g. a surface):

Closing 16017 deferred file handles
 15.6 Kihandles /  15.6 Kihandles (100 %) 354 handles/s in 41s ETA:  0s

@rbtcollins rbtcollins force-pushed the rbtcollins:new-tar-rs branch from 0bb14ce to ff0dc0d May 18, 2019

@rbtcollins rbtcollins changed the title WIP: Avoid blocking on CloseHandle Avoid blocking on CloseHandle May 18, 2019

@kinnison
Copy link
Collaborator

left a comment

There are a couple of typos, but honestly I don't want to hold things up for those and we can fix them later if we re-touch the code. This looks excellent. Thank you so much for working through all of this with Alex.

@kinnison kinnison merged commit 5580890 into rust-lang:master May 18, 2019

2 checks passed

Travis CI - Pull Request Build Passed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details

@rbtcollins rbtcollins deleted the rbtcollins:new-tar-rs branch May 18, 2019

@rbtcollins

This comment has been minimized.

Copy link
Collaborator Author

commented May 18, 2019

Thank you; please let me know on discord any followup typo fixes etc you'd like me to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.