Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Mmap to open the rmeta file. #55556

Merged
merged 1 commit into from
Nov 15, 2018
Merged

Conversation

nnethercote
Copy link
Contributor

Because those files are quite large, contribute significantly to peak
memory usage, but only a small fraction of the data is ever read.

r? @eddyb

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 31, 2018
@nnethercote
Copy link
Contributor Author

Here's what DHAT said about this, for tokio-webpush-simple, which was the most extreme case among the benchmarks:

  │     Total:     17,813,640 bytes (8.23%) in 66 blocks (0.01%), avg size 269,903.64 bytes, avg lifetime 1,210,149,468.11 instrs (75.11% of program duration)
  │     At t-gmax: 12,861,916 bytes (30.3%) in 58 blocks (0.05%), avg size 221,757.17 bytes
  │     At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
  │     Reads:     4,774,894 bytes (1.11%), avg 0.27 per byte
  │     Writes:    17,813,574 bytes (6.87%), avg 1 per byte
  │     Location: {
  │       #1: 0x6E271D4: alloc (alloc.rs:78)
  │       #2: 0x6E271D4: alloc (alloc.rs:154)
  │       #3: 0x6E271D4: allocate_in<u8,alloc::alloc::Global> (raw_vec.rs:106)
  │       #4: 0x6E271D4: with_capacity<u8> (raw_vec.rs:150)
  │       #5: 0x6E271D4: with_capacity<u8> (vec.rs:368)
  │       #6: 0x6E271D4: read<&std::path::Path> (fs.rs:267)
  │       #7: 0x6E271D4: get_metadata_section_imp (locator.rs:895)
  │       #8: 0x6E271D4: rustc_metadata::locator::get_metadata_section (locator.rs:852)
  │       #9: 0x6E257ED: rustc_metadata::locator::Context::extract_one (locator.rs:615)
  │     }

In other words, the allocations for these files accounted for 8.23% of all heap-allocated bytes, and 30.3% of the peak heap memory usage, but on average each byte was only read 0.27 times (i.e. most of the data was never even read).

I did a local benchmarking run and the results weren't as good as the above profiling suggested they would be. faults improved the most:

tokio-webpush-simple-check
        avg: -18.0%     min: -24.0%     max: -14.5%
sentry-cli-check
        avg: -13.0%     min: -16.4%     max: -10.6%
ripgrep-check
        avg: -10.7%     min: -13.3%     max: -8.1%
crates.io-check
        avg: -8.3%      min: -10.7%     max: -6.7%
encoding-check
        avg: -8.0%      min: -9.9%      max: -6.5%
regression-31157-check
        avg: -8.4%      min: -9.8%      max: -7.0%
webrender-check
        avg: -7.1%      min: -9.0%      max: -5.4%
piston-image-check
        avg: -3.8%      min: -5.9%      max: 0.9%
cargo-check
        avg: -4.2%      min: -5.3%      max: -3.3%
issue-46449-check
        avg: -3.3%      min: -4.4%      max: -2.9%
clap-rs-check
        avg: 0.1%       min: -3.7%      max: 3.0%
html5ever-check
        avg: -3.3%      min: -3.7%      max: -3.0%
regex-check
        avg: -2.4%      min: -3.0%      max: -2.0%

But max-rss didn't change that much, surprisingly:

sentry-cli-check
        avg: -4.1%      min: -5.0%      max: -3.6%
crates.io-check
        avg: -3.5%      min: -4.5%      max: -2.9%
issue-46449-check
        avg: 1.4%       min: 0.6%       max: 3.1%
deeply-nested-check
        avg: 1.9%       min: -0.2%      max: 3.1%
ripgrep-check
        avg: -2.3%      min: -3.1%      max: -1.8%
tokio-webpush-simple-check
        avg: -2.0%      min: -3.0%      max: -1.3%
webrender-check
        avg: -1.7%      min: -2.3%      max: -1.2%
ucd-check
        avg: 0.6%       min: 0.1%       max: 1.9%
helloworld-check
        avg: 1.4%       min: 1.2%       max: 1.9%
encoding-check
        avg: -1.4%      min: -1.9%      max: -1.0%
futures-check
        avg: 0.9%       min: 0.6%       max: 1.8%
html5ever-check
        avg: -1.2%      min: -1.6%      max: -1.0%
syn-check
        avg: 0.8%       min: 0.4%       max: 1.6%
cargo-check
        avg: -1.1%      min: -1.5%      max: -0.8%

cycles had some moderate improvements:

        avg: -3.6%      min: -5.3%      max: -1.5%
issue-46449-check
        avg: -1.9%      min: -4.9%      max: -0.3%
deeply-nested-check
        avg: -2.4%      min: -3.6%      max: -1.9%
ctfe-stress-check
        avg: 0.2%?      min: -2.3%?     max: 3.1%?
crates.io-check
        avg: -1.5%      min: -2.9%      max: -0.4%
unify-linearly-check
        avg: -1.5%      min: -2.5%      max: -0.9%
regex-check
        avg: -1.1%      min: -2.4%      max: -0.1%
regression-31157-check
        avg: -1.1%      min: -2.2%      max: -0.3%
encoding-check
        avg: -1.2%      min: -1.9%      max: 0.1%
inflate-check
        avg: -0.2%      min: -1.7%      max: 1.3%
tuple-stress-check
        avg: 0.4%       min: -1.0%      max: 1.7%
coercions-check
        avg: -0.5%?     min: -1.6%?     max: 1.0%?
piston-image-check
        avg: -0.9%      min: -1.5%      max: -0.3%
deep-vector-check
        avg: -0.7%      min: -1.5%      max: 0.2%

cpu-clock, instructions:u, wall-time, and task-clock all didn't change much.

A bit disappointing overall, but still certainly worth landing, IMO.

@nnethercote
Copy link
Contributor Author

@bors try

@bors
Copy link
Contributor

bors commented Oct 31, 2018

🔒 Merge conflict

This pull request and the master branch diverged in a way that cannot be automatically merged. Please rebase on top of the latest master branch, and let the reviewer approve again.

How do I rebase?

Assuming self is your fork and upstream is this repository, you can resolve the conflict following these steps:

  1. git checkout MmapMeta (switch to your branch)
  2. git fetch upstream master (retrieve the latest master)
  3. git rebase upstream/master -p (rebase on top of it)
  4. Follow the on-screen instruction to resolve conflicts (check git status if you got lost).
  5. git push self MmapMeta --force-with-lease (update this PR)

You may also read Git Rebasing to Resolve Conflicts by Drew Blessing for a short tutorial.

Please avoid the "Resolve conflicts" button on GitHub. It uses git merge instead of git rebase which makes the PR commit history more difficult to read.

Sometimes step 4 will complete without asking for resolution. This is usually due to difference between how Cargo.lock conflict is handled during merge and rebase. This is normal, and you should still perform step 5 to update this PR.

Error message
warning: Cannot merge binary files: src/Cargo.lock (HEAD vs. heads/homu-tmp)
Auto-merging src/Cargo.lock
CONFLICT (content): Merge conflict in src/Cargo.lock
Automatic merge failed; fix conflicts and then commit the result.

@nnethercote
Copy link
Contributor Author

@bors try

@bors
Copy link
Contributor

bors commented Oct 31, 2018

⌛ Trying commit 8fa874b329057d51e58ecfc4532b170def218d91 with merge 7325ca0693ea3ec184cca9bdf5b6d48f345d6beb...

@rust-highfive

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Oct 31, 2018

💔 Test failed - status-travis

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 31, 2018
@rust-highfive

This comment has been minimized.

@nnethercote
Copy link
Contributor Author

@bors try

@bors
Copy link
Contributor

bors commented Oct 31, 2018

⌛ Trying commit 39c5a8a with merge 6554f34...

bors added a commit that referenced this pull request Oct 31, 2018
Use `Mmap` to open the rmeta file.

Because those files are quite large, contribute significantly to peak
memory usage, but only a small fraction of the data is ever read.

r? @eddyb
// mmap the file, because only a small fraction of it is read.
let file = std::fs::File::open(filename).map_err(|_|
format!("failed to open rmeta metadata: '{}'", filename.display()))?;
let mmap = unsafe { memmap::Mmap::map(&file) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obligatory "not actually safe to map this because of concurrent modification"

See danburkert/memmap-rs#25.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do some simple locking here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the inherent unsafety be clarified? I'm not super familiar with the relation between mmap and concurrent modification, but rustc does sometimes load (and later reject) files which are being concurrently modified. This can happen when cargo fires of a number of rustc builds, and as one is loading dependency information another is writing that information. The one loading may read a halfway-written file, but it'll be guaranteed to reject it at some point later because it's not actually needed (or otherwise Cargo would sequence the compilations differently).

In light of this, though, is this something we actually need to fix in rustc or owrry about from an unsafety point of view?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the problem: we mmap the file's contents into a Vec at T1, and that Vec's contents are theoretically set. However, in practice they aren't actually set until they are read, at T2, whereupon the contents will be page-faulted in. But it's possible that the file's contents have changed between T1 and T2. (Even worse, there is actually a different T2 for every separate page in the Vec.)

File locking won't help unless we lock the file for the entire period that its contents may be read, which doesn't seem like a good idea.

Also, even if you use the normal file-reading operations, there's still a chance you'll end up with inconsistent data if the file is overwritten in the middle of that operation. But the window of opportunity is potentially much smaller. (I think this is what @alexcrichton is referring to.)

Given that the gains turned out to be minor, I would be ok with abandoning this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the gains turned out to be minor, I would be ok with abandoning this.

They did? In both time and memory? That's a shame, then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be legal for LLVM to optimize 4 relaxed atomic 1-byte loads into one relaxed atomic 4-byte load. (The same optimization would not be legal for volatile.) Merging loads is okay with atomics, just splitting them is not. I am not sure if LLVM does anything like that though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping it would do something like that, but then my code was 10x slower when I used relaxed atomics :( I didn't spend much time wrestling with it, though, and I'm no optimization expert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I expect that there is lots of unused potential around optimizations for atomics. But of course that doesn't help right now, that would take someone to improve LLVM's analyses.

As far as this PR is concerned, the good news is that this is UB only if mutation actually happens. The fact that this is shared is invisible to the program until something else writes to it. That is unlikely to happen. So the end result is that the compiler may fail in arbitrarily unexpected ways if someone mutates the mmap'ed file, but absent that nothing strange can happen.

There's no way to ask the kernel to do copy-on-write for an mmap'ed file so that memory, once actually mapped (on the first access), is frozen and not affected by others mutating the file -- is there?

Copy link
Member

@eddyb eddyb Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that I think we have to read things byte by byte anyway, so I have a doubt that AtomicU8 will pessimize anything.

EDIT: other than maybe getting a &[u8] to read a String.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the whole point of this PR is that typically only 10--20% of the mapped memory in question is actually read. There are no loops here, so the overhead of atomics are probably not high.

@bors
Copy link
Contributor

bors commented Nov 1, 2018

☀️ Test successful - status-travis
State: approved= try=True

@nnethercote
Copy link
Contributor Author

@rust-timer build 6554f34

@rust-timer
Copy link
Collaborator

Success: Queued 6554f34 with parent de9666f, comparison URL.

@nikic
Copy link
Contributor

nikic commented Nov 3, 2018

It looks like perf got stuck? It seems to be restarting benchmarks on the same commit again and again.

/cc @Mark-Simulacrum

type Target = [u8];

fn deref(&self) -> &[u8] {
self.0.deref()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preferred way to write this is &*self.0, but YMMV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I agree, but I think this way is clearer within a deref method :)

@eddyb
Copy link
Member

eddyb commented Nov 3, 2018

@rust-highfive rust-highfive assigned alexcrichton and unassigned eddyb Nov 3, 2018
@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 6554f34

@nnethercote nnethercote closed this Nov 5, 2018
@eddyb
Copy link
Member

eddyb commented Nov 5, 2018

I still want to land this, so I'll keep it open so we don't forget about it.
I expect more wins in cases with larger dependencies (see #55556 (comment)).

@eddyb eddyb reopened this Nov 5, 2018
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:0c2bf39b:start=1541412837802943333,finish=1541412895857652631,duration=58054709298
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#Pull-Requests-and-Security-Restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-5.0
---
travis_time:end:235af39d:start=1541413034538871033,finish=1541413034543809730,duration=4938697
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0ff24bba
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:0ae6e918
travis_time:start:0ae6e918
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:0280d1d2
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@alexcrichton
Copy link
Member

FWIW I don't intend to draw attention to possible unsafety here and try to make sure this has really strong motivation to land. @eddyb is right in that LLVM is already mmaping rlibs and we've basically never had problems with that (afaik). Also the compiler's predominant use case is not one that's exploiting UB, it just runs the risk of accidentally reading halfway-written files, nothing is being overwritten and truncated, it may just not be as long as we thought it was. (this is already the case today, we can read a file that was halfway written).

I am personally very unfamiliar with safety and mmap, but given how we've basically done this forever (via rlibs and LLVM) and it's not hitting the "serious cases" like overwriting data and truncation, this seems fine to land by me. This clearly doesn't have any perf regressions and should be a clear win for larger crates and dependency graphs.

@eddyb
Copy link
Member

eddyb commented Nov 5, 2018

@nnethercote Can you perhaps file an issue about attempting the replacement of [u8] with [AtomicU8] here (across the board, including for rlibs, where LLVM does the mmap-ing)?
If it happens to not regress perf (with relaxed loads), we could do that as a "peace of mind" thing.

Otherwise, I think we should land this.

@TimNN
Copy link
Contributor

TimNN commented Nov 13, 2018

Ping from triage! What is the status of this PR?

@nnethercote
Copy link
Contributor Author

Ping from triage! What is the status of this PR?

I closed it, @eddyb reopened it. I'm not inclined to pursue it further, because the perf wins were very small and the semantic are complicated. It's fine by me if someone else wants to take it over.

Because those files are quite large, contribute significantly to peak
memory usage, but only a small fraction of the data is ever read.
@eddyb
Copy link
Member

eddyb commented Nov 14, 2018

@bors r+

@bors
Copy link
Contributor

bors commented Nov 14, 2018

📌 Commit 818257e has been approved by eddyb

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 14, 2018
@eddyb
Copy link
Member

eddyb commented Nov 14, 2018

@nnethercote Like I said, I think that's just because you weren't testing on workspaces large enough. The semantics aren't anything new, because we already mmap via LLVM for rlibs.

kennytm added a commit to kennytm/rust that referenced this pull request Nov 15, 2018
Use `Mmap` to open the rmeta file.

Because those files are quite large, contribute significantly to peak
memory usage, but only a small fraction of the data is ever read.
bors added a commit that referenced this pull request Nov 15, 2018
Rollup of 16 pull requests

Successful merges:

 - #54906 (Reattach all grandchildren when constructing specialization graph.)
 - #55182 (Redox: Update to new changes)
 - #55211 (Add BufWriter::buffer method)
 - #55507 (Add link to std::mem::size_of to size_of intrinsic documentation)
 - #55530 (Speed up String::from_utf16)
 - #55556 (Use `Mmap` to open the rmeta file.)
 - #55622 (NetBSD: link libstd with librt in addition to libpthread)
 - #55827 (A few tweaks to iterations/collecting)
 - #55901 (fix various typos in doc comments)
 - #55926 (Change sidebar selector to fix compatibility with docs.rs)
 - #55930 (A handful of hir tweaks)
 - #55932 (core/char: Speed up `to_digit()` for `radix <= 10`)
 - #55935 (appveyor: Use VS2017 for all our images)
 - #55936 (save-analysis: be even more aggressive about ignorning macro-generated defs)
 - #55948 (submodules: update clippy from d8b4269 to 7e0ddef)
 - #55956 (add tests for some fixed ICEs)
pietroalbini added a commit to pietroalbini/rust that referenced this pull request Nov 15, 2018
Use `Mmap` to open the rmeta file.

Because those files are quite large, contribute significantly to peak
memory usage, but only a small fraction of the data is ever read.

r? @eddyb
bors added a commit that referenced this pull request Nov 15, 2018
Rollup of 17 pull requests

Successful merges:

 - #55182 (Redox: Update to new changes)
 - #55211 (Add BufWriter::buffer method)
 - #55507 (Add link to std::mem::size_of to size_of intrinsic documentation)
 - #55530 (Speed up String::from_utf16)
 - #55556 (Use `Mmap` to open the rmeta file.)
 - #55622 (NetBSD: link libstd with librt in addition to libpthread)
 - #55750 (Make `NodeId` and `HirLocalId` `newtype_index`)
 - #55778 (Wrap some query results in `Lrc`.)
 - #55781 (More precise spans for temps and their drops)
 - #55785 (Add mem::forget_unsized() for forgetting unsized values)
 - #55852 (Rewrite `...` as `..=` as a `MachineApplicable` 2018 idiom lint)
 - #55865 (Unix RwLock: avoid racy access to write_locked)
 - #55901 (fix various typos in doc comments)
 - #55926 (Change sidebar selector to fix compatibility with docs.rs)
 - #55930 (A handful of hir tweaks)
 - #55932 (core/char: Speed up `to_digit()` for `radix <= 10`)
 - #55956 (add tests for some fixed ICEs)

Failed merges:

r? @ghost
@bors bors merged commit 818257e into rust-lang:master Nov 15, 2018
@nnethercote nnethercote deleted the MmapMeta branch November 15, 2018 21:12
@SimonSapin
Copy link
Contributor

Per @eddyb’s request I’ve tried this in Servo. Comparing the first Nightly that contains this change, nightly-2018-11-16, with the one before that, nightly-2018-11-15.

I run cargo check a first time, then touch components/style/lib.rs && cargo check four times. The first run includes building multiple large C++ libraries (cargo build scripts run the same under cargo check as with cargo build), so the subsequent runs are perhaps more representative of normal workflow.

Times in seconds reported by Cargo before: 42.67, 41.73, 42.40, 42.42. Average: 42.305 seconds.

Times after: 40.80, 39.81, 39.87, 39.71. Average: 40.0475 seconds.

This is a ~5% time reduction.


$ git log --merges --pretty=%s 6f93e93..6b9b97b
Auto merge of #55948 - matthiaskrgr:clippy, r=oli-obk
Auto merge of #55974 - pietroalbini:rollup, r=pietroalbini
Rollup merge of #55956 - euclio:issue-55587, r=estebank
Rollup merge of #55932 - Turbo87:to_digit, r=alexcrichton
Rollup merge of #55930 - ljedrz:hir_bonuses, r=cramertj
Rollup merge of #55926 - cynecx:fix-rustdoc-mobile-css, r=GuillaumeGomez
Rollup merge of #55901 - euclio:speling, r=petrochenkov
Rollup merge of #55865 - RalfJung:unix-rwlock, r=alexcrichton
Rollup merge of #55852 - varkor:dotdotequals-lint, r=zackmdavis
Rollup merge of #55785 - stjepang:unsized-drop-forget, r=alexcrichton
Rollup merge of #55781 - pnkfelix:issue-54382-more-precise-spans-for-temps-and-their-drops, r=davidtwco
Rollup merge of #55778 - nnethercote:LrcPreds, r=eddyb
Rollup merge of #55750 - oli-obk:node_id_x, r=michaelwoerister
Rollup merge of #55622 - jakllsch:netbsd-librt, r=alexcrichton
Rollup merge of #55556 - nnethercote:MmapMeta, r=eddyb
Rollup merge of #55530 - ljedrz:speed_up_String_from_utf16, r=SimonSapin
Rollup merge of #55507 - fhartwig:size_of_intrinsic_docs, r=frewsxcv
Rollup merge of #55211 - fintelia:bufwriter-buffer, r=shepmaster
Rollup merge of #55182 - jD91mZM2:rebased, r=alexcrichton
Auto merge of #54906 - qnighy:fix-issue-50452, r=nikomatsakis
Auto merge of #55716 - RalfJung:escape-to-raw, r=oli-obk
Auto merge of #55939 - alexcrichton:path-regression-again, r=sfackler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.