Use `Mmap` to open the rmeta file. #55556

nnethercote · 2018-10-31T23:27:04Z

Because those files are quite large, contribute significantly to peak
memory usage, but only a small fraction of the data is ever read.

r? @eddyb

nnethercote · 2018-10-31T23:36:34Z

Here's what DHAT said about this, for tokio-webpush-simple, which was the most extreme case among the benchmarks:

  │     Total:     17,813,640 bytes (8.23%) in 66 blocks (0.01%), avg size 269,903.64 bytes, avg lifetime 1,210,149,468.11 instrs (75.11% of program duration)
  │     At t-gmax: 12,861,916 bytes (30.3%) in 58 blocks (0.05%), avg size 221,757.17 bytes
  │     At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
  │     Reads:     4,774,894 bytes (1.11%), avg 0.27 per byte
  │     Writes:    17,813,574 bytes (6.87%), avg 1 per byte
  │     Location: {
  │       #1: 0x6E271D4: alloc (alloc.rs:78)
  │       #2: 0x6E271D4: alloc (alloc.rs:154)
  │       #3: 0x6E271D4: allocate_in<u8,alloc::alloc::Global> (raw_vec.rs:106)
  │       #4: 0x6E271D4: with_capacity<u8> (raw_vec.rs:150)
  │       #5: 0x6E271D4: with_capacity<u8> (vec.rs:368)
  │       #6: 0x6E271D4: read<&std::path::Path> (fs.rs:267)
  │       #7: 0x6E271D4: get_metadata_section_imp (locator.rs:895)
  │       #8: 0x6E271D4: rustc_metadata::locator::get_metadata_section (locator.rs:852)
  │       #9: 0x6E257ED: rustc_metadata::locator::Context::extract_one (locator.rs:615)
  │     }

In other words, the allocations for these files accounted for 8.23% of all heap-allocated bytes, and 30.3% of the peak heap memory usage, but on average each byte was only read 0.27 times (i.e. most of the data was never even read).

I did a local benchmarking run and the results weren't as good as the above profiling suggested they would be. faults improved the most:

tokio-webpush-simple-check
        avg: -18.0%     min: -24.0%     max: -14.5%
sentry-cli-check
        avg: -13.0%     min: -16.4%     max: -10.6%
ripgrep-check
        avg: -10.7%     min: -13.3%     max: -8.1%
crates.io-check
        avg: -8.3%      min: -10.7%     max: -6.7%
encoding-check
        avg: -8.0%      min: -9.9%      max: -6.5%
regression-31157-check
        avg: -8.4%      min: -9.8%      max: -7.0%
webrender-check
        avg: -7.1%      min: -9.0%      max: -5.4%
piston-image-check
        avg: -3.8%      min: -5.9%      max: 0.9%
cargo-check
        avg: -4.2%      min: -5.3%      max: -3.3%
issue-46449-check
        avg: -3.3%      min: -4.4%      max: -2.9%
clap-rs-check
        avg: 0.1%       min: -3.7%      max: 3.0%
html5ever-check
        avg: -3.3%      min: -3.7%      max: -3.0%
regex-check
        avg: -2.4%      min: -3.0%      max: -2.0%

But max-rss didn't change that much, surprisingly:

sentry-cli-check
        avg: -4.1%      min: -5.0%      max: -3.6%
crates.io-check
        avg: -3.5%      min: -4.5%      max: -2.9%
issue-46449-check
        avg: 1.4%       min: 0.6%       max: 3.1%
deeply-nested-check
        avg: 1.9%       min: -0.2%      max: 3.1%
ripgrep-check
        avg: -2.3%      min: -3.1%      max: -1.8%
tokio-webpush-simple-check
        avg: -2.0%      min: -3.0%      max: -1.3%
webrender-check
        avg: -1.7%      min: -2.3%      max: -1.2%
ucd-check
        avg: 0.6%       min: 0.1%       max: 1.9%
helloworld-check
        avg: 1.4%       min: 1.2%       max: 1.9%
encoding-check
        avg: -1.4%      min: -1.9%      max: -1.0%
futures-check
        avg: 0.9%       min: 0.6%       max: 1.8%
html5ever-check
        avg: -1.2%      min: -1.6%      max: -1.0%
syn-check
        avg: 0.8%       min: 0.4%       max: 1.6%
cargo-check
        avg: -1.1%      min: -1.5%      max: -0.8%

cycles had some moderate improvements:

        avg: -3.6%      min: -5.3%      max: -1.5%
issue-46449-check
        avg: -1.9%      min: -4.9%      max: -0.3%
deeply-nested-check
        avg: -2.4%      min: -3.6%      max: -1.9%
ctfe-stress-check
        avg: 0.2%?      min: -2.3%?     max: 3.1%?
crates.io-check
        avg: -1.5%      min: -2.9%      max: -0.4%
unify-linearly-check
        avg: -1.5%      min: -2.5%      max: -0.9%
regex-check
        avg: -1.1%      min: -2.4%      max: -0.1%
regression-31157-check
        avg: -1.1%      min: -2.2%      max: -0.3%
encoding-check
        avg: -1.2%      min: -1.9%      max: 0.1%
inflate-check
        avg: -0.2%      min: -1.7%      max: 1.3%
tuple-stress-check
        avg: 0.4%       min: -1.0%      max: 1.7%
coercions-check
        avg: -0.5%?     min: -1.6%?     max: 1.0%?
piston-image-check
        avg: -0.9%      min: -1.5%      max: -0.3%
deep-vector-check
        avg: -0.7%      min: -1.5%      max: 0.2%

cpu-clock, instructions:u, wall-time, and task-clock all didn't change much.

A bit disappointing overall, but still certainly worth landing, IMO.

nnethercote · 2018-10-31T23:36:56Z

@bors try

bors · 2018-10-31T23:37:00Z

🔒 Merge conflict

This pull request and the master branch diverged in a way that cannot be automatically merged. Please rebase on top of the latest master branch, and let the reviewer approve again.

How do I rebase?

Assuming self is your fork and upstream is this repository, you can resolve the conflict following these steps:

git checkout MmapMeta (switch to your branch)
git fetch upstream master (retrieve the latest master)
git rebase upstream/master -p (rebase on top of it)
Follow the on-screen instruction to resolve conflicts (check git status if you got lost).
git push self MmapMeta --force-with-lease (update this PR)

You may also read Git Rebasing to Resolve Conflicts by Drew Blessing for a short tutorial.

Please avoid the "Resolve conflicts" button on GitHub. It uses git merge instead of git rebase which makes the PR commit history more difficult to read.

Sometimes step 4 will complete without asking for resolution. This is usually due to difference between how Cargo.lock conflict is handled during merge and rebase. This is normal, and you should still perform step 5 to update this PR.

Error message

warning: Cannot merge binary files: src/Cargo.lock (HEAD vs. heads/homu-tmp)
Auto-merging src/Cargo.lock
CONFLICT (content): Merge conflict in src/Cargo.lock
Automatic merge failed; fix conflicts and then commit the result.

nnethercote · 2018-10-31T23:41:26Z

@bors try

bors · 2018-10-31T23:41:37Z

⌛ Trying commit 8fa874b329057d51e58ecfc4532b170def218d91 with merge 7325ca0693ea3ec184cca9bdf5b6d48f345d6beb...

bors · 2018-10-31T23:49:51Z

💔 Test failed - status-travis

nnethercote · 2018-10-31T23:50:40Z

@bors try

bors · 2018-10-31T23:50:51Z

⌛ Trying commit 39c5a8a with merge 6554f34...

@eddyb

Use `Mmap` to open the rmeta file. Because those files are quite large, contribute significantly to peak memory usage, but only a small fraction of the data is ever read. r? @eddyb

cramertj · 2018-11-01T01:20:45Z

src/librustc_metadata/locator.rs

+            // mmap the file, because only a small fraction of it is read.
+            let file = std::fs::File::open(filename).map_err(|_|
+                format!("failed to open rmeta metadata: '{}'", filename.display()))?;
+            let mmap = unsafe { memmap::Mmap::map(&file) };


obligatory "not actually safe to map this because of concurrent modification"

See danburkert/memmap-rs#25.

Can we do some simple locking here?

Can the inherent unsafety be clarified? I'm not super familiar with the relation between mmap and concurrent modification, but rustc does sometimes load (and later reject) files which are being concurrently modified. This can happen when cargo fires of a number of rustc builds, and as one is loading dependency information another is writing that information. The one loading may read a halfway-written file, but it'll be guaranteed to reject it at some point later because it's not actually needed (or otherwise Cargo would sequence the compilations differently).

In light of this, though, is this something we actually need to fix in rustc or owrry about from an unsafety point of view?

My understanding of the problem: we mmap the file's contents into a Vec at T1, and that Vec's contents are theoretically set. However, in practice they aren't actually set until they are read, at T2, whereupon the contents will be page-faulted in. But it's possible that the file's contents have changed between T1 and T2. (Even worse, there is actually a different T2 for every separate page in the Vec.)

File locking won't help unless we lock the file for the entire period that its contents may be read, which doesn't seem like a good idea.

Also, even if you use the normal file-reading operations, there's still a chance you'll end up with inconsistent data if the file is overwritten in the middle of that operation. But the window of opportunity is potentially much smaller. (I think this is what @alexcrichton is referring to.)

Given that the gains turned out to be minor, I would be ok with abandoning this.

Given that the gains turned out to be minor, I would be ok with abandoning this.

They did? In both time and memory? That's a shame, then.

It would be legal for LLVM to optimize 4 relaxed atomic 1-byte loads into one relaxed atomic 4-byte load. (The same optimization would not be legal for volatile.) Merging loads is okay with atomics, just splitting them is not. I am not sure if LLVM does anything like that though.

I was hoping it would do something like that, but then my code was 10x slower when I used relaxed atomics :( I didn't spend much time wrestling with it, though, and I'm no optimization expert.

Yeah, I expect that there is lots of unused potential around optimizations for atomics. But of course that doesn't help right now, that would take someone to improve LLVM's analyses.

As far as this PR is concerned, the good news is that this is UB only if mutation actually happens. The fact that this is shared is invisible to the program until something else writes to it. That is unlikely to happen. So the end result is that the compiler may fail in arbitrarily unexpected ways if someone mutates the mmap'ed file, but absent that nothing strange can happen.

There's no way to ask the kernel to do copy-on-write for an mmap'ed file so that memory, once actually mapped (on the first access), is frozen and not affected by others mutating the file -- is there?

Also note that I think we have to read things byte by byte anyway, so I have a doubt that AtomicU8 will pessimize anything.

EDIT: other than maybe getting a &[u8] to read a String.

Well, the whole point of this PR is that typically only 10--20% of the mapped memory in question is actually read. There are no loops here, so the overhead of atomics are probably not high.

bors · 2018-11-01T02:19:31Z

☀️ Test successful - status-travis
State: approved= try=True

nnethercote · 2018-11-01T03:03:56Z

@rust-timer build 6554f34

rust-timer · 2018-11-01T03:03:57Z

Success: Queued 6554f34 with parent de9666f, comparison URL.

nikic · 2018-11-03T14:58:33Z

It looks like perf got stuck? It seems to be restarting benchmarks on the same commit again and again.

/cc @Mark-Simulacrum

eddyb · 2018-11-03T17:44:47Z

src/librustc_metadata/locator.rs

+    type Target = [u8];
+
+    fn deref(&self) -> &[u8] {
+        self.0.deref()


My preferred way to write this is &*self.0, but YMMV.

In general, I agree, but I think this way is clearer within a deref method :)

eddyb · 2018-11-03T17:45:17Z

r? @alexcrichton cc @nikomatsakis @michaelwoerister

rust-timer · 2018-11-03T23:11:12Z

Finished benchmarking try commit 6554f34

eddyb · 2018-11-05T10:12:21Z

I still want to land this, so I'll keep it open so we don't forget about it.
I expect more wins in cases with larger dependencies (see #55556 (comment)).

rust-highfive · 2018-11-05T10:18:24Z

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

travis_time:end:0c2bf39b:start=1541412837802943333,finish=1541412895857652631,duration=58054709298
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#Pull-Requests-and-Security-Restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-5.0
---
travis_time:end:235af39d:start=1541413034538871033,finish=1541413034543809730,duration=4938697
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0ff24bba
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:0ae6e918
travis_time:start:0ae6e918
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:0280d1d2
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

alexcrichton · 2018-11-05T16:27:28Z

FWIW I don't intend to draw attention to possible unsafety here and try to make sure this has really strong motivation to land. @eddyb is right in that LLVM is already mmaping rlibs and we've basically never had problems with that (afaik). Also the compiler's predominant use case is not one that's exploiting UB, it just runs the risk of accidentally reading halfway-written files, nothing is being overwritten and truncated, it may just not be as long as we thought it was. (this is already the case today, we can read a file that was halfway written).

I am personally very unfamiliar with safety and mmap, but given how we've basically done this forever (via rlibs and LLVM) and it's not hitting the "serious cases" like overwriting data and truncation, this seems fine to land by me. This clearly doesn't have any perf regressions and should be a clear win for larger crates and dependency graphs.

eddyb · 2018-11-05T18:29:11Z

@nnethercote Can you perhaps file an issue about attempting the replacement of [u8] with [AtomicU8] here (across the board, including for rlibs, where LLVM does the mmap-ing)?
If it happens to not regress perf (with relaxed loads), we could do that as a "peace of mind" thing.

Otherwise, I think we should land this.

TimNN · 2018-11-13T18:11:46Z

Ping from triage! What is the status of this PR?

nnethercote · 2018-11-13T20:41:56Z

Ping from triage! What is the status of this PR?

I closed it, @eddyb reopened it. I'm not inclined to pursue it further, because the perf wins were very small and the semantic are complicated. It's fine by me if someone else wants to take it over.

Because those files are quite large, contribute significantly to peak memory usage, but only a small fraction of the data is ever read.

eddyb · 2018-11-14T03:56:01Z

@bors r+

bors · 2018-11-14T03:56:02Z

📌 Commit 818257e has been approved by eddyb

eddyb · 2018-11-14T03:57:25Z

@nnethercote Like I said, I think that's just because you weren't testing on workspaces large enough. The semantics aren't anything new, because we already mmap via LLVM for rlibs.

Use `Mmap` to open the rmeta file. Because those files are quite large, contribute significantly to peak memory usage, but only a small fraction of the data is ever read.

Rollup of 16 pull requests Successful merges: - #54906 (Reattach all grandchildren when constructing specialization graph.) - #55182 (Redox: Update to new changes) - #55211 (Add BufWriter::buffer method) - #55507 (Add link to std::mem::size_of to size_of intrinsic documentation) - #55530 (Speed up String::from_utf16) - #55556 (Use `Mmap` to open the rmeta file.) - #55622 (NetBSD: link libstd with librt in addition to libpthread) - #55827 (A few tweaks to iterations/collecting) - #55901 (fix various typos in doc comments) - #55926 (Change sidebar selector to fix compatibility with docs.rs) - #55930 (A handful of hir tweaks) - #55932 (core/char: Speed up `to_digit()` for `radix <= 10`) - #55935 (appveyor: Use VS2017 for all our images) - #55936 (save-analysis: be even more aggressive about ignorning macro-generated defs) - #55948 (submodules: update clippy from d8b4269 to 7e0ddef) - #55956 (add tests for some fixed ICEs)

@eddyb

Use `Mmap` to open the rmeta file. Because those files are quite large, contribute significantly to peak memory usage, but only a small fraction of the data is ever read. r? @eddyb

@ghost

Rollup of 17 pull requests Successful merges: - #55182 (Redox: Update to new changes) - #55211 (Add BufWriter::buffer method) - #55507 (Add link to std::mem::size_of to size_of intrinsic documentation) - #55530 (Speed up String::from_utf16) - #55556 (Use `Mmap` to open the rmeta file.) - #55622 (NetBSD: link libstd with librt in addition to libpthread) - #55750 (Make `NodeId` and `HirLocalId` `newtype_index`) - #55778 (Wrap some query results in `Lrc`.) - #55781 (More precise spans for temps and their drops) - #55785 (Add mem::forget_unsized() for forgetting unsized values) - #55852 (Rewrite `...` as `..=` as a `MachineApplicable` 2018 idiom lint) - #55865 (Unix RwLock: avoid racy access to write_locked) - #55901 (fix various typos in doc comments) - #55926 (Change sidebar selector to fix compatibility with docs.rs) - #55930 (A handful of hir tweaks) - #55932 (core/char: Speed up `to_digit()` for `radix <= 10`) - #55956 (add tests for some fixed ICEs) Failed merges: r? @ghost

SimonSapin · 2018-11-19T18:18:48Z

Per @eddyb’s request I’ve tried this in Servo. Comparing the first Nightly that contains this change, nightly-2018-11-16, with the one before that, nightly-2018-11-15.

I run cargo check a first time, then touch components/style/lib.rs && cargo check four times. The first run includes building multiple large C++ libraries (cargo build scripts run the same under cargo check as with cargo build), so the subsequent runs are perhaps more representative of normal workflow.

Times in seconds reported by Cargo before: 42.67, 41.73, 42.40, 42.42. Average: 42.305 seconds.

Times after: 40.80, 39.81, 39.87, 39.71. Average: 40.0475 seconds.

This is a ~5% time reduction.

$ git log --merges --pretty=%s 6f93e93..6b9b97b
Auto merge of #55948 - matthiaskrgr:clippy, r=oli-obk
Auto merge of #55974 - pietroalbini:rollup, r=pietroalbini
Rollup merge of #55956 - euclio:issue-55587, r=estebank
Rollup merge of #55932 - Turbo87:to_digit, r=alexcrichton
Rollup merge of #55930 - ljedrz:hir_bonuses, r=cramertj
Rollup merge of #55926 - cynecx:fix-rustdoc-mobile-css, r=GuillaumeGomez
Rollup merge of #55901 - euclio:speling, r=petrochenkov
Rollup merge of #55865 - RalfJung:unix-rwlock, r=alexcrichton
Rollup merge of #55852 - varkor:dotdotequals-lint, r=zackmdavis
Rollup merge of #55785 - stjepang:unsized-drop-forget, r=alexcrichton
Rollup merge of #55781 - pnkfelix:issue-54382-more-precise-spans-for-temps-and-their-drops, r=davidtwco
Rollup merge of #55778 - nnethercote:LrcPreds, r=eddyb
Rollup merge of #55750 - oli-obk:node_id_x, r=michaelwoerister
Rollup merge of #55622 - jakllsch:netbsd-librt, r=alexcrichton
Rollup merge of #55556 - nnethercote:MmapMeta, r=eddyb
Rollup merge of #55530 - ljedrz:speed_up_String_from_utf16, r=SimonSapin
Rollup merge of #55507 - fhartwig:size_of_intrinsic_docs, r=frewsxcv
Rollup merge of #55211 - fintelia:bufwriter-buffer, r=shepmaster
Rollup merge of #55182 - jD91mZM2:rebased, r=alexcrichton
Auto merge of #54906 - qnighy:fix-issue-50452, r=nikomatsakis
Auto merge of #55716 - RalfJung:escape-to-raw, r=oli-obk
Auto merge of #55939 - alexcrichton:path-regression-again, r=sfackler

rust-highfive assigned eddyb Oct 31, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 31, 2018

nnethercote force-pushed the MmapMeta branch from c2d9e33 to 8fa874b Compare October 31, 2018 23:41

This comment has been minimized.

Sign in to view

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 31, 2018

This comment has been minimized.

Sign in to view

nnethercote force-pushed the MmapMeta branch from 8fa874b to 39c5a8a Compare October 31, 2018 23:49

cramertj reviewed Nov 1, 2018

View reviewed changes

eddyb reviewed Nov 3, 2018

View reviewed changes

rust-highfive assigned alexcrichton and unassigned eddyb Nov 3, 2018

nnethercote closed this Nov 5, 2018

eddyb reopened this Nov 5, 2018

Use Mmap to open the rmeta file.

818257e

Because those files are quite large, contribute significantly to peak memory usage, but only a small fraction of the data is ever read.

eddyb force-pushed the MmapMeta branch from 39c5a8a to 818257e Compare November 14, 2018 03:55

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 14, 2018

kennytm mentioned this pull request Nov 14, 2018

Rollup of 16 pull requests #55943

Closed

pietroalbini mentioned this pull request Nov 15, 2018

Rollup of 17 pull requests #55974

Merged

bors merged commit 818257e into rust-lang:master Nov 15, 2018

nnethercote deleted the MmapMeta branch November 15, 2018 21:12

Use Mmap to open the rmeta file. #55556

Use Mmap to open the rmeta file. #55556

Conversation

nnethercote commented Oct 31, 2018

nnethercote commented Oct 31, 2018

nnethercote commented Oct 31, 2018

bors commented Oct 31, 2018

nnethercote commented Oct 31, 2018

bors commented Oct 31, 2018

This comment has been minimized.

bors commented Oct 31, 2018

This comment has been minimized.

nnethercote commented Oct 31, 2018

bors commented Oct 31, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eddyb Nov 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bors commented Nov 1, 2018

nnethercote commented Nov 1, 2018

rust-timer commented Nov 1, 2018

nikic commented Nov 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eddyb commented Nov 3, 2018

rust-timer commented Nov 3, 2018

eddyb commented Nov 5, 2018

rust-highfive commented Nov 5, 2018

alexcrichton commented Nov 5, 2018

eddyb commented Nov 5, 2018

TimNN commented Nov 13, 2018

nnethercote commented Nov 13, 2018

eddyb commented Nov 14, 2018

bors commented Nov 14, 2018

eddyb commented Nov 14, 2018 • edited Loading

SimonSapin commented Nov 19, 2018

Use `Mmap` to open the rmeta file. #55556

Use `Mmap` to open the rmeta file. #55556

eddyb Nov 7, 2018 •

edited

Loading

nikic commented Nov 3, 2018 •

edited

Loading

eddyb commented Nov 14, 2018 •

edited

Loading