Remove Stacked Borrows GC heuristics #3194

saethlin · 2023-11-27T22:00:27Z

Removing these has no impact on our benchmarks. I think I initially added these because they have a significant impact on runtime at shorter GC intervals. But both these heuristics result in undesirable memory growth in real programs, especially modified_since_last_gc. I didn't realize at the time that required state becomes garbage as a result of changes to other allocations.

I think this nets even primarily because we get better heap reuse. With this change I see almost all the mmap calls coming from our diagnostics infrastructure go away. Not that there were many to start with, but it's an indicator that our memory locality has improved.

Before:

Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/backtraces/Cargo.toml"
  Time (mean ± σ):      4.046 s ±  0.087 s    [User: 3.952 s, System: 0.085 s]
  Range (min … max):    3.952 s …  4.139 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/invalidate/Cargo.toml"
  Time (mean ± σ):      6.271 s ±  0.073 s    [User: 6.206 s, System: 0.054 s]
  Range (min … max):    6.195 s …  6.365 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/mse/Cargo.toml"
  Time (mean ± σ):     570.3 ms ±   6.7 ms    [User: 505.5 ms, System: 61.8 ms]
  Range (min … max):   559.6 ms … 576.0 ms    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/serde1/Cargo.toml"
  Time (mean ± σ):      2.013 s ±  0.012 s    [User: 1.938 s, System: 0.069 s]
  Range (min … max):    1.994 s …  2.024 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/serde2/Cargo.toml"
  Time (mean ± σ):      4.155 s ±  0.082 s    [User: 4.078 s, System: 0.067 s]
  Range (min … max):    4.011 s …  4.211 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/slice-get-unchecked/Cargo.toml"
  Time (mean ± σ):     541.5 ms ±   3.9 ms    [User: 477.3 ms, System: 60.0 ms]
  Range (min … max):   536.1 ms … 545.1 ms    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/unicode/Cargo.toml"
  Time (mean ± σ):      1.496 s ±  0.002 s    [User: 1.442 s, System: 0.050 s]
  Range (min … max):    1.494 s …  1.500 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/zip-equal/Cargo.toml"
  Time (mean ± σ):      2.190 s ±  0.043 s    [User: 2.109 s, System: 0.075 s]
  Range (min … max):    2.114 s …  2.215 s    5 runs

After:

Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/backtraces/Cargo.toml"
  Time (mean ± σ):      3.954 s ±  0.057 s    [User: 3.871 s, System: 0.075 s]
  Range (min … max):    3.912 s …  4.052 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/invalidate/Cargo.toml"
  Time (mean ± σ):      6.200 s ±  0.108 s    [User: 6.129 s, System: 0.060 s]
  Range (min … max):    6.072 s …  6.295 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/mse/Cargo.toml"
  Time (mean ± σ):     575.3 ms ±   9.3 ms    [User: 511.5 ms, System: 60.2 ms]
  Range (min … max):   558.9 ms … 580.8 ms    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/serde1/Cargo.toml"
  Time (mean ± σ):      1.966 s ±  0.007 s    [User: 1.903 s, System: 0.058 s]
  Range (min … max):    1.959 s …  1.974 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/serde2/Cargo.toml"
  Time (mean ± σ):      4.138 s ±  0.040 s    [User: 4.057 s, System: 0.072 s]
  Range (min … max):    4.110 s …  4.207 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/slice-get-unchecked/Cargo.toml"
  Time (mean ± σ):     541.8 ms ±   5.6 ms    [User: 470.7 ms, System: 66.9 ms]
  Range (min … max):   535.6 ms … 549.1 ms    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/unicode/Cargo.toml"
  Time (mean ± σ):      1.490 s ±  0.021 s    [User: 1.424 s, System: 0.060 s]
  Range (min … max):    1.454 s …  1.505 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/zip-equal/Cargo.toml"
  Time (mean ± σ):      2.215 s ±  0.048 s    [User: 2.113 s, System: 0.072 s]
  Range (min … max):    2.183 s …  2.299 s    5 runs

saethlin · 2023-11-27T22:17:47Z

tests/pass/stacked-borrows/stack-printing.rs

-//@compile-flags: -Zmiri-permissive-provenance
+//@compile-flags: -Zmiri-permissive-provenance -Zmiri-provenance-gc=0


Previously, the GC didn't run on the relevant stacks during the -Zmiri-provenance-gc=1 test. Now it does. So adding this flag gives us consistent behavior.

And it's a borrow-stack-printing test, those generally shouldn't GC as that changes the stacks. But please add a comment explaining that.

RalfJung · 2023-11-28T09:06:08Z

I think this nets even primarily because we get better heap reuse. With this change I see almost all the mmap calls coming from our diagnostics infrastructure go away. Not that there were many to start with, but it's an indicator that our memory locality has improved.

Diagnostics should only trigger when there is an error? Or is this about the tracking SB does to give nicer errors if an error happens?

RalfJung · 2023-11-28T09:08:18Z

I am still wondering how good our benchmarks are at actually checking GC performance... maybe the benchmarks should be run with a GC interval of 1k blocks (10x the normal frequency) to better emulate long-running programs? After all, if Miri already takes <10s then it's not so critical, but we don't actually want benchmarks to take 100s to run -- but we can at least emulate the amount of GC work that such a 100s run would do.

saethlin · 2023-11-28T14:51:14Z

Diagnostics should only trigger when there is an error? Or is this about the tracking SB does to give nicer errors if an error happens?

This is about the tracking.

I am still wondering how good our benchmarks are at actually checking GC performance... maybe the benchmarks should be run with a GC interval of 1k blocks (10x the normal frequency) to better emulate long-running programs?

I don't think increasing the GC interval emulates long-running programs. The GC becomes a performance increase when the work it does to look for memory to reclaim is outweighed by the work decrease caused by shrinking the size of our data structures (keeping O(n) searches from becoming too slow) and also increasing memory locality on the heap. If we just run the GC more often, it does all its work but provides less value to the program because it finds less garbage.

If we want to add some macro-benchmarking, I can sift through crates.io and extract some tests that run for a while into benchmark programs.

RalfJung · 2023-11-28T15:00:30Z

I don't think increasing the GC interval emulates long-running programs. The GC becomes a performance increase when the work it does to look for memory to reclaim is outweighed by the work decrease caused by shrinking the size of our data structures (keeping O(n) searches from becoming too slow) and also increasing memory locality on the heap. If we just run the GC more often, it does all its work but provides less value to the program because it finds less garbage.

Hm, fair.

Maybe we should have one specific GC benchmark then that uses the lower GC interval and aims to artificially create a lot more garbage than normal programs would?

If we want to add some macro-benchmarking, I can sift through crates.io and extract some tests that run for a while into benchmark programs.

I'd prefer to keep benchmarks below 10s.

saethlin · 2023-11-28T21:22:11Z

Maybe we should have one specific GC benchmark then that uses the lower GC interval and aims to artificially create a lot more garbage than normal programs would?

slice-get-unchecked is kind of this already. Based on some quick experimentation, the optimal GC interval for that program's runtime is about 100. With the GC off, the benchmark takes 6.4 seconds. With the default GC interval, it runs in 0.55 seconds, and with -Zmiri-provenance-gc=100, it drops to 0.37 seconds.

With the GC enabled, the peak memory usage is 106 MB for the frequent GC and 119 MB for the default interval. With the GC off, the peak memory is 6554 MB. So I think it's fair to say this benchmark produces a lot of garbage.

Would just decreasing the GC interval for this benchmark in miri-script meet your request here?

RalfJung · 2023-11-28T21:34:23Z

Thanks for investigating!

Would just decreasing the GC interval for this benchmark in miri-script meet your request here?

I'm just throwing ideas in the room to hopefully get better bench coverage. Ultimately you have a better idea of where we are in terms of perf than I do. If you feel, after doing this investigation, that the benchmark as-is is already covering our GC sufficiently, I'm okay keeping it as-is. If you think changing the GC interval would make the benchmark suite better, then let's change it.

saethlin · 2023-11-28T21:54:23Z

I'd rather keep it as-is, so that the benchmark suite rewards us if we figure out how make the GC interval tune itself.

At some point, maybe I'll collect interpreter profiles for all the crates I'm running tests for. I bet there are odd cases hidden in there somewhere. The trick is finding a good way to sift through all the data.

RalfJung · 2023-11-28T22:01:12Z

Okay, then r=me after adding a comment in that test file as noted above.

saethlin · 2023-11-28T22:41:57Z

@bors r=RalfJung

bors · 2023-11-28T22:42:00Z

📌 Commit d163a44 has been approved by RalfJung

It is now in the queue for this repository.

bors · 2023-11-28T22:43:07Z

⌛ Testing commit d163a44 with merge 68fee26...

bors · 2023-11-28T23:24:03Z

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing 68fee26 to master...

saethlin force-pushed the remove-gc-heuristics branch from ade0c58 to 325b95d Compare November 27, 2023 22:16

saethlin commented Nov 27, 2023

View reviewed changes

Remove Stacked Borrows GC heuristics

d163a44

saethlin force-pushed the remove-gc-heuristics branch from 325b95d to d163a44 Compare November 28, 2023 22:41

bors merged commit 68fee26 into rust-lang:master Nov 28, 2023
8 checks passed

saethlin deleted the remove-gc-heuristics branch November 28, 2023 23:24

saethlin mentioned this pull request Dec 4, 2023

GC runtime is O(allocations), so it eventually dominates runtime #3209

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Stacked Borrows GC heuristics #3194

Remove Stacked Borrows GC heuristics #3194

saethlin commented Nov 27, 2023

saethlin Nov 27, 2023

RalfJung Nov 28, 2023

RalfJung commented Nov 28, 2023

RalfJung commented Nov 28, 2023 •

edited

saethlin commented Nov 28, 2023

RalfJung commented Nov 28, 2023

saethlin commented Nov 28, 2023

RalfJung commented Nov 28, 2023

saethlin commented Nov 28, 2023

RalfJung commented Nov 28, 2023

saethlin commented Nov 28, 2023

bors commented Nov 28, 2023

bors commented Nov 28, 2023

bors commented Nov 28, 2023

		//@compile-flags: -Zmiri-permissive-provenance
		//@compile-flags: -Zmiri-permissive-provenance -Zmiri-provenance-gc=0

Remove Stacked Borrows GC heuristics #3194

Remove Stacked Borrows GC heuristics #3194

Conversation

saethlin commented Nov 27, 2023

saethlin Nov 27, 2023

Choose a reason for hiding this comment

RalfJung Nov 28, 2023

Choose a reason for hiding this comment

RalfJung commented Nov 28, 2023

RalfJung commented Nov 28, 2023 • edited

saethlin commented Nov 28, 2023

RalfJung commented Nov 28, 2023

saethlin commented Nov 28, 2023

RalfJung commented Nov 28, 2023

saethlin commented Nov 28, 2023

RalfJung commented Nov 28, 2023

saethlin commented Nov 28, 2023

bors commented Nov 28, 2023

bors commented Nov 28, 2023

bors commented Nov 28, 2023

RalfJung commented Nov 28, 2023 •

edited