Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Stacked Borrows GC heuristics #3194

Merged
merged 1 commit into from Nov 28, 2023

Conversation

saethlin
Copy link
Member

Removing these has no impact on our benchmarks. I think I initially added these because they have a significant impact on runtime at shorter GC intervals. But both these heuristics result in undesirable memory growth in real programs, especially modified_since_last_gc. I didn't realize at the time that required state becomes garbage as a result of changes to other allocations.

I think this nets even primarily because we get better heap reuse. With this change I see almost all the mmap calls coming from our diagnostics infrastructure go away. Not that there were many to start with, but it's an indicator that our memory locality has improved.

Before:

Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/backtraces/Cargo.toml"
  Time (mean ± σ):      4.046 s ±  0.087 s    [User: 3.952 s, System: 0.085 s]
  Range (min … max):    3.952 s …  4.139 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/invalidate/Cargo.toml"
  Time (mean ± σ):      6.271 s ±  0.073 s    [User: 6.206 s, System: 0.054 s]
  Range (min … max):    6.195 s …  6.365 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/mse/Cargo.toml"
  Time (mean ± σ):     570.3 ms ±   6.7 ms    [User: 505.5 ms, System: 61.8 ms]
  Range (min … max):   559.6 ms … 576.0 ms    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/serde1/Cargo.toml"
  Time (mean ± σ):      2.013 s ±  0.012 s    [User: 1.938 s, System: 0.069 s]
  Range (min … max):    1.994 s …  2.024 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/serde2/Cargo.toml"
  Time (mean ± σ):      4.155 s ±  0.082 s    [User: 4.078 s, System: 0.067 s]
  Range (min … max):    4.011 s …  4.211 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/slice-get-unchecked/Cargo.toml"
  Time (mean ± σ):     541.5 ms ±   3.9 ms    [User: 477.3 ms, System: 60.0 ms]
  Range (min … max):   536.1 ms … 545.1 ms    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/unicode/Cargo.toml"
  Time (mean ± σ):      1.496 s ±  0.002 s    [User: 1.442 s, System: 0.050 s]
  Range (min … max):    1.494 s …  1.500 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/zip-equal/Cargo.toml"
  Time (mean ± σ):      2.190 s ±  0.043 s    [User: 2.109 s, System: 0.075 s]
  Range (min … max):    2.114 s …  2.215 s    5 runs

After:

Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/backtraces/Cargo.toml"
  Time (mean ± σ):      3.954 s ±  0.057 s    [User: 3.871 s, System: 0.075 s]
  Range (min … max):    3.912 s …  4.052 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/invalidate/Cargo.toml"
  Time (mean ± σ):      6.200 s ±  0.108 s    [User: 6.129 s, System: 0.060 s]
  Range (min … max):    6.072 s …  6.295 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/mse/Cargo.toml"
  Time (mean ± σ):     575.3 ms ±   9.3 ms    [User: 511.5 ms, System: 60.2 ms]
  Range (min … max):   558.9 ms … 580.8 ms    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/serde1/Cargo.toml"
  Time (mean ± σ):      1.966 s ±  0.007 s    [User: 1.903 s, System: 0.058 s]
  Range (min … max):    1.959 s …  1.974 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/serde2/Cargo.toml"
  Time (mean ± σ):      4.138 s ±  0.040 s    [User: 4.057 s, System: 0.072 s]
  Range (min … max):    4.110 s …  4.207 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/slice-get-unchecked/Cargo.toml"
  Time (mean ± σ):     541.8 ms ±   5.6 ms    [User: 470.7 ms, System: 66.9 ms]
  Range (min … max):   535.6 ms … 549.1 ms    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/unicode/Cargo.toml"
  Time (mean ± σ):      1.490 s ±  0.021 s    [User: 1.424 s, System: 0.060 s]
  Range (min … max):    1.454 s …  1.505 s    5 runs
 
Benchmark 1: cargo miri run  --manifest-path "/home/ben/miri/bench-cargo-miri/zip-equal/Cargo.toml"
  Time (mean ± σ):      2.215 s ±  0.048 s    [User: 2.113 s, System: 0.072 s]
  Range (min … max):    2.183 s …  2.299 s    5 runs

Comment on lines 1 to 3
//@compile-flags: -Zmiri-permissive-provenance
//@compile-flags: -Zmiri-permissive-provenance -Zmiri-provenance-gc=0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the GC didn't run on the relevant stacks during the -Zmiri-provenance-gc=1 test. Now it does. So adding this flag gives us consistent behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's a borrow-stack-printing test, those generally shouldn't GC as that changes the stacks. But please add a comment explaining that.

@RalfJung
Copy link
Member

I think this nets even primarily because we get better heap reuse. With this change I see almost all the mmap calls coming from our diagnostics infrastructure go away. Not that there were many to start with, but it's an indicator that our memory locality has improved.

Diagnostics should only trigger when there is an error? Or is this about the tracking SB does to give nicer errors if an error happens?

@RalfJung
Copy link
Member

RalfJung commented Nov 28, 2023

I am still wondering how good our benchmarks are at actually checking GC performance... maybe the benchmarks should be run with a GC interval of 1k blocks (10x the normal frequency) to better emulate long-running programs? After all, if Miri already takes <10s then it's not so critical, but we don't actually want benchmarks to take 100s to run -- but we can at least emulate the amount of GC work that such a 100s run would do.

@saethlin
Copy link
Member Author

Diagnostics should only trigger when there is an error? Or is this about the tracking SB does to give nicer errors if an error happens?

This is about the tracking.

I am still wondering how good our benchmarks are at actually checking GC performance... maybe the benchmarks should be run with a GC interval of 1k blocks (10x the normal frequency) to better emulate long-running programs?

I don't think increasing the GC interval emulates long-running programs. The GC becomes a performance increase when the work it does to look for memory to reclaim is outweighed by the work decrease caused by shrinking the size of our data structures (keeping O(n) searches from becoming too slow) and also increasing memory locality on the heap. If we just run the GC more often, it does all its work but provides less value to the program because it finds less garbage.

If we want to add some macro-benchmarking, I can sift through crates.io and extract some tests that run for a while into benchmark programs.

@RalfJung
Copy link
Member

I don't think increasing the GC interval emulates long-running programs. The GC becomes a performance increase when the work it does to look for memory to reclaim is outweighed by the work decrease caused by shrinking the size of our data structures (keeping O(n) searches from becoming too slow) and also increasing memory locality on the heap. If we just run the GC more often, it does all its work but provides less value to the program because it finds less garbage.

Hm, fair.

Maybe we should have one specific GC benchmark then that uses the lower GC interval and aims to artificially create a lot more garbage than normal programs would?

If we want to add some macro-benchmarking, I can sift through crates.io and extract some tests that run for a while into benchmark programs.

I'd prefer to keep benchmarks below 10s.

@saethlin
Copy link
Member Author

Maybe we should have one specific GC benchmark then that uses the lower GC interval and aims to artificially create a lot more garbage than normal programs would?

slice-get-unchecked is kind of this already. Based on some quick experimentation, the optimal GC interval for that program's runtime is about 100. With the GC off, the benchmark takes 6.4 seconds. With the default GC interval, it runs in 0.55 seconds, and with -Zmiri-provenance-gc=100, it drops to 0.37 seconds.

With the GC enabled, the peak memory usage is 106 MB for the frequent GC and 119 MB for the default interval. With the GC off, the peak memory is 6554 MB. So I think it's fair to say this benchmark produces a lot of garbage.

Would just decreasing the GC interval for this benchmark in miri-script meet your request here?

@RalfJung
Copy link
Member

Thanks for investigating!

Would just decreasing the GC interval for this benchmark in miri-script meet your request here?

I'm just throwing ideas in the room to hopefully get better bench coverage. Ultimately you have a better idea of where we are in terms of perf than I do. If you feel, after doing this investigation, that the benchmark as-is is already covering our GC sufficiently, I'm okay keeping it as-is. If you think changing the GC interval would make the benchmark suite better, then let's change it.

@saethlin
Copy link
Member Author

I'd rather keep it as-is, so that the benchmark suite rewards us if we figure out how make the GC interval tune itself.

At some point, maybe I'll collect interpreter profiles for all the crates I'm running tests for. I bet there are odd cases hidden in there somewhere. The trick is finding a good way to sift through all the data.

@RalfJung
Copy link
Member

Okay, then r=me after adding a comment in that test file as noted above.

@saethlin
Copy link
Member Author

@bors r=RalfJung

@bors
Copy link
Collaborator

bors commented Nov 28, 2023

📌 Commit d163a44 has been approved by RalfJung

It is now in the queue for this repository.

@bors
Copy link
Collaborator

bors commented Nov 28, 2023

⌛ Testing commit d163a44 with merge 68fee26...

@bors
Copy link
Collaborator

bors commented Nov 28, 2023

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing 68fee26 to master...

@bors bors merged commit 68fee26 into rust-lang:master Nov 28, 2023
8 checks passed
@saethlin saethlin deleted the remove-gc-heuristics branch November 28, 2023 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants