New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Stacked Borrows GC heuristics #3194
Conversation
ade0c58
to
325b95d
Compare
//@compile-flags: -Zmiri-permissive-provenance | ||
//@compile-flags: -Zmiri-permissive-provenance -Zmiri-provenance-gc=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, the GC didn't run on the relevant stacks during the -Zmiri-provenance-gc=1
test. Now it does. So adding this flag gives us consistent behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it's a borrow-stack-printing test, those generally shouldn't GC as that changes the stacks. But please add a comment explaining that.
Diagnostics should only trigger when there is an error? Or is this about the tracking SB does to give nicer errors if an error happens? |
I am still wondering how good our benchmarks are at actually checking GC performance... maybe the benchmarks should be run with a GC interval of 1k blocks (10x the normal frequency) to better emulate long-running programs? After all, if Miri already takes <10s then it's not so critical, but we don't actually want benchmarks to take 100s to run -- but we can at least emulate the amount of GC work that such a 100s run would do. |
This is about the tracking.
I don't think increasing the GC interval emulates long-running programs. The GC becomes a performance increase when the work it does to look for memory to reclaim is outweighed by the work decrease caused by shrinking the size of our data structures (keeping O(n) searches from becoming too slow) and also increasing memory locality on the heap. If we just run the GC more often, it does all its work but provides less value to the program because it finds less garbage. If we want to add some macro-benchmarking, I can sift through crates.io and extract some tests that run for a while into benchmark programs. |
Hm, fair. Maybe we should have one specific GC benchmark then that uses the lower GC interval and aims to artificially create a lot more garbage than normal programs would?
I'd prefer to keep benchmarks below 10s. |
With the GC enabled, the peak memory usage is 106 MB for the frequent GC and 119 MB for the default interval. With the GC off, the peak memory is 6554 MB. So I think it's fair to say this benchmark produces a lot of garbage. Would just decreasing the GC interval for this benchmark in |
Thanks for investigating!
I'm just throwing ideas in the room to hopefully get better bench coverage. Ultimately you have a better idea of where we are in terms of perf than I do. If you feel, after doing this investigation, that the benchmark as-is is already covering our GC sufficiently, I'm okay keeping it as-is. If you think changing the GC interval would make the benchmark suite better, then let's change it. |
I'd rather keep it as-is, so that the benchmark suite rewards us if we figure out how make the GC interval tune itself. At some point, maybe I'll collect interpreter profiles for all the crates I'm running tests for. I bet there are odd cases hidden in there somewhere. The trick is finding a good way to sift through all the data. |
Okay, then r=me after adding a comment in that test file as noted above. |
325b95d
to
d163a44
Compare
@bors r=RalfJung |
☀️ Test successful - checks-actions |
Removing these has no impact on our benchmarks. I think I initially added these because they have a significant impact on runtime at shorter GC intervals. But both these heuristics result in undesirable memory growth in real programs, especially
modified_since_last_gc
. I didn't realize at the time that required state becomes garbage as a result of changes to other allocations.I think this nets even primarily because we get better heap reuse. With this change I see almost all the mmap calls coming from our diagnostics infrastructure go away. Not that there were many to start with, but it's an indicator that our memory locality has improved.
Before:
After: