-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move Rc and Arc to 32-bit reference counts #41005
Conversation
r? @brson (rust_highfive has picked a reviewer for you, use r? to override) |
Would be good to have some heavy Rc/Arc users try this out. (Servo?) |
Seems nifty to me! My main comment would be the switch from If we add a platform without |
Yes I wasn't certain on platform availability or performance quirks; my primary concern was 16-bit, but I was assured libstd didn't (and never would?) support those targets. |
Hm yeah so the main "tiny target" I know of is AVR, although I don't know if that has the standard library compiling (e.g. anything other than libcore). As before, though, it's certinly possible to implement |
There's also the MSP430 series of 16-bit microcontrollers which is somewhat supported by Rust. I don't think any of them has any part of libstd compiling. |
I'm really quite suspicious of Arc on 16-bit (dynamic heap allocations and multithreading on hardware that weak?), but I'll be honest that I don't actually know what people are using these platforms for. |
We discussed this PR at libs triage the other day, and some conclusions were:
We concluded that the next steps here are likely to gather benchmarks both about memory usage and performance and see how it fares. After that we can evaluate the change for going into libstd. |
@alexcrichton do you have an idea in mind for who would do these benchmarks? For now, I'll assume you want @gankro to do it. Please feel free to update accordingly. |
@shepmaster I'm pretty busy and don't have any good idea of what a "representative" workload would be. Someone who actually uses this API really needs to chime in. @alexcrichton I'm still pretty confident that even if you have 1TB of RAM, when you fill 17GB of it with pointers to a single value, you should be considering some serious architectural changes. My understanding is that these behemoth servers are either caching their whole disk in RAM or manipulating some huge database (hadoop?). Neither of which should involve intensive Rc usage. Either way I agree it's not something we should support. |
I'm not sure. The consequence (undesirable |
Abort on 32-bit overflow can be avoided. Here's one idea: Of 64 bits for the reference counts, 32 are reserved for strong count, 31 for weak count, and 1 for a flag that signifies that the counts got really large. If the flag is not set, treat strong and weak counts as usual. If they start overflowing, set the flag and allocate on the heap a separate tuple I've heard that ARC in Swift/Obj-C uses tricks like this one. Perhaps @gankro knows more about this. |
I've never heard of that, sorry. The refcounting impl was rewritten recently, too. Another option is to just have '!0' be a sentinel that never increments/decrements, so that the memory just leaks forever. iirc Firefox has used this failure mode? |
It's possible to set MAX_REFCOUNT higher, like 3.2 billion, that's 25+GB worth of pointers! |
One idea we had in mind was Servo (which uses refcounting like this, right?). In triage though we didn't have a number of other concrete ideas, we just figured there'd be some already to motivate the PR.
Yeah so we didn't rule out this change at all, to be clear. We just wanted to get some numbers both ways to evaluate it. I could imagine such a scenario being useful to someone, but if it's only useful to one person then it may not be worth keeping in libstd.
An interesting idea! This may get somewhat tricky in the It's also still unclear (to me at least) that we'd need such an implementation. If we choose to go to 32-bit refcounting I'd expect libstd to just stick with that. If you know your use case needs 64-bit reference counting then we'd have a crate on crates.io for you. |
If we’re optimising for size of |
Ok the libs team discussed this again, and our conclusion was that we basically just need some concrete number of some form to land this. @gankro would you be willing to write an arbitrary benchmark, measure it, show the improvement, and then update the PR to land it? (one update we agreed was needed was to update the docs to reflect this fact) |
We're looking at memory saving here right? That depends heavily on the ref count (as you need space to hold pointers to the Arc). For the sake of example if you have 4 refs on average you get saving of 6.25% for T's of 24bytes (https://www.desmos.com/calculator/4yvi1px768). |
Thinking more about this, 4 references on average might be a high number for real applications. So the actual savings might be larger. |
We definitely do acknowledge that mathematically this is a win. We just want any concrete piece of evidence at all showing so as well. |
If understand correctly it does not. |
@SimonSapin yes, and memory savings on ArcBox leads to 1:Count memory savings on Arc. e.g. 1:1 savings if all your counts are always 1. |
Ah, I think I misunderstood what we were calculating, sorry :) |
friendly ping @gankro, just want to keep this on your radar! |
I'm stretched too thin to do the necessary legwork on this, sorry. |
If there's interested from libs team I'm sure others can pick this up, otherwise I can do it. Personally I think we can increase the max ref limit to 3 billion (25GB worth of pointers) and call it a day by either aborting or leaking. Anyone really expecting anything close to a billion Arcs is probably writing it's own ref counting (-weak for example). |
@bholley How do you think this would impact Stylo? (Assuming we hadn’t moved yet to something that doesn’t support |
The main impact of this change would be a marginal decrease in memory footprint, as well as likely a small (but probably hard-to-measure) increase in cache hits due to better data packing. I doubt it would be worthwhile to try to show concrete reproducible increases in stylo. Measurements affected by cache performance tend to be pretty noisy. |
Why this is profitable: This represents a potential memory savings on 64-bit platforms, although the allocator may happily overallocate, yielding no practical reduction in memory footprint for any given type. This can also reduce the pressure to vend a version of Rc/Arc that doesn't have weak counts: if you're storing a 64-bit aligned type (anything with a pointer/index), removing the weak count represents no potential space savings, as it would just be replaced with padding.
Why this is fine: We already pay the cost of overflow checks on our reference counts for esoteric safety reasons. So the only concern is that this will break applications that would exceed a 31-bit counter (1 bit is reserved). Any application that overflows a 31-bit counter on 64-bit is either doing
mem::forget
nonsense (and is broken) or has ~17GB committed to pointing to a single Rc allocation 2 billion times (and is almost certainly broken).Prior art: both Swift and Firefox already use 32-bit counts (Swift having weak references as well).
This implementation is currently a lazy minimal implementation. It's possible that fusing the weak/strong counts into a single u64 would enable "fused" operations which check/manipulate both counts at once. This was more work than I was willing to invest into this change at this point in time, but is interesting future work.
Someone should probably review the Arc changes carefully, in case I ruined something important.