Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Rc and Arc to 32-bit reference counts #41005

Closed
wants to merge 2 commits into from

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Apr 1, 2017

Why this is profitable: This represents a potential memory savings on 64-bit platforms, although the allocator may happily overallocate, yielding no practical reduction in memory footprint for any given type. This can also reduce the pressure to vend a version of Rc/Arc that doesn't have weak counts: if you're storing a 64-bit aligned type (anything with a pointer/index), removing the weak count represents no potential space savings, as it would just be replaced with padding.

Why this is fine: We already pay the cost of overflow checks on our reference counts for esoteric safety reasons. So the only concern is that this will break applications that would exceed a 31-bit counter (1 bit is reserved). Any application that overflows a 31-bit counter on 64-bit is either doing mem::forget nonsense (and is broken) or has ~17GB committed to pointing to a single Rc allocation 2 billion times (and is almost certainly broken).

Prior art: both Swift and Firefox already use 32-bit counts (Swift having weak references as well).

This implementation is currently a lazy minimal implementation. It's possible that fusing the weak/strong counts into a single u64 would enable "fused" operations which check/manipulate both counts at once. This was more work than I was willing to invest into this change at this point in time, but is interesting future work.

Someone should probably review the Arc changes carefully, in case I ruined something important.

@rust-highfive
Copy link
Collaborator

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

@Gankra
Copy link
Contributor Author

Gankra commented Apr 1, 2017

Would be good to have some heavy Rc/Arc users try this out. (Servo?)

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Apr 1, 2017
@alexcrichton
Copy link
Member

Seems nifty to me!

My main comment would be the switch from AtomicUsize to AtomicU32. I believe we guarantee that AtomicUsize is on all our platforms currently, but we don't guarantee that AtomicU32 is (although it may currently be on all platforms, I'm not sure).

If we add a platform without AtomicU32 that just means that someone will have to munge around and get AtomicUsize working with special casing for 64-bit, so it's mostly just a matter of whether that happens now or later. I'd personally be ok with either, just want to make sure we're on board with either decision.

@Gankra
Copy link
Contributor Author

Gankra commented Apr 1, 2017

Yes I wasn't certain on platform availability or performance quirks; my primary concern was 16-bit, but I was assured libstd didn't (and never would?) support those targets.

@alexcrichton
Copy link
Member

Hm yeah so the main "tiny target" I know of is AVR, although I don't know if that has the standard library compiling (e.g. anything other than libcore). As before, though, it's certinly possible to implement Arc on these platforms, it just requires extra work and we need to decide whether that should happen now or later on demand.

@jonas-schievink
Copy link
Contributor

There's also the MSP430 series of 16-bit microcontrollers which is somewhat supported by Rust. I don't think any of them has any part of libstd compiling.

@Gankra
Copy link
Contributor Author

Gankra commented Apr 2, 2017

I'm really quite suspicious of Arc on 16-bit (dynamic heap allocations and multithreading on hardware that weak?), but I'll be honest that I don't actually know what people are using these platforms for.

@alexcrichton
Copy link
Member

We discussed this PR at libs triage the other day, and some conclusions were:

  • It's somewhat reasonable that a super beefy server may actually blow the limit here. For example if you have 1TB of ram (not unheard of these days) and store a bunch of refcounted things it may be possible to blow the 32-bit limit.
  • On the other hand, we may not wish to tailor the standard library for niche use cases. It may be reasonable for a "usize" reference count to move to crates.io where a "u32" reference count sticks in the standard library.

We concluded that the next steps here are likely to gather benchmarks both about memory usage and performance and see how it fares. After that we can evaluate the change for going into libstd.

@shepmaster
Copy link
Member

the next steps here are likely to gather benchmarks both about memory usage and performance and see how it fares

@alexcrichton do you have an idea in mind for who would do these benchmarks? For now, I'll assume you want @gankro to do it. Please feel free to update accordingly.

@shepmaster shepmaster added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Apr 14, 2017
@Gankra
Copy link
Contributor Author

Gankra commented Apr 14, 2017

@shepmaster I'm pretty busy and don't have any good idea of what a "representative" workload would be. Someone who actually uses this API really needs to chime in.

@alexcrichton I'm still pretty confident that even if you have 1TB of RAM, when you fill 17GB of it with pointers to a single value, you should be considering some serious architectural changes. My understanding is that these behemoth servers are either caching their whole disk in RAM or manipulating some huge database (hadoop?). Neither of which should involve intensive Rc usage.

Either way I agree it's not something we should support.

@arielb1
Copy link
Contributor

arielb1 commented Apr 16, 2017

@gankro

I'm not sure. The consequence (undesirable abort on overflow) could be quite painful & confusing if hit, especially on an edge case.

@ghost
Copy link

ghost commented Apr 16, 2017

I'm not sure. The consequence (undesirable abort on overflow) could be quite painful & confusing if hit, especially on an edge case.

Abort on 32-bit overflow can be avoided. Here's one idea:

Of 64 bits for the reference counts, 32 are reserved for strong count, 31 for weak count, and 1 for a flag that signifies that the counts got really large. If the flag is not set, treat strong and weak counts as usual.

If they start overflowing, set the flag and allocate on the heap a separate tuple (u64, u64) with wider strong and weak counts. Now the 63 bits that held narrow counts can instead hold the address of the allocated pair of wide counts.

I've heard that ARC in Swift/Obj-C uses tricks like this one. Perhaps @gankro knows more about this.

@Gankra
Copy link
Contributor Author

Gankra commented Apr 16, 2017

I've never heard of that, sorry. The refcounting impl was rewritten recently, too.

Another option is to just have '!0' be a sentinel that never increments/decrements, so that the memory just leaks forever. iirc Firefox has used this failure mode?

@arthurprs
Copy link
Contributor

It's possible to set MAX_REFCOUNT higher, like 3.2 billion, that's 25+GB worth of pointers!

@arielb1 arielb1 added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 18, 2017
@alexcrichton
Copy link
Member

@gankro

Someone who actually uses this API really needs to chime in.

One idea we had in mind was Servo (which uses refcounting like this, right?). In triage though we didn't have a number of other concrete ideas, we just figured there'd be some already to motivate the PR.

I'm still pretty confident that even if you have 1TB of RAM, when you fill 17GB of it with pointers to a single value, you should be considering some serious architectural changes.

Yeah so we didn't rule out this change at all, to be clear. We just wanted to get some numbers both ways to evaluate it. I could imagine such a scenario being useful to someone, but if it's only useful to one person then it may not be worth keeping in libstd.


@stjepang

Abort on 32-bit overflow can be avoided. Here's one idea:

An interesting idea! This may get somewhat tricky in the Arc case due to concurrency, but could be a plausible solution. I'd be curious to evaluate the performance of such a change though because it'd be adding a conditional branch to all calls to clone, which are currently quite cheap.

It's also still unclear (to me at least) that we'd need such an implementation. If we choose to go to 32-bit refcounting I'd expect libstd to just stick with that. If you know your use case needs 64-bit reference counting then we'd have a crate on crates.io for you.

@nagisa
Copy link
Member

nagisa commented Apr 24, 2017

If we’re optimising for size of Rc/Arc and doing a sort-of-silently-breaking-but-not-likely change, why not just add generic parameters for size of both strong and weak counts? i.e. Rc<T, Strong=u32, Weak=u32>. I realise the suggestion is silly, though, because writing code generic over type of refcounts gonna be a pain.

@brson brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Apr 25, 2017
@alexcrichton
Copy link
Member

Ok the libs team discussed this again, and our conclusion was that we basically just need some concrete number of some form to land this. @gankro would you be willing to write an arbitrary benchmark, measure it, show the improvement, and then update the PR to land it?

(one update we agreed was needed was to update the docs to reflect this fact)

@arthurprs
Copy link
Contributor

We're looking at memory saving here right? That depends heavily on the ref count (as you need space to hold pointers to the Arc). For the sake of example if you have 4 refs on average you get saving of 6.25% for T's of 24bytes (https://www.desmos.com/calculator/4yvi1px768).

@arthurprs
Copy link
Contributor

arthurprs commented Apr 26, 2017

Thinking more about this, 4 references on average might be a high number for real applications. So the actual savings might be larger.

@alexcrichton
Copy link
Member

We definitely do acknowledge that mathematically this is a win. We just want any concrete piece of evidence at all showing so as well.

@SimonSapin
Copy link
Contributor

We're looking at memory saving here right? That depends heavily on the ref count

If understand correctly it does not. Arc<T> itself (of which there is one per reference) is a single pointer to ArcInner<T>. The reference counts are inside ArcInner. This PR affects the size of ArcInner, not Arc. Only the latter is repeated for each reference.

@Gankra
Copy link
Contributor Author

Gankra commented Apr 30, 2017

@SimonSapin yes, and memory savings on ArcBox leads to 1:Count memory savings on Arc. e.g. 1:1 savings if all your counts are always 1.

@SimonSapin
Copy link
Contributor

Ah, I think I misunderstood what we were calculating, sorry :)

@alexcrichton alexcrichton added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels May 9, 2017
@carols10cents
Copy link
Member

carols10cents commented May 15, 2017

friendly ping @gankro, just want to keep this on your radar!

@Gankra
Copy link
Contributor Author

Gankra commented May 16, 2017

I'm stretched too thin to do the necessary legwork on this, sorry.

@Gankra Gankra closed this May 16, 2017
@arthurprs
Copy link
Contributor

If there's interested from libs team I'm sure others can pick this up, otherwise I can do it.

Personally I think we can increase the max ref limit to 3 billion (25GB worth of pointers) and call it a day by either aborting or leaking. Anyone really expecting anything close to a billion Arcs is probably writing it's own ref counting (-weak for example).

@SimonSapin
Copy link
Contributor

@bholley How do you think this would impact Stylo? (Assuming we hadn’t moved yet to something that doesn’t support Weak.) What would be interesting to measure?

@bholley
Copy link
Contributor

bholley commented May 23, 2017

The main impact of this change would be a marginal decrease in memory footprint, as well as likely a small (but probably hard-to-measure) increase in cache hits due to better data packing. I doubt it would be worthwhile to try to show concrete reproducible increases in stylo. Measurements affected by cache performance tend to be pretty noisy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.