Move Rc and Arc to 32-bit reference counts #41005

Gankra · 2017-04-01T22:20:24Z

Why this is profitable: This represents a potential memory savings on 64-bit platforms, although the allocator may happily overallocate, yielding no practical reduction in memory footprint for any given type. This can also reduce the pressure to vend a version of Rc/Arc that doesn't have weak counts: if you're storing a 64-bit aligned type (anything with a pointer/index), removing the weak count represents no potential space savings, as it would just be replaced with padding.

Why this is fine: We already pay the cost of overflow checks on our reference counts for esoteric safety reasons. So the only concern is that this will break applications that would exceed a 31-bit counter (1 bit is reserved). Any application that overflows a 31-bit counter on 64-bit is either doing mem::forget nonsense (and is broken) or has ~17GB committed to pointing to a single Rc allocation 2 billion times (and is almost certainly broken).

Prior art: both Swift and Firefox already use 32-bit counts (Swift having weak references as well).

This implementation is currently a lazy minimal implementation. It's possible that fusing the weak/strong counts into a single u64 would enable "fused" operations which check/manipulate both counts at once. This was more work than I was willing to invest into this change at this point in time, but is interesting future work.

Someone should probably review the Arc changes carefully, in case I ruined something important.

rust-highfive · 2017-04-01T22:20:34Z

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

Gankra · 2017-04-01T22:23:24Z

Would be good to have some heavy Rc/Arc users try this out. (Servo?)

alexcrichton · 2017-04-01T22:47:10Z

Seems nifty to me!

My main comment would be the switch from AtomicUsize to AtomicU32. I believe we guarantee that AtomicUsize is on all our platforms currently, but we don't guarantee that AtomicU32 is (although it may currently be on all platforms, I'm not sure).

If we add a platform without AtomicU32 that just means that someone will have to munge around and get AtomicUsize working with special casing for 64-bit, so it's mostly just a matter of whether that happens now or later. I'd personally be ok with either, just want to make sure we're on board with either decision.

Gankra · 2017-04-01T22:55:51Z

Yes I wasn't certain on platform availability or performance quirks; my primary concern was 16-bit, but I was assured libstd didn't (and never would?) support those targets.

alexcrichton · 2017-04-02T02:05:06Z

Hm yeah so the main "tiny target" I know of is AVR, although I don't know if that has the standard library compiling (e.g. anything other than libcore). As before, though, it's certinly possible to implement Arc on these platforms, it just requires extra work and we need to decide whether that should happen now or later on demand.

jonas-schievink · 2017-04-02T08:45:20Z

There's also the MSP430 series of 16-bit microcontrollers which is somewhat supported by Rust. I don't think any of them has any part of libstd compiling.

Gankra · 2017-04-02T16:29:26Z

I'm really quite suspicious of Arc on 16-bit (dynamic heap allocations and multithreading on hardware that weak?), but I'll be honest that I don't actually know what people are using these platforms for.

alexcrichton · 2017-04-12T22:14:15Z

We discussed this PR at libs triage the other day, and some conclusions were:

It's somewhat reasonable that a super beefy server may actually blow the limit here. For example if you have 1TB of ram (not unheard of these days) and store a bunch of refcounted things it may be possible to blow the 32-bit limit.
On the other hand, we may not wish to tailor the standard library for niche use cases. It may be reasonable for a "usize" reference count to move to crates.io where a "u32" reference count sticks in the standard library.

We concluded that the next steps here are likely to gather benchmarks both about memory usage and performance and see how it fares. After that we can evaluate the change for going into libstd.

shepmaster · 2017-04-14T14:49:48Z

the next steps here are likely to gather benchmarks both about memory usage and performance and see how it fares

@alexcrichton do you have an idea in mind for who would do these benchmarks? For now, I'll assume you want @gankro to do it. Please feel free to update accordingly.

Gankra · 2017-04-14T15:03:17Z

@shepmaster I'm pretty busy and don't have any good idea of what a "representative" workload would be. Someone who actually uses this API really needs to chime in.

@alexcrichton I'm still pretty confident that even if you have 1TB of RAM, when you fill 17GB of it with pointers to a single value, you should be considering some serious architectural changes. My understanding is that these behemoth servers are either caching their whole disk in RAM or manipulating some huge database (hadoop?). Neither of which should involve intensive Rc usage.

Either way I agree it's not something we should support.

arielb1 · 2017-04-16T12:50:38Z

@gankro

I'm not sure. The consequence (undesirable abort on overflow) could be quite painful & confusing if hit, especially on an edge case.

ghost · 2017-04-16T15:30:10Z

I'm not sure. The consequence (undesirable abort on overflow) could be quite painful & confusing if hit, especially on an edge case.

Abort on 32-bit overflow can be avoided. Here's one idea:

Of 64 bits for the reference counts, 32 are reserved for strong count, 31 for weak count, and 1 for a flag that signifies that the counts got really large. If the flag is not set, treat strong and weak counts as usual.

If they start overflowing, set the flag and allocate on the heap a separate tuple (u64, u64) with wider strong and weak counts. Now the 63 bits that held narrow counts can instead hold the address of the allocated pair of wide counts.

I've heard that ARC in Swift/Obj-C uses tricks like this one. Perhaps @gankro knows more about this.

Gankra · 2017-04-16T16:47:50Z

I've never heard of that, sorry. The refcounting impl was rewritten recently, too.

Another option is to just have '!0' be a sentinel that never increments/decrements, so that the memory just leaks forever. iirc Firefox has used this failure mode?

arthurprs · 2017-04-17T18:50:53Z

It's possible to set MAX_REFCOUNT higher, like 3.2 billion, that's 25+GB worth of pointers!

alexcrichton · 2017-04-20T13:25:58Z

@gankro

Someone who actually uses this API really needs to chime in.

One idea we had in mind was Servo (which uses refcounting like this, right?). In triage though we didn't have a number of other concrete ideas, we just figured there'd be some already to motivate the PR.

I'm still pretty confident that even if you have 1TB of RAM, when you fill 17GB of it with pointers to a single value, you should be considering some serious architectural changes.

Yeah so we didn't rule out this change at all, to be clear. We just wanted to get some numbers both ways to evaluate it. I could imagine such a scenario being useful to someone, but if it's only useful to one person then it may not be worth keeping in libstd.

@stjepang

Abort on 32-bit overflow can be avoided. Here's one idea:

An interesting idea! This may get somewhat tricky in the Arc case due to concurrency, but could be a plausible solution. I'd be curious to evaluate the performance of such a change though because it'd be adding a conditional branch to all calls to clone, which are currently quite cheap.

It's also still unclear (to me at least) that we'd need such an implementation. If we choose to go to 32-bit refcounting I'd expect libstd to just stick with that. If you know your use case needs 64-bit reference counting then we'd have a crate on crates.io for you.

nagisa · 2017-04-24T21:35:46Z

If we’re optimising for size of Rc/Arc and doing a sort-of-silently-breaking-but-not-likely change, why not just add generic parameters for size of both strong and weak counts? i.e. Rc<T, Strong=u32, Weak=u32>. I realise the suggestion is silly, though, because writing code generic over type of refcounts gonna be a pain.

alexcrichton · 2017-04-26T14:47:22Z

Ok the libs team discussed this again, and our conclusion was that we basically just need some concrete number of some form to land this. @gankro would you be willing to write an arbitrary benchmark, measure it, show the improvement, and then update the PR to land it?

(one update we agreed was needed was to update the docs to reflect this fact)

arthurprs · 2017-04-26T17:06:02Z

We're looking at memory saving here right? That depends heavily on the ref count (as you need space to hold pointers to the Arc). For the sake of example if you have 4 refs on average you get saving of 6.25% for T's of 24bytes (https://www.desmos.com/calculator/4yvi1px768).

arthurprs · 2017-04-26T17:49:02Z

Thinking more about this, 4 references on average might be a high number for real applications. So the actual savings might be larger.

alexcrichton · 2017-04-26T18:39:38Z

We definitely do acknowledge that mathematically this is a win. We just want any concrete piece of evidence at all showing so as well.

SimonSapin · 2017-04-30T04:15:23Z

We're looking at memory saving here right? That depends heavily on the ref count

If understand correctly it does not. Arc<T> itself (of which there is one per reference) is a single pointer to ArcInner<T>. The reference counts are inside ArcInner. This PR affects the size of ArcInner, not Arc. Only the latter is repeated for each reference.

Gankra · 2017-04-30T22:10:36Z

@SimonSapin yes, and memory savings on ArcBox leads to 1:Count memory savings on Arc. e.g. 1:1 savings if all your counts are always 1.

SimonSapin · 2017-05-01T06:09:49Z

Ah, I think I misunderstood what we were calculating, sorry :)

carols10cents · 2017-05-15T12:43:30Z

friendly ping @gankro, just want to keep this on your radar!

Gankra · 2017-05-16T15:28:14Z

I'm stretched too thin to do the necessary legwork on this, sorry.

arthurprs · 2017-05-16T15:39:59Z

If there's interested from libs team I'm sure others can pick this up, otherwise I can do it.

Personally I think we can increase the max ref limit to 3 billion (25GB worth of pointers) and call it a day by either aborting or leaking. Anyone really expecting anything close to a billion Arcs is probably writing it's own ref counting (-weak for example).

SimonSapin · 2017-05-22T21:04:30Z

@bholley How do you think this would impact Stylo? (Assuming we hadn’t moved yet to something that doesn’t support Weak.) What would be interesting to measure?

bholley · 2017-05-23T07:17:14Z

The main impact of this change would be a marginal decrease in memory footprint, as well as likely a small (but probably hard-to-measure) increase in cache hits due to better data packing. I doubt it would be worthwhile to try to show concrete reproducible increases in stylo. Measurements affected by cache performance tend to be pretty noisy.

Gankra added 2 commits April 1, 2017 17:21

shrink reference counts on Arc to 32-bit

0d30517

shrink reference counts on Rc to 32-bit

f8f8ad3

rust-highfive assigned brson Apr 1, 2017

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Apr 1, 2017

shepmaster added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Apr 14, 2017

arielb1 added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 18, 2017

brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Apr 25, 2017

alexcrichton added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels May 9, 2017

Gankra closed this May 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move Rc and Arc to 32-bit reference counts #41005

Move Rc and Arc to 32-bit reference counts #41005

Gankra commented Apr 1, 2017

rust-highfive commented Apr 1, 2017

Gankra commented Apr 1, 2017

alexcrichton commented Apr 1, 2017

Gankra commented Apr 1, 2017

alexcrichton commented Apr 2, 2017

jonas-schievink commented Apr 2, 2017

Gankra commented Apr 2, 2017

alexcrichton commented Apr 12, 2017

shepmaster commented Apr 14, 2017

Gankra commented Apr 14, 2017

arielb1 commented Apr 16, 2017 •

edited

Loading

ghost commented Apr 16, 2017 •

edited by ghost

Loading

Gankra commented Apr 16, 2017

arthurprs commented Apr 17, 2017

alexcrichton commented Apr 20, 2017

nagisa commented Apr 24, 2017

alexcrichton commented Apr 26, 2017

arthurprs commented Apr 26, 2017

arthurprs commented Apr 26, 2017 •

edited

Loading

alexcrichton commented Apr 26, 2017

SimonSapin commented Apr 30, 2017

Gankra commented Apr 30, 2017

SimonSapin commented May 1, 2017

carols10cents commented May 15, 2017 •

edited

Loading

Gankra commented May 16, 2017

arthurprs commented May 16, 2017

SimonSapin commented May 22, 2017

bholley commented May 23, 2017

Move Rc and Arc to 32-bit reference counts #41005

Move Rc and Arc to 32-bit reference counts #41005

Conversation

Gankra commented Apr 1, 2017

rust-highfive commented Apr 1, 2017

Gankra commented Apr 1, 2017

alexcrichton commented Apr 1, 2017

Gankra commented Apr 1, 2017

alexcrichton commented Apr 2, 2017

jonas-schievink commented Apr 2, 2017

Gankra commented Apr 2, 2017

alexcrichton commented Apr 12, 2017

shepmaster commented Apr 14, 2017

Gankra commented Apr 14, 2017

arielb1 commented Apr 16, 2017 • edited Loading

ghost commented Apr 16, 2017 • edited by ghost Loading

Gankra commented Apr 16, 2017

arthurprs commented Apr 17, 2017

alexcrichton commented Apr 20, 2017

nagisa commented Apr 24, 2017

alexcrichton commented Apr 26, 2017

arthurprs commented Apr 26, 2017

arthurprs commented Apr 26, 2017 • edited Loading

alexcrichton commented Apr 26, 2017

SimonSapin commented Apr 30, 2017

Gankra commented Apr 30, 2017

SimonSapin commented May 1, 2017

carols10cents commented May 15, 2017 • edited Loading

Gankra commented May 16, 2017

arthurprs commented May 16, 2017

SimonSapin commented May 22, 2017

bholley commented May 23, 2017

arielb1 commented Apr 16, 2017 •

edited

Loading

ghost commented Apr 16, 2017 •

edited by ghost

Loading

arthurprs commented Apr 26, 2017 •

edited

Loading

carols10cents commented May 15, 2017 •

edited

Loading