Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Rc and Arc mem::forget safe #27174

Merged
merged 2 commits into from Jul 30, 2015

Conversation

Projects
None yet
@Gankro
Copy link
Contributor

Gankro commented Jul 21, 2015

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Jul 21, 2015

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@Gankro Gankro force-pushed the Gankro:rc-sat branch from 4eaf357 to 41d1ab5 Jul 21, 2015

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 21, 2015

Is it worth evaluating the performance impact and only doing these checks on 32-bit?

@brson

This comment has been minimized.

Copy link
Contributor

brson commented Jul 21, 2015

I'm also not clear on the impact on 64-bit platforms, the original report says it's a problem on both.

@brson

This comment has been minimized.

Copy link
Contributor

brson commented Jul 21, 2015

Can you add test cases to prove that these do what they say?

@brson brson added the relnotes label Jul 21, 2015

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 21, 2015

You can't do this only on 32-bit because LLVM will optimize mem::forget loops into a single add.

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 21, 2015

I was going to say that it seems weird to have a test that counts to usize::MAX (so long), but maybe LLVM can take us straight to the hoop.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 21, 2015

You can't do this only on 32-bit because LLVM will optimize mem::forget loops into a single add.

I can see how the previous "bump refcount" code might optimize to this, but with the saturating_add instead of just += 1 this may be enough to throw off LLVM's optimizations (especially in the Arc case).

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 21, 2015

I don't understand what you're proposing. That this change prevents llvm
from fast-forwarding the exploit to a practical level, and therefore it's
fine to not make this change on some platforms...?
On Jul 20, 2015 11:43 PM, "Alex Crichton" notifications@github.com wrote:

You can't do this only on 32-bit because LLVM will optimize mem::forget
loops into a single add.

I can see how the previous "bump refcount" code might optimize to this,
but with the saturating_add instead of just += 1 this may be enough to
throw off LLVM's optimizations (especially in the Arc case).


Reply to this email directly or view it on GitHub
#27174 (comment).

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 21, 2015

Oh, right! Duh.

May not optimize to one addition on the Arc case though?

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 21, 2015

Yes I'll do some tests.

That said, I'm not comfortable with leaving memory safety holes in Rust on the premise that they're impractical (edit: particularly if they rely on a "sufficiently stupid compiler" argument) and that preventing them has overhead.

I would be interested in including a cfg(allow_preposterous_unsafety) for those who Just Don't Care.

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 21, 2015

Arc does indeed fail to be optimized to a single add today. With some back of the envelope math, that means that on 64-bit with a single thread just incrementing (probably honestly the best case -- the contention from multiple threads incrementing is probable worse) on a 4ghz processor it would take 150 years to overflow.

Assuming a sufficiently stupid compiler.

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 21, 2015

However, that said, do we want to include a test for this if compiler optimization shifts and platform specific details mean this could become a 150 year test?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Jul 21, 2015

I'd like to see the performance measured, but I'm not sure what's a reasonable benchmark. Perhaps something in servo uses Arc? Microbenchmarks certainly showed small (but not negligible) impact. I think all things considered I'm in favor of landing this, I feel like I can imagine this being a problem in some code somewhere, particularly on 32 bit. But it always seems better to go in with data.

(That said, I'm not sure I find the arguments about "sufficiently smart compilers" all that persuasive: it assumes a mem-forget in a loop, right? That doesn't seem like a very realistic scenario, as compared to say some kind of bug that causes a leak for every web connection. But I don't know, it could happen.)

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 21, 2015

@pcwalton has suggested we test the perf impact by pointing Servo at Wikipedia and profiling with -p 1.

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 23, 2015

The results on Servo:

Wikipedia home: https://gist.github.com/Gankro/bed5fee96667c25a4f1c
Wikipedia guardians of the galaxY: https://gist.github.com/Gankro/61b2141c205b3ceb4d9d

Talking on #servo this seemed to be "in the noise".

Micro benchmarks from the original thread by niko (checking instead if x < 0 which presumably shouldn't make a difference?) showed noticeable but fairly low impact (10-20% in the Arc case):

https://gist.github.com/nikomatsakis/8ad11f16fd2d24b51b25

I can run any benchmarks people want to see but I must admit I am not the best at creating "good" performance evaluations.

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 23, 2015

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 23, 2015

Thanks for doing the analysis @Gankro! I think this is fine to merge, but I'd like to briefly discuss it in triage next week.

@larsberg

This comment has been minimized.

Copy link

larsberg commented Jul 24, 2015

I think you've got the wrong lars berg, maybe you want @larsbergstrom?

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 24, 2015

Oops, sorry @larsberg!

@dgrunwald

This comment has been minimized.

Copy link
Contributor

dgrunwald commented Jul 26, 2015

As discussed in the internals thread, the saturating solution requires that all wider-than-address-space stores that accepting arbitrary types cannot offer a safe API. Is this really a good trade-off?

For example, I could imagine a CompressingVec<T> that offers a similar public API as Vec<T> but uses gzip compression on the bytes stores in it. Do we want to forbid people from writing types like CompressingVec<T> and offering a safe API?

If we don't create and document a rule forbidding such wider-than-address-space stores in safe rust, the assumption that a saturated refcount can never again reach zero is invalid, and Rc stays unsafe.

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 26, 2015

@dgrunwald I don't really see why anyone would assume they could do such a thing.

@dgrunwald

This comment has been minimized.

Copy link
Contributor

dgrunwald commented Jul 27, 2015

Why not? If I have ownership of a value and can move it; why shouldn't I be able to move its bits to disk, and back later?
An out-of-memory store seems like a reasonable thing to write in a systems programming language; I don't think it's obvious that we want to forbid those just so that Rc can make an assumption on the number of live values.

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 27, 2015

Is this something many (any) languages support at all? Is this something anyone actually does in practice? It sounds like a big headache for not much gain.

Note that I don't believe this prevent you from mem-mapping files and doing magic compression with a custom FS or whatever. You just have to have everything uniquely addressable by a pointer, which is 2^64 bytes (18 exabytes) on modern platforms!

@pcwalton

This comment has been minimized.

Copy link
Contributor

pcwalton commented Jul 28, 2015

Do you have a citation on this? Has anybody ever written a std::vector that transparently compresses its contents? A quick Google search brings up no such example of C++ code in existence.

I'm really wary of introducing an extra test-and-branch for every single Rc manipulation for a theoretical use case. My inclination is to tell anybody who wants to write such a container to just use trait bounds to whitelist the kinds of values that can be stored within it.

@dgrunwald

This comment has been minimized.

Copy link
Contributor

dgrunwald commented Jul 28, 2015

I don't think the exact example of a zipping vector exists anywhere, but that's just an example from a large number of possible larger-than-memory stores (a more likely usecase would be using a large memory mapped file where only sections of it are mapped into memory).

I have actually written a generic vector with run-length-compression in C#. That one was using object.Equals() and cloning C# references and would be doable in safe rust using Eq + Clone -- but in cases without Clone, the equivalent type that works by comparing the underlying bits (and thus doesn't need any bounds) might be nice-to-have.

Edit: Also, i don't think test-and-predictable-branch is going to be any more expensive than test-and-conditional-move (which is what a saturating add compiles to on x64).

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 29, 2015

Discussing it today, while we concluded that no one cares about the external memory thing, we do think it makes sense to be able to identify two values as identical and compress them by storing once and maintaining a count of how many times its stored. As such I will update this to instead abort -- if you get to this point your code is tragically broken anyway. We can later decide to instead permanently saturate (adding branches to Drop) if this proves to be onerous. In practice this is all irrelevant since triggering this code will basically never happen. If you have a legit leak you'll OOM first. You need to be calling mem::forget.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Jul 29, 2015

We talked about this libs triage meeting today and the consensus was that we should merge with the abort strategy (as explained by @Gankro)

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Jul 29, 2015

Sounds good

@Gankro Gankro force-pushed the Gankro:rc-sat branch from 41d1ab5 to 8fda0b0 Jul 29, 2015

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 29, 2015

@bors r=alexcrichton

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jul 29, 2015

📌 Commit 8fda0b0 has been approved by alexcrichton

@Gankro Gankro force-pushed the Gankro:rc-sat branch 2 times, most recently from 1d7c787 to c9effc8 Jul 29, 2015

@Gankro Gankro force-pushed the Gankro:rc-sat branch from c9effc8 to 22e2100 Jul 30, 2015

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Jul 30, 2015

@bors r=alexcrichton

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jul 30, 2015

📌 Commit 22e2100 has been approved by alexcrichton

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jul 30, 2015

⌛️ Testing commit 22e2100 with merge db2af71...

bors added a commit that referenced this pull request Jul 30, 2015

@bors bors merged commit 22e2100 into rust-lang:master Jul 30, 2015

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
@comex

This comment has been minimized.

Copy link
Contributor

comex commented Aug 4, 2015

@Gankro Any plans to refine this to (safely) avoid the check on 64-bit ABIs?

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Aug 4, 2015

@comex there is no safe way to avoid the check on Rc, llvm is too smart -- it's possible to ditch on Arc if anyome can provide me concrete docs on the C11 atomics that the compiler isn't allowed to combine several fetch_adds into one.

@comex

This comment has been minimized.

Copy link
Contributor

comex commented Aug 4, 2015

I believe it's possible to do it with the same trick as test::black_box (or an inlined version of that function): an empty asm! block. For instance, if you passed the pointer through there, LLVM would have no way of knowing if multiple increases went to the same Rc object, so it couldn't combine them.

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Aug 4, 2015

@comex I might be happy with this as long as we had a test that verifies that llvm never starts optimizing away empty asm... the black-box thing is a tenuous hack to my knowledge.

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Aug 4, 2015

@comex it sounds like that might cause more damage than it does good (preventing optimizations).

@comex

This comment has been minimized.

Copy link
Contributor

comex commented Aug 4, 2015

@bluss It shouldn't; as long as the function containing asm is inlined, the only constraint given to LLVM is that the pointer should be in a register, which it's almost certain to be anyway. Without any other extended effects (e.g. "memory"), clobber registers, or outputs, it should be able to safely assume that nothing other than that register is changed.

@glaebhoerl

This comment has been minimized.

Copy link
Contributor

glaebhoerl commented Aug 8, 2015

Incidentally, I recall a discussion way back when about the possibility of optimizing space usage by using (u32, u32) instead of (usize, usize) for the reference counts. Back then, this idea was rejected on the grounds that it would be memory-unsafe if the counts were to overflow. Now that it turns out we need to explicitly check for overflow anyways, though, this reasoning no longer applies. The question is now only whether we consider it important to allow a reference-counted box to have over 4 billion references. (It's quite possible that we do; I just wanted to point out the possibility.)

This is tangentially related to rust-lang/rfcs#1232 in that both changes have the effect of halving the space overhead on 64-bit platforms. It also occurs to me that if we were to consider optimizing this overhead important enough to make Rc generic over whether it supports weak references, then perhaps we should also consider making it generic over the type used for the reference counts - so you could, for example, use Rc<SomeTypeWithAlignof1, NoWeak, u8> with an overhead of only a single byte per box if you didn't need weak references and were completely sure that there would never be more than 255 references to a given box at runtime.

(My personal sense, as I also commented on the other thread, is that these would all be premature optimizations until and unless there were empirical evidence showing concrete gains to be had. But the possibility is there.)

@Gankro

This comment has been minimized.

Copy link
Contributor Author

Gankro commented Aug 8, 2015

@glaebhoerl I agree with your analysis -- YAGNI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.