Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canvas unsafe code in the wild #18

Open
nikomatsakis opened this Issue Sep 1, 2016 · 38 comments

Comments

Projects
None yet
@nikomatsakis
Copy link
Owner

nikomatsakis commented Sep 1, 2016

I think we should organize a kind of "canvas" to find examples of how unsafe code is used in the wild. To start, it'd be great to enumerate a list of interesting places to look.

Here is my start at a list. Further nominations welcome. I'll try to keep the list up to date. Moreover, if you feel you've examined the source, open any relate issues and we can check it off.

Other thoughts for packages? Is this a fruitful thing to examine?

@nikomatsakis nikomatsakis changed the title Unsafe code in the wild Canvas unsafe code in the wild Sep 1, 2016

@nikomatsakis nikomatsakis added the K-Task label Sep 2, 2016

@ubsan

This comment has been minimized.

Copy link
Collaborator

ubsan commented Sep 2, 2016

Well, I think pcb would be a nice place to start, it's very simple and has some unsafe code that would be seen as UB by the "correct lifetime" style. (example)

@nikomatsakis

This comment has been minimized.

Copy link
Owner Author

nikomatsakis commented Sep 3, 2016

@ubsan do you think that the "interesting patterns" in pcb are thus summarized by #12? If not, maybe you can write-up an issue that covers them?

@ubsan

This comment has been minimized.

Copy link
Collaborator

ubsan commented Sep 3, 2016

I think they're well enough summarized. (see here) referencing pcb

@eternaleye

This comment has been minimized.

Copy link
Collaborator

eternaleye commented Sep 8, 2016

Another use of unsafe code, this one more type-focused than lifetime-focused: https://github.com/Diggsey/query_interface

@Ericson2314

This comment has been minimized.

Copy link

Ericson2314 commented Sep 8, 2016

@nikomatsakis

This comment has been minimized.

Copy link
Owner Author

nikomatsakis commented Sep 8, 2016

This discussion of how coroutines and scoped threads interact seems relevant, probably more as a negative case (i.e., enumerate where the coroutine abstractions goes wrong):

https://www.reddit.com/r/rust/comments/508pkb/unleakable_crate_safetysanityrefocus/d72703d

@arielb1

This comment has been minimized.

Copy link
Collaborator

arielb1 commented Sep 8, 2016

This discussion of how coroutines and scoped threads interact seems relevant, probably more as a negative case (i.e., enumerate where the coroutine abstractions goes wrong):

That is more of a soundness issue than a memory model issue. scoped assumes that stack frames never leak - that a lifetime parameter passed to a function must live until the function exits (through a return or resume), while coroutines flat out violates that assumption.

@RalfJung

This comment has been minimized.

Copy link
Collaborator

RalfJung commented Sep 12, 2016

I don't entirely understand what is the criterion for listing pieces of the standard library. Certainly RefCell is a very interesting bit of unsafe code, showing how far interior mutability can go and playing games with lifetimes that I have not yet seen in any other piece of unsafe code.

@mystor

This comment has been minimized.

Copy link

mystor commented Sep 12, 2016

The problem with RefCell is that we aren't sure that its implementation won't invoke Undefined Behavior in a future model for what defines undefined behavior in unsafe rust. The fact that it holds a &'a reference which isn't actually valid for all of &'a, but only for either the lifetime of the Ref or RefMut, or 'a, whichever is shorter. This could be considered UB under some models as the lifetimes are "incorrect".

We aren't suggesting removing RefCell, it's just a question of whether its current implementation should be UB.

@RalfJung

This comment has been minimized.

Copy link
Collaborator

RalfJung commented Sep 12, 2016

What I am asking is why RefCell is not listed above as "how unsafe code is used in the wild".

@nikomatsakis

This comment has been minimized.

Copy link
Owner Author

nikomatsakis commented Sep 12, 2016

@mystor @RalfJung I think the only reason I did not list RefCell is that we have #5 already. My goal with this issue is not for it to exist permanently, just to serve as a clearing house of work to be done in terms of researching and then opening issues.

@RalfJung

This comment has been minimized.

Copy link
Collaborator

RalfJung commented Sep 12, 2016

I see, that makes sense.

Notice that the trouble I am having with RefCell in my semantic model is entirely unrelated to #5. However, I am not sure if the memory model would be affected by that...

@arielb1

This comment has been minimized.

Copy link
Collaborator

arielb1 commented Sep 12, 2016

Notice that the trouble I am having with RefCell in my semantic model is entirely unrelated to #5.

What's the trouble again?

@nikomatsakis

This comment has been minimized.

Copy link
Owner Author

nikomatsakis commented Sep 12, 2016

@RalfJung if you see some other aspect of ref-cell that's not documented, please do open another issue...

@RalfJung

This comment has been minimized.

Copy link
Collaborator

RalfJung commented Sep 12, 2016

I think the long version of this needs a blogpost, and I don't know when I will get around to write it -- and I'm still in the process of figuring out the details. I don't necessarily think it's an "issue" in the sense of something that needs to be fixed, it's more of an observation that RefCell does something which nothing else does (nothing else I saw, that is).

I will try a short summary, but beware -- I am thinking in terms of "how to prove the code correct" here, not in terms of a static type system or a memory model.
Fist, let me try to define the "time that a value is shared". This starts when when &x for some not-yet-shared x is executed, and it ends when the lifetime of that "original borrow" ends. For example, in

fn foo(x: &mut i32) {
*x = 13;
{ let y = &*x;
  { let z = &*y; }
}
}

the sharing of the data pointed to by x (and its aliases y and z) lasts for the scope of y. I found that in the base type system, this sharing is always equal to a lifetime, so I like to think of a type to be "shared for a lifetime". The type gets to perform some "logical transitions" when sharing starts. For example, when a Cell<T> is shared, it arranges for the data to be available to anyone running code in the current thread, so that everyone holding a &Cell<T> can access the data. This is un-done when the lifetime of the sharing ends.
Now, for RefCell: When the RefCell<T> is shared, its "content" (the T) is not yet shared. This is because someone might call borrow_mut. So, when and fir which lifetime is the content shared? Sharing starts when borrow is first called successfully, and it ends when either (a) the count reaches 0 again, or (b) the sharing of the containing RefCell ends (whatever happens first). This does not correspond to any lifetime that is actually considered during type-checking the program, and I am having a hard time making it possible to "synthesize" a lifetime with the right proeprties (i.e., the right lifetime inclusions). Of course the same happens in RwLock, but other than that, I am not aware of any type exercising a similar amount of control over the lifetime something is shared for.

@arielb1

This comment has been minimized.

Copy link
Collaborator

arielb1 commented Sep 12, 2016

So, when and fir which lifetime is the content shared? Sharing starts when borrow is first called successfully, and it ends when either (a) the count reaches 0 again, or (b) the sharing of the containing RefCell ends (whatever happens first).

That is the union of the active lifetimes of the guards. This is also the lifetime of the sharing of an Rc<T>.

@RalfJung

This comment has been minimized.

Copy link
Collaborator

RalfJung commented Sep 12, 2016

That is the union of the active lifetimes of the guards.

What exactly do you mean by "active lifetime of the guards"? It's certainly not the lifetime parameter in Ref<'a, T>, as that one can easily outlive the sharing e.g. by calling drop on the guard. But it's also not the time until the guard is dropped, because the guard could be never dropped and still the sharing ends.

This is also the lifetime of the sharing of an Rc.

I think Rc is easer than RefCell though, because the "synthetic lifetime" here does not have to be a sublifetime of something larger, it just has all the guards lifetimes' as sublifetimes. For RefCell, there is the lifetime of the sharing of RefCell itself which upper-bounds the lifetime of the sharing of the content.

@arielb1

This comment has been minimized.

Copy link
Collaborator

arielb1 commented Sep 12, 2016

Active lifetime = intersection of data lifetime and lifetimes of all type parameters. Note that Rc with an allocator behaves just like RefCell.

@Manishearth

This comment has been minimized.

Copy link

Manishearth commented Oct 4, 2016

The Servo-Gecko interface in stylo uses tons of unsafe code (which I'm trying to reduce and refactor). We make some (iffy) assumptions about unsafe code including:

  • &T can be used in an extern C function
  • Option<&T> and Option<Arc> are pointer-sized because of nonnull (and can be used in extern C functions in place of possibly-null pointer args
  • Cell<T> has the same repr as T for a repr(C) T. See rust-lang/rfcs#1758
  • &(void enum) isn't optimized away. This is a trick we probably will get rid of.
@ubsan

This comment has been minimized.

Copy link
Collaborator

ubsan commented Oct 4, 2016

@Manishearth I think, eventually, the fourth will be undefined behavior, but the other three seem like correct assumptions. (although ArcInner is not equivalent to #[repr(C)] { *const usize, *const usize, *const T })

@Manishearth

This comment has been minimized.

Copy link

Manishearth commented Oct 4, 2016

Oh, we're not casting Option<&T> to Option<Arc<T>>, we're casting Option<&T> to Option<Arc<U>>, where the T <-> U relationship (T is opaque) is specified by an unsafe trait that enables this conversion 😄

@mystor

This comment has been minimized.

Copy link

mystor commented Oct 4, 2016

I should clarify that this T @Manishearth is talking about in the casting is the enum Foo {}. in question, namely it looks like:

enum FooVoid {}
pub struct Foo(FooVoid);
@Manishearth

This comment has been minimized.

Copy link

Manishearth commented Oct 4, 2016

(I'm thinking of replacing Foo with a ZST -- the ctypes lint doesn't like it, but &ZST should be safe through FFI)

@arielb1

This comment has been minimized.

Copy link
Collaborator

arielb1 commented Oct 4, 2016

Replace Foo with a ZST and use #[allow] to make the ctypes lint shut up.

@Manishearth

This comment has been minimized.

Copy link

Manishearth commented Oct 4, 2016

Right, that's the plan

@ubsan

This comment has been minimized.

Copy link
Collaborator

ubsan commented Oct 4, 2016

@Manishearth lol, I was just pointing that out :)

The ctypes lint is currently broken, imho. &() should be allowed (but &(#[repr(i8)] enum { __foo; }) works too).

@mystor

This comment has been minimized.

Copy link

mystor commented Oct 5, 2016

@ubsan That enum unfortunately is not zero sized, so mem::swap can screw it up with &mut references. I think our best option right now is &[i8; 0] which is both zero sized and doesn't trigger the lint.

@ubsan

This comment has been minimized.

Copy link
Collaborator

ubsan commented Oct 5, 2016

@mystor ... eww.

I'd rather do

#[repr(C)]
struct Foo {
    __: [i8; 0]
}

in order to get rid of the chance of conversion to &[i8]. But it's all still a hack to get around () not being C compatible.

@mystor

This comment has been minimized.

Copy link

mystor commented Oct 5, 2016

@ubsan Yup, that's actually almost exactly what I'm doing.

And yes, I agree that it's a hack to get around () not being C compatible. I think we should probably fix that.

@burdges

This comment has been minimized.

Copy link

burdges commented Oct 5, 2016

Just stating the obvious : There are going to be many situations where the only unsafe code consists of calls to std::mem::transmute<X,Y> where X is a packed C struct and Y is [u8; std::mem::size_of<X>] or vice versa. It sounds unfortunate to disable optimizations for that if X does not contain pointers.

@arielb1

This comment has been minimized.

Copy link
Collaborator

arielb1 commented Oct 5, 2016

The get_unchecked examples at rust-lang/rust#36976 are convincing me that we do not want to UB on just creating a reference to an invalid value, or even returning such a reference from a function. This would naturally make references to Void be well-defined too (through horribly unsafe).

@arielb1

This comment has been minimized.

Copy link
Collaborator

arielb1 commented Oct 5, 2016

BTW, even if invalid values are ignored, my aliasing memory models could end up with the result of an out-of-range [T]::get_unchecked not having a capability to access the out-of-range memory (because they are limited to the capability of the &[T] - i.e. only to the length of it).

@bluss

This comment has been minimized.

Copy link

bluss commented Jan 21, 2017

I implement struct RevSlice<T>([T]) which is a reversed view of a slice. I'm unsure what's needed to put it on sound ground, specifically the conversions:

  • &[T] -> &RevSlice<T>
  • &mut [T] -> &mut RevSlice<T>
  • Box<[T]> -> Box<RevSlice<T>>
@Amanieu

This comment has been minimized.

Copy link

Amanieu commented Jan 21, 2017

@bluss I think this is a good case for #[repr(transparent)]

@bluss

This comment has been minimized.

Copy link

bluss commented Jan 22, 2017

Another crate:

@llogiq

This comment has been minimized.

Copy link

llogiq commented Mar 16, 2017

bytecount (used by ripgrep&xi) has one line of unsafe to transmute aligned byte slices to usize/SIMD slices. This pattern might perhaps be factored out into its own crate, or perhaps even find a place in libstd.

@Manishearth

This comment has been minimized.

Copy link

Manishearth commented Mar 16, 2017

Can't byteorder do that?

@bluss

This comment has been minimized.

Copy link

bluss commented Mar 17, 2017

byteorder does not deal with slice conversions.

I imagine that's something like split_aligned_for but even that formulation leaves much to be desired, slice iteratorish formulation seems to codegen much better (hence the other experiments there in the same junkyard and in rawslice).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.