Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

References into repr(packed) structs should be unsafe. #1240

Merged
merged 1 commit into from
Sep 18, 2015

Conversation

huonw
Copy link
Member

@huonw huonw commented Aug 7, 2015

@huonw huonw added the T-lang Relevant to the language team, which will review and decide on the RFC. label Aug 7, 2015
@huonw huonw self-assigned this Aug 7, 2015
@arcnmx
Copy link

arcnmx commented Aug 7, 2015

Taking a reference to a member of a packed struct is a very wrong thing to do, unless...

  1. You first verify that the alignment is correct.
    • Subpoint: Your type has an alignment of 1, and it is therefore always correct.
  2. You actually just want a pointer to it, to transmute, or use in asm, etc.
  3. You want to call a method by &self/&mut self that doesn't actually dereference self..?
  4. We modify the language to have typed references with tagged alignments, so the compiler can emit a proper load/store for everything and suddenly it's all safe and good.

Now going to step back and consider what the actual uses of packed are...

  1. Mapping packed data structures in memory from external sources, such as C libraries or I/O.
    • This can be done safely with unaligned types and #[repr(C)] (I have implementations of this in packed and pod). Slight syntax overhead required.
  2. You want to save space in your type and have very peculiar requirements. This also can be done safely as above.

Given the above, this sounds like the correct approach to take. I just have a few points to bring up:

  1. Can we define 1-aligned types as a safe exception for referencing within packed types (and therefore other packed types would be as well)?
  2. I don't really feel like the generic restriction is necessary. As long as all fields can be asserted to maintain the no-Drop requirement, they should be harmless.
  3. Same for indexing and referencing fields as long as you're not taking references. Depending on how the compiler is structured this may be additional work to support, though?
  4. The inability to use struct<T>([u8; ::std::mem::size_of::<T>()]) hurts a lot for exposing safe abstractions around packed types :/

@huonw
Copy link
Member Author

huonw commented Aug 7, 2015

@arcnmx

this sounds like the correct approach to take

I'm confused, I don't really understand your comment in the light of rust-lang/rust#27060 (comment) . Is "this" the design the RFC has? (i.e. are you broadly in favour?)

I don't really feel like the generic restriction is necessary. As long as all fields can be asserted to maintain the no-Drop requirement, they should be harmless.

NB. I do cover this in the Alternatives section. However, I think it's relatively low-benefit and can be introduced later if necessary: I don't recall crater bringing up a single instance of repr(packed) on a generic struct that was useful.

Same for indexing and referencing fields as long as you're not taking references. Depending on how the compiler is structured this may be additional work to support, though?

Yes, also covered in Alternatives. Like with generics, it should be backwards compatible to allow, and the main reason it is explicitly not supported is to make the implementation easier.

The inability to use struct([u8; ::std::mem::size_of::()]) hurts a lot for exposing safe abstractions around packed types :/

This seems unrelated to the changes this RFC is proposing?

We modify the language to have typed references with tagged alignments, so the compiler can emit a proper load/store for everything and suddenly it's all safe and good.

(Incidentally we can probably do this in libraries if we had some way to specify the desired alignment of a struct, via something similar to "Unaligned".)

@arcnmx
Copy link

arcnmx commented Aug 7, 2015

I do cover this in the Alternatives section

Right, just wanted to cast my vote in.

I'm confused, I don't really understand your comment in the light of rust-lang/rust#27060 (comment) . Is "this" the design the RFC has? (i.e. are you broadly in favour?)

I changed my mind! As I went through the use cases, my thought process drifted from "packed is nice for mapping data" to "packed is almost completely useless what we actually need is ergonomic safe abstractions instead". So yes, in favour of this RFC - though I worry people will ignore the warning and take unsafe references without understanding the concequences. It's rather subtle and dangerous, though I'm not a fan of denying the references outright either.

This seems unrelated to the changes this RFC is proposing?

Just another pain point of dealing with packed types. Tangential, yes, but I'm very concerned about the ergonomics surrounding packed data!

@arcnmx
Copy link

arcnmx commented Aug 7, 2015

(Incidentally we can probably do this in libraries if we had some way to specify the desired alignment of a struct, via something similar to "Unaligned".)

Also to expand slightly on the constexpr point, if that were working, it would be possible to write the generic Unalign type you linked without #[repr(packed)] at all, and expose safe unaligned accessors. packed would be unnecessary, modulo a bit uglier syntax compared to built-in copy/move/assign. Currently, you have to manually implement it for specific types.

@bluss
Copy link
Member

bluss commented Aug 7, 2015

Really great RFC work! Preparation, motivation, all of it!

It's a good plan, but I'd like to do it without allowing references at all — it's an additional rule for unsafe blocks, and do we really need that?

@nikomatsakis
Copy link
Contributor

@bluss

It's a good plan, but I'd like to do it without allowing references at all — it's an additional rule for unsafe blocks, and do we really need that?

We debated that, but there are legitimate use-cases for creating a reference, and if we don't permit & notation, then one is reduced to doing manual memory arithmetic. As a simple example of when it could be useful, you may know that some fields are aligned (even when packed) even if all fields are not. For example, you may wish to obtain a pointer to a [u8;512] field embedded in your struct (such a field is always aligned).

@arcnmx
Copy link

arcnmx commented Aug 7, 2015

@nikomatsakis

For example, you may wish to obtain a pointer to a [u8;512] field embedded in your struct (such a field is always aligned).

Would it then make sense to only disallow references to improperly aligned fields, but still allow those the compiler can statically determine are safe? In the context of packed, anything with an alignment of 1, such as u8 arrays, would still be safe to reference.

I dislike removing functionality completely, but an unaligned reference is pretty much always the wrong thing to do. If anything, you want a pointer, not a reference. Which could still be done:

#[repr(C, packed)]
struct Something {
    data: [u8; 0x1f],
    unaligned_ptr: (),
    unaligned: u16,
}

let something = Something { .. };
let x = &something.data; // safe and valid
let x = &something.unaligned; // error: cannot take references to improperly aligned fields
let x = &something.unaligned_ptr as *const () as *const u16; // if you *really* want to access the field somehow... Go for it but don't dereference this!

@bluss
Copy link
Member

bluss commented Aug 7, 2015

Agree, I think creating an unaligned reference breaks against Behavior considered undefined which I think unsafe code blocks shouldn't violate either.

@nikomatsakis
Copy link
Contributor

On Fri, Aug 07, 2015 at 06:54:24AM -0700, arcnmx wrote:

Would it then make sense to only disallow references to improperly aligned fields (field_alignment > containing_struct_alignment)? In the context of packed, anything with an alignment of 1, such as u8 arrays, would still be safe to reference.

This would be a more specific rule, yes. It is harder to enforce
because (currently) the type-checking code is platform independent.

I dislike removing functionality completely, but an unaligned
reference is pretty much always the wrong thing to do. If anything,
you want a pointer, not a reference. Which could still be done:

This is an interesting pattern, but it seems unfortunate to be forced
to insert a dummy field of () type. Also, the value of a reference
of type &() is undefined (well, it can't be NULL, but other than
that...), so I don't know that we would guarantee any particular value
for the pointer you are creating there. I guess we could say that we
guarantee it, but it may require extra logic that reasons about the
origin of a reference, which in general is not knowable, rather than
simply its type.

@arcnmx
Copy link

arcnmx commented Aug 7, 2015

@nikomatsakis

This would be a more specific rule, yes. It is harder to enforce because (currently) the type-checking code is platform independent.

A limited implementation would be to whitelist the types that are guaranteed to be 1-aligned on all architectures, which I believe is anything of size 1 or 0? Which again, may not be guaranteed for many types, but at least u8, (), [_; 0]... bool? Not sure. Could probably be done as an OIBIT marker trait. We can also use OIBIT markers with #[config(target = "etc")] to make it arch-dependent without introducing target-specific code into the lint / type check itself?

EDIT: I like the OIBIT approach because it makes it possible to use packed structs within other packed structs properly. Any reason why this wouldn't work?

This is an interesting pattern, but it seems unfortunate to be forced to insert a dummy field of () type

Unfortunate, sure, but there are very few legitimate reasons for doing this if 1-aligned type references are safe and allowed. It was more an example of how to do it if you really needed to, without manual pointer offset counting.

Also, the value of a reference of type &() is undefined (well, it can't be NULL, but other than that...), so I don't know that we would guarantee any particular value for the pointer you are creating there.

Hm, even with repr(C)? How about other zero-sized types such as [_; 0], struct X, enum Void { }..? Unfortunate, but makes sense :/

@nikomatsakis
Copy link
Contributor

@arcnmx

Hm, even with repr(C)? How about other zero-sized types such as [_; 0], struct X, enum Void { }..? Unfortunate, but makes sense :/

Well, I don't know that we've ever thought about it that hard before. But I wouldn't be surprised if there were paths where the compiler would say "ah, that's a &() pointer, which could never be dereferenced, I'll just replace it with a constant 0x1" and strip out some address arithmetic. Or if no such paths exist today, I wouldn't be surprised if we tried to add some in the future. :)

@arielb1
Copy link
Contributor

arielb1 commented Aug 9, 2015

@nikomatsakis

&() is pretty much equivalent to NonZero<*const ()>. trans randomly replacing it with &*1 would be a bug.

@nikomatsakis
Copy link
Contributor

@arielb1

&() is pretty much equivalent to NonZero<_const ()>. trans randomly replacing it with &_1 would be a bug.

Well, that is the question, isn't it? I don't think this is entirely clear. That said, it's not clear how we would be able to really optimize by doing such changes, and it's certainly counterintuitive.

@Aatch
Copy link
Contributor

Aatch commented Aug 11, 2015

To address some comments so far:

Being able to determine which fields are definitely aligned is difficult to do currently. However, even if it was possible, I don't think it should be done. A simple rule like "Cannot take references to fields of packed structures in safe code" will be easier to explain and maintain compared to "Cannot take references to fields of packed structures, unless we're sure they're aligned properly". The idea that taking a reference to one field is safe, but another is not seems strange. i'd much rather that the behaviour was consistent across all the fields of the struct.

@Aatch
Copy link
Contributor

Aatch commented Aug 11, 2015

I think that an intrinsic for doing unaligned loads would be useful given the issue this is solving. The linked issue is related to SIMD where (on x86) there are separate instructions for aligned and unaligned loads and the alignment information we pass to LLVM causes it use the former based on the assumption that the location is aligned.

Some platforms don't allow unaligned loads at all, so being able to still load from an unaligned (for example) *const i64 would still be useful.

@bluss
Copy link
Member

bluss commented Aug 11, 2015

The usual way I know to write an unaligned load is to use memcpy / ptr::copy_nonoverlapping

@arcnmx
Copy link

arcnmx commented Aug 14, 2015

Hm, I suppose it could be seen as a bit of an inconsistent rule. Is it that difficult to implement using OIBIT? My view as someone trying to use it would basically be "this is perfectly safe why is the compiler getting in my way here?!"

In any case I'm more and more convinced that using packed is simply the wrong thing to use in today's Rust. repr(C) works alright as long as you can avoid pitfalls, and type-safe abstractions for unaligned accesses are just fine.

@retep998
Copy link
Member

retep998 commented Sep 1, 2015

I'm okay with making references into packed structs be unsafe to create, as well as the restrictions on no Drop fields and no generics. Just please do not feature gate #[repr(packed)], even temporarily, since the Windows SDK uses packed structs quite a bit so it is essential for winapi. FFI is already hard enough as is without untagged unions or bitfields or ident concatenation, so please don't make it any harder by removing #[repr(packed)].

@nikomatsakis
Copy link
Contributor

Hear ye, hear ye. This RFC is entering final comment period.

@nikomatsakis nikomatsakis added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Sep 4, 2015
@glaebhoerl
Copy link
Contributor

Would this be forwards compatible with later making it safe again by means of first copying the value out onto the stack (and then back afterwards for &mut), as discussed in rust-lang/rust#27060?

@arcnmx
Copy link

arcnmx commented Sep 4, 2015

I feel like this is the easiest approach to take for now, but it certainly merits some further discussion in a followup RFC on how to handle it properly. A packed(alignment) syntax as well as safe ways to reference packed data would be nice at some point.

Would this be forwards compatible with later making it safe again by means of first copying the value out onto the stack (and then back afterwards for &mut), as discussed in rust-lang/rust#27060?

This seems strange to me, especially if the lifetime of the value outlives the current scope. I might be misinterpreting though: copying to stack then referencing will definitely be safe with RFC, and safe+compatible with whatever comes later.

@huonw
Copy link
Member Author

huonw commented Sep 4, 2015

@glaebhoerl I'm not sure. It's certainly unlikely to be backwards incompatible due to causing some unsafe blocks to become unneeded (the warning about that is just a lint), but I suspect the lifetime shortening discussed in rust-lang/rust#27060 (comment) may cause issues: given x: &'a PackedFoo, &x.y wouldn't be able to have lifetime 'a and changing semantics/lifetimes due to having an unsafe block vs. not seems sad. That said, I suspect that the lifetime problems mean we probably don't ever want & to do that magically anyway.

@huonw
Copy link
Member Author

huonw commented Sep 4, 2015

(The repr(packed(n)) extension should be just that, a back-compat extension.)

generic type) in a `repr(packed)` type, since the destructor of `T` is
passed a reference to that `T`. The crater run (see appendix) found no
crate that needs to use `repr(packed)` to store a `Drop` type (or a
generic type). The generic type rule is conservatively approximated by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to point out that we already have generic repr(packed) structs in rustc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh the irony.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, rustc isn't covered by crater.

It looks like that case could be handled by a trait that uses a monomorphic #[repr(packed)] internally to do an unaligned load, since it is just used to load a small set of types ({u,i}{8,16,32,64,size}?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case - yes (though it'd be still annoying with 10 permutations).
But imagine you wanted to write a generic (de)serializer function... If we go with "no generics" rule, IMHO we should consider adding unaligned_load/store intrinsics.

@glaebhoerl
Copy link
Contributor

I suspect the lifetime shortening discussed in rust-lang/rust#27060 (comment) may cause issues: given x: &'a PackedFoo, &x.y wouldn't be able to have lifetime 'a and changing semantics/lifetimes due to having an unsafe block vs. not seems sad.

Yeah, that's the kind of thing I was concerned about as well.

That said, I suspect that the lifetime problems mean we probably don't ever want & to do that magically anyway.

I'm not certain. I think we'd pretty likely want it for #[repr(compact)] (because there's no other way to get references into those, aligned or not), and given that reprs packed, simd, and compact (and potentially other nonstandard reprs, say bitfields) all have in common a restriction on interior references, making this kind of thing potentially useful for them, we may at some point want them to work consistently in this respect.

@rkjnsn
Copy link
Contributor

rkjnsn commented Sep 8, 2015

Is the idea that the syntax for taking a reference into a packed struct would be the same, it would just be required to be within an unsafe block or function?

I'm concerned that if the syntax is the same as the common, safe operation of taking a reference, it would be easy to take a reference to a member of a packed struct without realizing it, because one is already writing an unsafe function, or inside an unsafe block for other reasons. It would also be harder to catch during code reviews, as it would be easy to look over. Knowing it was unsafe would require realizing the struct definition (which may be in a different part of the code) specified repr(packed).

I agree that obtaining a reference to a member of a packed struct should be unsafe, but I also think doing so should have a different syntax from the normal, safe operation to make it more apparent that special care is required. Additionally, I would expect such an operation to return a raw pointer.

@sourcejedi
Copy link

Has anyone looked at function arguments which aren't references, but might be optimized to a pointer? Maybe f(a.b) where a.b : [u32: 512].

@retep998
Copy link
Member

retep998 commented Sep 9, 2015

In those cases the correct thing to do would be to copy it out onto the stack to get an aligned value and then pass a pointer. Whether we actually do that, I don't know.

@sourcejedi
Copy link

@retep998 Agreed, that sounds reasonable given how _un_reasonable my example code is. If there's ever a real (performance) problem from that, you'd just have to find it by profiling.

I guess the answer for correctness is to add a second test, for that case. (Just checking the alignment, no need to try forcing a crash with simd).

@nikomatsakis
Copy link
Contributor

@rkjnsn

I agree that obtaining a reference to a member of a packed struct should be unsafe, but I also think doing so should have a different syntax from the normal, safe operation to make it more apparent that special care is required. Additionally, I would expect such an operation to return a raw pointer.

I find this appealing. I wonder what the syntax would be, though?

@eddyb
Copy link
Member

eddyb commented Sep 9, 2015

@nikomatsakis Another usecase for getting a raw pointer could be for uninitialized memory (potentially going through some sort of safe uninitialized reference?).

@retep998
Copy link
Member

retep998 commented Sep 9, 2015

I like the idea of having alternate syntax which gets a raw pointer directly and can be used when getting a safe pointer wouldn't be safe to do.

@arcnmx
Copy link

arcnmx commented Sep 9, 2015

Also in favour of an alternative that retrieves a pointer (and doesn't require unsafe), as I mentioned previously. Then references can simply be completely disallowed for now. The problem is that we suddenly need to introduce some new syntax for it...

We could introduce an offset_of!(type, field_ident) macro maybe?

@retep998
Copy link
Member

retep998 commented Sep 9, 2015

And if we do introduce new syntax we'd need to make sure that while it is unstable, the old method of just taking a reference would still work in stable Rust, albeit with a warning.

@arcnmx
Copy link

arcnmx commented Sep 9, 2015

Wasn't the point of this RFC to be an intentional breaking change and remove packed references in stable though, because they're completely wrong and unsafe?

@retep998
Copy link
Member

retep998 commented Sep 9, 2015

Oh yeah, and on second thought, I don't need to be able to get pointers to those fields regardless, I just need to be able to have those packed structs, and be able to access their fields at all. So as long as that stays in, I don't mind.

@arcnmx
Copy link

arcnmx commented Sep 9, 2015

Mm, the uses for packed are small enough as they are, the uses for taking references into packed values are even smaller. At this point I wouldn't mind simply disallowing them entirely until we figure out a better alternative. I do also agree that the "no generics" restriction seems a bit unnecessary.

@nikomatsakis
Copy link
Contributor

@huonw

Crater was run on 2015/07/23 with a patch that feature gated repr(packed).

So, I'm slow I guess, but for some reason I thought that this patch actually detected cases that took the address of a packed structure. But re-reading this, I realize that this was just a way to detect where repr(packed) was being used period. Do you have any idea how many people actually try to take addr of packed fields?

@arcnmx
Copy link

arcnmx commented Sep 9, 2015

@nikomatsakis I was, but I've moved on to using repr(C) instead.

I can see it being somewhat useful (mostly for referencing other inner packed members, or calling methods by &self for wrapper types), but at this point I don't think it would hurt anyone too much if we just made it flat-out invalid until a proper solution comes up.

@bill-myers
Copy link

This seems too limiting since for instance on x86 unaligned accesses work fine on scalars and there are instructions for unaligned SIMD access (and of course it's always possible to emulate them since atomicity doesn't matter because & has no mutable aliasing)

A better solution would seem to be adding an "unaligned" modifier to pointers, so that taking such an unaligned reference would produce an "&unaligned T".

Obviously this would not be compatible with normal references, so for instance methods that take "&self" and "&mut self" would not be callable; on the other hand, it should be possible to implicitly convert "&T" to "&unaligned T" and explicitly converting &unaligned T to something like Either<&T, &unaligned T> with a runtime check.

Also this way there is no need for unsafe code.

@nikomatsakis
Copy link
Contributor

@bill-myers introducing &unaligned T does not seem justified vs unsafe
pointers. Unaligned pointers just aren't that important.

On Thu, Sep 10, 2015 at 8:37 AM, bill-myers notifications@github.com
wrote:

This seems too limiting since for instance on x86 unaligned accesses work
fine on scalars and there are instructions for unaligned SIMD access.

A more general solution would seem to be adding an "unaligned" modifier to
pointers, so that taking such an unaligned reference would produce an
"&unaligned T".

Obviously this would not be compatible with normal references, so for
instance methods that take "&self" and "&mut self" would not be callable;
on the other hand, it should be possible to implicitly convert "&T" to
"&unaligned T" and explicitly converting &unaligned T to something like
Either<&T, &unaligned T> with a runtime check.


Reply to this email directly or view it on GitHub
#1240 (comment).

@nikomatsakis
Copy link
Contributor

Huzzah! The language design subteam has decided to accept this RFC.

Regarding the proposal to remove such references altogether, or introduce a new operator, the general feeling was that (as discussed earlier on the thread) references have value, but an entirely new operator did not feel warranted.

@nikomatsakis nikomatsakis merged commit 3fdbb60 into rust-lang:master Sep 18, 2015
nikomatsakis added a commit that referenced this pull request Sep 18, 2015
@huonw huonw deleted the repr-packed-unsafe-ref branch September 22, 2015 05:19
@Aatch
Copy link
Contributor

Aatch commented Oct 3, 2015

I just noticed a hole in this RFC, it doesn't cover by-ref bindings in patterns. So technically let Foo(_, ref x) = foo; isn't considered unsafe by this RFC while let x = &foo.1; is. The RFC probably should be amended to include by-ref bindings, as it seems like a simple oversight.

@nikomatsakis
Copy link
Contributor

Indeed. I consider that to be implied. It's worth noting that in the MIR,
by-ref bindings and explicit & expressions are the same thing, so if we
moved effect checking to the MIR (as I've been considering) this would be
another example where things get simpler.

On Fri, Oct 2, 2015 at 5:23 PM, James Miller notifications@github.com
wrote:

I just noticed a hole in this RFC, it doesn't cover by-ref bindings in
patterns. So technically let Foo(_, ref x) = foo; isn't considered
unsafe by this RFC while let x = &foo.1; is. The RFC probably should be
amended to include by-ref bindings, as it seems like a simple oversight.


Reply to this email directly or view it on GitHub
#1240 (comment).

@Aatch
Copy link
Contributor

Aatch commented Oct 5, 2015

@nikomatsakis still, it seems like it should probably be mentioned in the RFC text. At least for the sake of being thorough.

@nikomatsakis
Copy link
Contributor

@Aatch

still, it seems like it should probably be mentioned in the RFC text. At
least for the sake of being thorough.

agreed

On Sun, Oct 4, 2015 at 10:22 PM, James Miller notifications@github.com
wrote:

@nikomatsakis https://github.com/nikomatsakis still, it seems like it
should probably be mentioned in the RFC text. At least for the sake of
being thorough.


Reply to this email directly or view it on GitHub
#1240 (comment).

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 29, 2018

Huzzah! The language design subteam has decided to accept this RFC.

This is a message from the future: does anyone know where I can find logs of the rationale for this decision ? Meeting minutes, IRC logs, etc ? I'm writing a post about what we can learn about what happened with repr(packed), and what could we do to perform better in the future.

Also if anyone that was around here back then wants to proof-read it, I would appreciate it.

@huonw
Copy link
Member Author

huonw commented Dec 11, 2018

@gnzlbg You may already know of these, but there's https://github.com/rust-lang/meeting-minutes and https://github.com/rust-lang/subteams with historical info... however neither of them have much info about repr(packed).

(I'd be happy to proof-read and/or answer questions about the history. Feel free email me at the email in my profile.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet