Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic Pointer to Field #2708

Open
wants to merge 24 commits into
base: master
from

Conversation

Projects
None yet
@KrishnaSannasi
Copy link

commented Jun 5, 2019

Rendered

work in progress proof of concept crate

This RFC aims to provide a generic way to talk about the fields on types! Then go one step further and allow smart pointers to project to those fields!

struct Foo { bar: Bar }
let x: Pin<&Foo> = ...;
let y = x.project(Foo.bar);
@KrishnaSannasi

This comment has been minimized.

Copy link
Author

commented Jun 5, 2019

This RFC had some previous discussion here on internals

I would like to focus the discussion first on if we actually need this feature in the first place.
I think we do, as it make smart pointers first-class in a way that only references and Box<_> currently are, i.e. only references and Box<_> are allowed to project to fields, everything else must go through them first. This RFC is the first step towards changing that.

@Diggsey

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2019

I wrote this crate which seems to do pretty much the same thing: https://crates.io/crates/field-offset

@KrishnaSannasi

This comment has been minimized.

Copy link
Author

commented Jun 6, 2019

Cool, unfortunately according to the current unsafe guidelines your crate is unsound to use on any type that contains references or NonZero* types because of the, std::mem::zeroed(), which is insta-UB. Otherwise it looks like we came up with similar formulations of the solution.

I hadn't thought about chaining field offsets, but that doesn't seem to be mission critical, so we can decide on that later.

What do you think of this RFC?

@Diggsey

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2019

Cool, unfortunately according to the current unsafe guidelines your crate is unsound to use on any type that contains references or NonZero*

The crate was written before the existence of the unsafe code guidelines when this was considered a valid pattern (the variable is never accessed). In practice, this does not cause any problems, but it will be nice to be able to stop relying on implementation details of the compiler once MaybeUninit is stabilised.

What do you think of this RFC?

One of the considerations for whether something should be added as a language feature, particularly a complex and niche one such as this, is whether it can be implemented as a library (either as part of the standard library or a separate crate).

It would be nice to highlight in the RFC what the benefits are of having this be a language feature as opposed to a library, and why they justify the additional complexity in the language. The motivation section currently says:

This feature cannot be implemented as a library effectively because it depends on the layouts of types, so it requires integration with the Rust compiler until Rust gets a stable layout (which may never happen).

But this is greatly overstating the impossibility of implementing this in a library.

@KrishnaSannasi

This comment has been minimized.

Copy link
Author

commented Jun 6, 2019

The crate was written before the existence of the unsafe code guidelines when this was considered a valid pattern

Ok, I just noticed that this crate was last updated 3 years ago.

but it will be nice to be able to stop relying on implementation details of the compiler once MaybeUninit is stabilised.

MaybeUninit won't solve this because the only qay to get a member of a type is to have a value, reference or box, all of which need to be a valid instance of the type to work.

particularly a complex

This feature isn't particularly complex to implement, although it does have some far reaching implications. Some of these implications were analyzed on internals, but I suspect that there is more that we may have missed.

The only parts that absolutely have to be implemented in the compiler is the Field trait. Everything else is a library built on top of that. In loght of this, we could shave this proposal down to just the pointer projections as associated functions on raw pointers, and get rid of the Project trait entirely.

But this is greatly overstating the impossibility of implementing this in a library

How would you get the necessary offsets of fields soundly in light of the current unsafe guidelines? I don't see how.

Keep in mind that the compiler already has all the information available, this is a way of exposing that information safely.

@comex

This comment has been minimized.

Copy link

commented Jun 6, 2019

MaybeUninit won't solve this because the only qay to get a member of a type is to have a value, reference or box, all of which need to be a valid instance of the type to work.

You can do it through a raw pointer.

@KrishnaSannasi

This comment has been minimized.

Copy link
Author

commented Jun 6, 2019

@comex How? You can't access the field through a raw pointer without already knowing its offset. If you can we can close this and implement the whole proposal as a library. But seeing as all libraries that try to provide this use UB via null pointer dereference, or mem::zeroed/undefined, or requires marking types at the definition. The last is unacceptable because it doesn't scale at all, everyone must think about it even if they don't want to.
Also note that C++, a language woth less restrictions than Rust, built it into the language. This seems to be compelling evidence that it is not possible to do safely.

References:

using null pointer: crate and link to offending code
using mem::zeroed crate and link to offending code
marking types at definition: crate

I searched through hundreds of crates on crates.io to find these examples.

@camlorn

This comment has been minimized.

Copy link

commented Jun 6, 2019

If the core issue here is that you need the offset of a field in order to implement something like this as a library, then maybe this should be a offsetof RFC.

As someone who has used more than a little C++, the need for pointers to members comes up rarely if ever,. Most of the time it can be done another way (i.e. there's a trick for some trees where instead of having left and right you put them in an array instead, then you can mirror algorithms by passing an additional index. I don't have a reference for this handy). I see why it might be more valuable to Rust, but I feel like in C++ it was added for the sake of completeness and possibly to mollify C programmers who use offsetof tricks in anger. In general I think a lot of early C++ just happened, and that drawing conclusions from it about what is and isn't necessary is a mistake, especially since one of Rust's selling points is not being C++ (even if the complexity is getting up there these days).

If there's going to be a syntax to refer to a field descriptor, the next obvious question from the perspective of someone new to Rust finding this seems to me to be "how do I refer to a field's type?" That might be quite useful in macro land, and something worth doing; reserving Type.field for that might be beneficial.

I feel like motivation and guide-level explanation need to be fleshed out more in the direction of someone wanting to learn what the feature is. Guide-level explanation seems to be reference-level explanation, and motivation is written such that you have to already know why you want the feature in order to understand it (that's not quite right; finding better wording is failing me).

But I do like what this is getting at.

@KrishnaSannasi

This comment has been minimized.

Copy link
Author

commented Jun 6, 2019

"how do I refer to a field's type?"

Using the Field trait which has the associated type Type that tells what the type of the field is.

I feel like motivation and guide-level explanation need to be fleshed out more in the direction of someone wanting to learn what the feature is.

Yes, they need to be fleshed out more and reworded in general.

@spunit262

This comment has been minimized.

Copy link

commented Jun 6, 2019

An idea just came to me. We already have projection though references via patterns so we could extend that to pointers.
I'm not actually sure if this is a good idea, but it is no new syntax, just building on what already exists.
As pointers are not required to point anywhere, any pattern that require inspecting the pointee would not be allowed.

struct Foo { a: u32, b: i16, c: f64, }

// works today
fn by_ref(Foo { b, .. }: &Foo) -> &i16 { b }

fn by_ptr(Foo { b, .. }: *mut Foo) -> *mut i16 { b }

A safe offset_of as the pointer is explicitly allow to not point at a valid object.

macro_rules! offset_of {
    ($t:path => $f:tt) => {
        match core::ptr::null<$t>() {
            $t { $f: a, .. } => a as usize
        }
    }
}
@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2019

Related issue - #1287 (about the Object::field syntax).

It conflicts with associated values indeed, but object.field also conflicts with method calls and we've been living with it mostly successfully so far.
(Some disambiguated syntax for the conflicting cases is probably still needed, similarly to <Type [as Trait]>::Assoc for associated items.)

@Pauan

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

It conflicts with associated values indeed, but object.field also conflicts with method calls and we've been living with it mostly successfully so far.

In this proposal it's being used on a type, not a value. So it's not object.field, it's Type.field. So what situations would that cause conflicts?

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2019

@Pauan
That sentence is not talking about this proposal, but about the stable object.field syntax applied to values, which is usually disambiguated successfully, but occasionally you have to write things like (object.field)() to get the desired meaning.

@Pauan

This comment has been minimized.

Copy link
Member

commented Jun 6, 2019

@petrochenkov Ah, sorry, I misunderstood. Thanks for the correction.

@KrishnaSannasi

This comment has been minimized.

Copy link
Author

commented Jun 6, 2019

@petrochenkov @spunit262 syntax discussions are off topic for now, I would like to focus on if this feature is needed, and is it sound. If either answer to no, then this proposal may end up scraped, so syntax discussions now are not productive yet. I used the dot syntax as a placeholder for whatever syntax we end up with.

@eaglgenes101

This comment has been minimized.

Copy link

commented Jun 6, 2019

If the core issue here is that you need the offset of a field in order to implement something like this as a library, then maybe this should be a offsetof RFC.

Rust has been moving in the direction of making unsafe code more ergonomic so that there's less tedium associated with it, and therefore less chance for mistakes. Being able to get a properly offset *mut U field pointer from *mut T directly through projection would be less tedious than having to perform manual pointer offsetting and type cast juggling for the same result.

@camlorn

This comment has been minimized.

Copy link

commented Jun 6, 2019

@eaglgenes101
I don't disagree but the core objection raised by @Diggsey is whether this can be implemented as a library. Giving them the ability to implement their crate properly does solve the ergonomics issue for the unsafe code case. There would be slight ergonomics gains if it was in the language, I suppose.

I think the more interesting thing about unsafe code here is that having unsafe code in std, or a compiler intrinsic, or whatever means that it's by trusted people by most definitions of trusted, and enables the ecosystem to be 100% safe code for some of these use cases.

@KrishnaSannasi

This comment has been minimized.

Copy link
Author

commented Jun 6, 2019

The bare minimum proposal that we can strip this RFC down to is this

/// Opaque type that can only be constructed by the compiler.
struct FieldDescriptor<F: Field> { ... }

/// Represents a field on some type, must be implemented by the compiler, and only the compiler
unsafe trait Field {
    /// The type that the fields belongs to
    type Parent: ?Sized;
    /// The type of the field itself
    type Type: ?Sized;
    /// Describes the data needed to convert from *[const|mut] Self::Parent to *[const|mut] Self::Type
    const FIELD_DESCRIPTOR: FieldDescriptor<Self>;
}

impl<T: ?Sized> *const T {
    fn project<F: Field<Parent = T>>(self, field: F) -> *const F::Type {
        // safe projection, where invalid pointers are handled in a safe, but implementation defined way
    }
    
    unsafe fn project_unchecked<F: Field<Parent = T>>(self, field: F) -> *const F::Type {
        // unsafe projection, where using invalid pointers are UB
    }
}

impl<T: ?Sized> *mut T {
    fn project<F: Field<Parent = T>>(self, field: F) -> *mut F::Type {
        // safe projection, where invalid pointers are handled in a safe, but implementation defined way
    }
    
    unsafe fn project_unchecked<F: Field<Parent = T>>(self, field: F) -> *mut F::Type {
        // unsafe projection, where using invalid pointers are UB
    }
}

Everything else could be punted to a separate library, i.e. everything related to Project. These functions could be implemented as intrinsics or otherwise. These functions will live inside of std::ptr, and the Field trait and FieldDescriptor type can live in either a new module or in std::marker.

I find this to be just the right size for such a niche feature, only a handful of associated functions, a type, and a trait.

KrishnaSannasi added some commits Jun 6, 2019

added unsafe to `Field` trait
as it controls unsafe code, `Field` must be an unsafe trait
Show resolved Hide resolved text/0000-ptr-to-field.md Outdated
Show resolved Hide resolved text/0000-ptr-to-field.md Outdated
@skippy10110

This comment has been minimized.

Copy link

commented on text/0000-ptr-to-field.md in 14fc49f Jun 7, 2019

Should Foo.name be Bar.name here?

This comment has been minimized.

Copy link
Owner Author

replied Jun 7, 2019

Yes, thanks for the catch

fixed typo
Thanks @skippy10110 for finding this
@camlorn

This comment has been minimized.

Copy link

commented Jun 8, 2019

@KrishnaSannasi Yes, this is much better.

@KrishnaSannasi

This comment has been minimized.

Copy link
Author

commented Jun 16, 2019

Given that it has been over a week with no movement, are there any Rust team members that we can ping to get an opinion on this? I'm not sure if this fits on the roadmap.

@ckaran

This comment has been minimized.

Copy link

commented Jun 21, 2019

@KrishnaSannasi Would you consider adding inverse information as well? I suspect that FieldDescriptor could have additional information added to it so that if you have a pointer to a field, you can extract a pointer to the parent safely. I.e., the API can be extended to the following (please forgive any syntax errors, I'm still learning rust):

impl<T: ?Sized> *const T {
    fn inverse_project<F: Field<Type = T>>(self, field: F) -> *const F::Parent {
        // safe projection, where invalid pointers are handled in a safe, but implementation defined way
    }
   
    unsafe fn inverse_project_unchecked<F: Field<Type = T>>(self, field: F) -> *const F::Parent {
        // unsafe projection, where using invalid pointers are UB
    }
}

impl<T: ?Sized> *mut T {
    fn inverse_project<F: Field<Type = T>>(self, field: F) -> *mut F::Parent {
        // safe projection, where invalid pointers are handled in a safe, but implementation defined way
    }
    
    unsafe fn inverse_project_unchecked<F: Field<Type = T>>(self, field: F) -> *mut F::Parent {
        // unsafe projection, where using invalid pointers are UB
    }
}

I think this would allow the development of safe intrusive collections, and would remove much of the need for offset_of macros that rely on internal compiler details.

@KrishnaSannasi

This comment has been minimized.

Copy link
Author

commented Jun 21, 2019

@ckaran There is no way to safely go from a child to the parent because there is no way to check if the field is actually part of the parent or if it is just a free variable.

For example, how would you check if this is safe

struct Point { x: i32, y: i32 }

fn to_point(x: &i32) -> &Point {
    // magic
}

How can we check that the input x: &i32 is referencing the x inside of Point? The fact is we can't. This means that there is no way to safely do inverse projections. Now we could add inverse projections, but they wouldn't be safe and they would increase the scope of this RFC. I would like to keep this RFC as lean as possible as this is already a niche feature.

@ckaran

This comment has been minimized.

Copy link

commented Jun 21, 2019

@KrishnaSannasi I agree with you that you can't use &i32 and expect to get anywhere safely, but my thought is to have the compiler generate new types for fields, like so (CAVEAT I'm still learning rust, I probably got the syntax wrong):

// Compiler generated code:
struct FieldDescriptor<F: Field> { ... }

/// Represents a field on some type, must be implemented by the compiler, 
/// and only the compiler
unsafe trait Field {
    /// The type that the fields belongs to
    type Parent: ?Sized;
    /// The type of the field itself
    type Type: ?Sized;
    /// Describes the data needed to convert from 
    /// *[const|mut] Self::Parent to *[const|mut] Self::Type
    const FIELD_DESCRIPTOR: FieldDescriptor<Self>;
}

// User code:
struct Point { x: i32, y: i32 }

impl Field for Point::i32 {     /// Newish syntax here, maybe?
    type Parent = Point;
    type Type = i32;
    const FIELD_DESCRIPTOR: FieldDescriptor<Self>;
}

fn to_point(x: &Point::i32) -> &Point {
    x.inverse_project() /// OK, I'm not sure how the field value gets used here.
}

*Point::i32 acts like an ordinary *i32, but isn't directly interchangeable for one; while inverse_project() is defined on a *Point::i32, it isn't defined on an *i32. (*i32)::from(*Point::i32) is defined, but not (*Point::i32)::from(*i32). I'm sure that there are additional rules that would need to be fleshed out, but the gist of this is that rust has strongly defined types, and we can take advantage of that (assuming that it's possible to extend the language as I've shown). I think that this may also make your proposal safer as the lifetimes of projected pointers cannot be longer than the lifetimes of the fields that they point to. This can be part of the type of the Point::i32, which gives the compiler a chance to check the code at compile time.

@KrishnaSannasi

This comment has been minimized.

Copy link
Author

commented Jun 21, 2019

Oh, I see, you are thinking that the projected pointers retain some meta-data (either compile-time metadata or runtime, doesn't matter) about the projection. This will not happen in the current proposal, but I can see how it would be useful. Currently, after a pointer is projected it indistinguishable from a different pointer of the same type.

Just to be consistent with the RFC right now, the syntax would be

type RefToX<'a> = <&'a Point as Project<Point.x>>::Output;

impl Field for Point.x {     /// Newish syntax here, maybe?
    type Parent = Point;
    type Type = i32;
    const FIELD_DESCRIPTOR: FieldDescriptor<Self>;
}

fn to_point(x: RefToX<'_>) -> &'_ Point {
    x.inverse_project() /// OK, I'm not sure how the field value gets used here.
}
@ckaran

This comment has been minimized.

Copy link

commented Jun 21, 2019

@KrishnaSannasi Yes, that is exactly what I was assuming. If that isn't so, then you're right, there is no way to do a safe inversion.

@KrishnaSannasi

This comment has been minimized.

Copy link
Author

commented Jun 21, 2019

That's a cool idea, I put it in the alternatives section and think about it some more!

@ckaran

This comment has been minimized.

Copy link

commented Jun 21, 2019

@KrishnaSannasi Thank you! Like I said, it may make your proposal safe as well, simply by giving the compiler more information about what is being pointed to. I would have to think about it a lot more before I could guarantee that though...

@CAD97

This comment has been minimized.

Copy link

commented Jun 22, 2019

The other option is to just make reverse projection work on pointers but not references. Then it's "just" sugar around the raw "offsetof" version, but with the compiler doing the offsetting for you.

Just as an utterly silly suggestion: field_ptr.project(-Struct.field). More realistically, have some unproject or similar to offset in the opposite direction.

The point of providing the opposite direction is just to provide the tools for raw pointer manipulation to use the information from the compiler.


Here's an alternative take that just thinks about raw pointers and not types (as the generalization can be done outside std theoretically): make Struct.field special "offset-like" types that can be given to ptr::offset. This gives you the inbounds/wrapping selection for free with syntax of struct_ptr.offset(Struct.field). (As changing the declaration of offset would be breaking, this basically comes back to .project anyway.)


I think I'm loosely in favor of the minimal API that just adds the manipulation of raw pointers based on types rather than offsets, and allows the ecosystem to build from there. And an option for reverse projection adds a reason (other than unfamiliarity and subtlety) for that projection to not "just" be ptr.field.

@KrishnaSannasi

This comment has been minimized.

Copy link
Author

commented Jun 22, 2019

@CAD97 I was thinking about trimming this RFC down to just raw pointers, I think I will do that and shove all of the Project stuff into another external crate that could be folded into std in another RFC if there is sufficient motivation for it.

@ckaran

This comment has been minimized.

Copy link

commented Jun 24, 2019

@CAD97 @KrishnaSannasi is there any reason why any of this couldn't be put into the core library? As near as I can tell, most of what we're talking about here would be accomplished by the compiler, without any need for runtime support. That would allow embedded systems to use this feature as well.

Also, while raw pointers seem like a good first step, I don't think that we need to start with them. With references, you can guarantee that a projection won't outlive what it is referencing, and an inverse projection can't be created without a projection in the first place, right? The only issue I can think of is projecting out of Arc<_> or some other smart pointer type, but I suspect that the type information would include the Arc as a part of it.

@KrishnaSannasi

This comment has been minimized.

Copy link
Author

commented Jun 24, 2019

Yes, this should be in core. I still think that we should start with raw pointers because we can build an api for references on top of the raw pointer api, but not the other way around.

KrishnaSannasi added some commits Jun 24, 2019

removed `trait Project` and `trait PinProjectable`
They increased the scope of the RFC too much and were not critical for
for this RFC to be useful, so they were cut from the RFC.
@KrishnaSannasi

This comment has been minimized.

Copy link
Author

commented Jun 24, 2019

I removed Project and PinProjectable from the RFC and reduced to just field types and projections over raw pointers. I also added @ckaran's idea for inverse projections, reason being that project_unchecked is like add, wrapping_project is like wrapping_add, inverse_project_unchecked is like sub and inverse_wrapping_project is like wrapping_sub. So it made sense to add the inverse projections.

The PinProjectable trait was rather ill-formed and didn't work that well with a blanket impl for Unpin types.

@ckaran

This comment has been minimized.

Copy link

commented Jun 24, 2019

@KrishnaSannasi Do we really need the ability to project from raw pointers? I'm having a really hard time thinking up any situation where that would be necessary.

@CAD97
Copy link

left a comment

(I might have some more to say later but I wanted to go ahead and give you these notes:)

Show resolved Hide resolved text/0000-ptr-to-field.md Outdated
Show resolved Hide resolved text/0000-ptr-to-field.md Outdated
Show resolved Hide resolved text/0000-ptr-to-field.md Outdated
Show resolved Hide resolved text/0000-ptr-to-field.md Outdated
Show resolved Hide resolved text/0000-ptr-to-field.md Outdated

We need both `project_unchecked` and `wrapping_project` because there are some important optimization available inside of LLVM related to aliasing and escape analysis. In particular the LLVM `inbounds` assertion tells LLVM that a pointer offset stays within the same allocation and if the pointer is invalid or the offset does not stay within the same allocation it is considered UB. This behaviour is exposed via `project_unchecked`. This can be used by implementations of the `Project` trait for smart pointers that are always valid, like `&T` to enable better codegen. `wrapping_project` on the other hand will not assert `inbounds`, and will just wrap around if the pointer offset is larger than `usize::max_value()`. This safe defined behaviour, even if it is almost always a bug, unlike `project_unchecked` which is UB on invalid pointers or offsets.

This corresponds with `core::ptr::add` and `core::ptr::wrapping_add` in safety and behaviour. `project_unchecked` and `project` need to be implemented as intrinsics because there is no way to assert that the pointer metadata for fat pointers of `Field::Parent` and `Field::Type` will always match in general without some other compiler support. This is necessary to allow unsized types to be used transparently with this scheme.

This comment has been minimized.

Copy link
@CAD97

CAD97 Jun 24, 2019

So wait, how does this interact with unsized types exactly? This needs to be spelled out.

Since this is specifically to give vocabulary to the direct manipulation of the pointers, I suspect that this should require that the field pointer's metadata is derivable from the parent metadata. Note that this doesn't require equivalent metadata: a theoretical pointer with two metadatas could be split into two child fat pointers with one or the other metadata.

The intrinsics live in core::intrinsics, of course.

This comment has been minimized.

Copy link
@KrishnaSannasi

KrishnaSannasi Jun 25, 2019

Author

Note that this doesn't require equivalent metadata: a theoretical pointer with two metadatas could be split into two child fat pointers with one or the other metadata.

Actually, you can only have 1 unsized field on a type, and it must be the last field, so the metadata will be exactly the same.

Also, I haven't considered how custom DST will interact with this, but that still hasn't been accepted. I will add that as an unanswered question.

The intrinsics live in core::intrinsics, of course.

Yes, I will make it clear where everything lives in the next revision. I see now that it wasn't clear enough

Show resolved Hide resolved text/0000-ptr-to-field.md Outdated
@KrishnaSannasi

This comment has been minimized.

Copy link
Author

commented Jun 25, 2019

@ckaran it's not about the raw pointers themselves, but a foundation to build other apis. If we just provided references for example, we couldn't build an api for pretty must any other smart pointer. For references, this api is pretty useless because references already have projections. However, raw pointers don't already have projections and every other smart pointer can boil down to them, so it is the perfect foundation.

KrishnaSannasi and others added some commits Jun 25, 2019

Update text/0000-ptr-to-field.md
Co-Authored-By: Christopher Durham <cad97@cad97.com>
Update text/0000-ptr-to-field.md
Co-Authored-By: Christopher Durham <cad97@cad97.com>
updated wording based off of @CAD97's suggestions
updated `FiedDescriptor` type to make it fool-proof.

added example of inverse projection UB and invalid pointers
to complement projections.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.