Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: RArrow Dereference for Pointer Ergonomics #3577

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

EEliisaa
Copy link

@EEliisaa EEliisaa commented Feb 20, 2024

This RFC improves ergonomics for pointers in unsafe Rust. It adds the RArrow token as a single-dereference member access operator. x->field desugars to (*x).field, and x->method() desugars to (*x).method().

Before:

(*(*(*pointer.add(5)).some_field).method_returning_pointer()).other_method()

After:

pointer.add(5)->some_field->method_returning_pointer()->other_method()

Rendered

@EEliisaa EEliisaa changed the title RFC: Ergonomic Pointers with RArrow Dereference RFC: RArrow Dereference for Pointer Ergonomics Feb 20, 2024
@ChayimFriedman2
Copy link

An alternative is to make a postfix dereference operator (v.*.field or something alike).

@traviscross traviscross added the T-lang Relevant to the language team, which will review and decide on the RFC. label Feb 21, 2024
@EEliisaa
Copy link
Author

An alternative is to make a postfix dereference operator (v.*.field or something alike).

This is a good alternative. With .* even more expressions become ergonomic than with RArrow. In particular long ones that end with a dereference without a field or method access. As far as I can see there are no grammar ambiguities that prevent this either. Most important is that some kind of non-prefix dereference operator exists. React with either:

  1. ❤️ - If you want .*
  2. 🎉 - If you want ->
  3. Both ❤️ and 🎉 if you want both.

@Lokathor
Copy link
Contributor

Lokathor commented Feb 21, 2024

Clarification Question: Are you suggesting p.* = 3; for postfix "whole thing" assignment and p.*.field for postfix field accessing?

@EEliisaa
Copy link
Author

EEliisaa commented Feb 21, 2024

Clarification Question: Are you suggesting p.* = 3; for postfix "whole thing" assignment and p.*.field for postfix field accessing?

Yes, p.* would be equivalent to (*p). Meaning:

p.* = 3; is equivalent to (*p) = 3;
p.*.field is equivalent to (*p).field

@Lokathor
Copy link
Contributor

Then maybe p.*field (one less .) for field accessing, which is what I typoed while posting my question above. Easy to miss the little dot with all the rest of the punctuation.

@EEliisaa
Copy link
Author

Then maybe p.*field (one less .) for field accessing, which is what I typoed while posting my question above. Easy to miss the little dot with all the rest of the punctuation.

With one less ., there would be an ambiguity between postfix .* and infix .*. The last dot is not a part of the operator.

@Lokathor
Copy link
Contributor

When would you have a place followed by another place in an expression or statement?

@EEliisaa
Copy link
Author

EEliisaa commented Feb 21, 2024

When would you have a place followed by another place in an expression or statement?

Never, it would not be a grammar ambiguity. It would be a readability ambiguity. It could also be confused with a.*b in C++.

@Lokathor
Copy link
Contributor

Well, now we're into the realm of opinions I suppose.

p.*.field.*.field just seems like kinda too much.

Also I don't know C++ so I've got no idea what that a.*b thing would do in C++.

@EEliisaa
Copy link
Author

EEliisaa commented Feb 21, 2024

Well, now we're into the realm of opinions I suppose.

p.*.field.*.field just seems like kinda too much.

Also I don't know C++ so I've got no idea what that a.*b thing would do in C++.

The RArrow operator doesn't have this problem. Perhaps this is a reason to have both postfix .* and infix ->?

@kennytm
Copy link
Member

kennytm commented Feb 21, 2024

@Lokathor If you think there are too many dots I'd prefer a*.b rather than a.*b, the latter looks like it's dereferencing b.

(indeed as C++ is mentioned, a.*b is the pointer-to-member access operator where b is a pointer-to-member variable.)


Also, while I think it's not a concern in practice, the following is valid Rust today:

fn main() {
    dbg!(5.*-6.0);
}

@Lokathor
Copy link
Contributor

That wouldn't actually break, since 5 isn't an identifier

@kennytm
Copy link
Member

kennytm commented Feb 21, 2024

in a.* the expression a doesn't need to be an identifier, a[i].*, a(x,y).*, a.5.* are all valid.

it certainly can't break in practice, just that I think the parser needs more special rules to distinguish a.5.* from 5.*, not really a big deal.

@clarfonthey
Copy link
Contributor

p.*.field.*.field just seems like kinda too much.

I thought about this a lot earlier today and honestly, I disagree. For a while I was almost very convinced of this as a reason why we should adopt the -> operator, but ultimately, the issue is actually not that -> would be nice, but that it's not enough.

If you have a long expression you want to dereference, it seems counter-intuitive that the dereferencing happens from left-to-right except the last one, which is placed at the very beginning. So, you'd probably want to add a .* at the very end to accomplish that. Alternatively, you could end with a postfix arrow, which just feels wrong.

Ultimately, .*. isn't that bad of an operator; you can type it pretty easily by doing periods with your right hand and shift+8 with your left hand, or by using a numpad. It looks a bit weird, but it makes the dereferencing abundantly clear in the middle of the expression (which is where the unsafety happens), whereas with arrows your brain kind of tends to gloss over them. (At least, mine does.)

So, I'm more in favour of postfix dereference than right-arrows, but I do think that it's important to explore why. It makes a lot of sense why C had them and still does, but I don't think that Rust should, especially with its focus on memory safety, since we want the dereferences to stick out in the middle of the code as places where bad things can happen.

@Lokathor
Copy link
Contributor

If you have a long expression you want to dereference, it seems counter-intuitive that the dereferencing happens from left-to-right except the last one, which is placed at the very beginning.

If a->b is (*a).b then it's already in "dereferenced form".

@clarfonthey
Copy link
Contributor

If a->b is (*a).b then it's already in "dereferenced form".

The point here is that (*a).b is possible with arrows, but not *(*a).b. In other words, you can doa.*.b.*.c.*.d.* but only *a->b->c->d.

@kennytm
Copy link
Member

kennytm commented Feb 21, 2024

For the original RArrow proposal, are these supported or not?

let a: *const [u8; 256];
(*a)[3];
// a.*[3];
// a->[3]; //?

let f: *const fn(u32) -> u32;
(*f)(5);
// f.*(5);
// f->(5); // ?

let o: *const Option<NonNull<u64>>;
(*o)?.as_ref().checked_add(7)?;
// o.*?.as_ref().checked_add(7)?;
// o->?->checked_add(7)?;

@CraftSpider
Copy link

I really like the idea of postfix dereference via .*, especially with the examples given by @kennytm - while trailing arrows could be allowed, at least to me the postfix star syntax feels cleaner, especially as the last operator in a sequence. a-> = 1 feels very odd, while a.* = 1 looks better. I'll also note I'm not generally in favor or adding new operators or syntax without good reason, but .* feels much more like just allowing an existing operator in a new way (think postfix match or similar - it's really just allowing *a to be written postfix)

@tgross35
Copy link
Contributor

Existing similar things: with std::ops::Deref in scope, foo.deref() is an existing postfix equivalent to *foo for safe code. ptr.read() is postfix but copies the value. ptr.as_ref().unwrap_unchecked() is &*ptr. Of course a library solution can't provide place-ness.

A keyword like ptr.deref.foo looks nicer than ptr.*.foo IMO, but that is a new bag of worms (and more characters).

@joshtriplett
Copy link
Member

joshtriplett commented Feb 29, 2024

👍 for the idea of postfix dereference.

.* is a reasonable choice:

  • Advantage: self-explanatory.
  • Advantage and disadvantage: uses familiar symbols (but that means it's also easy to mentally mis-lex in a few ways).
  • Minor disadvantage: it's two characters.

Another I've seen proposed for postfix dereference is ^: ptr^.method().

  • Advantage: one character
  • Advantage: won't be mistaken for anything else
  • Disadvantage: not self-explanatory
  • Disadvantage: unfamiliar to most people

@Lokathor
Copy link
Contributor

I would strongly favor postfix ^

It's unfamiliar in the instant you first see it, but it feels like you learn it once and then you don't forget it.

@RalfJung
Copy link
Member

RalfJung commented Feb 29, 2024

It does look a lot less noisy, yes.

A point worth considering: we don't have many ASCII characters left, is this a good enough use case to burn one of them? It might well be.

Are there parsing issues? ^ is also XOR. So ptr ^ .5 could be mistaken as XOR of ptr and a float value. Now what if ptr is a custom type that implements both Deref and BitXor<f32>? That seems nonsensical but then both parsings would even yield well-typed results I think?

@Lokathor
Copy link
Contributor

Lokathor commented Feb 29, 2024

Case 1: The compiler will tell you that "float literals must have an integer part". You currently have to write it as ptr ^ 0.5 if you wanted to "xor with an f32", which seems a lot more difficult to misread (though still possible).

Case 2: Just playing around with it a bit, ops used with punctuation (eg: a^b instead of a.bitxor(b)) don't seem to trigger "deref and try again" logic when the impl is missing. You just immediately get the error.

@kennytm
Copy link
Member

kennytm commented Feb 29, 2024

^ was used in Pascal and its derivative because they do use this character to indicate pointer type (var p : ^Integer ; p := @v; p^ := 123). This is not the case for Rust though, which IMO would be quite confusing if used.

and again because ^ is already bitxor you have the same #3577 (comment) issue around prefix vs binary -.

fn main() {
    let p = &10;
    dbg!(p^ - 5);
}

@matthieu-m
Copy link

I feel like one extremely important point that is not being discussed here is the very desugaring.

Is desugaring x->field to (*x).field really a good idea in the first place?

The problem of *x is that it creates a reference to x, with all that entails:

  • x better not be null.
  • x better be well-aligned.
  • x better point to a sufficiently sized memory block.
  • x better refer to a live value.

And quite importantly... creating the reference to x better not step on another live borrow.

I can only speak from my own experience, but in general, if I could have a reference instead of a pointer, I would have a reference instead of a pointer. Instead, if I've got a pointer in my hands, it's because there's something special about it, and borrowing is quite often what's special.

Accidentally borrowing is terrible: it introduces UB. This goes against the very goals of this RFC: there's nothing ergonomic about introducing UB.

Which, at this point, makes me question the very motivating example:

pointer.add(5)->some_field->method_returning_pointer()->other_method()

Where is the // SAFETY comment here?

  • The add is not justified to be sound.
  • The ->some_field is not justified to be sound.
  • The ->method_returning_pointer() is not justified to be sound.
  • The ->other_method() is not justified to be sound.

And since you need to justify each and every step -- yes, really, that's the burden you took on when you decided to write unsafe code -- then you may as well break them down so it's clearer which justification refers to which step:

//  SAFETY:
//  - `pointer` points to a sequence of at least 6 elements since <...>.
let element = pointer.add(5);

//  SAFETY:
//  - `element` is not null and well aligned since `pointer` was.
//  - `element` points to a sufficiently sized memory block since `pointer` pointed to a sufficiently sized sequence.
//  - `element` points to a live value since <...>.
//  - `element` can be borrowed immutably since <...>.
let element = &*element;

//  SAFETY:
//  - `element.some_field` is not null and well aligned since <...>.
//  - `element.some_field` points to a sufficiently sized memory block since <...>.
//  - `element.some_field` points to a live value since <...>.
//  - `element.some_field` can be borrowed immutably since <...>.
let some_field = &*element.some_field;

let pointer = some_field.method_returning_pointer();

//  SAFETY:
//  - `pointer` is not null and well aligned since <...>.
//  - `pointer` points to a sufficiently sized memory block since <...>.
//  - `pointer` points to a live value since <...>.
//  - `pointer` can be borrowed immutably since <...>.
let thing = &*pointer;

thing.other_method()

And I think we can argue that once due diligence is made, &* vs -> is the least of our worries.


I note that there's value in projection because it enables navigating the fields without forming intermediate references which could potentially blow up in our faces.

@RalfJung
Copy link
Member

The problem of *x is that it creates a reference to x, with all that entails:

I am not sure what you mean, but it doesn't create a reference. It creates a place. The requirements you state only apply if the place is later turned into a reference, but that may or may not happen.

I note that there's value in projection because it enables navigating the fields without forming intermediate references which could potentially blow up in our faces.

Again, this should be "intermediate places".
Other than that I think this is basically rephrasing this earlier argument. It hasn't been picked up in follow-on discussion much.

I agree that the ~ operator is valuable even if this RFC gets accepted, but postfix deref seems valuable and aligned with modern Rust even if ~ is a thing. (Note that the discussion moved away from -> and towards postfix deref.)

@Lokathor
Copy link
Contributor

Also, I don't believe that anyone is suggesting that p-> or p.* or p^ or any other syntax would be a safe operation. So, you'd still have it within an unsafe block and you can still put every single safety comment you want on that block or within that block or wherever you like.

Personally, I think you're overdoing it quite a bit with a list of comments on every single access.

@EEliisaa
Copy link
Author

EEliisaa commented Mar 1, 2024

The problem of *x is that it creates a reference to x

No, it does not. It creates a place.

If I could have a reference instead of a pointer, I would have a reference instead of a pointer

Hence the term irreducible encapsulation.

Accidentally borrowing is terrible: it introduces UB. This goes against the very goals of this RFC: there's nothing ergonomic about introducing UB.

You got it backwards. Since it does not create a reference, this RFC reduces UB.

@steffahn
Copy link
Member

steffahn commented Mar 1, 2024

I agree that the ~ operator is valuable even if this RFC gets accepted, but postfix deref seems valuable and aligned with modern Rust even if ~ is a thing. (Note that the discussion moved away from -> and towards postfix deref.)

Interesting idea! Combining the two, one could go as far as to lint against any use-case of deref on pointers that does not claim access to the whole pointed-to value. Assuming all those cases could then use ~ instead.

That way .* on a raw pointer always means about as much as taking a reference to the whole pointed-to value.1 The only remaining implicitness then would be whether that by-reference access is immutable or mutable.

Footnotes

  1. Making a copy (powered by Copy trait) of the value falls under access-by-immutable reference; AFAICT the safety conditions should be the same. Similarly, assigning to the value falls under access-by-mutable reference. Anything else you could do to a place?

@matthieu-m
Copy link

The problem of *x is that it creates a reference to x, with all that entails:

I am not sure what you mean, but it doesn't create a reference. It creates a place. The requirements you state only apply if the place is later turned into a reference, but that may or may not happen.

Thanks for the correction. I knew of places but I typically just immediately turn them into references so didn't think of the distinction.

I tried searching, but could not find, the safety requirements for turning a pointer into a place. Are those the requirements of derefencing a pointer? (So everything I listed but borrowing)

You got it backwards. Since it does not create a reference, this RFC reduces UB.

Unless, of course, -> (or whatever) is used to call a method, right?

Not creating a reference is nice. Though I do note there's likely still quite a laundry list of pre-conditions which need to be validated, regardless.

@Lokathor
Copy link
Contributor

Lokathor commented Mar 1, 2024

A place isn't quite an operation of its own. Making a place is one step in read or writing, in which case either the reading or writing rules apply, for example.

EDIT: also, yes, calling a method can create a reference depending on the method used. However, even using self methods on a value behind a pointer would need to read the pointer to get the self value so there's not a way for methods to fully safely be used with pointers or anything like that.

@matthieu-m
Copy link

Is there any way to apply -> (or .* or whatever) to a user-defined type?

I tend not to use raw pointers a lot, because I like to leverage types to enforce invariants. At the very least, this means using NonNull<T>, and signalling potential nullity via Option<NonNull<T>>.

I would expect the ability to define -> (or .*, ...) on such user-defined types.

Is there a way to represent places in the type system so that writing the function is possible?


Otherwise, as mentioned by @steffahn, we may be better off having two operators:

  • A projection operator -- for which -> would make sense -- which goes from *const T to *const U or NonNull<T> to NonNull<U>, etc... no dereference occurs.
  • The regular Deref and DerefMut operators, which form a reference, and can simply be invoked either via prefix or postfix syntax (or regular method calls).

This way, custom types can benefit from the syntax sugar instead of being second-class, and it's clear to the reader whether a reference is formed, or not.

@RalfJung
Copy link
Member

RalfJung commented Mar 2, 2024

Is there any way to apply -> (or .* or whatever) to a user-defined type?

.* is exactly the same as prefix *. So, it calls Deref/DerefMut as usual.

Something like DerefRaw would be a completely separate RFC, that has basically nothing to do with this RFC.

@ibkevg
Copy link

ibkevg commented Mar 4, 2024

Unsafe code is frequently used to interface to C and adopting -> reduces the impedance match with C syntax.

People that work on low level Rust code will typically have codebases that include significant amounts of C code. I think the two will have to coexist for a very long time yet. Making syntax similar where it is easy to do so will really help people in this situation.

@RalfJung
Copy link
Member

RalfJung commented Mar 4, 2024

One of the things that makes Rust an interesting option is not being like C in a bunch of dimensions, including syntactic ones (e.g, not copying ? :). Furthermore, one of the goals of Rust is to bring more people into systems programming, people not already familiar with C/C++. So this is really not a very convincing argument to me. The syntax should be able to stand on its own, without arguments like "it's how other languages do it" -- and IMO -> does not pass that test. It is strictly less expressive than postfix deref.

In contrast, making more things postfix has been one of the things that have consistently worked out well for Rust (postfix ?, postfix await, adding postfix methods to raw pointers to replace other operations, and soon probably postfix match as well). Postfix deref fits into that story very well.

It's not always a good idea to just copy things from other languages, even if they work well in those languages. Unsafe code id also frequently used to do things that have nothing to do with C, after all. C programmers should have no problem learning to write .*. or ^. instead of ->.

@RalfJung
Copy link
Member

RalfJung commented Mar 4, 2024

When we encounter design choices that could go either way,

This is not one of these though IMO, postfix deref has numerous advantages over -> as has been stated above by various people.

@Lokathor
Copy link
Contributor

Lokathor commented Mar 4, 2024

At the risk of excessive restatement: the specific biggest advantage of a general postfix deref is that it improves things not just for pointers.

References and types implementing Deref could benefit from a general postfix deref improvement, but they wouldn't be able to use ->, or at least so far it's only been proposed as a pointer operator.

@RalfJung
Copy link
Member

RalfJung commented Mar 4, 2024

To me the biggest point is compositionality. -> fuses deref and field access / method calls into a single syntax, which violates the idea that an operation should be simple and do one thing and then compose with others to give rise to complex behavior. Given that compositional alternatives have been proposed, this needs strong motivation. "C does it" is far from being strong motivation.

Rust has made a number of design decisions, such as adopting curly braces, precisely for the purpose of creating something that doesn't seem too foreign to users of existing languages and in recognition I think that perfection can be the enemy of good.

Rust has never shied away from diverging from prior language when improvements were possible. I mentioned the : ? operator above, you ignored that argument. So familiarity on its own is not an argument, one also needs to argue why the operator is actually a good one -- we don't want to copy mistakes from C just because people are familiar with them. -> does not pass that bar IMO. You haven't even tried to argue for the merits of ->, you argued solely based on familiarity, which does not increase my confidence that -> has merit.

Curly braces were picked not just because they are familiar, but also because the alternatives (significant whitespace, "begin ... end", not sure what else was considered) were considered worse. Familiarity can never stand on its own as an argument -- that would just lead to us repeating past mistakes rather than learning from other language's mistakes.


This discussion quickly reminds me why I never participate in RFCs that need new syntax. It's much less exhausting to make deep changes to the operational semantics of Rust with far-reaching consequences for all unsafe code, than to add a single new piece of syntax. Don't expect an RFC for the ~ operator from me, someone else will have to write that if it's ever supposed to happen. As for this RFC, I made my points and will leave the remaining bikesheding to others. :)

@SOF3
Copy link

SOF3 commented Mar 5, 2024

If we want postfix operators, shouldn't the same be done for other unary operators too, like .!. and .-.? (ok I agree that .-. makes everyone literally .-., but the negation part could be considered together if that's the case)

@kennytm
Copy link
Member

kennytm commented Mar 5, 2024

those are spelled .not() and .neg().

@SOF3
Copy link

SOF3 commented Mar 5, 2024

There is .deref() as well in that case

@Lokathor
Copy link
Contributor

Lokathor commented Mar 5, 2024

Not quite, because deref cannot return "a place", because that's not a real full thing in rust, it's only one step in something else.

@RalfJung
Copy link
Member

TIL that Herb Sutter's cpp2 experiment uses postfix *: variable*.field.

@bjorn3
Copy link
Member

bjorn3 commented Mar 10, 2024

-> fuses deref and field access / method calls into a single syntax

For pin projections that is exactly what you want as you must not expose the unpinned place to the user to avoid unsoundness.

@ibkevg
Copy link

ibkevg commented Mar 10, 2024

TIL that Herb Sutter's cpp2 experiment uses postfix *: variable*.field.

Interesting. He briefly justifies this in his GitHub design note on postfix operators:

"When you have postfix *, there's no need for a separate -> operator, because that is naturally spelled *.. And in fact this just embraces what has already been true since the 1970s in C for built-in types, a->b already means (*a).b, and now we can write it without the parens as simply a*.b."

Seems like postfix deref is pretty widely used, and I've learned Zig and Ada both have it as well.

In the case of Rust, and the expression (*ptr).field is there really any ambiguity if the compiler were to accept ptr.b as meaning the same thing? Basically implicit deref instead of introducing a new ptr/field operator such as -> or *.? Similar to what is done in safe code and I believe both Zig and Ada take this approach as well.

@kennytm
Copy link
Member

kennytm commented Mar 11, 2024

TIL that Herb Sutter's cpp2 experiment uses postfix *: variable*.field.

According to https://github.com/hsutter/cppfront/wiki/Design-note:-Postfix-unary-operators-vs-binary-operators the new language currently disambiguates a*-b with significant whitespace and lookahead:

  • a *-b means (a) * (-b) because there is a space before the `*``
  • a*-b means (a*) - (b) because there is no space before the *, however
  • a*(-b) means (a) * (-b) because the * is followed by a (

the last rule seems to make it impossible to deref and call a pointer to function without an extra pair of parenthesis i.e. (a*)(-b).

@matthieu-m
Copy link

I had not realized Zig had postfix dereference already, as the .* operator.

For reference: https://ziglang.org/documentation/master/#Pointers (see code samples).

@LunarLambda
Copy link

+1 for either *. or ^.. Postfix syntaxes have generally worked out great for Rust, I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet