Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for a match based surface syntax to get pointer-to-field #2666

Open
wants to merge 10 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@HeroicKatora
Copy link

commented Mar 22, 2019

Extend match syntax (*) and patterns (raw const, raw mut) by support for a limited set of operations for pointers, which involve only address calculation and not actually reading through the pointer value. Make it possible to use these matches to calculate addresses of fields even for repr(packed) structs and possibly unaligned pointers where an intermediate reference must not be created.

Rendered

This should be read as complementary surface syntax to the MIR changes proposed in #2582 . Now, somewhat coincidentally, the author @RalfJung decided to rewrite the other PR to also include a surface syntax. I was unaware of this parallel process . If desired, one could read the match/pattern syntax as complementary to the place expressions proposed there but since the idea was floating around for a while, I address the downsides in comparison to match in Rationale and Alternatives already.

/// * be properly aligned.
unsafe fn get_if_init<'a>(w: *const Weird) -> Option<&'a Weird> {
let Weird { raw const a, ..} = w;
match core::ptr::read(a) {

This comment has been minimized.

Copy link
@oli-obk

oli-obk Mar 22, 2019

Contributor

why would this read a integer? Isn't a of type *const bool?

This comment has been minimized.

Copy link
@HeroicKatora

HeroicKatora Mar 22, 2019

Author

You're right this needs a cast first, core::ptr::read(a as *const u8)

This comment has been minimized.

Copy link
@HeroicKatora

HeroicKatora Mar 22, 2019

Author

Thank you for spotting, fixed it.

}
```

Note that pointer binding mode and pointer pattern requires `unsafe`, even when

This comment has been minimized.

Copy link
@oli-obk

oli-obk Mar 22, 2019

Contributor

There have been multiple references to these unsafe pointer patterns, but no examples or further explanations. What are they and why are they necessary? Aren't raw mut or raw const patterns sufficient?

This comment has been minimized.

Copy link
@HeroicKatora

HeroicKatora Mar 22, 2019

Author

Pointer patterns are the counter part of reference patterns, necessary for disambiguation in some special cases. They use * <subpattern>, paralleling & <subpattern> and pointer binding mode automatically adds the top-level pointer pattern, the same as the reference pattern implied by reference binding mode. I'll add an example showing the necessity but it boils down to being able to match a pointer by-value and its content with raw pattern, the former being necessary for backwards compatibility.

This comment has been minimized.

* `raw (const|mut) identifier`; allowed for field bindings and identifier bindings.
These are allowed in the grammar where `ref? mut? identifier` is allowed
currently. For this purpose `raw` is a contextual keyword.
* `* <subpattern>`; to match a pointer not by value but to additionally use

This comment has been minimized.

Copy link
@Nemo157

Nemo157 Mar 28, 2019

Contributor

Is there a reason to have this be a generic * pattern instead of separate *const, *mut patterns?

This comment has been minimized.

Copy link
@HeroicKatora

HeroicKatora Mar 28, 2019

Author

Not really, and it does seem better to stay consistent with &/&mut.

This comment has been minimized.

Copy link
@rpjohnst

rpjohnst Mar 29, 2019

Using *const/*mut isn't really consistent with &/&mut in a meaningful way, AFAICS. Patterns follow their corresponding expression syntax, and while the way to construct a &T/&mut T is &x/&mut x, the way to construct a *const T is not *const x.

This comment has been minimized.

Copy link
@Nemo157

Nemo157 Mar 29, 2019

Contributor

So.....

match (0 as *mut usize) {
    &mut raw const z as *mut usize => (),
}

(definitely not serious).


It occurs to me that I mostly think about patterns following the type syntax rather than expression syntax, which is likely why I made this comment. It just works out that those are the same most of the time because of struct expressions following the type syntax.

This comment has been minimized.

Copy link
@HeroicKatora

HeroicKatora Mar 31, 2019

Author

Agree with @Nemo157 here, I had always assumed it followed type syntax due to struct-patterns and enum-patterns. (Although that is kind of a weak argument, I really can't say for sure why that felt more natural. Maybe because it doesn't support any operators?). Also, enum variants:

// current nightly makes this possible, with
// #![feature(type_alias_enum_variants)]
enum Foo {
    A,
}

impl Foo {
    fn test(self) {
        // gives various errors on different other rust versions.
        let Self::A = self;
    }
}

Edit: But that's kind of besides the point.

Make pointer pattern const/mut qualification necessary
It's more consistent with reference patterns that way and there is no
inherent upside to not having to specify it that I know of.
/// * point to memory valid for the chosen lifetime `'a`
/// * be properly aligned.
unsafe fn get_if_init<'a>(w: *const Weird) -> Option<&'a Weird> {
let Weird { raw const a, ..} = w;

This comment has been minimized.

Copy link
@RalfJung

RalfJung Mar 30, 2019

Member

Does this auto-deref, or how does this typecheck?

Your example below with match (0 as *mut usize) does not look like auto-deref happens, but then shouldn't this be let *const Weird { raw const a, .. } = w?

This comment has been minimized.

Copy link
@HeroicKatora

HeroicKatora Mar 31, 2019

Author

match doesn't auto-deref in the ops::Deref sense of expressions. I'd rather view it as only adding reference patterns automatically where required, the implementation kind of agrees. I only suggest automatically wrapping into pointer patterns if required as well (and maybe only in pointer binding mode). Which pattern to add should never be ambiguous–in a top-down pass of the pattern after the rhs of the match is typed we know the required pattern kind–but probably deserves its own section anyways.

This comment has been minimized.

Copy link
@HeroicKatora

HeroicKatora Mar 31, 2019

Author

Your example below with match (0 as *mut usize) does not look like auto-deref happens

Automatically adding reference for pointers has the slightly unfortunate side-effect of colliding with the ability to match pointers by const value (i.e. core::ptr::null()), which does not exist for references afaik. The example was intended to show that disambiguation should be backwards compatible but didn't show automatically adding the pattern.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Apr 15, 2019

Member

match doesn't auto-deref in the ops::Deref sense of expressions. I'd rather view it as only adding reference patterns automatically where required

I do not see how that makes any fundamental difference. The fact remains taht you can write a pointer deref and incur a memory access without ever typing *, such as in

fn foo(x: &bool) -> bool {
    match x {
        true => true,
        false => false,
    }
}

For raw pointers, we want to avoid this because accessing memory through them is unsafe and should only be done explicitly. In this sense, auto-deref on raw pointers and auto-&-pattern in match have the same concern.

This is not about ambiguity, this is about calling out to whoeever reads this code that a raw pointer is being dereferenced. I don't think this should compile:

fn foo(x: *const bool) -> bool {
    unsafe { match x {
        true => true,
        false => false,
    } }
}

This comment has been minimized.

Copy link
@HeroicKatora

HeroicKatora Apr 15, 2019

Author

That code would not work. While the pointer pattern is added automatically, it still does not allow any of the value-reading patterns to occur within it. So the value pattern true within in that example would desugar to *const true and then be denied on the grounds that a value pattern occurred inside a pointer pattern. I believe that the wording correctly captures this part of the mechanic but I could have misphrased it. Hence the distinction of auto-deref vs. adding a pattern, because the latter does not circumvent the pattern structure based checks.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Apr 15, 2019

Member

That would indeed fix this concern. It wasn't clear to me from the RFC; there should be examples for this kind of rule.

@RalfJung

This comment has been minimized.

Copy link
Member

commented Mar 30, 2019

I can see good arguments for having pattern syntax on top of explicit pointer-taking syntax, just like we do for references. However, it would be rather strange IMO to have only pattern syntax and not also a direct way to express this -- that would be very inconsistent with the handling of references. I also think that would hurt teachability, as you'd have no more direct way to express what happens in these patterns.

Pattern matching is already hugely complicated with guards and their evaluation order concerns, implicit derefs, handling of empty types, and so on. It is arguably the most complex language construct, and in fact it is the main reason why I pretty much refuse to do anything formal about surface Rust or the HIR -- I prefer working on the MIR where pattern matching has been almost entirely lowered away. At least most of the things it does can be explained as being equivalent to other language constructs: everything except for actually determining the discriminant of an enum. This RFC would add "create raw pointers" as the second such match-only operation. I do not think that is a good idea.

In that light I find myself disagreeing with the arguments in the "Rationale and alternatives" section:

Place expressions are overloaded with auto-deref, custom indexing (core::ops::Index/core::ops::IndexMut), invoking arbitrary user code. A solution with place syntax needs to explicitely forbid these forms of place statements, both to disallow user code and avoid accidental reference intermediates. The new statements thus resembles a very different other statement.

There is nothing new about this, we already do not do auto-deref for raw pointers. The custom code entry points you mention all have to do with using, not creating pointers. Adding a new way to create (raw) pointers (not subject to auto-deref) does not interact with them in any interesting way, from what I can tell.

Also, we do not auto-deref for raw pointers in match either. Do you suggest we do? If yes, it will have to be restricted to raw patterns, and at that point you have the same almost-but-not-quite-the-same-as-references situation as place expressions do.

Do you have examples for what you are worried about here?

The initial dereferencing of the pointer necessary for a place expression (struct.field is implicitely (*struct).field for a reference argument struct) will not work with pointer arguments, which do no automatically dereference even in unsafe code (and arguably should not, outside &raw).

This is an argument for a ->-like operator, or maybe doing auto-deref in more situations (like when no actual access happens). I do not see how "limited raw pointer auto-deref" in match is any easier to understand than in place expressions.

Also, &raw mut (*x).field is still shorter and IMO more readable than match x { Struct { raw mut field, .. } => field }, even though the pattern uses auto-deref and the place expression does not. I expect in many cases these pointers will be used only once, so they should be ergonomic to use even without a let-binding.

It provides a clear pattern that extends to enum fields in packed structs, which are not absolutely not expressible in place syntax.

You can totally use &raw to take the address of a field of enum type. I assume you are talking about taking pointers into enums, but there a pattern matching syntax does not help either: you can only match on an enum after you initialized it, at which point you might as well create a reference and not just a raw pointer.

Or maybe I am misunderstanding what your point is here. An example would help.

@HeroicKatora

This comment has been minimized.

Copy link
Author

commented Mar 31, 2019

Also, &raw mut (*x).field is still shorter and IMO more readable than match x { Struct { raw mut field, .. } => field }, even though the pattern uses auto-deref and the place expression does not. I expect in many cases these pointers will be used only once, so they should be ergonomic to use even without a let-binding.

Maybe it should be pointed out specifically that it works for irrefutable bindings perfectly fine, and both pattern let and match take a place-expression as their input, not a value expression. Although it does not retain the auto-deref clarity for the operand, it is fine to use a pattern binding let ref _ = <place> or a match expression as an alternative to & <place> for getting a reference as well.

// Effectively same as `let r = &x.field`.
let ref r = x.field;

// Proposed pointer variant, no reference to potentially packed-struct-field `field`
let raw const ptr_to_field = x.field;

Edit: I've updated the introductory example to make use of this and explained its validity in the reference level section.

@HeroicKatora

This comment has been minimized.

Copy link
Author

commented Apr 1, 2019

I can see good arguments for having pattern syntax on top of explicit pointer-taking syntax, just like we do for references. However, it would be rather strange IMO to have only pattern syntax and not also a direct way to express this -- that would be very inconsistent with the handling of references.

Also in light of let ptr const field = , I generally agree with this and would also like an expression. However, I expect syntax questions to be much less controversial here¹. It is pretty clear which existing patterns to mirror in the design, mostly the keyword may be different. And, since you note that match and expr syntax should be complementary, I see no reason not to adapt the more straightforward one first when they achieve very similar things with the same basic means. Previously, tuples also had their expression extractors (with integer fields) added after pattern matching was possible.


¹ Short summary for the problems with pointer-from-place in expressions: In match, references are matched by reference patterns, and optionally inferred, for which they use the type sigils. The operation of creating one has a single purpose pattern with a contextual keyword. In expressions however, the type sigil for pointers is already in use for dereferencing but the reference sigil is reuse for reference taking. Also, since the reference wraps the place expression, safety and ergonomic concerns must be balanced when considering if the place expression must be valid regardless of such a wrapping, as taking out the place-subexpression results in stricter requirements on the validity of the involved memory than when it occurs inside an imagined raw-ptr-expression.

@dtolnay dtolnay added the T-lang label Apr 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.