Skip to content

RFC: new lifetime elision rules #141

Merged
merged 4 commits into from Jul 9, 2014
@aturon
aturon commented Jun 26, 2014

Rendered (draft)

text/

tracking issue: rust-lang/rust#15552

Note: the core idea for this RFC and the initial survey both came from @wycats.

@nrc nrc commented on the diff Jun 26, 2014
active/0000-lifetime-elision.md
+ input position and two lifetimes in output position.
+
+* For `impl` headers, input refers to the lifetimes appears in the type
+ receiving the `impl`, while output refers to the trait, if any. So `impl<'a>
+ Foo<'a>` has `'a` in input position, while `impl<'a> SomeTrait<'a> Foo<'a>`
+ has `'a` in both input and output positions.
+
+### The rules
+
+* Each elided lifetime in input position becomes a distinct lifetime
+ parameter. This is the current behavior for `fn` definitions.
+
+* If there is exactly one input lifetime position (elided or not), that lifetime
+ is assigned to _all_ elided output lifetimes.
+
+* If there are multiple input lifetime positions, but one of them is `&self` or
@nrc
nrc added a note Jun 26, 2014

I find this rule a bit surprising (the others make perfect sense). I can intuitively see the motivation that self ought to be privileged but, I can't really justify why that is so. Looking at the examples below, the ones using this rule took me a lot longer to grok.

@wycats
wycats added a note Jun 26, 2014

The rationale is that in several usage surveys, this was essentially the only pattern we saw when &self was involved.

I believe that the reason for this is that when you're borrowing something out of self, it makes sense to involve another ref for computation. In contrast, it's a very unusual pattern to borrow something out of a value as a method of some other object. It's just not really how people think about using methods and objects in general, so it doesn't happen (almost at all).

I suspect that in cases where this pattern could occur, people use standalone functions instead of methods.

@bstrie
bstrie added a note Jun 26, 2014

@wycats, what proportion of the cited 87% would be lost if this rule were not accepted? I don't personally object to it, but I can see how it's a bit more flimsy than the others, and I would be willing to live without it if the statistics bore it out.

@bill-myers
bill-myers added a note Jun 26, 2014

I think that it should use the lifetime of the first input parameter, regardless of whether it is self or not, and only if it is an elided lifetime.

This avoids issues with UFC and makes method and non-method functions work the same.

Supporting elision of lifetimes only in the return value when they are explicit on self seems a bad idea, since it is counterintuitive. Also, it doesn't work for multiple explicit lifetimes (e.g. &'a Block<'b>).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@nrc nrc commented on the diff Jun 26, 2014
active/0000-lifetime-elision.md
+To asses the value of the proposed rules, we conducted a survey of the code
+defined _in_ `libstd` (as opposed to the code it reexports). This corpus is
+large and central enough to be representative, but small enough to easily
+analyze.
+
+We found that of the 169 lifetimes that currently require annotation for
+`libstd`, 147 would be elidable under the new rules, or 87%.
+
+_Note: this percentage does not include the large number of lifetimes that are
+already elided with today's rules._
+
+The detailed data is available at:
+https://gist.github.com/aturon/da49a6d00099fdb0e861
+
+# Drawbacks
+
@nrc
nrc added a note Jun 26, 2014

Another drawback: I find full specification of lifetime parameters makes it easier to understand what is going on. Even today, I often write the lifetimes where they could be elided because I think it makes code easier to reason about if you can name things. If I have a lifetime error, the first thing I do is add explicit lifetimes wherever they are missing.

I get the impression I'm in the minority with this though.

To me, these extra rules trade off easier reading (and writing) when you don't need to think about lifetimes too much against greater cognitive overhead when you do have to think about them. I guess that since reading code is more common than debugging lifetime errors, this trade off is worthwhile. I certainly like the idea of reducing lifetime noise.

@huonw
The Rust Programming Language member
huonw added a note Jun 26, 2014

👍 to full specs making things clearer.

@wycats
wycats added a note Jun 26, 2014

@nick29581 It might be worth considering having the compiler optionally show you all of the inferred lifetimes when there are error messages that involve lifetimes: rustc --errors=expanded or something.

That said, I think the error message improvements in this proposal go a long way to making it obvious what has happened when you inappropriately elided a lifetime. Similar error message work around other lifetime errors would go a long way to improving the general ergonomics of explicit lifetimes as well, and we should work on that!

@bstrie
bstrie added a note Jun 26, 2014

@nick29581, that same argument can be made for type inference. Just like type inference, nothing is stopping you from being fully explicit with lifetimes if you deem it's better for readability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@chris-morgan chris-morgan commented on an outdated diff Jun 26, 2014
active/0000-lifetime-elision.md
+
+Rust currently supports eliding lifetimes in functions, so that
+
+```rust
+fn print(s: &str);
+fn get_str() -> &str;
+```
+
+become
+
+```rust
+fn print<'a>(s: &'a str);
+fn get_str<'a>() -> &'a str;
+```
+
+The ellision rules work well for functions that consume references, but not for
@chris-morgan
The Rust Programming Language member
chris-morgan added a note Jun 26, 2014

s/ellision/elision/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@kballard

👍

@chris-morgan chris-morgan commented on the diff Jun 26, 2014
active/0000-lifetime-elision.md
+```
+
+As with today's Rust, the proposed elision rules do _not_ distinguish between
+different lifetime positions. For example, both `&str` and `Ref<uint>` have
+elided a single lifetime.
+
+Lifetime positions can appear as either "input" or "output":
+
+* For `fn` definitions, input refers to argument types while output refers to
+ result types. So `fn foo(s: &str) -> (&str, &str)` has elided one lifetime in
+ input position and two lifetimes in output position.
+
+* For `impl` headers, input refers to the lifetimes appears in the type
+ receiving the `impl`, while output refers to the trait, if any. So `impl<'a>
+ Foo<'a>` has `'a` in input position, while `impl<'a> SomeTrait<'a> Foo<'a>`
+ has `'a` in both input and output positions.
@chris-morgan
The Rust Programming Language member
chris-morgan added a note Jun 26, 2014

I think the word for is lacking from the second example. It’s not an obvious example of where the lifetimes are, either—it could be rewritten as the probably-fairly-nonsensical “impl<'a, 'b> SomeTrait<'a> for Foo<'b> has 'a in [the] output position and 'b in [the] input position”.

(As for the “the”, I think that should be there in all these cases, or “an” as the case may be in some places. This affects much of the document.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@huonw
The Rust Programming Language member
huonw commented Jun 26, 2014

I'm nervous about adding elision for output parameters since I'm slightly concerned that may make things less clear (a minor adjustment to a signature that otherwise compiles would make the compiler spew weird errors), but I am in favour of elision in input position in impl, that is:

impl BufReader { ... }
impl Reader for BufReader { ... }
impl Reader for (&str, &str) { ... }
@chris-morgan
The Rust Programming Language member

@huonw Do you mean output parameters in general, or just in impls?

@glaebhoerl
  • If there are multiple input lifetime positions, but one of them is &self or &mut sef, the lifetime of self is assigned to all elided output lifetimes.

I don't like this rule. The other rules have the property that there's no other way the signature could possibly make sense: i.e., the desugaring is unambiguous. Here we're making an arbitrary choice. I don't think we should do that.

Subtlety for non-& types

There's an additional subtlety: lifetime parameters of & types are covariant. For other types, they may not be. For instance:

struct Callback<'s> {
    callback: fn(&'s str) -> int;
}

fn some_fn(cb: Callback) -> &str;

// Under proposed rules desugars to:
fn some_fn<'s>(cb: Callback<'s>) -> &'s str;

Here Callback has a contravariant lifetime parameter. And the desugaring doesn't make sense, because there's no way you can get something with a lifetime of 's out of a Callback<'s>; you can only "put one in". In other words, Callback's lifetime parameter is in an output position.

If you just take that into account when applying the rules, then I think they would keep working. But I'm not sure what the situation is with invariant or bivariant lifetime parameters, because I haven't thought about it yet.

@glaebhoerl

OK, so in plain English, I think the rule should be: If there's exactly one readable lifetime and N writable ones, all the writable lifetimes are assumed to be the same as the readable one. Lifetime parameters in covariant position are readable, in contravariant writable, invariant both, bivariant neither.

@wycats
wycats commented Jun 26, 2014

@huonw I think the proposed error messages will go a long way to avoid "compiling spewing weird error messages", no?

@pcwalton

I was originally a bit nervous about this sort of thing, but now I have no objections.

I'm slightly more nervous about the self thing, but I'm fine with trying it and seeing how it goes. I think that the "suggest-a-lifetime" error messages that we now have make this sort of thing easier to deal with.

@steveklabnik steveklabnik commented on the diff Jun 26, 2014
active/0000-lifetime-elision.md
+```
+
+and for `impl` blocks:
+
+```rust
+impl<'a> Reader for BufReader<'a> { ... }
+```
+
+In the vast majority of cases, however, the lifetimes follow a very simple
+pattern.
+
+By codifying this pattern into simple rules for filling in elided lifetimes, we
+can avoid writing any lifetimes in ~87% of the cases where they are currently
+required.
+
+Doing so is a clear ergonomic win.
@steveklabnik
steveklabnik added a note Jun 26, 2014

This is the biggest part of this proposal for me. (well, combined with the data that shows that it is)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@steveklabnik steveklabnik commented on an outdated diff Jun 26, 2014
active/0000-lifetime-elision.md
+required.
+
+Doing so is a clear ergonomic win.
+
+# Detailed design
+
+## Today's lifetime elision rules
+
+Rust currently supports eliding lifetimes in functions, so that
+
+```rust
+fn print(s: &str);
+fn get_str() -> &str;
+```
+
+become
@steveklabnik
steveklabnik added a note Jun 26, 2014

becomes. and isn't this backwards? To elide is to remove, so the ones with the rules become the ones without the rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@steveklabnik steveklabnik commented on the diff Jun 26, 2014
active/0000-lifetime-elision.md
+rarely used, newcomers may encounter error messages about lifetimes long before
+encountering lifetimes in signatures, which may be confusing. Counterpoints:
+
+* This is already the case, to some extent, with the current elision rules.
+
+* Most existing error messages are geared to talk about specific borrows not
+ living long enough, pinpointing their _locations_ in the source, rather than
+ talking in terms of lifetime annotations. When the errors do mention
+ annotations, it is usually to suggest specific ones.
+
+* The proposed error messages above will help programmers transition out of the
+ fully elided regime when they first encounter a signature requiring it.
+
+* When combined with a good tutorial on the borrow/lifetime system (which should
+ be introduced early in the documentation), the above should provide a
+ reasonably gentle path toward using and understanding explicit lifetimes.
@steveklabnik
steveklabnik added a note Jun 26, 2014

Yup, I care about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@steveklabnik

A big 👍 from me. If the vast majority of code is doing something a certain way, then it's a good basis for making a rule. This should eliminate a lot of what is effectively boilerplate, and a good lifetimes tutorial / better errors will assist in the pedagogy sense.

Also, if you like the lifetimes, you can keep writing them.

@aturon
aturon commented Jun 26, 2014

@glaebhoerl Great point about contravariance, which I hadn't thought about. I agree that a contravariant argument should not be considered as an input position.

Just to be clear, is the suggestion that contravariant positions swap the input/output distinction? (Which would be the typical type-theoretical thing to do.) Concretely, are you proposing that

fn some_fn(&self, cb: Callback) -> int;
fn other_fn(n: int) -> (&T, cb: Callback);

expands to

fn some_fn<'a>(&'a self, cb: Callback<'a>) -> int;
fn other_fn<'a>(n: int) -> (&'a T, cb: Callback<'a>)

The first case makes some sense, but the latter case is pretty surprising -- it would happen because the Callback's lifetime is considered an input position, and thus can establish the (sole) output position for &T.

We could also simply disallow eliding contravariant lifetimes, since it may be preferable to be explicit in those (rare) cases.

Finally, see @wycats's comment above re: the &self rule. It's not arbitrary: the &self parameter definitely plays a special role for methods, and the proposed rules are based on the most common patterns in the libstd corpus.

@bachm
bachm commented Jun 26, 2014

Just posting to express my support for this well written RFC. With the proposed error messages there should be little confusion when an user first encounters unelidable lifetimes.

@glaebhoerl

the latter case is pretty surprising -- it would happen because the Callback's lifetime is considered an input position, and thus can establish the (sole) output position for &T.

Even thinking about this example makes my head hurt... I think the "logic" of it, as it were, is that when the caller of other_fn invokes the second component of the returned tuple, which is the Callback, with something of lifetime 'a, other_fn can then use that to "produce" the first component of the tuple, also of lifetime 'a? Obviously that couldn't physically work without a time machine.

One distinction that I noticed, and I'm not sure if it has significance, is that while the return type of a function f, and an argument of a function g which is f's parameter, are both output positions, f is required to return a value, but it's not required to call the callback g. Again, I'm not sure whether this has implications for how inference should work.

I basically agree with you that it seems reasonable-but-not-imperative to desugar your first example, but not so much the second one. I don't have any concrete rules in mind which might accomplish this.

Finally, see @wycats's comment above re: the &self rule. It's not arbitrary

To avoid getting caught up in debating the meaning of the word "arbitrary" (I wasn't assuming that you flipped a coin): For the first and second rules, there's only one way it can make sense. If the user were to explicitly annotate lifetimes, they would annotate the same ones we infer 100% of the time. For the third rule, there's more than one way it can make sense, and we'd be choosing to favor one of them. Even if our favoring rests on a stronger basis than a coin flip, I don't think this kind of "probably what you meant" inference is something we should be doing.

@aturon
aturon commented Jun 26, 2014

@glaebhoerl Thanks for the thoughtful comments.

My feeling about &self is that these rules are not inference, but rather shorthand: they are a systematic way of filling in what's been left off of a signature without looking at the body.

The rules are simple enough that it's easy to know, given the signature in your head, whether you can elide or not.

Put another way, the debate is whether

fn foo(&self, t: &T) -> &U;

is simply not allowed/usable as a signature, or whether it has a useful meaning based on the most common lifetime patterns. Once you know the rules, you know immediately that the above would expand into

fn foo<'a,'b>(&'a self, t: &'b T) -> &'a U;

and would only write the elided signature if that's what you wanted.

FWIW, I disagree that the other rules give the only sensible expansion. Not even today's rules do. If you write

fn bar(t: &T, u: &U);

you get distinct lifetimes for the two parameters. But it can also make sense for them to share the same lifetime, and some uses would require it. In that situation, you know you can't leave off the lifetimes, and you write an explicit signature. I think the same would be true with the &self rule.

@bstrie bstrie commented on the diff Jun 26, 2014
active/0000-lifetime-elision.md
+talking about borrowed values:
+
+> This function's return type contains a borrowed value, but the signature does
+> not say which parameter it is borrowed from. It could be one of a, b, or
+> c. Mark the input parameter it borrows from using lifetimes,
+> e.g. [generated example]. See [url] for an introduction to lifetimes.
+
+This message is slightly inaccurate, since the presence of a lifetime parameter
+does not necessarily imply the presence of a borrowed value, but there are no
+known use-cases of phantom lifetime parameters.
+
+### For `impl`
+
+The error case on `impl` is exceedingly rare: it requires (1) that the `impl` is
+for a trait with a lifetime argument, which is uncommon, and (2) that the `Self`
+type has multiple lifetime arguments.
@bstrie
bstrie added a note Jun 26, 2014

Does this example arise today in any known Rust codebase?

@aturon
aturon added a note Jun 26, 2014

@bstrie I don't know of any cases offhand, which is why the error message here is probably not so important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@bstrie bstrie commented on the diff Jun 26, 2014
active/0000-lifetime-elision.md
+
+## The impact
+
+To asses the value of the proposed rules, we conducted a survey of the code
+defined _in_ `libstd` (as opposed to the code it reexports). This corpus is
+large and central enough to be representative, but small enough to easily
+analyze.
+
+We found that of the 169 lifetimes that currently require annotation for
+`libstd`, 147 would be elidable under the new rules, or 87%.
+
+_Note: this percentage does not include the large number of lifetimes that are
+already elided with today's rules._
+
+The detailed data is available at:
+https://gist.github.com/aturon/da49a6d00099fdb0e861
@bstrie
bstrie added a note Jun 26, 2014

Of the 13% of functions which still require explicit lifetimes, do any seem particularly notable for their nonconformity to the usual patterns? It would also be really great if you could select one of these real-world functions and use it in the example error message above.

@aturon
aturon added a note Jun 26, 2014

Almost all of the remaining cases are situations like:

impl<'a> AsciiCast<&'a[Ascii]> for &'a [u8] {
    fn unsafe fn to_ascii_nocheck(&self) -> &'a[Ascii] { ... }
    ...
}

where the impl involves types with lifetimes, and the fns within refer to those lifetimes directly. That counts against us in two ways:
1. The impl header has to be annotated so that you can name the lifetime, even though it would otherwise follow the standard pattern, and
2. The fn definitions have to be annotated to use the outer lifetime.

Note that this kind of example does not require an annotation according to the rules (so you wouldn't get an annotation error if you elided the lifetime). Rather, the annotation is needed to go beyond the patterns provided by the rule.

@aturon
aturon added a note Jun 26, 2014

@bstrie The other predominant case is:

fn difference<'a>(&'a self, other: &'a HashSet<T, H>) -> SetAlgebraItems<'a, T, H>;

where the two input lifetimes are required to match.

@glaebhoerl Take note -- this is a case where even the rules for input positions don't give you what you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@bstrie bstrie commented on the diff Jun 26, 2014
active/0000-lifetime-elision.md
+
+Another pattern that sometimes arises is types like `&'a Foo<'a>`. We could
+consider an additional elision rule that expands `&Foo` to `&'a Foo<'a>`.
+
+However, such a rule could be easily added later, and it is unclear how common
+the pattern is, so it seems best to leave that for a later RFC.
+
+## Lifetime elision in `struct`s
+
+We may want to allow lifetime elision in `struct`s, but the cost/benefit
+analysis is much less clear. In particular, it could require chasing an
+arbitrary number of (potentially private) `struct` fields to discover the source
+of a lifetime parameter for a `struct`. There are also some good reasons to
+treat elided lifetimes in `struct`s as `'static`.
+
+Again, since shorthand can be added backwards-compatibly, it seems best to wait.
@bstrie
bstrie added a note Jun 26, 2014

Agreed, I'm fine with leaving structs as they are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@bstrie
bstrie commented Jun 26, 2014

Above I draw a comparison between lifetime elision and type inference, and how the great thing is that people who choose to be explicit are still welcome to manually annotate lifetimes. However, there is one thing that would support the people who make such a decision and improve teachability for newcomers: make the --pretty typed compiler flag annotate the elided lifetimes just as it annotates the inferred types (or you could make it an entirely separate flag, I suppose).

@rkjnsn
rkjnsn commented Jun 26, 2014

Quick question:

Would it be feasible to handle the multiple-input case by having something like

fn frob(s: &str, t: &str) -> &'t str;

expand to

fn frob<'a, 'b>(s: &'a str, t: &'b str) -> &'b str;

While such a shorthand would be mostly orthogonal to the elision rules of this RFC, I bring it up because it seems like it could impact whether we want to treat self specially (the third rule of the RFC), since one would be able to write

fn args<T:ToCStr>(&mut self, args: &[T]) -> &'self mut Command

Also, I realize the lookup rules would take some consideration if this were to be implemented, since lifetimes and parameter names are currently in different namespaces.

@glaebhoerl

@aturon You're right.

Now we have the interesting situation that you've shown that my stated arguments against "the self rule" are invalid, yet, for some reason, this hasn't convinced me to like it. Apparently, my stated arguments were not the real reason why it bothers me. When your subconscious is telling you something is wrong, it doesn't necessarily go into great detail about why, or which part...

I think a large part of it is because of the fact that I don't think we should semantically/syntactically distinguish the self argument in the first place, or even necessarily have a self keyword at all. (I have a proposal to this effect which I might hopefully have time to write down at some point in the next 5,000 years.) And here we're proposing to distinguish it in an additional way. When you write, "If there are multiple input lifetime positions, but one of them is &self", I read, "If there are multiple input lifetime positions, but one of them is the first argument, or is called "self""... I mean, maybe it holds up statistically, but statistically speaking, there are two popes per square kilometer in Vatican City. (Or currently, I suppose, four.)

@krdln
krdln commented Jun 27, 2014

@bstrie
I think that not only --pretty typed should reveal all lifetimes, but also each compiler error involving lifetimes should print fully annotated function signature. This is a problem even now, when compiler tells about errors involving unnamed lifetimes.

@aturon
aturon commented Jun 27, 2014

@rkjnsn Some proposals along these lines have been made in the comments on #134 and it might be a reasonable design. However, I'd like to separate the question of when you need to write lifetimes from how you write the lifetimes. (I think we can make improvements on both.)

As I mentioned above, I think of the debate around the &self rule as being whether

fn foo(&self, t: &T) -> &U;

should be an error, or work usefully as shorthand based on the most common patterns.

@steveklabnik

I'd like to leave a 👍 from this Reddit thread, from a Rust newbie: http://www.reddit.com/r/rust/comments/298j3y/question_about_lifetime_parameters/

@aturon
aturon commented Jun 27, 2014

@glaebhoerl The counterpoint is that, like it or not, self is special in today's Rust, in conjunction with the special treatment of . for autoborrowing and the like. I think the proposed design fits current Rust idioms well. We could certainly revisit the rules if our general treatment of self changes.

@erickt
erickt commented Jun 27, 2014

What about the case where a struct has one lifetime, and a method has another? For example:

struct Foo<'a> {
    x: &'a int,
}

impl<'a> Foo<'a> {
    fn bar<'b>(&'b mut self) -> &'b int {
        self.x
    }
}

fn main() {}

While I think that 'a should unify with 'b, I swear I've seen the case where they aren't identical. Unfortunately I can't think up a good example demonstrating that.

@aturon
aturon commented Jun 27, 2014

@erickt Using the proposed rules, you could elide all the lifetimes in that example impl and fn. The rules for methods do not take into account any enclosing impl lifetimes.

But there are examples where the methods in an impl do talk about the enclosing lifetime, e.g. https://github.com/rust-lang/rust/blob/master/src/libstd/ascii.rs#L203-L207 and these are the primary cases where annotation is required under the proposed rules.

@jfager jfager commented on the diff Jun 30, 2014
active/0000-lifetime-elision.md
+fn args<T:ToCStr>(&mut self, args: &[T]) -> &mut Command // elided
+fn args<'a, 'b, T:ToCStr>(&'a mut self, args: &'b [T]) -> &'a mut Command // expanded
+
+fn new(buf: &mut [u8]) -> BufWriter; // elided
+fn new<'a>(buf: &'a mut [u8]) -> BufWriter<'a> // expanded
+
+impl Reader for BufReader { ... } // elided
+impl<'a> Reader for BufReader<'a> { .. } // expanded
+
+impl Reader for (&str, &str) { ... } // elided
+impl<'a, 'b> Reader for (&'a str, &'b str) { ... } // expanded
+
+impl StrSlice for &str { ... } // elided
+impl<'a> StrSlice<'a> for &'a str { ... } // expanded
+```
+
@jfager
jfager added a note Jun 30, 2014

A by-value arg is not a lifetime position, so the following is legal?

fn foo(a: &str, b: int) -> &str

That is, it would possible to use multiple args and still have the lifetimes elided, right? I think the answer is yes but it's not shown in these examples.

@aturon
aturon added a note Jun 30, 2014

@jfager Yes, that's right, and good point about the examples. Will update.

@pnkfelix
The Rust Programming Language member
pnkfelix added a note Jul 15, 2014

Please also add examples of methods within impls where the implementing trait and/or type has lifetime parameters itself, just to underline the scenarios I brought up in my comment here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@schmee schmee referenced this pull request Jul 1, 2014
Closed

Switch `<>` back to `[]` #148

@zwarich
zwarich commented Jul 3, 2014

One thing that I don't think has been mentioned here is unsafe code. Since unsafe code depends on properties that are not enforced by the type checker, it would be good to require explicit lifetime parameters for functions using unsafe code.

@bstrie
bstrie commented Jul 3, 2014

@zwarich, it isn't obvious to me how having explicit lifetime names will make unsafe blocks any safer, since the lifetime names cannot be used for any purpose within the unsafe block itself. The function signature will still make it obvious that the arguments are references, which is the important thing.

@huonw
The Rust Programming Language member
huonw commented Jul 3, 2014

The function signature will still make it obvious that the arguments are references, which is the important thing.

I would think the lifetimes of the references are the important things when handling unsafe: you really don't want to be returning a reference that doesn't match the lifetime.

@kballard
kballard commented Jul 3, 2014

Eliding the lifetime doesn't change the semantics of the function. That's kind of the whole point. So I agree with @bstrie, I don't see how this affects the safety of an unsafe function.

@huonw
The Rust Programming Language member
huonw commented Jul 3, 2014

It doesn't affect the safety directly; it just makes it easier for the author to see that they're doing something wrong, by having to explicitly have the lifetimes right in front of them. Avoiding mistakes when e.g. the author forgets to annotate them. (Normally the compiler will catch bad lifetimes, so eliding them is fine, but it won't necessarily when there's unsafe involved: it's up to humans to make sure everything matches, and elision gets in the way of this IMO.)

@bstrie
bstrie commented Jul 3, 2014

Would that rule apply only to unsafe fn, or to any function that uses an unsafe block? After thinking about it, I'd be fine even if it were the latter. Tiny amounts of friction to using unsafe are a great deterrent to casual overuse.

@zwarich
zwarich commented Jul 3, 2014

@bstrie I was thinking any function that uses an unsafe block, since that unsafe block could be transmuting lifetimes.

@bstrie
bstrie commented Jul 4, 2014

Though I'm in favor of it, the only weird thing about requiring explicit lifetimes on unsafe-containing functions is that it might be a little weird to be forced to annotate lifetimes on a trait method whose signature has elided the lifetimes. Though I think it's probably okay because if you can't puzzle out the lifetimes yourself then you're probably unqualified to be writing the unsafe code (and we should probably have a lint pass that would explicitly annotate the function for you anyway).

@alexcrichton
The Rust Programming Language member

This was discussed in yesterday's meeting and it was decided to be merged.

@alexcrichton alexcrichton merged commit 7a459c9 into rust-lang:master Jul 9, 2014
@glaebhoerl

@aturon To turn this around a bit:

I have the distinct impression that Rule 3 is qualitatively different from Rules 1 and 2 in some meaningful sense, even if I can't seem to put my finger on what, exactly, that is. Do you not have the same feeling? Perhaps you might have a better idea of what the difference could be?

@aturon
aturon commented Jul 14, 2014

@zwarich @bstrie My apologies for not responding to your comments about unsafe earlier. We discussed this in the meeting where the RFC was accepted, and the consensus was to not treat unsafe code specially.

This is seen as an improvement over our current situation, in which elision is allowed for signatures on unsafe code yet, when used on output lifetimes, almost always gives the wrong lifetime annotation! In other words, the new rules provide safer defaults.

@aturon
aturon commented Jul 14, 2014

@glaebhoerl It's a good question, and others have voiced a similar feeling about the rules.

As we discussed earlier, rules 1 and 2 are not the "most general" or "only possible" ways to make sense of elided lifetimes. You sometimes need signatures like:

fn foo<'a>(arg1: &'a T, arg2: &'a U) -> &'a V

or

impl<'a> MyType<'a> {
    fn method(&self, arg: U) -> &'a V
}

I think a lot of people sensed a "qualitative difference" by initially assuming that rules 1-2 were fully general. But in fact, none of the rules are fully general; they all just cover the (vastly) common case/intuition.

That said, in the end I think it comes down to whether you see a difference between

fn frob(f: &Foo, b: &Bar) -> &Baz { ... }

and

impl Foo {
    fn frob(&self, b: &Bar) -> &Baz { ... }
}

Semantically, these two are identical. But Rust treats the method version as special, for example for auto-borrowing. Why?

Methods are a part of expressing OO idioms in Rust, and perhaps the most basic aspect of those idioms is the notion of a receiver. By making frob a method on Foo, we are saying that the Foo parameter plays a special conceptual role: it is the thing being acted on by the method, while the other parameters fill in the details of what the requested action is.

If you see things in those terms then rule 3 is as natural as the other rules: the method receiver generally provides the "ambient lifetime" that you care about.

@wycats did a further survey of places where rule 3 applies: https://gist.github.com/wycats/2957ea3090349640b417

The most common cases are indexing, or otherwise extracting some information from the method receiver, which then lives as long as the receiver does. This is the simplest and most common OO idiom, as it plays out in Rust.

All that said, if you disagree with the basic idea of self/receivers/methods, then rule 3 would certainly seem arbitrary. But the rule is designed for today's Rust and OOish idioms it employs.

@glaebhoerl

@aturon Thank you. That's a good defense of the rule, and the survey is especially interesting. At this point, however, I really am just trying to figure out:

if you disagree with the basic idea of self/receivers/methods, then rule 3 would certainly seem arbitrary

In what sense is it "arbitrary" in which the other two rules are not, given that none of them are the "most general" / "only possible" / etc. desugarings? Apparently we agree that, without the OO intuitions about self, there's some kind of difference here, and that Rule 3 is then arbitrary in some sense in which the others aren't. But not in the sense which any of us assumed at first! Which makes it an interesting question. I'd like to try to ferret out the source of this intuition, without any particular purpose in mind, and capture it as something more concrete and precise.

A first stab into the dark is that maybe Rules 1 & 2 are "parametric" in a way that 3 is not, in that Rule 3 singles out a specific argument of the function for special treatment, while the other two treat them all equally. This appears to be true as far as it goes, but it's still an awfully rudimentary theory, and doesn't feel like it would be the whole story.

@aturon
aturon commented Jul 14, 2014

@glaebhoerl A more focused version of my comment: rule 3 is arbitrary if, and only if, the distinction between functions and methods is arbitrary.

Put another way, if you buy into methods as having a distinct role from functions, then rule 3 has the same standing as the others.

Put yet another way, it's not rule 3 that's singling out a special argument: it's methods that do that.

@glaebhoerl

I understand all of that completely. That's not what I'm trying to figure out. My purpose right now is not to try to discredit Rule 3, and hasn't been for a while. My purpose is to try to gain a deeper understanding.

What I want to understand is

if you buy into methods as having a distinct role from functions, then rule 3 has the same standing as the others

what that standing is.

As you've pointed out, Rules 1 & 2 are not the most general or only possible desugarings, either. Given that, should we just say that all three rules are completely arbitrary? As far as I can tell, both of us feel that they're not. But in what way are they not? What logic do they follow, which we sense, but so far, cannot name?

@steveklabnik

Given that, should we just say that all three rules are completely arbitrary?

No, we can not say that they are. "Doesn't cover every possible case != arbitrary." These rules were chosen with specific thought behind them, making them the opposite of arbitrary.

@glaebhoerl

@steveklabnik Excellent. A third person who doesn't think they're arbitrary. :)

But why these rules, then, and not others? What underlying logic do they spring from?

@aturon
aturon commented Jul 14, 2014

@glaebhoerl I didn't mean to turn this discussion into a defense of rule 3; I'm also trying to understand better the relationships between the rules. I'm sorry I didn't make that more clear. (Text is hard.)

Let me try again. The initial question was whether I see a qualitative difference between rule 3 and the others. I do not, myself. But I can see how someone with a different perspective on methods (which I think you have?) would feel differently.

My general perspective on the rules is that they are simply shorthand, providing carefully-chosen defaults. Defaults are always heuristic and connected to common patterns of thought and code.

As with any defaults, in a purely semantic sense the rules are arbitrary, because there are other valid (and sometimes useful) lifetime assignments that the language allows.

As heuristics, the rules have a clear quantitative basis.

I think what's up for grabs is the qualitative basis -- how do they "feel", how well do they match our intuitions?

The intuitions that @wycats and I were most interested in come from borrowing/ownership, as opposed to lifetimes. If you write

fn foo(x: &Foo) -> &Bar

you know the function takes in borrowed data and produces borrowed data. The simplest intuition is that the output borrow takes its ownership from the input borrow. It's then not a hard conceptual leap to say that the borrowed ownership of the output is only good for as long as the input's was -- we hope that the elided form can build intuitions about borrowing that lead naturally into the mechanics of lifetimes.

I feel similarly about methods. I'm using a method to access or otherwise manipulate the receiver, so all things being equal I expect any output borrows to flow from my borrow of the receiver.

Does that help?

@jfager
jfager commented Jul 14, 2014

What was the argument against @bill-myers suggestion of using the first input lifetime? That covers more cases for regular functions and rule 3 falls out for free. It's not a particular deep or profound unifying principle, but it's simple and seems less ad-hoc.

@kballard

First input lifetime seems a bit more ad-hoc, as strange as it sounds.

Methods are special, and self is special in these methods. Rule 3 seems perfectly natural to me given that perspective. But the first input lifetime is not special. It's actually rather arbitrary. There's no reason to believe that in fn foo(a: &str, b: &str) -> &str the output is necessarily more likely to be derived from a than from b.

"First input lifetime" will also cause some possibly surprising behavior in fn foo(self, x: &str) -> &str, where the output is derived from the second parameter x instead of from self. Of course, it usually can't be derived from self (the only way that makes sense is if the type of self contains a lifetime parameter), but that's not a good reason to arbitrarily select the second parameter as the inferred lifetime source.

Overall, "lifetime of self" is a more constrained rule than "first input lifetime", based as it is on the special nature of methods and self, and I believe is much more likely to be a correct heuristic than "first input lifetime".

@jfager
jfager commented Jul 14, 2014

Under the currently proposed rules, fn foo(self, x: &str) -> &str's output lifetime would also be derived from x via rule 2, wouldn't it? Rule 3 only states it kicks in for &self or &mut self.

@kballard

@jfager Hrm, you're right. I hadn't considered the fn foo(self, x: &str) -> &str case until my previous comment, and there I only considered it in light of rule 3.

I think that, due to rule 3, it may be reasonable to adjust the rules such that fn foo(self, x: &str) -> &str cannot elide the lifetime. This would be a consequence of the fact that self is special, and therefore any method on self reasonably assumes an elided output lifetime is derived from self. My belief is this should be true even for by-value self methods.

That said, this particular case is I think something of an edge case, and I would not consider it a serious problem if the rules are left unchanged.

@jfager
jfager commented Jul 15, 2014

It's an edge case but now that it's come up I think it gets right at the discomfort of the current set of rules. The justification for rule 3 is 'methods are special', but this interaction with rule 2 says 'but maybe not that special'. They should either be uniform, or they should be different; it's straddling the fence that feels odd.

"You may elide lifetimes; output lifetimes are assigned the first input lifetime" is arbitrary and there's not a great intuitive reason it should be true, but it's uniform between fns and methods, and despite its arbitrariness it's simple and easy to understand, and it ends up giving you the same code and behavior in all but one of the examples given in this RFC, frob being the exception.

"Elided output lifetimes take the lifetime of self for methods, or the lifetime of a sole input lifetime for functions" is similarly straightforward and simple, but treats methods and fns clearly differently.

I could get behind either.

*Edit: sorry, posted early.

@glaebhoerl

@aturon Yes, that's closer what I was trying to get at. (Though I was also wondering if there might be some drier, more formal formulation of our intuitions.) How does rule 1 fit into these intuitions about borrowing, i.e. why is it more intuitive for each input lifetime to be different rather than tied together?

@pnkfelix pnkfelix commented on the diff Jul 15, 2014
active/0000-lifetime-elision.md
+&'a T
+&'a mut T
+T<'a>
+```
+
+As with today's Rust, the proposed elision rules do _not_ distinguish between
+different lifetime positions. For example, both `&str` and `Ref<uint>` have
+elided a single lifetime.
+
+Lifetime positions can appear as either "input" or "output":
+
+* For `fn` definitions, input refers to argument types while output refers to
+ result types. So `fn foo(s: &str) -> (&str, &str)` has elided one lifetime in
+ input position and two lifetimes in output position.
+
+* For `impl` headers, input refers to the lifetimes appears in the type
@pnkfelix
The Rust Programming Language member
pnkfelix added a note Jul 15, 2014

Trait definitions themselves are also a form that offers lifetime positions. That may or may not be relevant (I'll be posting a question about that soon -- see a few lines up), but should probably be addressed explicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@pnkfelix pnkfelix commented on the diff Jul 15, 2014
active/0000-lifetime-elision.md
+
+```rust
+&'a T
+&'a mut T
+T<'a>
+```
+
+As with today's Rust, the proposed elision rules do _not_ distinguish between
+different lifetime positions. For example, both `&str` and `Ref<uint>` have
+elided a single lifetime.
+
+Lifetime positions can appear as either "input" or "output":
+
+* For `fn` definitions, input refers to argument types while output refers to
+ result types. So `fn foo(s: &str) -> (&str, &str)` has elided one lifetime in
+ input position and two lifetimes in output position.
@pnkfelix
The Rust Programming Language member
pnkfelix added a note Jul 15, 2014

For an fn method definition, i.e. one that occurs in the scope of an impl block or as the default method in a trait item, are the lifetimes that occur in the implementing type (in the former case) or the trait (in the latter case) also considered to be input positions? (Or perhaps all of the lifetimes bound by impl<'a,'b,...> are part of the input positions? Or perhaps none of them are?)

In other words, is a method considered to be in the scope of its impl header for the purposes of lifetime elision?

(I will follow up to this comment with a concrete set of examples elaborating my question in a moment.)

@pnkfelix
The Rust Programming Language member
pnkfelix added a note Jul 15, 2014

Okay, here is a gist with my attempt to survey the space here: https://gist.github.com/pnkfelix/a4054e51400152c63714

It could well be that the intent is (and has always been) to not consider an impl header in scope for lifetime elision on methods. But if so, this needs to be spelled out explicitly in the RFC itself.

@pnkfelix
The Rust Programming Language member
pnkfelix added a note Jul 15, 2014

Hmm I guess since this was already merged I should instead open an issue against it.

@pnkfelix
The Rust Programming Language member
pnkfelix added a note Jul 15, 2014

Ah and now I just saw @aturon 's comment here which explicitly confirms that the intent has been to not consider an impl header in scope for lifetime elision on methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@pnkfelix pnkfelix added a commit to pnkfelix/rfcs that referenced this pull request Jul 15, 2014
@pnkfelix pnkfelix Clarify definition of "input positions" in lifetime elision RFC.
Explicitly note that lifetimes from the `impl` (and `trait`/`struct`)
are not considered "input positions" for the purposes of expanded `fn`
definitions.

Added a collection of examples illustrating this.

Drive-by: Addressed a review comment from @chris-morgan
[here](rust-lang#141 (comment)).
f766d18
@zwarich
zwarich commented Jul 17, 2014

@glaebhoerl In the absence of any lifetime variables in return types, the assignment of distinct lifetime parameters is the most general type that can be given. That is probably the intuition that is at play here. Of course, once return types with lifetime variables get involved, then this no longer applies, but everyone agreed that this case was broken today anyways.

@glaebhoerl

I was thinking that maybe elided lifetimes in arguments of higher-order function parameters should be desugared to higher-rank lifetimes, because that's usually what you want:

// not legal, I believe?
fn print_with(text: &str, printer: |&str|) { ... }

=>

// you pretty much always want this, I think?
fn print_with<'a>(text: &'a str, printer: <'b> |&'b str|) { ... }

The question is, given that closures are going to be merely trait objects, how could we properly generalize this? (There may or may not be an easy answer; I've spent approximately two minutes thinking about it.)

@zwarich
zwarich commented Jul 26, 2014

@glaebhoerl Why does it matter in that particular case? The only lifetimes that can ever be passed to printer are 'a and 'static, so you can always choose 'a.

I assume you were thinking of another case where it does matter?

@glaebhoerl

Maybe the example was bad. But:

The only lifetimes that can ever be passed to printer are 'a and 'static, so you can always choose 'a.

This is not true, because print_with could easily have local variables of type &'x str. (Which, in this case, might be weird, which in turn is why this might've been a bad example; then again, maybe print_with might want to prefix text with a timestamp or something.)

But to amend, imagine this:

fn print_two_with(text1: &str, text2: &str, printer: |&str|) { ... }

Now there are two &str arguments with different lifetimes and we want printer to work for both.

But the point is really that in general, do you ever want the lifetimes of the arguments of an argument function to be pre-determined by lifetime parameters on the outer HOF, instead of the (strictly-)more-general formulation where the argument function itself is parameterized over them?

@zwarich
zwarich commented Jul 26, 2014

@glaebhoerl My (potentially mistaken) assumption is that the legacy closures actually have higher-rank lifetimes, even though it isn't a feature exposed independently in the type system, and rust-lang/rust#15067 is tracking exposing that to the new unboxed closures. This code type-checks:

fn print_two_with<'a, 'b>(text1: &'a str, text2: &'b str, printer: |&str|) {
    if true {
        printer(text1);
    } else {
        printer(text2);
    }
}

whereas this code does not:

fn print_two_with<'a, 'b>(text1: &'a str, text2: &'b str, printer: |&'a str|) {
    if true {
        printer(text1);
    } else {
        printer(text2);
    }
}
@aturon
aturon commented Jul 27, 2014

@glaebhoerl The current plan is that the elision rules apply recursively for the sugared form of unboxed closure types (i.e., the |x: T| -> U notation), as they have in the past. There's not currently a plan to generalize this to uses of traits directly, although that might actually be the right answer to covariant lifetime positions (see rust-lang/rust#15699 about covariance being the odd case, and rust-lang/rust#15907 about its relation to lifetime elision).

@glaebhoerl glaebhoerl added a commit to glaebhoerl/rfcs that referenced this pull request Aug 8, 2014
@pnkfelix pnkfelix Clarify definition of "input positions" in lifetime elision RFC.
Explicitly note that lifetimes from the `impl` (and `trait`/`struct`)
are not considered "input positions" for the purposes of expanded `fn`
definitions.

Added a collection of examples illustrating this.

Drive-by: Addressed a review comment from @chris-morgan
[here](rust-lang#141 (comment)).
de6f577
@ticki
ticki commented Aug 7, 2015

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.