Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More flexible input lifetime #177

Merged
merged 8 commits into from Apr 29, 2020

Conversation

ltratt
Copy link
Member

@ltratt ltratt commented Apr 26, 2020

This is based on #174: it tries to decouple the lifetime of a lexer from its input. That PR is a work of near genius: I simply would not have imagined that the outcome it achieves is possible without the PR as a proof-of-existence. The only minor problem was that I couldn't work out how it achieved its effect. I therefore tried to simplify the PR, but didn't get very far. I then tried reimplementing it, and didn't get very far with that either.

This PR is the result of me taking a different approach. First I simplified and unified the existing lifetimes in grmtools, because there were several inconsistencies, which I thought might be responsible for some of the pain in #174. Once that was done I could then add an explicit 'input lifetime in 91dab3f which I think achieves the same effect as #174. Certainly it is enough to allow this program to now compile:

      fn main() {
          let lexerdef = t_l::lexerdef();
          let input = "a";
          let t = {
              let lexer = lexerdef.lexer(&input);
              let lx = lexer.iter().next().unwrap().unwrap();
              lexer.span_str(lx.span())
          };
      }

where previously rustc would complain:

error[E0597]: `lexer` does not live long enough
  --> src/main.rs:12:9
   |
9  |     let t = {
   |         - borrow later stored here
...
12 |         lexer.span_str(lx.span())
   |         ^^^^^ borrowed value does not live long enough
13 |     };
   |     - `lexer` dropped here while still borrowed

error: aborting due to previous error

However, I am not sure if this PR is able to handle all the same cases as #174. I'm hoping that @valarauca will be able to let me know if this solves the problem that led him to create #174.

@valarauca
Copy link
Contributor

valarauca commented Apr 27, 2020

It does not.

The minimal test case you want to use is something like:

lrlex_mod!("my_lexer_module");
lrpar_mod!("my_parser_module");

pub fn parse_my_data<'a>(input: &'a str) -> Option<crate::my_parser_module::ParseResult<'a>> {
    let lexer_def = crate::my_lexer_module::lexerdef();
    let l = lexer_def.lexer(input);
    match crate::my_parser_module::parser(&l) {
        (Option::Some(Ok(x)),_) => Some(x),
        _ => None
    }
}

I roughly implemented that here. It gives me: error[E0597]. That's same case I used for testing the change I pushed to this project.

The critical issue I believe is here. The resulting trait needs to track 2 separate lifetimes, the input source, and the lexer itself need to have separate lifetimes. When they're identical, this is when problems happen.

This is why the lifetimes tend to "explode" when you get to internal types as you're borrowing 2 lifetimes explicitly, so you need a 3rd lifetime. Then later in Parser you borrow the same 2 lifetimes, soyou still need a 3rd. But since everything doesn't [live the same amount of time (borrowing in multiple places)]https://github.com/valarauca/grmtools/blob/master/lrpar/src/lib/parser.rs#L791), you need a 4th lifetime to ensure the compiler is aware each borrow in Parser isn't identical.

@ltratt
Copy link
Member Author

ltratt commented Apr 27, 2020

OK, I can see what your example is doing now. Clearly I haven't yet fully understood #174 yet! I'll keep at it -- this is fascinating stuff!

@ltratt
Copy link
Member Author

ltratt commented Apr 28, 2020

@valarauca c815524 now allows your example to compile. I did engage in one cheat, changing lrlex::LexerDef from:

    pub fn lexer(&self, s: &str) -> impl Lexer<StorageT> {
        LRLexer::new(self, s)
    }

to:

    pub fn lexer<'lexer, 'input: 'lexer>(
        &'lexer self,
        s: &'input str
    ) -> LRLexer<'lexer, 'input, StorageT> {
        LRLexer::new(self, s)

AFAICS the only reason to do this is so that Lexer only needs an 'input lifetime (and not a 'lexer lifetime as well) -- I don't think it saves us from adding extra lifetimes internally. Warning: I might be wrong.

Do you think this commit is flexible enough or are there other things that it doesn't allow to compile?

@valarauca
Copy link
Contributor

valarauca commented Apr 28, 2020

I depended on 4bc7ee3 and it worked :D


I guess I was wrong about the internal lifetimes.

@ltratt
Copy link
Member Author

ltratt commented Apr 28, 2020

@valarauca Cool, thanks so much for trying this -- I really can't emphasise enough how much of an eye-opener your PR has been for me!

I hate doing this (because I would dislike it being done to me), but I think this PR, as it's simpler, is probably the better candidate for merger relative to #174. Are you OK with that? And do you have any comments on any of the code? If we do end up merging this PR, I'd like to make sure it's as good as it can be.

@valarauca
Copy link
Contributor

valarauca commented Apr 28, 2020

Are you OK with that?

Of course. I can close #174 it is nbd.

And do you have any comments on any of the code?

I took a look across it; LGTM. Props on getting the change smaller, and more readable.

Thanks for your hard work and help getting this ready to go 👍

@ltratt
Copy link
Member Author

ltratt commented Apr 28, 2020

Thanks for being a good sport @valarauca ! I'm going to ask @ptersilie to do a lightweight review of the PR, but since you're happy with it, I expect that will be a simple task for him.

@ptersilie I will need to squash at least one commit away before merging.

@ptersilie
Copy link
Member

Typo in commit 8d41382? probably be easier to change it [in] one go later.

@ltratt
Copy link
Member Author

ltratt commented Apr 29, 2020

Agreed.

@ptersilie
Copy link
Member

ptersilie commented Apr 29, 2020

Just one minor typo which you can fix during squashing. Another thought I had was, since this PR is based on @valarauca's work, maybe we could add him as a co-author to some of the commits where that makes sense. Thoughts?

@ltratt
Copy link
Member Author

ltratt commented Apr 29, 2020

Excellent idea! Let me squash and do that at the same time.

@ptersilie
Copy link
Member

Ok, please go ahead.

ltratt and others added 6 commits April 29, 2020 12:15
…type.

Co-authored-by: valarauca <codylaeder@gmail.com>
This made clear some subtle inconsistencies in lifetimes between the various
uses of the explicit version of this type.

Co-authored-by: valarauca <codylaeder@gmail.com>
The actual 'input lifetime is going to be related to, but not the same as, 'b.

Note that ctbuilder.rs remains unrenamed, because in that file there is a mix of
"wrong" and "correct" uses of 'input, and it will probably be easier to change
it one go later.
I found it a bit confusing that sometimes we had <'a, 'b: 'a> and sometimes <'a,
'b>; since the former is more restrictive I've settled on that.
With this the following program now successfully compiles:

  fn main() {
      let lexerdef = t_l::lexerdef();
      let input = "a";
      let t = {
          let lexer = lexerdef.lexer(&input);
          let lx = lexer.iter().next().unwrap().unwrap();
          lexer.span_str(lx.span())
      };
  }

A mildly modified version of this is now included as a test in
lrpar/cttests/src/lexer_lifetime.test.
@ltratt ltratt force-pushed the more_flexible_input_lifetime branch from ab0d4c0 to d28d511 Compare April 29, 2020 11:16
@ltratt
Copy link
Member Author

ltratt commented Apr 29, 2020

OK, done: I've added @valarauca to the first two commits as those are clearly based off his work (I feel I should shoulder the blame for the latter commits!) and squashed.

@ptersilie
Copy link
Member

bors r+

bors bot added a commit that referenced this pull request Apr 29, 2020
177: More flexible input lifetime r=ptersilie a=ltratt

This is based on #174: it tries to decouple the lifetime of a lexer from its input. That PR is a work of near genius: I simply would not have imagined that the outcome it achieves is possible without the PR as a proof-of-existence. The only minor problem was that I couldn't work out how it achieved its effect. I therefore tried to simplify the PR, but didn't get very far. I then tried reimplementing it, and didn't get very far with that either.

This PR is the result of me taking a different approach. First I simplified and unified the existing lifetimes in grmtools, because there were several inconsistencies, which I thought might be responsible for some of the pain in #174. Once that was done I could then add an explicit `'input` lifetime in 91dab3f which I *think* achieves the same effect as #174. Certainly it is enough to allow this program to now compile:

```rust
      fn main() {
          let lexerdef = t_l::lexerdef();
          let input = "a";
          let t = {
              let lexer = lexerdef.lexer(&input);
              let lx = lexer.iter().next().unwrap().unwrap();
              lexer.span_str(lx.span())
          };
      }
```

where previously rustc would complain:

```
error[E0597]: `lexer` does not live long enough
  --> src/main.rs:12:9
   |
9  |     let t = {
   |         - borrow later stored here
...
12 |         lexer.span_str(lx.span())
   |         ^^^^^ borrowed value does not live long enough
13 |     };
   |     - `lexer` dropped here while still borrowed

error: aborting due to previous error
```

However, I am not sure if this PR is able to handle all the same cases as #174. I'm hoping that @valarauca will be able to let me know if this solves the problem that led him to create #174. 

Co-authored-by: Laurence Tratt <laurie@tratt.net>
@bors
Copy link
Contributor

bors bot commented Apr 29, 2020

Build failed:

@ltratt
Copy link
Member Author

ltratt commented Apr 29, 2020

Oops! That last commit will need squashing.

bors try

bors bot added a commit that referenced this pull request Apr 29, 2020
@bors
Copy link
Contributor

bors bot commented Apr 29, 2020

try

Build succeeded:

For the program:

  pub fn parse_my_data<'a>(input: &'a str) -> Option<&'a str> {
      let lexer_def = crate::t_l::lexerdef();
      let l = lexer_def.lexer(input);
      match crate::t_y::parse(&l) {
          (Option::Some(x),_) => Some(x),
          _ => None
      }
  }

this previously failed to compile with:

  error[E0515]: cannot return value referencing local variable `l`
    --> src/main.rs:26:32
     |
  25 |     match crate::t_y::parse(&l) {
     |                             -- `l` is borrowed here
  26 |         (Option::Some(x),_) => Some(x),
     |                                ^^^^^^^ returns a value referencing data owned by the current function

  error[E0515]: cannot return value referencing local variable `lexer_def`
    --> src/main.rs:26:32
     |
  24 |     let l = lexer_def.lexer(input);
     |             --------- `lexer_def` is borrowed here
  25 |     match crate::t_y::parse(&l) {
  26 |         (Option::Some(x),_) => Some(x),
     |                                ^^^^^^^ returns a value referencing data owned by the current function

  error: aborting due to 2 previous errors; 1 warning emitted

This commit fixes that problem, mostly through introducing a new lifetime
'lexer. There is one very tiny API change in lrlex::LexerDef from:

    pub fn lexer(&self, s: &str) -> impl Lexer<StorageT> {
        LRLexer::new(self, s)
    }

to:

    pub fn lexer<'lexer, 'input: 'lexer>(
        &'lexer self,
        s: &'input str
    ) -> LRLexer<'lexer, 'input, StorageT> {
        LRLexer::new(self, s)
    }

This is a safe change because, by definition, lrlex can only return LRLexer
instances. The alternative to this would have been to add a second lifetime
'lexer to the Lexer trait, which seems overkill.
@ptersilie
Copy link
Member

Please squash.

@ltratt ltratt force-pushed the more_flexible_input_lifetime branch from 64a04e3 to e577a52 Compare April 29, 2020 18:12
@ltratt
Copy link
Member Author

ltratt commented Apr 29, 2020

Squashed.

@ptersilie
Copy link
Member

bors r+

@bors
Copy link
Contributor

bors bot commented Apr 29, 2020

Build succeeded:

@bors bors bot merged commit 3c9dd39 into softdevteam:master Apr 29, 2020
@ltratt ltratt deleted the more_flexible_input_lifetime branch April 30, 2020 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants