Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lookAhead consumes input on failure #73

Closed
chtenb opened this issue Jun 15, 2021 · 15 comments
Closed

lookAhead consumes input on failure #73

chtenb opened this issue Jun 15, 2021 · 15 comments
Labels
type: bug Something that should function correctly isn't.

Comments

@chtenb
Copy link
Member

chtenb commented Jun 15, 2021

Describe the bug

-- | Read ahead without consuming input.
lookAhead :: forall a. Parser a -> Parser a
lookAhead (Parser p) = Parser \s ->
  case p s of
    Right { result } -> Right { result, suffix: s }
    left -> left

As you can see from the definition the suffix is "reset" to s in case of success, but in case of error the error is passed on as is, meaning the PosString at the failure is propagated.

Expected behavior
I would expect the function to do what its documentation suggests: to not consume any input. Or if this is really not intended I would expect the documentation to be correct and not misleading.

Additional context
This cost me quite some debugging time :) I'm currently working around it by using try $ lookAhead everywhere.

@chtenb chtenb added the type: bug Something that should function correctly isn't. label Jun 15, 2021
@chtenb
Copy link
Member Author

chtenb commented Dec 31, 2021

I would like to provide a pull request to fix this. Should I adapt the documentation or adapt the code?

@chtenb chtenb mentioned this issue Feb 8, 2022
4 tasks
@jamesdbrock
Copy link
Member

The semantics of lookAhead in Megaparsec are:

If p in lookAhead p succeeds (either consuming input or not)
the whole parser behaves like p succeeded without consuming anything
(parser state is not updated as well). If p fails, 'lookAhead' has no
effect, i.e. it will fail consuming input if p fails consuming input.
Combine with 'try' if this is undesirable.

There's probably a reason why Megaparsec does it this way. https://hackage.haskell.org/package/megaparsec-9.2.0/docs/Text-Megaparsec.html#v:lookAhead

Parsec does the same thing. https://hackage.haskell.org/package/parsec-3.1.15.0/docs/Text-Parsec.html#v:lookAhead

I would like to understand the reason why Parsec and Megaparsec lookAhead work this way.

@jamesdbrock
Copy link
Member

purescript-parsing lookAhead, on the other hand, seems to work the way you expect it to, and differently from Parsec and Megaparsec... https://pursuit.purescript.org/packages/purescript-parsing/8.2.0/docs/Text.Parsing.Parser.Combinators#v:lookAhead

@chtenb
Copy link
Member Author

chtenb commented Feb 15, 2022

My suspicion is that they defined it this way for reasons of orthogonality. As they mention, one can combine it with try to obtain the semantics I would expect. But you can't have it the other way around.

However, while this is valid, I would argue that this is not what users would expect under the name "lookAhead".
I can't think of a scenario where one would want the parser to fail and get stuck halfway of a lookAhead. If there such a scenario does actually exist I would suggest to have this added as a separate function, with a clear name as to not confuse the user.

@chtenb
Copy link
Member Author

chtenb commented Feb 15, 2022

To elaborate a bit more on the name "lookAhead", I think users would intuitively expect that lookAhead p means

Is what comes next of the form p?

Failing to execute p to success effectively means that the answer to that question is "no".

The way that mega parsec has defined it is more about really running the parser but keeping the previous positional context in case of success. Maybe a name like parseAhead would be more appropriate for this case.

@jamesdbrock
Copy link
Member

jamesdbrock commented Feb 15, 2022

Everything you say sounds convincing, but the current situation is that all the major Haskell libraries which are similar to purescript-string-parsers think that lookAhead should work this way. And I'm always reluctant to contradict the semantics of those Haskell libraries because

  1. they probably have good reasons for what they do and
  2. so as not to surprise users.

In fact I'm starting to think that purescript-parsing is the one package that's doing this wrong.

Package lookAhead consumes input on failure?
parsec yes
megaparsec yes
attoparsec yes
purescript-string-parsers yes
purescript-parsing no

@chtenb
Copy link
Member Author

chtenb commented Feb 16, 2022

I agree that compatibility with existing parsec-like libraries is something that improves user experience.
In that case I would like to propose a new function to add to our API, which does not consume input in both success and failure. The reasons for this would be that

  1. I think this is what you would want in most cases, and it is user friendly if we provide a function that has this behavior out of the box instead of letting each user define an alias for try <<< lookAhead.
  2. Having the function explicitly defined makes the user extra aware of the difference in behavior when reading the API documentation.

I'm not a 100% sure as to what to call this function though. Do you have any suggestions?

tryAhead, dry, ...

@JordanMartinez
Copy link
Member

Why not update lookAhead to match what others do and then introduce a new function with a different name that does what the current lookAhead does? (Or is that the approach that's currently being proposed?) I know this would be a breaking change, but it would match people's expectations if they're coming from other languages.

@chtenb
Copy link
Member Author

chtenb commented Feb 16, 2022

Why not update lookAhead to match what others do and then introduce a new function with a different name that does what the current lookAhead does? (Or is that the approach that's currently being proposed?) I know this would be a breaking change, but it would match people's expectations if they're coming from other languages.

Yes that I what I proposed in my last post. But not clear enough apparently :) Also, note that currently lookAhead already matches what others do (except purescript-parsing), and this issue was about changing that. So it wouldn't be breaking for purescript-string-parsers.

@jamesdbrock
Copy link
Member

( I actually had the yeses and noes incorrectly reversed on the table above so I edited the table and now it's correct. )

@jamesdbrock
Copy link
Member

I would like to propose a new function to add to our API, which does not consume input in both success and failure ... instead of letting each user define an alias for try <<< lookAhead.

I personally think the best solution here is lots of good documentation. All of the Haskell libraries explain very directly that the lookAhead should be combined with try if the user wants to backtrack on failure.

@chtenb
Copy link
Member Author

chtenb commented Feb 16, 2022

I personally think the best solution here is lots of good documentation. All of the Haskell libraries explain very directly that the lookAhead should be combined with try if the user wants to backtrack on failure.

Yes that too. The documentation is incorrect at the moment. Still I would also have the composition as a separate function. Do you have objections to that?

@paulyoung
Copy link

I got tripped up by this. I'm still not sure how to get the behavior I'm looking for.

What's the recommended way to not consume input when using lookAhead?

@paulyoung
Copy link

I think the answer is tryAhead.

@chtenb
Copy link
Member Author

chtenb commented Jun 4, 2022

Yes, tryAhead is the way. If you find the documentation not clear enough in this regard, feel free to improve it and provide a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something that should function correctly isn't.
Development

Successfully merging a pull request may close this issue.

4 participants