Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we have more typesafety w.r.t. many-like combinators and infinite loops? #120

Closed
chtenb opened this issue Sep 24, 2021 · 7 comments · Fixed by #193
Closed

Can we have more typesafety w.r.t. many-like combinators and infinite loops? #120

chtenb opened this issue Sep 24, 2021 · 7 comments · Fixed by #193

Comments

@chtenb
Copy link
Member

chtenb commented Sep 24, 2021

Is your change request related to a problem? Please describe.
many p will get stuck in a loop if p possibly doesn't consume any input but still succeeds.

One could argue that the parser is bogus in that case. However, it would be nice to have some compile time safety guarantees for this, because a hanging application isn't fun to deal with, even if it occurs while developing.

Some solution directions that cross my mind

  • Utilizing the typesystem in some way that makes it impossible to pass a parser that possibly consumes no input but still succeeds.
  • Changing the semantics of many such that it stops (or fails?) whenever the position of the parser state hasn't moved after applying p.
@jamesdbrock
Copy link
Member

I like your thinking about this and it would be nice to solve this problem. Some further notes:

  • Utilizing the typesystem in some way that makes it impossible to pass a parser that possibly consumes no input but still succeeds.

I can think of one useful parser that consumes no input and succeeds: eof.

  • Changing the semantics of many such that it stops (or fails?) whenever the position of the parser state hasn't moved after applying p.

Because ParserT is a monad transformer, it's possible to imagine a ParserT s State a which consumes no input of the first parse but does consume input on the second parse, due to some mechanics of the internal state.

@chtenb
Copy link
Member Author

chtenb commented Oct 5, 2021

I've given a little bit of thought. I like the direction of the second solution (stopping the loop when no input is consumed) the best for the following reasons.

  • It does not impose new complexity on the construction of parsers. Parsers can be constructed like before, except for the fact that no infinite loop will occur for non-consuming parsers.
  • eof is indeed a useful parser and it would be nice if many could deal with it. You wouldn't pass it directly to many of course, but it can certainly be used in a more complex parser case. Forbidding such a parser construction feels like an unnecessary constraint.

Because ParserT is a monad transformer, it's possible to imagine a ParserT s State a which consumes no input of the first parse but does consume input on the second parse, due to some mechanics of the internal state.

I can't imagine a practical situation for this, but you are correct. Perhaps we can expose two flavors of many, a terminating one and one that will just keep looping until failure.

@jamesdbrock
Copy link
Member

jamesdbrock commented Oct 6, 2021

Perhaps we can expose two flavors of many, a terminating one and one that will just keep looping until failure.

I like the idea of having a flavor of many which would guarantee that it either makes forward progress or fails.

We would have to think about which combinators we want to supply this flavor for. Here are some candidates:

  • Data.Array.many
  • Data.List.many
  • Data.Array.some
  • Data.List.some
  • Data.List.manyRec
  • Data.List.someRec
  • Data.Array.NonEmpty.some
  • Text.Parsing.Parser.Combinators.many1
  • Text.Parsing.Parser.Combinators.skipMany
  • Text.Parsing.Parser.Combinators.skipMany1
  • Text.Parsing.Parser.Combinators.manyTill
  • Text.Parsing.Parser.Combinators.many1Till

That's a pretty long list. But we could start with just a few of those, and then add more if there was popular demand.

Here's another idea which just occurred to me: a combinator consume :: ParserT s m a -> ParserT s m a which forces a parser to fail if the parser would succeed while consuming no input. (Is this parser published somewhere else with a different name? I can't find it with two minutes of Googling.)

Here's an implementation for String:

consume :: forall m a. Monad m => ParserT String m a -> ParserT String m a
consume p = do
  ParseState input1 _ _ <- get
  x <- p
  ParseState input2 _ _ <- get
  if CodeUnits.length input2 < CodeUnits.length input1 then pure x else fail "Consumed no input."

So if we have a parserWhichMayNotConsume then instead of calling many parserWhichMayNotConsume we can call many (consume parserWhichMayNotConsume) and that will never hang.

@chtenb
Copy link
Member Author

chtenb commented Oct 6, 2021

I like your idea of a consume combinator. It makes writing terminating flavours of many-like combinators very concise, but I'm not even sure that it would be necessary when having that combinator available. Perhaps extending the documentation of many and cousins to mention the termination pitfalls and the consume function is enough. It's name conflicts with an existing function though.

I see you included some non-parser functions in your list, like Data.Array.many. I'm not familiar with those. Do they suffer from the same problem? The documentation states that the Lazy constraint ensures termination, but I'm not sure I understand that fully, nor am I in a position to verify that claim.

@jamesdbrock
Copy link
Member

(The consume implementation I gave will probably not work because it doesn't take into account the parser state's consumed flag.)

@jamesdbrock
Copy link
Member

jamesdbrock commented Jan 18, 2022

(The consume implementation I gave will probably not work because it doesn't take into account the parser state's consumed flag.)

I changed my mind, I think that the consume implementation which I wrote above is good.

The internal consumed flag in purescript-parsing really means, “are we committed to this parsing branch, yes or no?” So the consumed flag resulting from the consume combinator will be whatever the consumed flag resulting from the argument parser was. Which is right.

What we want to assert with the consume combinator is that we’re making progress in the input. Which is a different concern than whether we’re committed to this parsing branch.

@jamesdbrock jamesdbrock linked a pull request Apr 26, 2022 that will close this issue
4 tasks
@chtenb
Copy link
Member Author

chtenb commented Apr 28, 2022

Nice @jamesdbrock . Perhaps the existence of the advance function could be mentioned in the docs of all relevant many-like parser combinators? So that users will know that the many combinator will hang when there is a chance that their parser does not consume, and how to solve that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants