New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the purpose of the set of unexpected items? #227

Closed
feuerbach opened this Issue Jun 29, 2017 · 5 comments

Comments

Projects
None yet
2 participants
@feuerbach
Contributor

feuerbach commented Jun 29, 2017

I understand how multiple possible tokens could be expected, but I struggle to understand what a set of unexpected tokens means.

In my own code, this set is always empty or singleton. If this covers all the cases, then perhaps it should be Maybe instead of Set?

Or, if the intention is that the multiple consecutive tokens are unexpected, shouldn't it be a list?

@mrkkrp mrkkrp added the question label Jun 29, 2017

@mrkkrp

This comment has been minimized.

Owner

mrkkrp commented Jun 29, 2017

This is a good question, thanks for asking it now while the (re-)design in this area is still in flux.

I think I know of only one case when there may be several unexpected tokens:

module Main (main) where

import Data.Void
import Text.Megaparsec -- current master
import Text.Megaparsec.Char

type Parser = Parsec Void String

pScheme :: Parser String
pScheme = choice
  [ string "data"
  , string "file"
  , string "ftp"
  , string "https"
  , string "http"
  , string "irc"
  , string "mailto" ]

main :: IO ()
main = parseTest pScheme "dat"

This currently prints:

1:1:
unexpected "dat" or 'd'
expecting "data", "file", "ftp", "http", "https", "irc", or "mailto"

This is because "dat" is tried against every string in turn, so for the first alternative it yields something like unexpected "dat", expecting "data" and for all others just unexpected 'd', expecting ..., because it's clear that the first character does not match and we stop the comparison there.

Logically, it sort of makes sense: "dat" is unexpected as a whole, but since there are other options that do not even start with 'd', the 'd' character by itself is also unexpected. Still, I wouldn't be surprised if just "dat" would be more straightforward for most people in this case.

It looks like unexpected item can only be a token from the input stream or EndOfInput thing, so indeed we could switch to Maybe (ErrorItem t) instead of Set for unexpected tokens. But then we need to have a merging strategy for Maybe (ErrorItem t), which can be:

  • collection of tokens
  • label (not with vanilla mechanics, but it could be used by custom user code)
  • end of input

So what takes precedence over what in that case? Should labels win over collection of tokens, what about end of input? It looks like it's not possible to get unexpected end of input and unexpected token(s) at the same position, but I doubt it's encodable in the types in a satisfactory fashion.

@feuerbach

This comment has been minimized.

Contributor

feuerbach commented Jun 29, 2017

Ah, I see. This is a consequence of not using a lexer. Being a proponent of lexers, I don't sympathize with this use case, but I guess it's a valid one.

Perhaps you could include this example in the documentation too?

@mrkkrp

This comment has been minimized.

Owner

mrkkrp commented Jun 29, 2017

Where do you propose to put it?

@feuerbach

This comment has been minimized.

Contributor

feuerbach commented Jun 29, 2017

I'd put it in the docs for ParseError and then maybe reference it in the docs for token and other functions that have this tuple of sets in their signature. ("For more details, see the documentation for 'ParseError'.")

Or make a separate haddock section explaining the two sets and reference it from other places. (This would subsume #226.) I can't find the docs for that markup at the moment, but it was something like -- $.

@mrkkrp

This comment has been minimized.

Owner

mrkkrp commented Jun 29, 2017

There is a description of the sets in the docs for ParseError and I'll correct the docs for token. I think it's enough, including an additional example just to clarify why set is preferred to Maybe looks like an overkill.

@mrkkrp mrkkrp closed this Jun 29, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment