New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(CharIn.rep vs CharsWhile) + rep(min = 1) -- a puzzler #164

Closed
wookietreiber opened this Issue Aug 13, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@wookietreiber

wookietreiber commented Aug 13, 2017

code (0.4.3):

package foo

import fastparse.all._

object Foo extends App {
  case class NamedFunction[A,B](f: A => B, name: String) extends (A => B) {
    override def apply(a: A) = f(a)
    override def toString = name
  }

  { // using CharsWhile(..)
    val alpha = NamedFunction((c: Char) => c.isLetter, "alpha")

    val word: Parser[String] = P( CharsWhile(alpha).! )

    val list: Parser[Seq[String]] = P( "(" ~ word.rep(min = 1, sep = ",") ~ ")" )

    println("-- CharsWhile")
    println(list.parse("()"))
    println(list.parse("(a,b,c)"))
  }

  { // using CharIn(..).rep
    val word: Parser[String] = P( CharIn('a' to 'z' ).rep.! )

    val list: Parser[Seq[String]] = P( "(" ~ word.rep(min = 1, sep = ",") ~ ")" )

    println("-- CharIn")
    println(list.parse("()"))
    println(list.parse("(a,b,c)"))
  }
}

output:

-- CharsWhile
Failure(CharsWhile(alpha):1:2 ...")")
Success(ArrayBuffer(a, b, c),7)
-- CharIn
Success(ArrayBuffer(),2)
Success(ArrayBuffer(a, b, c),7)

Why are these different? Both should expect at least one element in the list, however, the CharIn case allows an empty one. Is this a bug or did I miss something in the distinction between the two definitions of word?

@lihaoyi

This comment has been minimized.

Show comment
Hide comment
@lihaoyi

lihaoyi Aug 14, 2017

Owner

The basic confusion is that while both CharsWhile and .rep have configurable mins, CharsWhile defaults to min=1 while .rep defaults to min=0.

I don't really have a good reason why this is the case; it seemed like a good idea at the time based on what I was using the two things for. Really, both of them should have the same default min, whether it be min=0 or min=1.

However, I suspect if we change it now it'll be hell for people to upgrade though =/ You won't get any compiler error when you upgrade a version, or even any runtime error: your parser would just start behaving differently in subtle ways.

I honestly don't know what the right thing to do here is

Owner

lihaoyi commented Aug 14, 2017

The basic confusion is that while both CharsWhile and .rep have configurable mins, CharsWhile defaults to min=1 while .rep defaults to min=0.

I don't really have a good reason why this is the case; it seemed like a good idea at the time based on what I was using the two things for. Really, both of them should have the same default min, whether it be min=0 or min=1.

However, I suspect if we change it now it'll be hell for people to upgrade though =/ You won't get any compiler error when you upgrade a version, or even any runtime error: your parser would just start behaving differently in subtle ways.

I honestly don't know what the right thing to do here is

@wookietreiber

This comment has been minimized.

Show comment
Hide comment
@wookietreiber

wookietreiber Aug 15, 2017

I see. So, this change here to the example from above:

@@ -21,7 +21,7 @@ object Foo extends App {
   }

   { // using CharIn(..).rep
-    val word: Parser[String] = P( CharIn('a' to 'z').rep.! )
+    val word: Parser[String] = P( CharIn('a' to 'z').rep(min = 1).! )

     val list: Parser[Seq[String]] = P( "(" ~ word.rep(min = 1, sep = ",") ~ ")" )

... produces the expected output:

-- CharsWhile
Failure(CharsWhile(alpha):1:2 ...")")
Success(ArrayBuffer(a, b, c),7)
-- CharIn
Failure(CharIn("abcdefghijklmnopqrstuvwxyz"):1:2 ...")")
Success(ArrayBuffer(a, b, c),7)

wookietreiber commented Aug 15, 2017

I see. So, this change here to the example from above:

@@ -21,7 +21,7 @@ object Foo extends App {
   }

   { // using CharIn(..).rep
-    val word: Parser[String] = P( CharIn('a' to 'z').rep.! )
+    val word: Parser[String] = P( CharIn('a' to 'z').rep(min = 1).! )

     val list: Parser[Seq[String]] = P( "(" ~ word.rep(min = 1, sep = ",") ~ ")" )

... produces the expected output:

-- CharsWhile
Failure(CharsWhile(alpha):1:2 ...")")
Success(ArrayBuffer(a, b, c),7)
-- CharIn
Failure(CharIn("abcdefghijklmnopqrstuvwxyz"):1:2 ...")")
Success(ArrayBuffer(a, b, c),7)
@wookietreiber

This comment has been minimized.

Show comment
Hide comment
@wookietreiber

wookietreiber Aug 15, 2017

In regards to changing it: If you really can't think of a reason why these should behave differently, I would suggest the following:

  1. For now, explain the difference in the documentation, maybe even use a note box with a colored background, so there is enough emphasis on the distinction to have the puzzler documented.

  2. Well, there is semantic versioning: you are allowed to introduce breaking changes when you bump versions either to 0.5.0 or 1.0.0. Just document the change well enough in the changelog and the GitHub releases page and you're fine. Especially while you are still on 0.X.Y you shouldn't feel too burdened by introducing breaking changes.

    Also, because there are probably not that many uses of both functions in projects, I don't think people will have that a hard time when updating.

wookietreiber commented Aug 15, 2017

In regards to changing it: If you really can't think of a reason why these should behave differently, I would suggest the following:

  1. For now, explain the difference in the documentation, maybe even use a note box with a colored background, so there is enough emphasis on the distinction to have the puzzler documented.

  2. Well, there is semantic versioning: you are allowed to introduce breaking changes when you bump versions either to 0.5.0 or 1.0.0. Just document the change well enough in the changelog and the GitHub releases page and you're fine. Especially while you are still on 0.X.Y you shouldn't feel too burdened by introducing breaking changes.

    Also, because there are probably not that many uses of both functions in projects, I don't think people will have that a hard time when updating.

@lihaoyi

This comment has been minimized.

Show comment
Hide comment
@lihaoyi

lihaoyi Oct 18, 2018

Owner

I'm probably going to call this a wontfix for now. I admit it's a wart, but I don't think it's worth the cost of changing

Owner

lihaoyi commented Oct 18, 2018

I'm probably going to call this a wontfix for now. I admit it's a wart, but I don't think it's worth the cost of changing

@lihaoyi lihaoyi closed this Oct 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment