Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser performance #4942

Open
eatkins opened this issue Aug 12, 2019 · 0 comments
Open

Parser performance #4942

eatkins opened this issue Aug 12, 2019 · 0 comments

Comments

@eatkins
Copy link
Contributor

eatkins commented Aug 12, 2019

This is more of an observation than an issue:

Parsers can be very slow when the | is involved. I tried to write a parser for paths in the following form: /?foo/bar/baz/(**)?/filename that would extract to (Option[Path], Boolean, String). It looked like this:

private[impl] val globParser: Parser[(Option[Path], Boolean, String)] = {
   val sep = ('/': Parser[Char]) | ('\\': Parser[Char])
   val star: Parser[Char] = '*'
   val notSpecial = charClass(c => c != '/' && c != '\\' && c != '*', "not meta")
   val pathParts = matched(notSpecial.* ~ sep.? ~ (notSpecial.+ <~ sep).*).?
   val recursivePart = sep.? ~> (star.+ <~ sep).?
   (OptSpace ~> pathParts ~ recursivePart ~ matched(not(sep, "").+)).map {
     case ((prefix, Some(Seq('*', '*'))), filename) => (prefix.map(Paths.get(_)), true, filename)
     case ((prefix, _), filename)                   => (prefix.map(Paths.get(_)), false, filename)
   }
}

The parser worked fine for short paths, but the performance was really terrible as the number of path prefixes grew.

One thing that I noticed is that the map is applied eagerly. Naively, I assumed that map would only be applied after all of the bytes had been parsed. With this assumption, I tried splitting the parser into something like pathParts ~ matched(.*).flatMap { (prefix, rest) => ... } but it behaved more or less exactly the same because the flatMap was immediately applied.

I think the Parsers are more or less working as designed, but the design makes it somewhat difficult to write even modestly complicated parsers with reasonable performance (see #4890).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants