Motivation

Jean-Remi Desjardins edited this page Jul 9, 2014 · 6 revisions
Clone this wiki locally

parboiled was born out of frustration over existing parser building tools for the JVM.

The huge increase in popularity dynamic languages (like Ruby or Groovy) have seen in recent years stems to a good part from the ease with which they lend themselves to the modeling of Domain Specific Languages (DSLs). While these languages (and even some statically typed ones, like Scala) sport a concise and flexible syntax that is often used directly as the base for internal DSLs, the rather unwieldy Java syntax makes internal DSLs quite unattractive.

Still, for many projects a small DSL can make an elegant “user interface”, providing rich expressiveness and flexibility without the need for a complex GUI. In Java, with internal DSLs practically out of the picture, you will have to build a parser for an external DSL in order to reap these benefits.
Even though (external) DSLs are certainly not the only use case for parsers they are one area, where traditional parsing support tools for languages like Java don’t really shine. Many times supporting a DSL is not the centerpiece of a project (as it is for example in a compiler) but rather an elegant way of solving one of many problems. Therefore you might not want to dedicate too much time to the theory of context free grammars, lexing and the intricacies of external parser generators. You just want to somehow specify what your parsing grammar looks like and get it to work quickly and easily. parboiled tries to deliver just that.

Here are some disadvantages of “old-school” parser generators (like ANTLR), as I see them:

  • Special, non-java grammar syntax in separate project files (i.e. an external DSL)
  • No built-in IDE support for grammar files (no syntax highlighting, no inline validation, no refactoring, no code navigation, etc.)
  • Special build steps required to run external parser generator
  • “Untouchable”, generated java source files in your project which need to be kept in sync with the grammar specification
  • More complicated design and maintenance through divided parsing process in lexing (token generation) and token parsing phases
  • General footprint (the ANTLR distribution (generator + runtime) for example is greater than 1.8 MB in size)