RegEx vs. parboiled vs. Parser Generators

Mathias edited this page Sep 14, 2010 · 2 revisions
Clone this wiki locally

If you think about parsing requirements as a spectrum with “quick and dirty” Regular Expressions on the one side and full-blown parser generators like ANTLR on the other side parboiled aims to fill the large space between both ends. For very simple use cases a regex might be an adequate solution with minimal overhead. However, Regular Expressions can very quickly grow into ugly messes that are incredibly hard to read, understand and therefore maintain. In many cases they also simply lack the expressive power to parse things like nested constructs, which require recursive rule definitions. They also do not generate proper error messages or recover from errors in the input, something that can be a huge time saver not only during development.

On the other end of the spectrum powerful tools like ANTLR and Rats! certainly have their applications as well. Whenever large amounts of source code written in complicated languages have to be parsed a parser generator might be the right tool for the task. You might, for example, want to make use of existing grammars or need the full feature set of an enterprise tool that has grown over many years.

However, whenever you are thinking about defining your own grammar or if you do not have prior experience with a parser generator parboiled might get you to your goal faster and with much less overhead. parboiled is being used for “small” tasks, like parsing arbitrary date and time constructs, as well as complicated things, like Java source code or markup like Markdown. Its small footprint and flexible architecture lets it fit into many applications and also offers a good base for customizations of many kinds.