Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handwritten parser vs. yecc-generated parser #2

Closed
whatyouhide opened this issue Mar 16, 2015 · 5 comments
Closed

Handwritten parser vs. yecc-generated parser #2

whatyouhide opened this issue Mar 16, 2015 · 5 comments

Comments

@whatyouhide
Copy link
Contributor

@josevalim and I have been discussing whether to use a handwritten parser or a yecc-generated parser for .po files. Initially, I was in favour of using yecc in order to have a very declarative and easy-to-understand grammar. After implementing a first version of the parser using yecc, I gave implementing a handwritten parser a try and it was practically just as easy.

Since I couldn't decide which way was the best one, I pushed both implementation to my fork so that we can decide together.

Please note that both parsers require (a lot) of polishing, in particular the yecc-based one since I'm not sure that stuffing the .yrl file in a src directory is the right way to go.

I'm looking forward to your opinions!

@josevalim
Copy link
Contributor

Good work, to me the yecc-based is much cleaner!

A couple notes on the .yrl one:

  1. The parser should stay in the .src directory indeed with the generated .erl being .gitignored
  2. You can match on the error information here to generate proper syntax errors (with line and what not)
  3. If you want, you can generate a map inside .yrl (instead of the translation tuple) and in the Elixir side of things convert them to a struct by doing Map.put(map, :__struct__, Translation).

@ericmj
Copy link
Contributor

ericmj commented Mar 17, 2015

I agree that yecc is cleaner but one of the advantages of the elixir version is that anyone can understand and contribute to it because you don't have to learn yet another parser language.

@whatyouhide
Copy link
Contributor Author

@ericmj that was the main reason I wrote the Elixir version too. It's also true that yecc has a very simple grammar and is very easy to understand (you have to know a little bit about CF grammars maybe, but it's nothing complicated).

I think the yecc version is cleaner but also "safer" in the sense that it's less prone to cause subtle bugs since the grammar definition is very concise and straightforward. If that means giving up some possible contributors, maybe it's worth it.

Keep also in mind that the current versions of both parsers only handle msgid/msgstr couples with no pluralisation nor anything else; this may be the first reason why the Elixir code is so simple. I'm confident that when the grammar becomes a little bit more complicated, yecc's clarity will be even more valuable.

@josevalim
Copy link
Contributor

@ericmj plus the grammar will be simpler to the point it will be easier to contribute to a grammar syntax you don't know than a complex hand-written grammar in Elixir. :)

@whatyouhide
Copy link
Contributor Author

I'm closing this since I merged #3, introducing a .po parser generated using yecc. This resulted in a very straightforward grammar and I also managed to move some tedious logic directly to the yecc parser (e.g., concatenation of a series of string tokens), so yay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants