Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: status of Go version of Textmapper? #6

Closed
mewmew opened this issue Oct 11, 2018 · 8 comments
Closed

Question: status of Go version of Textmapper? #6

mewmew opened this issue Oct 11, 2018 · 8 comments

Comments

@mewmew
Copy link
Contributor

mewmew commented Oct 11, 2018

Hi Evgeny,

I just came across Textmapper, and having read the Language Reference and the motivation behind the project, it seems to be exactly what I was looking for. Essentially an LR version of ANTLR for Go. I can tell that you have a lot of experience in this domain, as the architecture is well thought out. I still have to dive deep and examine the minute details of the implementation, but my initial reaction of Textmapper is very positive!

Now, of course, I'd like to take tm out for a spin! However, looking at the implementation of tm-go/cmd/textmapper/generate.go, I noticed a TODO in the generate function.

I noticed that you recently ported the Tarjan's algorithm for detecting strongly connected component (in rev 78fc54e). My question is, how far is the Go version of Textmapper from being ready for use?

I'd love to try it out!

Cheerful regards,
Robin

@inspirer
Copy link
Owner

The Go version is very far from being complete. I think it will take me two more months to finish porting the lexer generator from Java, and then another two quarters for the parser generator. It is not that it is much work per se but rather my lack of time between work and family. I'm committed though. The main thing I want to get from this rewrite is the support of declarative (and transparent) nonterminal inlining, which should become the main tool in resolving grammar ambiguities. I'm also looking into better compression for generated tables. The compression scheme Textmapper currently uses is the same as in Bison, and it does not scale well to large templated grammars. The problem of generating performant static hash maps seems very interesting to me but I don't want to do this in Java.

Meanwhile, use the Java version. It is stable and generates very performant code. On real-world languages, generated parsers in Go gave me ~100-230MB/sec of lexing throughtput and 20-60MB/sec of parsing throughput. It gets slightly better with each Go release, mostly because of improved register allocation within the Go compiler.

I will refresh the documentation in the upcoming weeks to better cover Textmapper advanced features, such as templates, grammar lookaheads, token sets, error recovery best practices, and the arrow notation for producing ASTs.

@mewmew
Copy link
Contributor Author

mewmew commented Oct 12, 2018

Thanks a lot for the writeup! It's good to know roughly at what stage the Go port is at, what your plans are for future releases and in particular that you are committed to it!

Performance was actually why I started looking at Textmapper. The intention is to evaluate using Textmapper for parsing LLVM IR assembly, and thus switch from using Gocc to Textmapper in the upcoming release of https://github.com/llir/llvm.

There is still quite a bit to do, but I'd say about 80% of the grammar has been ported from Gocc to Textmapper https://github.com/mewmew/l-tm/blob/master/parser/ll.tm

There is still production actions to write, and that will take the other 80% of the project :)

Once more, thanks for releasing Textmapper to the public!

Cheers,
Robin

@mewmew
Copy link
Contributor Author

mewmew commented Oct 13, 2018

There is still quite a bit to do, but I'd say about 80% of the grammar has been ported from Gocc to Textmapper https://github.com/mewmew/l-tm/blob/master/parser/ll.tm

The port is now done. And the performance looks very promising.

On real-world languages, generated parsers in Go gave me ~100-230MB/sec of lexing throughtput and 20-60MB/sec of parsing throughput.

I can validate this claim, as I get a parsing throughput of roughly 45 MB/s. Have not yet done the semantic actions for constructing the AST though, so hope that won't bring the performance down too much.

Extract from mewspring/mewmew-l#6 (comment):

Parsing 1,733,842 lines and 135 MB of LLVM IR assembly, as contained in the 107 source files at decomp/testdata took ~3 seconds; thus ~30ms was used per file, or ~45 MB/s.

@mewmew
Copy link
Contributor Author

mewmew commented Oct 14, 2018

Just a note, the more I use Textmapper the more remarkable I think it is. Evgeny, what you have managed to do is quite an achievement! I've never come across a parser generator before, where the grammar ends up being so readable as the one in Textmapper. I'm quite amazed how well the LLVM IR grammar seem to turn out.

Simply wanted to extend a thank you!

Hats off and with respect.
Robin

@inspirer
Copy link
Owner

inspirer commented Feb 9, 2019

Thanks for good words, Robin!

A quick update from me: the Go version reached feature parity with its Java counterpart in lexer generation. It produces byte-for-byte identical output for most grammars, and I'm now working on porting the parser generator. I believe I'm past the midpoint of the rewrite.

@mewmew
Copy link
Contributor Author

mewmew commented Feb 9, 2019

A quick update from me: the Go version reached feature parity with its Java counterpart in lexer generation. It produces byte-for-byte identical output for most grammars, and I'm now working on porting the parser generator. I believe I'm past the midpoint of the rewrite.

That is really wonderful to hear! Thanks for the update.

Wish you the best of springs and happy coding ahead :)

@tmm1
Copy link

tmm1 commented Aug 2, 2021

I'm trying to start using the golang textmapper, and I'm not sure if there's a feature missing or I'm doing something wrong.

I started by simply trying to regenerate the simple parser, but the parser.go and listener.go files are not being generated:

$ cd tm-go/parsers/simple
$ rm *.go
$ ../../cmd/textmapper/textmapper generate simple.tm
$ git status
Changes not staged for commit:
	deleted:    listener.go
	deleted:    parser.go

What am I missing?

EDIT: I found an example with the correct commands here: https://github.com/llir/grammar/blob/5291534192d972964c2745b7c18ac47208dc6be5/Makefile#L5-L7

@inspirer
Copy link
Owner

Textmapper is fully rewritten in Go.

Run go install github.com/inspirer/textmapper/cmd/textmapper@latest to install it locally.

In most cases the rewrite is a drop-in replacement for the Java version but there are a few places where the new tool produces slightly different output (mostly in identifiers) or is more strict to grammar errors. Expect the following errors:

  • similar names in the grammar (capitalization, camel vs snake case, etc.) cause a grammar compilation error to avoid confusion and actual compilation errors down the road
  • declarative lookaheads are properly checked to be mutually exclusive (the previous implementation was too lenient)
  • unused patterns get reported
  • syntax sugar is processed in a slightly different order, which in some cases produces a different output
  • (label? -> Foo) is now correctly reported as an empty node when 'label' is missing. Rewrite it as (label -> Foo)?.

There is a new flag --compat which tries to reduce the variation in the generated code between the versions.

Important: the new version uses https://pkg.go.dev/text/template as the templating language. If you override any templates in your grammar, you'll have to update them. Under the --compat flag Texmapper tries to translate previous templates into new templates but this breaks pretty quickly on advanced grammars.

As the first step during the migration, run textmapper generate --diff --compat to see any new errors and the difference in generated code (compares generated code vs the on-disk state).

Bonus: a new grammar option optimizeTables = true speeds up large grammars by 30-80%.

I've successfully migrated dozens of grammars recently and the new version is handling them well. Please let me know if you get into any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants