-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Project aim and design discussions #1
Comments
@reedkotler Perhaps you have some input with regards to the open question of how production actions may be stored. You mentioned you had previous experience working with tools that separated the productions actions from the otherwise language-agnostic input grammar. How did these tools work? Did you write two files, one for the grammar and one for the production actions (i.e. what happens on reduce)? Any sources you may point us to for inspiration? I'll play around with a few different approaches over the coming months, and would love to gain additional input on approaches that are known to work. Cheers /u |
To give an idea of what the input and output of these tools may look like, a proof of concept tool for step 1 has been created. The terms command extracts regular expressions for terminals of a given input grammar, and outputs them as JSON. Example output in JSON from parsing the uC input grammar expressed in EBNF. (Note, under the covers, the {
"names": [
{
"id": "ident",
"reg": "[A-Z_a-z][0-9A-Z_a-z]*"
},
{
"id": "int_lit",
"reg": "[0-9][0-9]*"
}
],
"tokens": [
"!",
"!=",
"\u0026\u0026",
"(",
")",
"*",
"+",
",",
"-",
"/",
";",
"\u003c",
"\u003c=",
"=",
"==",
"\u003e",
"\u003e=",
"[",
"]",
"else",
"if",
"return",
"while",
"{",
"}"
]
} |
I think you might be able to push a lot of these ideas into gocc :) gocc code generation speedIt would be really cool to speed this up, but I don't know if its possible. exporting regexesI think exporting the regular expressions is easy, but I think gocc does a small bit more than simply checking these regexes, when conflicting regexes exist. Maybe its again better to export a table? But I am now simply partially recalling conversations from years ago, so I might be wrong again. language specific ast creationYou mentioned:
Do you have some examples? language agnostic ast creationSome of my ideas on this topic have been towards specifying a BNF with production rules that are target language agnostic. dreamMy personal dream would be to be able to spec a language once in BNF (or whatever works) and have a parser be produced in any* language of my choice. |
…g regexp of terminators. Note, this is a naive implementation and should be considered a proof of concept. The generated lexer is roughly two orders of magnitude slower than the one generated by Gocc. This represents the tool of step 2b. Updates #1.
Hi @awalterschulze, Thanks for getting back to me! I think I share your dream. I would also very much like to specify the grammar of a language in one BNF (or whatever works..) and generate lexers, parsers and such for a set of languages. The intention of this project is definitely to experiment with various designs that may bring about such a workflow. Using protobuf or something similar may be interesting to look into, or some for of LISP dialect to represent ASTs in a language-agnostic fashion. For the time being, I'll focus my efforts in trying to generate language-agnostic tables for lexers (DFA transition and action tables) and parsers (shift/reduce tables). And then look into how one may tie into and make use of the shift/reduce tables in a good fashion. Experimenting is fun! Regarding your question on parsing X in X (Go in Go, Python in Python, etc...)
Definitely, I'm playing around with different ways to export this table right now. Should hopefully have something up and running within the next few days. Cheers /u |
On Wed, 15 Feb 2017 at 02:48 Robin Eklind ***@***.***> wrote:
Hi @awalterschulze <https://github.com/awalterschulze>,
Thanks for getting back to me! I think I share your dream. I would also
very much like to specify the grammar of a language in one BNF (or whatever
works..) and generate lexers, parsers and such for a set of languages. The
intention of this project is definitely to experiment with various designs
that may bring about such a workflow.
For me the AST is an integral part of this dream.
Actually it does have to be an AST, but it has to be a parse tree or some
structure that is the same independent of target language.
Using protobuf or something similar may be interesting to look into, or
some for of LISP dialect to represent ASTs in a language-agnostic fashion.
For the time being, I'll focus my efforts in trying to generate
language-agnostic tables for lexers (DFA transition and action tables) and
parsers (shift/reduce tables). And then look into how one may tie into and
make use of the shift/reduce tables in a good fashion. Experimenting is fun!
Maybe a ParsedTree struct (class) can be generated from the EBNF as well?
Regarding your question on parsing X in X (Go in Go, Python in Python,
etc...)
You mentioned:
programming language often have mature environments for creating ASTs in
their own language
Do you have some examples?
- Go
- https://golang.org/pkg/go/ast/
- https://golang.org/pkg/go/parser/
- Haskell
- http://hackage.haskell.org/package/haskell-src
- https://hackage.haskell.org/package/haskell-tools-ast-fromghc
- Python
- https://docs.python.org/3/library/ast.html
- Ruby
- https://github.com/whitequark/parser
- https://github.com/seattlerb/ruby_parser
Here we were talking past each other.
I was asking what types of tools to target languages (like Go, Ruby, etc.)
have that can transform a parsedTree into an AST.
I think exporting the regular expressions is easy, but I think gocc does a
small bit more than simply checking these regexes, when conflicting regexes
exist. Maybe its again better to export a table?
Definitely, I'm playing around with different ways to export this table
right now. Should hopefully have something up and running within the next
few days.
Cool.
… Cheers /u
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABvsLQ1nVDd2ZF_2_1dN1zhwcTilciysks5rcll4gaJpZM4L-svA>
.
|
Proposal: generated ParsedTree instead of AST and ProtobufsGiven a grammar
We could generate a parsed Tree struct
Obviously there might some harder examples and we have to get this right in every language. |
This is a meta issue to discuss various design decisions related to the speak tool.
NOTE: This section is under construction, and will be revised based on continuous discussions, so feel free to join.
(The intention is to explore a large set of ideas that may or may not be suited for implementation in Gocc. Implementing them from the ground up opens up the design space, and makes it possible to try wild ideas that would be cumbersome to implement in the current code base of Gocc. If the workflow of speak ends up substantially deviating from that of Gocc, it may make sense to keep both tools around.)
Aim and Objectives
The aim of this project is create an experimental playground for compiler construction.
Objectives:
Open questions:
Where should production actions be stored, and how may they look like? The main issue is that the EBNF grammar should remain language-agnostic, and may therefore only contain information about the source/input programming language and not about the library/target programming languages.
The idea is to create one ENBF grammar per source/input programming language, and create a set of associated production action files, one per library/target programming language.
Any ideas on how to do this in a clean fashion which makes the mapping between EBNF input grammar and production actions clear, and which facilitates support for generating lexers and parsers for the source language in several different programming languages.
As a side note, this effort is also very much intended to help solve the slow compilations and generation times associated with Gocc as the input grammar becomes more complex. The llir/llvm project is currently experiencing lexer and parser generation times of 1.5 minutes. Sometimes which makes coding less enjoyable.
The key aim of this compiler construction playground is to make language analysis more fun again.
CC: @awalterschulze Any thoughts, input or ideas? I think you are currently hitting the same walls as I am with Gocc. Would be great to flesh out ideas for how we can improve the situation : )
The text was updated successfully, but these errors were encountered: