-
Notifications
You must be signed in to change notification settings - Fork 461
Document the process of adding new languages. (#126) #131
Conversation
Took a first stab at this. I described the addition process in terms of how it is now, rather than how it will be when we compile tree-sitter ASTs directly into Core. I anticipate that this will impress upon the public the amount of work required to add a new language, and hopefully encourage people to wait until the new generation of AST parsing lands before they invest a ton of time into what will be a deprecated code path.
robrix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for this! I’ve left a few notes, but I think this is a great start.
docs/adding-new-languages.md
Outdated
|
|
||
| ## The procedure | ||
|
|
||
| 1. **Find or write a [tree-sitter](https://tree-sitter.github.io) parser for your language.** The tree-sitter [organization page](https://github.com/tree-sitter) has a number of parsers beyond those we currently support in Semantic; look there first to make sure you're not duplicating work. The tree-sitter [documentation on creating parsers](http://tree-sitter.github.io/tree-sitter/creating-parsers) provides an exhaustive look at the process of developing and debugging tree-sitter parsers. Though we do not support grammars written with other toolkits such as [ANTLR](https://www.antlr.org), translating an ANTLR or other BNF-style grammar into a tree-sitter grammar is usually straightforward. The parser needs to be as complete as possible: `semantic` cannot yet handle syntax trees containing unrecognized/erroneous nodes ([https://github.com/github/semantic/issues/13]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last bit is incorrect: we can and do represent errors in syntax just fine. These can represent errors in the parser or assignment, as well as legit syntax errors in the parsed source.
We don’t yet have a way to do it with the new-style generated ASTs, however; cf tree-sitter/haskell-tree-sitter#114.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, cc @maxbrunsfeld for anything we should change or explicitly call out in discussions of tree-sitter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this whole line, but am open to any more suggestions Max might have.
|
cc @aymannadeem |
| @@ -0,0 +1,20 @@ | |||
| # Adding new languages to Semantic | |||
|
|
|||
| This document exists to outline the process associated with adding a new language to Semantic. Though the Semantic authors have architected the library such that adding new languages and syntax [requires no changes to existing code](https://en.wikipedia.org/wiki/Expression_problem), adding support for a new language is a nontrivial amount of work. Those willing to take the plunge will probably need a degree of Haskell experience. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add another (short) paragraph here calling out that this is temporary. I like the detail you have below about TH/Core as a replacement (and that can stay down there), but it's worth mentioning that here at the beginning too. Just a sentence or two, plus maybe a link to the more detailed discussion down in the FAQ.
docs/adding-new-languages.md
Outdated
| 2. **Create a Haskell library providing an interface to that C source.** The [`haskell-tree-sitter`](https://github.com/tree-sitter/haskell-tree-sitter/tree/master/languages) repository provides a Cabal package for each supported language. You can find an example of a pull request to add such a package here. Each package needs to provide two API surfaces: | ||
| * a bridged (via the FFI) reference to the toplevel parser in the generated file ([example](https://github.com/tree-sitter/haskell-tree-sitter/blob/master/languages/json/internal/TreeSitter/JSON/Internal.hs)) | ||
| * symbol datatypes for each syntax node in the parser, generated with the `mkSymbolDatatype` Template Haskell splice ([example](https://github.com/tree-sitter/haskell-tree-sitter/blob/master/languages/json/TreeSitter/JSON.hs)) | ||
| 3. **Identify the new syntax nodes required to represent your language.** While we provide an extensive library of reusable AST nodes for [literals](https://github.com/github/semantic/blob/master/src/Data/Syntax/Literal.hs), [expressions](https://github.com/github/semantic/blob/master/src/Data/Syntax/Expression.hs), [statements](https://github.com/github/semantic/blob/master/src/Data/Syntax/Statement.hs), and [types](https://github.com/github/semantic/blob/master/src/Data/Syntax/Type.hs), most languages will require some syntax nodes not found in other languages. You'll need to create a new module providing those data types, and those data types must be written as an open union. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to an existing example language-specific module
| This document exists to outline the process associated with adding a new language to Semantic. Though the Semantic authors have architected the library such that adding new languages and syntax [requires no changes to existing code](https://en.wikipedia.org/wiki/Expression_problem), adding support for a new language is a nontrivial amount of work. Those willing to take the plunge will probably need a degree of Haskell experience. | ||
|
|
||
| ## The procedure | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe at the end of each bullet point describe which semantic subcommands will work at that point?
aymannadeem
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Can't wait to open a PR once my changes are more end-to-end. 🙌
dcreager
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
Took a first stab at this. I described the addition process in terms
of how it is now, rather than how it will be when we compile
tree-sitter ASTs directly into Core. I anticipate that this will
impress upon the public the amount of work required to add a new
language, and hopefully encourage people to wait until the new
generation of AST parsing lands before they invest a ton of time into
what will be a deprecated code path.