Skip to content
This repository was archived by the owner on Apr 1, 2025. It is now read-only.

Conversation

@patrickt
Copy link
Contributor

Took a first stab at this. I described the addition process in terms
of how it is now, rather than how it will be when we compile
tree-sitter ASTs directly into Core. I anticipate that this will
impress upon the public the amount of work required to add a new
language, and hopefully encourage people to wait until the new
generation of AST parsing lands before they invest a ton of time into
what will be a deprecated code path.

Took a first stab at this. I described the addition process in terms
of how it is now, rather than how it will be when we compile
tree-sitter ASTs directly into Core. I anticipate that this will
impress upon the public the amount of work required to add a new
language, and hopefully encourage people to wait until the new
generation of AST parsing lands before they invest a ton of time into
what will be a deprecated code path.
@patrickt patrickt requested a review from a team June 12, 2019 20:07
@patrickt patrickt mentioned this pull request Jun 12, 2019
Copy link
Contributor

@robrix robrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this! I’ve left a few notes, but I think this is a great start.


## The procedure

1. **Find or write a [tree-sitter](https://tree-sitter.github.io) parser for your language.** The tree-sitter [organization page](https://github.com/tree-sitter) has a number of parsers beyond those we currently support in Semantic; look there first to make sure you're not duplicating work. The tree-sitter [documentation on creating parsers](http://tree-sitter.github.io/tree-sitter/creating-parsers) provides an exhaustive look at the process of developing and debugging tree-sitter parsers. Though we do not support grammars written with other toolkits such as [ANTLR](https://www.antlr.org), translating an ANTLR or other BNF-style grammar into a tree-sitter grammar is usually straightforward. The parser needs to be as complete as possible: `semantic` cannot yet handle syntax trees containing unrecognized/erroneous nodes ([https://github.com/github/semantic/issues/13]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last bit is incorrect: we can and do represent errors in syntax just fine. These can represent errors in the parser or assignment, as well as legit syntax errors in the parsed source.

We don’t yet have a way to do it with the new-style generated ASTs, however; cf tree-sitter/haskell-tree-sitter#114.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, cc @maxbrunsfeld for anything we should change or explicitly call out in discussions of tree-sitter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this whole line, but am open to any more suggestions Max might have.

@robrix
Copy link
Contributor

robrix commented Jun 13, 2019

cc @aymannadeem

This was referenced Jun 14, 2019
@@ -0,0 +1,20 @@
# Adding new languages to Semantic

This document exists to outline the process associated with adding a new language to Semantic. Though the Semantic authors have architected the library such that adding new languages and syntax [requires no changes to existing code](https://en.wikipedia.org/wiki/Expression_problem), adding support for a new language is a nontrivial amount of work. Those willing to take the plunge will probably need a degree of Haskell experience.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add another (short) paragraph here calling out that this is temporary. I like the detail you have below about TH/Core as a replacement (and that can stay down there), but it's worth mentioning that here at the beginning too. Just a sentence or two, plus maybe a link to the more detailed discussion down in the FAQ.

2. **Create a Haskell library providing an interface to that C source.** The [`haskell-tree-sitter`](https://github.com/tree-sitter/haskell-tree-sitter/tree/master/languages) repository provides a Cabal package for each supported language. You can find an example of a pull request to add such a package here. Each package needs to provide two API surfaces:
* a bridged (via the FFI) reference to the toplevel parser in the generated file ([example](https://github.com/tree-sitter/haskell-tree-sitter/blob/master/languages/json/internal/TreeSitter/JSON/Internal.hs))
* symbol datatypes for each syntax node in the parser, generated with the `mkSymbolDatatype` Template Haskell splice ([example](https://github.com/tree-sitter/haskell-tree-sitter/blob/master/languages/json/TreeSitter/JSON.hs))
3. **Identify the new syntax nodes required to represent your language.** While we provide an extensive library of reusable AST nodes for [literals](https://github.com/github/semantic/blob/master/src/Data/Syntax/Literal.hs), [expressions](https://github.com/github/semantic/blob/master/src/Data/Syntax/Expression.hs), [statements](https://github.com/github/semantic/blob/master/src/Data/Syntax/Statement.hs), and [types](https://github.com/github/semantic/blob/master/src/Data/Syntax/Type.hs), most languages will require some syntax nodes not found in other languages. You'll need to create a new module providing those data types, and those data types must be written as an open union.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to an existing example language-specific module

This document exists to outline the process associated with adding a new language to Semantic. Though the Semantic authors have architected the library such that adding new languages and syntax [requires no changes to existing code](https://en.wikipedia.org/wiki/Expression_problem), adding support for a new language is a nontrivial amount of work. Those willing to take the plunge will probably need a degree of Haskell experience.

## The procedure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe at the end of each bullet point describe which semantic subcommands will work at that point?

Copy link
Contributor

@aymannadeem aymannadeem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Can't wait to open a PR once my changes are more end-to-end. 🙌

Copy link
Contributor

@dcreager dcreager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@patrickt patrickt merged commit 1227403 into master Jun 14, 2019
@patrickt patrickt deleted the document-adding-new-langs branch June 14, 2019 20:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants