Join GitHub today
Lossless Syntax Tree Pattern
- RedBaron: Full Syntax Tree
- Go: AST
- Roslyn: Syntax Tree -- makes the explicit claim of being round-trippable
- Red-Green Trees? This term is more about how to handle insertions in an IDE, which change the position of every node after the insertion. Red tree is a wrapper around the green tree.
Roslyn -- Microsoft's C# compiler platform
JetBrains Implementing Parser and PSI -- The AST nodes have a direct mapping to text ranges in the underlying document. The bottom-most nodes of the AST match individual tokens returned by the lexer, and higher level nodes match multiple-token fragments. Operations performed on nodes of the AST tree, such as inserting, removing, reordering nodes and so on, are immediately reflected as changes to the text of the underlying document.
- There is currently no ready way to reuse existing language grammars, for example, from ANTLR, for creating custom language parsers. The parsers need to be coded manually. -- this should go in my meta-language rant
- Custom language parser and PSI classes can be generated from grammars using Grammar-Kit plugin. Besides code generation it provides various features for editing grammar files: syntax highlighting, quick navigation, refactorings and more. The Grammar-Kit plugin is built using its own engine and its source code can be found on GitHub.
Making Tools Kythe-Compatible -- I would have thought that Kythe had the LST concept, because it's basically an IDE, but I don't see it. I guess that is because it's read-only?
JRuby Parser -- JRuby once had a parser which kept track of all sorts of extra information when it built it's Abstract Syntax Tree (AST). Stuff like character offsets where a particular element started or ended. The impact of this extra information was a more than noticeable amount of memory and a bit of a perf impact. At the time we decided to discontinue having this sort of parser in JRuby we created JRubyParser.* -- related to Parsing Is Difficult
Currently, lib/AST structures don't make a very clear distinction between syntactic and semantic information. Long term, we hope to achieve the following based on work here: In no particular order, here is a summary of the design and implementation points for this library:
- Represent Swift source with "full fidelity" - parsing a source file and printing the syntax tree should result in the same file.
- Provide good structured editing APIs at all granularities.
- Accommodate "bad syntax" - humans are imperfect and source code is constantly in a state of flux in an editor.
- For configuration files:
- This is the second comment-preserving configuration parser I've written, after an /etc/hosts parser. Eventually, I will write one for every Linux file format.
Style-Preserving Source Translators
2to3 -- for Python 2 to Python 3
RedBaron -- RedBaron, by relying on Baron, uses a Full Syntax Tree (FST). It’s like an AST except it keeps every information, included formatting, and is then a lossless representation of the source code.
Facebook's jscodeshift -- uses the recast library
Facebook's pfff -- Tools for code analysis, visualizations, or style-preserving source transformation
- Facebook's Spatch for PHP, based on the pfff toolkit
External Clang Examples -- has clang-mutate, but it's dormant
TODO: What data structures do these use?
- clang-format -- Clang AST, which is huge
- http://clang.llvm.org/docs/LibFormat.html -- design is undocumented
- gofmt -- uses the Go AST package, which is reputed to be messy
- dartfmt -- The Hardest Program I've Ever Written
- yapf for Python
- rustfmt Design -- Blog Post
- dfmt First uses a parsers to generate lists of locations with special meanings. Then goes through a lossless token stream to format.
FemtoCleaner - A bot to automatically upgrade your Julia syntax (August 2017)
- Unfortunately, while julia’s parser is accessible from within the language and can be used to find these instances of deprecated syntax, it cannot be used for our purposes. This is because it does not support our second goal - style preservation. -- Parsing Is Difficult
- There are several names of this concept, “round-tripable representation”, “Concrete Syntax Tree (CST)” or “Lossless Syntax Tree” being perhaps the most common. Luckily, in the Julia ecosystem we have not one, but two choices for such a parser
Other Docs / Designs / Discussions
Haskell GHC AST Annotations -- Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers
What to do about comments? on Lambda the Ultimate
https://news.ycombinator.com/item?id=13914218 -- From C# compiler dev -- Resilient parsing. This is the big one! If you give our parser a string that is illegal according to the grammar, our parser will still give you a syntax tree! (We'll also spit errors out). But getting a syntax tree regardless of the actual validity of the program being passed in means that the IDE can give autocomplete and report type-checking error messages. As an example, the code "var x = velocity." is invalid C#. However, in order to give autocomplete on "velocity", that code needs to be parsed into an AST, and then typechecked, and then we can extract the members on the type in order to provide a good user experience.
These don't need to use the Lossless Syntax Tree because the resulting code won't be edited by a human.
- CoffeeScript, etc.