Skip to content

haxscramper/htsparse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

readme

This package provides auto-generated wrappers for multiple tree-sitter grammars, - cpp, java, lua, js, latex, clojure etc (for the full list see htsparse module list). Each wrapper has two versions - core_only (does not depend on most of the hmisc features and only imports treesitter_core from hmisc) and more feature-full one.

“core_only” wrapper provides a simple interfaces to the tree-sitter - node is wrapped in the distinct type and helper procedures for .kind are provided. In order to read and analyze the nodes you can use procedures defined in the treesitter_core, such as [] (get idx-th named subnode), {} (get any idx-th subnode) and items iterator.

“full” wrapper proviedes an additional convenience layer on top of the bare tree-sitter, by rewriting distinct tree into much more useable structure that supports [] for named fields (node["type"]), stores pointer to the base string (and has proper .strVal() call implemented, without the need to pass original string everywhere), and has more sophisticated .treeRepr() implementation that is very useuful for understanding the parsed AST.

Both “full” and “core only” versions include seveal helper types, consts and procs:

node kinds
enum with all possible node kinds. Tree-sitter has two different node types - named and unnamed, with named nodes further subdivided into regular and hidden. All of these types are listed in the single enum. Fields that end with Tok correspond to the token nodes, ones that have Hid in name are “hidden” (regular parser won’t show these nodes in the resulting AST).
  • Full list of token nodes can be accessed via <lang>HiddenKinds constant
  • List of tokens kinds is stored in <lang>TokenKinds
  • Most of the operations hide token nodes by deafult (.len(), [], items() etc). In order to access those, unnamed = true argument can be used. node[<idx>, true] has a shorthand version - {} operator. In most cases this is not necessary, except for maybe various operators that used literal tokens - to properly iterate subnodes of 1 + 2 you probably need to use for sub in items(node, true)
allowed subnodes
Short list of the allowed node kinds for each subnode is stored in the <lang>AllowedSubnodes const
original grammar
Most of the original grammar production rules are copied into the <lang>Grammar variable. These rules correspond to the original productions in the grammar.js and might be used to generate new source code from the AST.

In addition to the shared features, “full” vesion also includes helpers to

parse text
parse<lang>String - helper proc to convert string into rewritten node. Node the unnamed: bool = false argument - by default only named nodes are rewritten, and resulting tokens are discareded. To get full tree rewrite with tokens intact set this argument to true, or use .getTs() for the rewritten node to access the original tree-sitter node.
  • agda
  • bash
  • c
  • clojure
  • common.nim
  • cpp
  • csharp
  • css
  • dart
  • elisp
  • embeddedTemplate
  • eno
  • fennel
  • go
  • graphql
  • html
  • java
  • js
  • julia
  • kotlin
  • latex
  • lua
  • make
  • nix
  • php
  • python
  • regex
  • ruby
  • rust
  • scala
  • systemrdl
  • systemVerilog
  • toml
  • vhdl
  • zig

Installation and setup

nimble install htsparse

Links

Usage

import htsparse/cpp/cpp

let str = """
int main () {
  std::cout << "Hello world";
}
"""

echo parseCppString(str).treeRepr()
 TranslationUnit 0:0-3:0
[0] FunctionDefinition 0:0-2:1
  [0] PrimitiveType <type(0)> 0:0..3 int
  [1] FunctionDeclarator <declarator(1)> 0:4..11
    [0] Identifier <declarator(0)> 0:4..8 main
    [1] ParameterList <parameters(1)> 0:9..11 ()
  [2] CompoundStatement <body(2)> 0:12-2:1
    [0] ExpressionStatement 1:2..29
      [0] BinaryExpression 1:2..28
        [0] QualifiedIdentifier <left(0)> 1:2..11
          [0] NamespaceIdentifier <scope(0)> 1:2..5 std
          [1] Identifier <name(1)> 1:7..11 cout
        [1] StringLiteral <right(1)> 1:15..28 "Hello world"

Tree-sitter library

You need to have tree-sitter runtime library installed. For arch linux it can be done by installing tree-sitter, otherwise you can install it manually:

wget https://github.com/tree-sitter/tree-sitter/archive/0.20.0.tar.gz
tar -xvf 0.20.0.tar.gz && cd tree-sitter-0.20.0
sudo make install PREFIX=/usr

Packages

No packages published

Languages