Skip to content

gmr/tree-sitter-postgres

Repository files navigation

tree-sitter-postgres

A tree-sitter grammar for PostgreSQL, generated directly from PostgreSQL's Bison grammar (gram.y) and keyword list (kwlist.h).

Features

  • Current as of PostgreSQL 18 (generated from REL_18_3)
  • 727 grammar rules covering the full PostgreSQL SQL syntax
  • 494 case-insensitive keywords across all four PG keyword categories
  • Correct operator precedence1 + 2 * 3 parses as 1 + (2 * 3)
  • PL/pgSQL support via a separate grammar with language injection
  • Generated, not hand-written — regenerate for any PostgreSQL version

Quick start

npm install
cd postgres && npx tree-sitter generate && npx tree-sitter test

Regenerating from PostgreSQL source

The grammar is generated from a local PostgreSQL checkout:

# Default: ~/Source/gmr/postgres
node script/generate-grammar.js

# Or specify the path
node script/generate-grammar.js /path/to/postgres

# Then build the parser
cd postgres && npx tree-sitter generate

Input files

File Source
src/backend/parser/gram.y Bison grammar (733 rules, 3236 alternatives)
src/include/parser/kwlist.h Keyword definitions (494 keywords)

Generator scripts

Script Purpose
script/generate-grammar.js Orchestrator — reads PG source, writes postgres/grammar.js
script/parse-gram-y.js Parses Bison grammar: rules, terminals, precedence, %prec annotations
script/parse-kwlist.js Parses keyword list into categories
script/codegen.js Generates tree-sitter grammar with precedence and optional-rule handling
postgres/harvest-conflicts.sh Iteratively discovers GLR conflicts needed by tree-sitter

Repository structure

postgres/               PostgreSQL SQL grammar
  grammar.js            Generated tree-sitter grammar
  src/                  Generated parser (C)
  test/corpus/          Test cases (35 tests)
  known-conflicts.json  GLR conflict pairs

plpgsql/                PL/pgSQL grammar
  grammar.js            Hand-written tree-sitter grammar
  src/scanner.c         External scanner for dollar-quoting and keywords
  test/corpus/          Test cases
  queries/              Highlights and injection queries

script/                 Shared generator code
  generate-grammar.js   SQL grammar orchestrator
  parse-gram-y.js       Bison parser
  parse-kwlist.js       Keyword parser
  codegen.js            Grammar code generator

bindings/               Language bindings (Node, Rust, Python, Go, Swift, C)

Design notes

Empty rule handling

Bison's /* EMPTY */ alternatives cannot be directly translated — tree-sitter forbids non-start rules that match the empty string. The generator propagates optionality upward via a fixpoint loop and wraps references with optional() at call sites.

Operator precedence

Binary operators are split into a separate a_expr_prec rule resolved by static precedence (no GLR), while complex patterns (IS, IN, BETWEEN, LIKE, subquery operators) stay in a_expr with GLR conflict resolution. Both prec.left/prec.right (generation-time) and prec.dynamic (runtime) are emitted.

PL/pgSQL

PL/pgSQL is implemented as a separate hand-written grammar in plpgsql/ with an external scanner for dollar-quoting and context-sensitive keywords. SQL expressions and statements within PL/pgSQL blocks are delegated to the postgres grammar via tree-sitter language injection (plpgsql/queries/injections.scm).

License

BSD 3-Clause

About

Tree Sitter grammar for Postgres

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors