Skip to content
PostgreSQL Languages AST and statements prettifier
Python TSQL Makefile PLpgSQL
Branch: master
Clone or download
Latest commit 1f8f706 Dec 20, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs Fix prettification of CASE statements, adding a space after THEN and … Dec 7, 2019
libpg_query @ d710cb0
pglast Release 1.9 Dec 20, 2019
tests Emit INSERT's target columns using print_list() Dec 18, 2019
tools
.gitignore Ignore pytest's cache directory Jun 16, 2018
.gitmodules Add libpg_query as a submodule Aug 3, 2017
.travis.yml Use cibuildwheel 1.0.0, producing Python 3.8 wheels Dec 1, 2019
CHANGES.rst Release 1.9 Dec 20, 2019
MANIFEST.in Explicitly include new *.sql test cases Mar 17, 2019
Makefile Extract enums and defines from pg_trigger.h and lockdefs.h Mar 15, 2019
Makefile.release Use a single git command to push commits and tags Sep 4, 2019
Makefile.virtualenv Rename the package from "pg_query" to "pglast" Jun 16, 2018
README.rst Fix prettification of CASE statements, adding a space after THEN and … Dec 7, 2019
requirements-test.txt Upgrade pytest and coverage Sep 4, 2019
requirements.txt Use Cython 0.29.13 Aug 9, 2019
setup.cfg Rename the package from "pg_query" to "pglast" Jun 16, 2018
setup.py Use cibuildwheel 1.0.0, producing Python 3.8 wheels Dec 1, 2019
version.txt Release 1.9 Dec 20, 2019

README.rst

pglast

PostgreSQL Languages AST and statements prettifier

Author: Lele Gaifax
Contact: lele@metapensiero.it
License:GNU General Public License version 3 or later
Status: Build status Documentation status

This is a Python 3 module that exposes the parse tree of a PostgreSQL statement (extracted by the almost standard PG parser repackaged as a standalone static library by libpg_query) as set of interconnected nodes, usually called an abstract syntax tree.

I needed a better SQL reformatter than the one implemented by sqlparse, and was annoyed by a few glitches (subselects in particular) that ruins the otherwise excellent job it does, considering that it is a generic library that tries to swallow many different SQL dialects.

When I found psqlparse I decided to try implementing a PostgreSQL focused tool: at the beginning it's been easier than I feared, but I quickly hit some shortcomings in that implementation, so I opted for writing my own solution restarting from scratch, with the following goals:

  • target only Python 3.4+
  • target PostgreSQL 10
  • use a more dynamic approach to represent the parse tree, with a twofold advantage:
    1. it is much less boring to code, because there's no need to write one Python class for each PostgreSQL node tag
    2. the representation is version agnostic, it can be adapted to newer/older Elephants in a snap
  • allow exploration of parse tree in both directions, because I realized that some kinds of nodes require that knowledge to determine their textual representation
  • avoid introducing arbitrary renames of tags and attributes, so what you read in PostgreSQL documentation/sources[*] is available without the hassle of guessing how a symbol has been mapped
  • use a zero copy approach, keeping the original parse tree returned from the underlying libpg_query functions and have each node just borrow a reference to its own subtree
[*]

Currently what you can find in the following headers:

Introduction

At the lower level the module exposes two libpg_query functions, parse_sql() and parse_plpgsql(), that take respectively an SQL statement and a PLpgSQL statement and return a parse tree as a hierarchy of Python dictionaries, lists and scalar values. In some cases these scalars correspond to some C typedef enums, that are automatically extracted from the PostgreSQL headers mentioned above and are available as pglast.enums.

At a higher level that tree is represented by three Python classes, a Node that represents a single node, a List that wraps a sequence of nodes and a Scalar for plain values such a strings, integers, booleans or none.

Every node is identified by a tag, a string label that characterizes its content that is exposed as a set of attributes as well as with a dictionary-like interface (technically they implements both a __getattr__ method and a __getitem__ method). When asked for an attribute, the node returns an instance of the base classes, i.e. another Node, or a List or a Scalar, depending on the data type of that item. When the node does not contain the requested attribute it returns a singleton Missing marker instance.

A List wraps a plain Python list and may contains a sequence of Node instances, or in some cases other sub-lists, that can be accessed with the usual syntax, or iterated.

Finally, a Scalar carries a single value of some type, accessible through its value attribute.

On top of that, the module implements two serializations, one that transforms a Node into a raw textual representation and another that returns a prettified representation. The latter is exposed by the pgpp CLI tool, see below for an example.

Installation

As usual, the easiest way is with pip:

$ pip install pglast

Alternatively you can clone the repository:

$ git clone https://github.com/lelit/pglast.git --recursive

and install from there:

$ pip install ./pglast

Development

There is a set of makefiles implementing the most common operations, a make help will show a brief table of contents. A comprehensive test suite, based on pytest, covers 98% of the source lines.

Examples of usage

  • Parse an SQL statement and get its AST root node:

    >>> from pglast import Node, parse_sql
    >>> root = Node(parse_sql('SELECT foo FROM bar'))
    >>> print(root)
    None=[1*{RawStmt}]
    
  • Recursively traverse the parse tree:

    >>> for node in root.traverse():
    ...   print(node)
    ...
    None[0]={RawStmt}
    stmt={SelectStmt}
    fromClause[0]={RangeVar}
    inh=<True>
    location=<16>
    relname=<'bar'>
    relpersistence=<'p'>
    op=<0>
    targetList[0]={ResTarget}
    location=<7>
    val={ColumnRef}
    fields[0]={String}
    str=<'foo'>
    location=<7>
    

    As you can see, the representation of each value is mnemonic: {some_tag} means a Node with tag some_tag, [X*{some_tag}] is a List containing X nodes of that particular kind[†] and <value> is a Scalar.

  • Get a particular node:

    >>> from_clause = root[0].stmt.fromClause
    >>> print(from_clause)
    fromClause=[1*{RangeVar}]
    
  • Obtain some information about a node:

    >>> range_var = from_clause[0]
    >>> print(range_var.node_tag)
    RangeVar
    >>> print(range_var.attribute_names)
    dict_keys(['relname', 'inh', 'relpersistence', 'location'])
    >>> print(range_var.parent_node)
    stmt={SelectStmt}
    
  • Iterate over nodes:

    >>> for a in from_clause:
    ...     print(a)
    ...     for b in a:
    ...         print(b)
    ...
    fromClause[0]={RangeVar}
    inh=<True>
    location=<16>
    relname=<'bar'>
    relpersistence=<'p'>
    
  • Reformat a SQL statement[‡] from the command line:

    $ echo "select a,b,c from sometable" | pgpp
    SELECT a
         , b
         , c
    FROM sometable
    
    $ echo "select a,b,c from sometable" | pgpp -c
    SELECT a,
           b,
           c
    FROM sometable
    
    $ echo "select a, case when a=1 then 'singular' else 'plural' end from test" > /tmp/q.sql
    $ pgpp /tmp/q.sql
    SELECT a
         , CASE
             WHEN (a = 1)
               THEN 'singular'
             ELSE 'plural'
           END
    FROM test
    
    $ echo 'update "table" set value=123 where value is null' | pgpp
    UPDATE "table"
    SET value = 123
    WHERE value IS NULL
    
    $ echo "
    insert into t (id, description)
    values (1, 'this is short enough'),
           (2, 'this is too long, and will be splitted')" | pgpp -s 20
    INSERT INTO t (id, description)
    VALUES (1, 'this is short enough')
         , (2, 'this is too long, an'
               'd will be splitted')
    
  • Programmatically reformat a SQL statement:

    >>> from pglast import prettify
    >>> print(prettify('delete from sometable where value is null'))
    DELETE FROM sometable
    WHERE value IS NULL
    

Documentation

Latest documentation is hosted by Read the Docs at http://pglast.readthedocs.io/en/latest/

[†]This is an approximation, because in principle a list could contain different kinds of nodes, or even sub-lists in some cases: the List representation arbitrarily shows the tag of the first object.
[‡]Currently this covers most DML statements such as SELECTs, INSERTs, DELETEs and UPDATEs, fulfilling my needs, but I'd like to extend it to handle also DDL statements and, why not, PLpgSQL instructions too.
You can’t perform that action at this time.