Skip to content
Generates scannerless generalised LL (GLL) parsers
Go Makefile
Branch: master
Clone or download

Readme.md

Copyright 2019 Marius Ackerman. See Apache license.

News

2019-09-13

  1. Documentation added on how to walk the parse forest. See

Gogll

Gogll generates scannerless, clustered nonterminal parsers (CNP) following Scott et al 2019. CNP is a version of generalised LL parsing (GLL)Scott & Johnstone 2016. GLL parsers can parse all context free (CF) languages.

Walking the parse forest

Gogll produces a binary subtree representation (BSR) set of the parse forest of a successful parse. See Walking the BSR Parse Forest

Benefits and disadvantages

The following table compares GLL parsers with LL-k/LR-k parsers and PEGs

GLL LL-k/LR-k PEG
General CF grammars Yes No No
Composable CF grammars Yes No No
Handle ambiguity Yes No No
Indirect left recursion No problem Bad Bad
Speed (time to compile gogll.md) 0.244 s 0.040 s -
  • General CF grammars allow the parser developer to write grammars that match the language most naturally.
  • Composability allows pre-existing grammar modules to be imported.
  • GLL produces a forest of all valid parses of a string. This provides a more systematic basis for disambiguation than k>1 lookahead and solves the problem of PEGs that hide ambiguity by selecting the first valid parse.
  • Operator precedence can be implemented very easily by disambiguating the parse forest [Afroozeh et al 2013, Basten & Vinju 2012].

But

  • Most non-trivial context free grammars will generate ambiguous parsers, requiring explicit disambiguation.
  • A scannerless GLL parser is slower than the equivalent LR1 parser. See the performance figures for compiling gogll.md in the table above. I expect to improve the performance of gogll but GLL parsers are likely to remain significantly slower than equivalent LL-1/LR-1 parsers.
  • GLL parsers are worst-case cubic in time and space complexity. The LL-1 parts of the grammar have linear complexity.

Input Symbols, Markdown Files

Gogll accepts UTF-8 input strings. A gogll parser has two parse functions:

  • Parse(I []byte) []*ParseError
  • func ParseFile(fname string) []*ParseError
    If fname ends with .md the parser ignores all text outside the markdown code blocks delimited by triple backticks. See gogll.md for an example.

Gogll Grammar

Gogll v1 has a BNF grammar. See gogll.md

Status

  • gogll v0 was a bootstrap compiler implemented by a gocc lexer and parser.
  • gogll v0 was used to compile gogll v1.
  • gogll v0 is currently used to compile a proprietary a query language.
  • gogll v1 compiles itself
  • The query language mentioned above is being migrated to gogll v1.
  • gogll v1 is currently being used to implement a proprietary GUI definition language.

gogll v1 is actively being developed.

Features considered for future implementation

  1. EBNF grammar support Scott&Johnstone 2018.
  2. Error reporting for normal people.
  3. Better documentation, including how to traverse the binary subtree reprentation (BSR) of the parse forest.

Documentation

At the moment this document and the gogll grammar are the only documentation. Have a look at gogll/examples/ambiguous for a simple example and also for simple disambiguation.

Alternatively look at gogll.md which is the input grammar and also the grammar from which the parser for this version of gogll was generated. gogll/da disambiguates the parse forest for an input string.

Changelog

see

Bibliography

  • Elizabeth Scott, Adrian Johnstone and L. Thomas van Binsbergen.
    Derivation representation using binary subtree sets.
    In: Science of Computer Programming (175) 2019

You can’t perform that action at this time.