Skip to content

Latest commit

 

History

History
201 lines (170 loc) · 6.42 KB

Readme.md

File metadata and controls

201 lines (170 loc) · 6.42 KB

Generting LR(1) Parsers

Gogll uses the same input grammar for GLL and LR(1). GLL is the default output - to generate an LR(1) parser the option -pager or -knuth must be used. The Knuth option is provided only for reference. In general the Pager machine should be used as it produces the smallest LR(1) tables.

Note: LR(1) grammars should be left recursive.

Usage

The following gogll usage options are relevant to generating LR(1) parsers. Get the full help by running gogll -h.

use: gogll [-a][-v] [-o <out dir>] [-go] [-rust] [-pager] [-knuth] [-resolve_conflicts] <source file>
    to generate a lexer and parser.

    <source file>: Mandatory. Name of the source file to be processed. 
        If the file extension is ".md" the bnf is extracted from markdown code 
        segments enclosed in triple backticks.
    
    -a: Optional. Regenerate all files.
        WARNING: This may destroy user editing in the LR(1) AST.
        Default: false
         
    -v: Optional. Produce verbose output, including first and follow sets,
        LR(1) sets and lexer FSA sets.
    
    -o <out dir>: Optional. The directory to which code will be generated.
                  Default: the same directory as <source file>.
                  
    -go: Optional. Generate Go code.
          Default: true, but false if -rust is selected

    -rust: Optional. Generate Rust code.
           Default: false
           
    -knuth: Optional. Generate a Knuth LR(1) parser
            Default false

    -pager: Optional. Generate a Pager PGM LR(1) parser.
            Default false

    -resolve_conflicts: Optional. Automatically resolve LR(1) conflicts.
            Default: false. Only used when generating LR(1) parsers.

Generated Modules

Gogll generates the following modules when LR(1) is used:

  1. ast: A module containing the action routines called by the LR(1) parser when reducing a production. Gogll generates the skeleton of the action routines, which must be implemented by the user. See Implementing the AST below.
  2. lexer: A linear time FSA lexer. This should not be edited by the user.
  3. parser: The files containing a table-driven LR(1) parser. These should not be edited by the user.
  4. token: The token definitions used by the lexer and parser. This should not be edited by the user.

Implementing the AST

This section uses examples/lr1/g1/g1.md as example. It has the following grammar:

package "g1"

E1 : E1 "+" T1 | T1 ;

T1 : "a" ;

The following extract from the makefile shows how to generate Go and Rust lexers and parsers from g1.md:

go: $(GODIR)/go.mod
	gogll -pager -o $(GODIR) g1.md

The command above generates a lexer and LR(1) parser (Pager) with matching lexer in Go.

rust: $(RUSTDIR)/Cargo.toml
	gogll -rust -pager -o $(RUSTDIR) g1.md

The command above generates a lexer and LR(1) parser (Pager) with matching lexer in rust.

Every action routine returns a tuple: (interface{}, error). A non-zero error is returned if the parameters of the action routine contain a semantic error. Otherwise the routine constructs an AST node, which it returns with a null error.

The parameters of an action routing match the symbols of its production. A non-terminal symbol will have been returned by another action routine.

A terminal symbol with be a token from the lexer.

If an action routine returns a non-null error the parse will be terminated with the error.

Skeleton of Go AST generated by gogll

Goggl generates the following AST skeleton to be completed by the user. It contains an action routing called by the parser for every production alternate of the grammar.

// Generated by GoGLL.
package ast

import(
    "fmt"
)

// G0 : E1 ;
func G00(p0 interface{})(interface{}, error){
    fmt.Println("ast.G00 is unimplemented")
    return nil, nil
}

func G00 is called when the parser accepts the start production of the grammar, E1. p0 will have the type returned by the action routine: func E10. G0 is an extra start production generated by gogll only for LR(1) parsers.

The value returned by func G00 will be returned by the parser when the parse is complete.

// E1 : E1 + T1 ;
func E10(p0, p1, p2 interface{})(interface{}, error){
    fmt.Println("ast.E10 is unimplemented")
    return nil, nil
}

func E10 is the action routine called by the parser when the production E1 : E1 + T1 has been parsed. It has three parameters:
p0 will have the type returned by func E10 or func E11.
p1 will have type *token.Token.
p2 will have the type returned by func T1.

// E1 : T1 ;
func E11(p0 interface{})(interface{}, error){
    fmt.Println("ast.E11 is unimplemented")
    return nil, nil
}

func E11 is called by the parser when production E1 : T1 has been parsed.
p0 will have the type returned by func T10.

// T1 : a ;
func T10(p0 interface{})(interface{}, error){
    fmt.Println("ast.T10 is unimplemented")
    return nil, nil
}

func T10 is called when the production T1 : a.
p0 will have the type *token.Token

Go AST completed by user

The following is the completed AST:

// Generated by GoGLL. Edited by user
package ast

import (
	"g1/token"
)

// G0 : E1 ;
func G00(p0 interface{}) ([]string, error) {
	return p0.([]string), nil
}

func G00 adds nothing the value returned by the alternates of E1, which is []string.

// E1 : E1 + T1 ;
func E10(p0, p1, p2 interface{}) ([]string, error) {
	e := append(
		p0.([]string),
		p1.(*token.Token).LiteralString(),
		p2.(string))
	return e, nil
}

func E10 appends the literal string of p1 (*token.Token) and the string value of p2 (returned by func T10) to the string value of p0 (returned by func E10 or func E11).

// E1 : T1 ;
func E11(p0 interface{}) ([]string, error) {
	return []string{p0.(string)}, nil
}

func E11 constructs an []string from the value of p0

// T1 : a ;
func T10(p0 interface{}) (string, error) {
	return p0.(*token.Token).LiteralString(), nil
}

func T10 returns the literal string value of p0 (*token.Token).

Rust AST

The generated Rust AST has the same structure as the Go. For g1.md, please see: