Skip to content
A universal LL top-down parser written in Go
Go
Branch: master
Clone or download
romshark Update example README: dicklang
Add quick playground demo
Latest commit fc3c2db Aug 12, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github/ISSUE_TEMPLATE Update issue template: feature-request Aug 12, 2019
examples/dicklang Update example README: dicklang Aug 12, 2019
misc
vendor Add vendored dependencies Aug 4, 2019
.gitignore Initial commit Aug 4, 2019
.travis.yml
CONTRIBUTING.md Add a contribution guidelines README Aug 12, 2019
LICENSE Initial commit Aug 4, 2019
README.md Add documentation badge to README Aug 12, 2019
construct.go
cursor.go Remove irrelevant code Aug 4, 2019
error.go Fix bug Aug 11, 2019
fragment.go Fix code formatting Aug 4, 2019
go.mod Implement optionals Aug 6, 2019
go.sum Implement optionals Aug 6, 2019
parser.go Fix bug Aug 11, 2019
parser_test.go Fix test Aug 11, 2019
pattern.go Make lexer support exact read instructions Aug 10, 2019
printFragment.go
rule.go Implement action callbacks Aug 9, 2019
scanner.go Make lexer support exact read instructions Aug 10, 2019
scanner_test.go Make lexer support exact read instructions Aug 10, 2019
token.go

README.md

Travis CI: build status Coverage Status GoReportCard GoDoc

A Universal Dynamic LL(*) Parser
written in and for the Go programming language.

romshark/llparser is a dynamic recursive-descent top-down parser which parses any given input stream by trying to recursively match the root rule. It's universal in that it supports any LL(*) grammar.

This library allows building parsers in Go with relatively good error messages and flexible, even dynamic LL(*) grammars which may mutate at runtime. It parses the input stream into a typed parse-tree and allows action hooks to be executed when a particular rule is matched.

Getting Started

Rules

A grammar always begins with a root rule. A rule is a non-terminal symbol. Non-terminals are nodes of the parse-tree that consist of other non-terminals or terminals while terminals are leaf-nodes. A rule consists of a Designation, a Pattern, a Kind and an Action:

mainRule := &llparser.Rule{
	Designation: "name of the rule",
	Kind: 100,
	Pattern: llparser.TermExact{
		Kind:        101,
		Expectation: "string",
	},
	Action: func(f llparser..Fragment) error {
		log.Print("the rule was successfuly matched!")
		return nil
	},
}
  • Designation defines the optional logical name of the rule and is used for debugging and error reporting purposes.
  • Kind defines the type identifier of the rule. If this field isn't set then zero (untyped) is used by default.
  • Pattern defines the expected pattern of the rule. This field is required.
  • Action defines the optional callback which is executed when this rule is matched. The action callback may return an error which will make the parser stop and fail immediately.

Rules can be nested:

ruleTwo := &llparser.Rule{
	Pattern: llparser.TermExact{
		Kind:        101,
		Expectation: "string",
	},
}

ruleOne := &llparser.Rule{
	Pattern: ruleTwo,
}

Rules can also recurse:

rule := &llparser.Rule{Kind: 1}
rule.Pattern = llparser.Sequence{
	llparser.TermExact{Expectation: "="},
	llparser.Optional{Pattern: rule}, // potential recursion
}

Terminals

Pattern: Term

Term expects a particular fragment kind to be lexed:

Pattern: llparser.Term(SomeKindConstant),

Pattern: TermExact

TermExact expects a particular sequence of characters to be lexed:

Pattern: llparser.TermExact{
	Kind:        SomeKindConstant,
	Expectation: "some string",
},

Pattern: Checked

Checked expects the lexed fragment to pass an arbitrary user-defined validation function:

Pattern: llparser.Checked{
	Designation: "some checked terminal",
	Fn: func(str string) bool { return len(str) > 5 },
},

Combinators

Pattern: Optional

Optional tries to match a specific pattern but doesn't expect it to be matched:

Pattern: llparser.Optional{
	Pattern: somePattern,
},

Pattern: Sequence

Sequence expects a specific sequence of patterns:

Pattern: llparser.Sequence{
	somePattern,
	llparser.Term(SomeKindConstant),
	llparser.Optional{
		Pattern: llparser.TermExact{
			Kind:        SomeKindConstant,
			Expectation: "foo",
		},
	},
},

Pattern: ZeroOrMore

ZeroOrMore tries to match one or many instances of a specific pattern but doesn't expect it to be matched:

Pattern: llparser.ZeroOrMore{
	Pattern: somePattern,
},

Pattern: OneOrMore

OneOrMore expects at least one or many instances of a specific pattern:

Pattern: llparser.OneOrMore{
	Pattern: somePattern,
},

Pattern: Either

Either expects either of the given patterns selecting the first match:

Pattern: llparser.Either{
	somePattern,
	anotherPattern,
},

Lexer

The lexer is an abstract part of the parser which tokenizes the input stream:

type Lexer interface {
	Read() (*Token, error)
	ReadExact(
		expectation string,
		kind FragmentKind,
	) (
		token *Token,
		matched bool,
		err error,
	)
	Position() Cursor
	Set(Cursor)
}

A default lexer implementation is available out-of-the-box but sometimes implementing your own parser makes more sense. Some examples for when a custom lexer might be useful:

  • Sometimes you don't care how many white-spaces, tabs and line-breaks there are between the patterns you care about and thus it doesn't make any sense to make each individual space character a terminal leaf-node, instead the lexer would read a sequence of whitespaces, tabs and line-breaks as a single typed terminal node (in fact, this is the behavior of the default lexer implementation linked aboved, it will treat these kinds of sequences as misc.FrSpace) reducing the complexity of the resulting parse-tree.
  • If you want to disallow certain kinds of runes in the source code you can make the custom lexer implementation return an ErrUnexpectedToken error when approaching one.

The Parse-Tree

A parse-tree defines the serialized representation of the parsed input stream and consists of Fragment interfaces represented by the main fragment returned by llparser.Parse. A fragment is a typed chunk of the source code pointing to a start and end position in the source file, defining the kind of the chunk and referring to its child-fragments.

You can’t perform that action at this time.