MetaphysicsIndustries.Giza

A parser

Intro

Giza is a parser, but not in the same vein as what you're used to. For one, it's not a parser-generator. Rather than turning a grammar into a bunch of source code, it instead converts the grammar into a ready-to-use form, and can then start parsing immediately. No compiling necessary!

One of the cool benefits of this is that you can try out and edit your grammar in real time. Take a look at the included giza program. It's actually a REPL for editing grammars and parsing text. How cool is that!?

Giza would probably be classified as a kind of GLR parser. It goes through all possible paths without the need for backtracking. It can also handle ambiguities, by producing multiple parse trees, if needed. The format of its grammars is not quite BNF; instead the syntax is more inspired by programming languages and regexes. The syntax of grammars is described entirely by the "Supergrammar", which is written in it's own syntax. Take a look at Supergrammar.txt.

Giza maintains a distinction between "tokenized" and "non-tokenized" processing. The former is what parsers usually do, constructing a parse tree from a stream of input tokens. The latter is more akin to what regexes do, matching against a stream of input characters instead. Herein, non-tokenized processing often goes by the name "spanning" and is done by a "spanner".

Example

Here's how you would work with a grammar in the REPL:

>>> expr = ( mult-expr | add-expr | sub-expr );
>>> mult-expr = sub-expr ( [*/%] sub-expr )+;
>>> add-expr = ( mult-expr | sub-expr ) ( [-+] ( mult-expr | sub-expr ) )+;
>>> sub-expr = ( number | var | paren );
>>> <token> number = [\d]+;
>>> <token> var = [\l]+;

Then you can check the definitions for errors like this:

>>> check --tokenized
There are errors in the grammar:
  Definition 'sub-expr' references a definition 'paren' which is not defined.

Oops, we forgot to define paren:

>>> paren = '(' expr ')';
>>> check --tokenized
There are no errors or warnings.

Once that's settled, we can parse some text:

>>> parse expr '1 + 2 * 3 - four / 5 % 6 + seven'
There is 1 valid parse of the input.

To get more info, use the --verbose flag to print out the parse tree:

>>> parse expr --verbose '1 + 2 * 3 - four / 5 % 6 + seven'
There is 1 valid parse of the input.
          expr
            add-expr
              sub-expr
1               number
+             $implicit char class +-
              mult-expr
                sub-expr
2                 number
*               $implicit char class %*/
                sub-expr
3                 number
-             $implicit char class +-
              mult-expr
                sub-expr
four              var
/               $implicit char class %*/
                sub-expr
5                 number
%               $implicit char class %*/
                sub-expr
6                 number
+             $implicit char class +-
              sub-expr
seven           var

In the left column, you see the sequence of tokens in the input. On the right is the parse tree with indentation to show hierarchy, with each matching node on the same line of text as the token that matched it.

But that's not all. There's tons of stuff the REPL can do. It has a built-in help system to explain everything:

>>> help
Usage:
    >>> [options]
    >>> help [command_or_topic]
    >>> command [args...]

Commands:

    help       Display general help, or help on a specific topic.
    list       List all of the definitions currently defined.
    print      Print definitions as text in giza grammar format
    delete     Delete the specified definitions.
    save       Save definitions to a file as text in giza grammar format
    load       Load definitions from a file
    check      Check definitions for errors
    parse      Parse one or more inputs with a tokenized grammar, starting with a given definition, and print how many valid parse trees are found
    span       Span one or more inputs with a non-tokenized grammar, starting with a given definition, and print how many valid span trees are found
    render     Convert definitions to state machine format and render the state machines to a C# class.

Example in Code

Once you've worked out your language's grammar, you typically want to use it in some other application. To do that, follow these steps:

Save all relevant definitions to a file. By convention, the file extension is .giza, but you can use anything. See an example here.
Use the render command. This takes your grammar, converts all of the definitions into state machine format, and then emits the C# code for a class that creates the same state machine representation¹. See example here.
Create a class that takes your *Grammar class and plugs it into a Parser object. Whenever your want to parse some input text, pass it to the Parser.Parse method to get a list of parse trees. See example here.
[Optional] If there's more than one parse tree, then there's some ambiguity in your grammar with that particular input. You can do some kind of semantic analysis to choose which of the parse trees is the 'right' one. Or you can just pick the first one, if you're lazy.
Convert the generalized parse tree(s) into whatever domain objects you need. See example here.

¹ This kinda violates the "not turning a grammar into a bunch of source code" assertion above, but not entirely. The code so generated is not capable of parsing anything. It's basically a data structure serialized as C#. We could just as well store the raw text of the grammar file and run it through SupergrammarSpanner to generate the state machine representation, which is what the render command does for you. Whatever. We're working on making it prettier.

State of the project

Unfortunately, you caught us right in the middle of a major architecture overhaul. Much of the internals of how the system work are being completely re-worked. In particular, we're trying to treat tokenized and non-tokenized processing as special cases of a generalized pattern-matching system. Hence, the hideous Spanner1/Spanner2 dichotomy. It's a work in progress. Code and API may change at any time, although the command-line tool will be pretty stable. There's a lot of cosmetic stuff to be fixed up as well, such as the fact that Supergrammar.txt should really be named Supergrammar.giza. Stay tuned!

Name		Name	Last commit message	Last commit date
Latest commit History 857 Commits
MetaphysicsIndustries.Giza.CSharp		MetaphysicsIndustries.Giza.CSharp
MetaphysicsIndustries.Giza.Test		MetaphysicsIndustries.Giza.Test
docs		docs
giza		giza
.gitignore		.gitignore
.travis.yml		.travis.yml
AssemblyInfo.cs		AssemblyInfo.cs
BinarySpanner example grammar.txt		BinarySpanner example grammar.txt
BinarySpanner.cs		BinarySpanner.cs
BinarySpannerGrammar.txt		BinarySpannerGrammar.txt
BranchTipsByIndexCollection.cs		BranchTipsByIndexCollection.cs
CharClass.cs		CharClass.cs
CharClassSubExpression.cs		CharClassSubExpression.cs
CharNode.cs		CharNode.cs
CharacterSource.cs		CharacterSource.cs
Collection.cs		Collection.cs
CppCliGrammar.txt		CppCliGrammar.txt
DefRefNode.cs		DefRefNode.cs
DefRefSubExpression.cs		DefRefSubExpression.cs
Definition.cs		Definition.cs
DefinitionChecker.cs		DefinitionChecker.cs
DefinitionDirective.cs		DefinitionDirective.cs
DefinitionError.cs		DefinitionError.cs
DefinitionNodeOrderedParentChildrenCollection.cs.cs		DefinitionNodeOrderedParentChildrenCollection.cs.cs
DefinitionRenderer.cs		DefinitionRenderer.cs
EndCandidatesByIndexCollection.cs		EndCandidatesByIndexCollection.cs
Error.cs		Error.cs
ErrorType.cs		ErrorType.cs
ExeBinaryGrammar.txt		ExeBinaryGrammar.txt
Expression.cs		Expression.cs
ExpressionChecker.cs		ExpressionChecker.cs
ExpressionError.cs		ExpressionError.cs
ExpressionItem.cs		ExpressionItem.cs
FileSource.cs		FileSource.cs
Giza.cd		Giza.cd
Grammar.cs		Grammar.cs
GrammarComparer.cs		GrammarComparer.cs
GrammarCompiler.cs		GrammarCompiler.cs
GrammarDefinitionOrderedParentChildrenCollection.cs		GrammarDefinitionOrderedParentChildrenCollection.cs
IFileSource.cs		IFileSource.cs
IGrammarTransform.cs		IGrammarTransform.cs
IInputElement.cs		IInputElement.cs
IInputSource.cs		IInputSource.cs
ImportCache.cs		ImportCache.cs
ImportError.cs		ImportError.cs
ImportRef.cs		ImportRef.cs
ImportStatement.cs		ImportStatement.cs
ImportTransform.cs		ImportTransform.cs
InputChar.cs		InputChar.cs
InputElementSet.cs		InputElementSet.cs
InputPosition.cs		InputPosition.cs
LICENSE		LICENSE
ListTokenSource.cs		ListTokenSource.cs
LiteralSubExpression.cs		LiteralSubExpression.cs
Logger.cs		Logger.cs
MetaphysicsIndustries.Giza.csproj		MetaphysicsIndustries.Giza.csproj
NDefinition.cs		NDefinition.cs
NGrammar.cs		NGrammar.cs
Node.cs		Node.cs
NodeMatch.cs		NodeMatch.cs
OrExpression.cs		OrExpression.cs
Parser.cs		Parser.cs
ParserError.cs		ParserError.cs
PriorityQueue.cs		PriorityQueue.cs
README.md		README.md
Span.cs		Span.cs
SpanChecker.cs		SpanChecker.cs
SpanError.cs		SpanError.cs
Spanner.cs		Spanner.cs
Spanner2.cs		Spanner2.cs
StringFormatter.cs		StringFormatter.cs
StringFormatterGrammar.cs		StringFormatterGrammar.cs
StringFormatterGrammar.giza		StringFormatterGrammar.giza
SubExpression.cs		SubExpression.cs
Supergrammar.cs		Supergrammar.cs
Supergrammar.emerald		Supergrammar.emerald
Supergrammar.png		Supergrammar.png
Supergrammar.txt		Supergrammar.txt
Supergrammar2.txt		Supergrammar2.txt
SupergrammarSpanner.cs		SupergrammarSpanner.cs
SupergrammarSpannerError.cs		SupergrammarSpannerError.cs
Token.cs		Token.cs
TokenizeTransform.cs		TokenizeTransform.cs
Tokenizer.cs		Tokenizer.cs
TransitionType.cs		TransitionType.cs
build.sh		build.sh
check_code_style.sh		check_code_style.sh
giza.sln		giza.sln
pre-build.py		pre-build.py
release.sh		release.sh
render_grammars.sh		render_grammars.sh
simple_arithmetic_grammar.emerald		simple_arithmetic_grammar.emerald
test.txt		test.txt
todo.txt		todo.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MetaphysicsIndustries.Giza

Intro

Example

Example in Code

State of the project

About

Releases 3

Packages

Languages

License

metaindu/MetaphysicsIndustries.Giza

Folders and files

Latest commit

History

Repository files navigation

MetaphysicsIndustries.Giza

Intro

Example

Example in Code

State of the project

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages