Skip to content

i-e-b/Gool

Repository files navigation

Gool is a lexer/parser for C#

A parser-combinator library for C#, with a fluent BNF-like interface for building parsers.

See Sample Parsers for fully functional examples.

Basic example

// Basic infix arithmetic expressions
BNF 
    number     = BNF.Regex("\-?[0-9]+(\.[0-9]+)?"),  // Signed numbers

    factor     = number | ('(' > _expression > ')'), // Number or parenthesised expression
    power      = factor > !('^' > factor),           // Factor, with optional '^' + exponent
    term       = power % ('*' | '/'),                // Powers, optionally joined with '*' or '/'
    expression = term % ('+' | '-');                 // Terms, optionally joined will '+' or '-'
var result = expression.ParseString(     // Run the parser
    "(6.5 + 3) * (5.5 - -2)"
    );

var tree = TreeNode.FromParserMatch(result, true);        // Interpret the raw parse tree
var final = TreeNode.TransformTree(tree, ApplyOperation); // Apply functions to reduce tree to value

Console.WriteLine(final); // 71.25

Some details removed for clarity -- see bottom of document for full implementation)

BNF Syntax

Terminal parsers:

  • '…'Character parser that matches a single literal character in the input
  • "…"String parser that matches a literal string in the input
  • "#…" or BNF.Regex("…")Regex parser that matches a string based on a regex pattern. The # prefix is not included in the pattern.
  • BNF.OneOf(…) → Match a single character from the set provided
  • BNF.NoneOf(…) → Match any single character that is not in the set provided
  • BNF.AnyChar → Parser that matches any single character.
  • BNF.Empty → Parser that matches an empty string (useful in unions)
  • BNF.EndOfInput → Parser that matches the end of input (parsers will normally accept partial matches)
  • BNF.LineEnd → Parser that matches a line end (either \r, or \n, or \r\n)
  • BNF.WhiteSpace → Parser that matches a single character of white-space

Combining parsers:

  • a | b → Create a union parser that matches the longest result from either a or b. Parser will match if only one of a and b match, and if both a and b match.
    • Example: "hello" | "world" matches hello or world
    • Example: "on" | "one" matches on and one. +( "on" | "one" ) will match oneone as {one, one}
  • a > b → Create a sequence parser that matches a then b
    • Example: 'x' > 'y' matches xy but not x or y
  • a < b → Create a terminated list parser that matches a list of a, each being terminated by b. The last item a must be terminated.
    • Example: 'x' < ';' matches x;x;x; and x;, but not x or x;x
  • a % b → Create a delimited list parser that matches a list of a, delimited by b. A trailing delimiter is not matched.
    • Example: 'x'%',' matches x and x,x, but not x,x,
  • -a → Create an optional repeat parser that matches zero or more a
    • Example: -"xy" matches xyxy, xy, and empty
  • +a → Create a repeat parser that matches one or more a
    • Example: +"xy" matches xy and xyxy, but not empty
  • !a → Create an option parser that matches zero or one a
    • Example: !"xy" matches xy and empty, but not xyxy
    • Can also be expressed BNF.Optional(a)
  • a & b → Create an intersection parser that matches (a then b) or (b then a)
    • Example: 'x'&'y' matches xy and yx, but not xx or yy
  • a ^ b → Create an exclusion parser that matches a or b but not both
    • Example: 'x'^'y' matches x and y, but not xy or yx
  • a / b → Create a difference parser that matches a but not b
    • Example: "on" / "one" matches on but not one. +( 'x' / '.' ) will match xx. as {x, x}

Parsers generated by BNF can be used repeatedly.

Scanners

Parsers operate over a 'scanner', which is an input string plus transforms and contextual data. For common cases, you won't need to create one directly -- just use BNF.ParseString or BnfPackage.ParseString.

Scanners handle case-conversion and white-space skipping if you use those options.

Because scanners hold context for a parse, they cannot be reused, or shared between parse attempts.

Tags, scopes, and trees

The basic output from a parser is a ParserMatch, which gives a complete tree of all matches, including those from combined parsers. ParserMatch also gives access to the parser that made the match, and the scanner that was used for input.

The ParserMatch tree contains all the information from a result, but often this is too much.

Any parser can be tagged with a string value, and this can be used to extract salient information from the tree.

TaggedTokensDepthFirst / TaggedTokensBreadthFirst

You can request a sequence of ParserMatches from the result tree, only returning tagged results. Tags are not inherited by parent matches.

ScopeNode.FromMatch

The scoped node tree ignores the ParserMatch hierarchy, and uses .OpenScope() and .CloseScope() to build a different structure. This is useful for structured data (like JSON and XML) and otherwise scoped data that use open and close markers -- like ({,}) or (begin,end) in many programming languages.

TreeNode.FromParserMatch

General tree nodes match the ParserMatch hierarchy, but only including nodes with a tag or scope set.

The Pivot scope has a specific effect on general trees, 'lifting' them to a level above non-pivot peers. This is useful for chains of operators:

Given the parser sum from:

BNF number = BNF.Regex("[0-9]+").Tag("num");
BNF addSub = BNF.OneOf('+', '-').Tag("op");
BNF sum = number % addSub;

and the input:

var result = sum.ParseString(
                             "1+2",
                             );
var tree = TreeNode.FromParserMatch(result, false);

outputs tree as:

┌───── 1   
│      +   
└──  2     

but changing addSub to BNF.OneOf('+', '-').Tag("op").PivotScope(); results in

  ┌──1  
 +│     
  └──2  

Detailed examples

See Sample Parsers for fully functional examples.

Basic infix arithmetic calculator

public double EvaluateExpression(string expression)
{
    var result = Arithmetic().ParseString(expression);          // Step 1: parse input
    var tree = TreeNode.FromParserMatch(result, prune: true);   // Step 2: build expression tree
    var final = TreeNode.TransformTree(tree, ApplyOperation);   // Step 3: reduce the tree to a value
    
    return final;
}

public static BNF.Package Arithmetic()
{
    var _expression = BNF.Forward();      // Forward reference, to allow circular/recursive matching

    BNF add_sub = (BNF)'+' | '-';         // You may need to cast one of a set to `BNF`
    BNF mul_div = BNF.OneOf('*', '/');    // Helper, equivalent to: `(BNF)'*' | '/'`
    BNF exp = '^';                        // Individual strings or characters are auto cast if possible

    BNF number = @"#\-?[0-9]+(\.[0-9]+)?";// Regex for a signed number. Regexes bind as a single terminal
    BNF factor =
          number 
        | ('(' > _expression > ')');      // Note the reference to `_expression` forward ref
    BNF power = factor > !(exp > factor); // `factor`, then optional '^' + `factor`
    BNF term = power % mul_div;           // delimited list, to handle patterns like "1 * 2 / 3 * 4"
    BNF expression = term % add_sub;      //    ...again to handle "1 + 2 - 3 + 4"

    _expression.Is(expression);           // Complete the forward reference

    add_sub.Tag(Operation).PivotScope();  // Tag '+' and '-' as operations, and mark them as pivoting
    mul_div.Tag(Operation).PivotScope();  //    ... same for '*' and '/'
    exp.Tag(Operation).PivotScope();      //    ... and '^'
    number.Tag(Value);                    // Tag numbers separately from operations

    return expression
        .WithOptions(                     // Package this BNF with the intended scanner settings
            BNF.Options.SkipWhitespace
        );
}

public const string Operation = "operation";
public const string Value = "value";

private static TreeNode ApplyOperation(TreeNode node)
{
    if (node.Source.Tag is null) return node.Children[0]; // pull child up through joining nodes

    if (node.Source.Tag != ArithmeticExample.Operation) return node; // only look at operation nodes
    var operation = node.Source.Value;

    if (node.Children.Count < 2) throw new Exception("Invalid expression");
    var left = node.Children[0].Source;
    var right = node.Children[1].Source;

    if (!double.TryParse(left.Value, out var a) || !double.TryParse(right.Value, out var b)) return node; // one of our children is not a number

    // Both children are values: perform the operation
    var result = operation switch
    {
        "+" => a + b,
        "-" => a - b,
        "*" => a * b,
        "/" => a / b,
        "^" => Math.Pow(a, b),
        _ => throw new NotImplementedException($"Operation not implemented: '{operation}'")
    };

    // Return a new node with the calculated value
    return TreeNode.FromString(result.ToString(CultureInfo.InvariantCulture), ArithmeticExample.Value);
}

Basic XML Parser

BNF text       = BNF.Regex("[^<>]+");
BNF identifier = BNF.Regex("[_a-zA-Z][_a-zA-Z0-9]*");
BNF whitespace = BNF.Regex(@"\s+");

BNF quoted_string = '"' > identifier > '"';
BNF attribute = whitespace > identifier > '=' > quoted_string;

BNF tag_id = identifier.Tagged(TagId);
BNF open_tag = '<' > tag_id > -attribute > '>';

BNF close_tag = "</" > tag_id > '>';

return BNF
    .Recursive(tree => -(open_tag > -(tree | text) > close_tag))
    .WithOptions(BNF.Options.None);

Full spec JSON parser

From https://www.json.org/json-en.html

var _value = BNF.Forward();

BNF ws = BNF.Regex(@"\s*");
BNF neg = '-';
BNF digit = BNF.Regex("[0-9]");
BNF exp = BNF.Regex("[eE]");
BNF sign = BNF.OneOf('+', '-');

BNF escape = BNF.OneOf('"', '\\', '/', 'b', 'f', 'n', 'r', 't') | BNF.Regex("u[0-9a-fA-F]{4}");
BNF character = BNF.Regex("""[^"\\]""") | ('\\' > escape);
BNF characters = -character;
BNF quoted_string = '"' > characters > '"';

BNF element = ws > _value > ws;
BNF elements = element % ',';

BNF member_key = quoted_string.Copy();
BNF member = ws > member_key > ws > ':' > element;
BNF members = member % ',';

BNF object_enter = '{';
BNF object_leave = '}';
BNF object_block = object_enter > (ws | members) > object_leave;

BNF array_enter = '[';
BNF array_leave = ']';
BNF array_block = array_enter > elements > array_leave;

BNF digits = +digit;
BNF exponent = !(exp > sign > digits);
BNF fraction = !('.' > digits);
BNF integer = (!neg) > (+digit); // this is slightly out of spec, as it allows "01234"
BNF number = integer > fraction > exponent;

BNF primitive = quoted_string | number | "true" | "false" | "null";
BNF value = object_block | array_block | primitive;

_value.Is(value);

return element.WithOptions(BNF.Options.SkipWhitespace);

Languages