A parser-combinator library for C#, with a fluent BNF-like interface for building parsers.
See Sample Parsers for fully functional examples.
// Basic infix arithmetic expressions
BNF
number = BNF.Regex("\-?[0-9]+(\.[0-9]+)?"), // Signed numbers
factor = number | ('(' > _expression > ')'), // Number or parenthesised expression
power = factor > !('^' > factor), // Factor, with optional '^' + exponent
term = power % ('*' | '/'), // Powers, optionally joined with '*' or '/'
expression = term % ('+' | '-'); // Terms, optionally joined will '+' or '-'
var result = expression.ParseString( // Run the parser
"(6.5 + 3) * (5.5 - -2)"
);
var tree = TreeNode.FromParserMatch(result, true); // Interpret the raw parse tree
var final = TreeNode.TransformTree(tree, ApplyOperation); // Apply functions to reduce tree to value
Console.WriteLine(final); // 71.25
Some details removed for clarity -- see bottom of document for full implementation)
'…'
→ Character parser that matches a single literal character in the input"…"
→ String parser that matches a literal string in the input"#…"
orBNF.Regex("…")
→ Regex parser that matches a string based on a regex pattern. The#
prefix is not included in the pattern.BNF.OneOf(…)
→ Match a single character from the set providedBNF.NoneOf(…)
→ Match any single character that is not in the set providedBNF.AnyChar
→ Parser that matches any single character.BNF.Empty
→ Parser that matches an empty string (useful in unions)BNF.EndOfInput
→ Parser that matches the end of input (parsers will normally accept partial matches)BNF.LineEnd
→ Parser that matches a line end (either\r
, or\n
, or\r\n
)BNF.WhiteSpace
→ Parser that matches a single character of white-space
- a
|
b → Create a union parser that matches the longest result from either a or b. Parser will match if only one of a and b match, and if both a and b match.- Example:
"hello" | "world"
matcheshello
orworld
- Example:
"on" | "one"
matcheson
andone
.+( "on" | "one" )
will matchoneone
as {one
,one
}
- Example:
- a
>
b → Create a sequence parser that matches a then b- Example:
'x' > 'y'
matchesxy
but notx
ory
- Example:
- a
<
b → Create a terminated list parser that matches a list of a, each being terminated by b. The last item a must be terminated.- Example:
'x' < ';'
matchesx;x;x;
andx;
, but notx
orx;x
- Example:
- a
%
b → Create a delimited list parser that matches a list of a, delimited by b. A trailing delimiter is not matched.- Example:
'x'%','
matchesx
andx,x
, but notx,x,
- Example:
-
a → Create an optional repeat parser that matches zero or more a- Example:
-"xy"
matchesxyxy
,xy
, and empty
- Example:
+
a → Create a repeat parser that matches one or more a- Example:
+"xy"
matchesxy
andxyxy
, but not empty
- Example:
!
a → Create an option parser that matches zero or one a- Example:
!"xy"
matchesxy
and empty, but notxyxy
- Can also be expressed
BNF.Optional(
a)
- Example:
- a
&
b → Create an intersection parser that matches (a then b) or (b then a)- Example:
'x'&'y'
matchesxy
andyx
, but notxx
oryy
- Example:
- a
^
b → Create an exclusion parser that matches a or b but not both- Example:
'x'^'y'
matchesx
andy
, but notxy
oryx
- Example:
- a
/
b → Create a difference parser that matches a but not b- Example:
"on" / "one"
matcheson
but notone
.+( 'x' / '.' )
will matchxx.
as {x
,x
}
- Example:
Parsers generated by BNF can be used repeatedly.
Parsers operate over a 'scanner', which is an input string plus transforms and contextual data.
For common cases, you won't need to create one directly -- just use BNF.ParseString
or BnfPackage.ParseString
.
Scanners handle case-conversion and white-space skipping if you use those options.
Because scanners hold context for a parse, they cannot be reused, or shared between parse attempts.
The basic output from a parser is a ParserMatch
, which gives a complete tree of all matches, including those from combined
parsers. ParserMatch
also gives access to the parser that made the match, and the scanner that was used for input.
The ParserMatch
tree contains all the information from a result, but often this is too much.
Any parser can be tagged with a string value, and this can be used to extract salient information from the tree.
You can request a sequence of ParserMatch
es from the result tree, only returning tagged results.
Tags are not inherited by parent matches.
The scoped node tree ignores the ParserMatch
hierarchy, and uses .OpenScope()
and .CloseScope()
to build a different structure. This is useful for structured data (like JSON and XML) and otherwise scoped
data that use open and close markers -- like ({
,}
) or (begin
,end
) in many programming languages.
General tree nodes match the ParserMatch
hierarchy, but only including nodes with a tag or scope set.
The Pivot
scope has a specific effect on general trees, 'lifting' them to a level above non-pivot peers.
This is useful for chains of operators:
Given the parser sum
from:
BNF number = BNF.Regex("[0-9]+").Tag("num");
BNF addSub = BNF.OneOf('+', '-').Tag("op");
BNF sum = number % addSub;
and the input:
var result = sum.ParseString(
"1+2",
);
var tree = TreeNode.FromParserMatch(result, false);
outputs tree
as:
┌───── 1
│ +
└── 2
but changing addSub
to BNF.OneOf('+', '-').Tag("op").PivotScope();
results in
┌──1
+│
└──2
See Sample Parsers for fully functional examples.
public double EvaluateExpression(string expression)
{
var result = Arithmetic().ParseString(expression); // Step 1: parse input
var tree = TreeNode.FromParserMatch(result, prune: true); // Step 2: build expression tree
var final = TreeNode.TransformTree(tree, ApplyOperation); // Step 3: reduce the tree to a value
return final;
}
public static BNF.Package Arithmetic()
{
var _expression = BNF.Forward(); // Forward reference, to allow circular/recursive matching
BNF add_sub = (BNF)'+' | '-'; // You may need to cast one of a set to `BNF`
BNF mul_div = BNF.OneOf('*', '/'); // Helper, equivalent to: `(BNF)'*' | '/'`
BNF exp = '^'; // Individual strings or characters are auto cast if possible
BNF number = @"#\-?[0-9]+(\.[0-9]+)?";// Regex for a signed number. Regexes bind as a single terminal
BNF factor =
number
| ('(' > _expression > ')'); // Note the reference to `_expression` forward ref
BNF power = factor > !(exp > factor); // `factor`, then optional '^' + `factor`
BNF term = power % mul_div; // delimited list, to handle patterns like "1 * 2 / 3 * 4"
BNF expression = term % add_sub; // ...again to handle "1 + 2 - 3 + 4"
_expression.Is(expression); // Complete the forward reference
add_sub.Tag(Operation).PivotScope(); // Tag '+' and '-' as operations, and mark them as pivoting
mul_div.Tag(Operation).PivotScope(); // ... same for '*' and '/'
exp.Tag(Operation).PivotScope(); // ... and '^'
number.Tag(Value); // Tag numbers separately from operations
return expression
.WithOptions( // Package this BNF with the intended scanner settings
BNF.Options.SkipWhitespace
);
}
public const string Operation = "operation";
public const string Value = "value";
private static TreeNode ApplyOperation(TreeNode node)
{
if (node.Source.Tag is null) return node.Children[0]; // pull child up through joining nodes
if (node.Source.Tag != ArithmeticExample.Operation) return node; // only look at operation nodes
var operation = node.Source.Value;
if (node.Children.Count < 2) throw new Exception("Invalid expression");
var left = node.Children[0].Source;
var right = node.Children[1].Source;
if (!double.TryParse(left.Value, out var a) || !double.TryParse(right.Value, out var b)) return node; // one of our children is not a number
// Both children are values: perform the operation
var result = operation switch
{
"+" => a + b,
"-" => a - b,
"*" => a * b,
"/" => a / b,
"^" => Math.Pow(a, b),
_ => throw new NotImplementedException($"Operation not implemented: '{operation}'")
};
// Return a new node with the calculated value
return TreeNode.FromString(result.ToString(CultureInfo.InvariantCulture), ArithmeticExample.Value);
}
BNF text = BNF.Regex("[^<>]+");
BNF identifier = BNF.Regex("[_a-zA-Z][_a-zA-Z0-9]*");
BNF whitespace = BNF.Regex(@"\s+");
BNF quoted_string = '"' > identifier > '"';
BNF attribute = whitespace > identifier > '=' > quoted_string;
BNF tag_id = identifier.Tagged(TagId);
BNF open_tag = '<' > tag_id > -attribute > '>';
BNF close_tag = "</" > tag_id > '>';
return BNF
.Recursive(tree => -(open_tag > -(tree | text) > close_tag))
.WithOptions(BNF.Options.None);
From https://www.json.org/json-en.html
var _value = BNF.Forward();
BNF ws = BNF.Regex(@"\s*");
BNF neg = '-';
BNF digit = BNF.Regex("[0-9]");
BNF exp = BNF.Regex("[eE]");
BNF sign = BNF.OneOf('+', '-');
BNF escape = BNF.OneOf('"', '\\', '/', 'b', 'f', 'n', 'r', 't') | BNF.Regex("u[0-9a-fA-F]{4}");
BNF character = BNF.Regex("""[^"\\]""") | ('\\' > escape);
BNF characters = -character;
BNF quoted_string = '"' > characters > '"';
BNF element = ws > _value > ws;
BNF elements = element % ',';
BNF member_key = quoted_string.Copy();
BNF member = ws > member_key > ws > ':' > element;
BNF members = member % ',';
BNF object_enter = '{';
BNF object_leave = '}';
BNF object_block = object_enter > (ws | members) > object_leave;
BNF array_enter = '[';
BNF array_leave = ']';
BNF array_block = array_enter > elements > array_leave;
BNF digits = +digit;
BNF exponent = !(exp > sign > digits);
BNF fraction = !('.' > digits);
BNF integer = (!neg) > (+digit); // this is slightly out of spec, as it allows "01234"
BNF number = integer > fraction > exponent;
BNF primitive = quoted_string | number | "true" | "false" | "null";
BNF value = object_block | array_block | primitive;
_value.Is(value);
return element.WithOptions(BNF.Options.SkipWhitespace);