Skip to content
Mark Whitaker edited this page Jan 6, 2021 · 6 revisions

Contents

Overview

The core class of RegexToolbox.NET is RegexBuilder. RegexBuilder makes it easy to build complicated regular expressions in a way that's more readable to C# developers.

For example, say we want to use a regular expression to match people's names in a text file. We'll define a person's name as two words next to each other, both beginning with a capital letter. Here's how we might do it without using RegexBuilder:

var regex = new Regex(@"\b[A-Z][a-z]+\s[A-Z][a-z]+\b");

That's a pretty simple regular expression, but unless you're familiar with the syntax it can still look confusing and be difficult to read, understand and maintain. Here it is again with RegexBuilder:

var regex = new RegexBuilder()
    .WordBoundary()
    .UppercaseLetter()
    .LowercaseLetter(RegexQuantifier.OneOrMore)
    .Whitespace()
    .UppercaseLetter()
    .LowercaseLetter(RegexQuantifier.OneOrMore)
    .WordBoundary()
    .BuildRegex();

Some things you'll notice straight off the bat:

  1. There's no regex syntax on display here at all - just simple, clearly-named method calls in a fluent chain.
  2. We defined the regex by adding building blocks such as LowercaseLetter(), and finished by calling BuildRegex(). BuildRegex() returns a standard C# System.Text.RegularExpressions.Regex object, so you can treat it just the same as an object built with the regular syntax.
  3. Matching an element repeatedly is done by passing in a RegexQuantifier object: more about those in Quantifiers.
  4. The code got a lot longer. That's unavoidable, but a worthwhile trade-off for cleaner, more maintainable code. We're not trying to win a round of code golf here. If you are, then bare regex syntax is definitely the way to go. 😉

Building the regex

The last step in building your regex is always calling BuildRegex(). The building-block methods (Text(), Digit(), etc.) all return a RegexBuilder object so they can be chained, but BuildRegex() returns a regular C# System.Text.RegularExpressions.Regex object. It also resets the RegexBuilder object, so if you really want to you can re-use it to build a new regex.

BuildRegex() takes a variable array of RegexOptions as a parameter. The available values are as follows:

Value Description
RegexOptions.IgnoreCase Makes the regex case-insensitive. Note that this causes element methods like UppercaseLetter() to lose their case sensitivity.
RegexOptions.Multiline Causes StartOfString() and EndOfString() to also match line breaks within a multi-line string.

Note: the decision to use our own enum instead of System.Text.RegularExpressions.RegexOptions is deliberate: firstly, to keep things simple while covering the most common use cases, and secondly because not all of the system values are applicable to RegexBuilder (e.g. IgnorePatternWhitespace).

Features

RegexBuilder supports the most commonly used features of regular expressions. Some advanced features that are rarely used are omitted for the sake of simplicity, but they may be added in future if there's enough demand. (Or of course you can fork this repo and add all you like. 😃)

The current features of RegexBuilder are described in the following pages:

  • Elements. These are the building blocks that we make regexes from: things like letters, numbers, whitespace and so on.
  • Quantifiers. These are used to match multiple occurrences of an element in a regex.
  • Groups. These are used either a) to bunch together a set of elements so you can apply quantifiers to the whole lot, or b) to "remember" part of a regex so you can extract it from the match later. Or both of the above.

NuGet Version and Downloads count

Clone this wiki locally