Skip to content
Mark Whitaker edited this page Aug 5, 2022 · 8 revisions

Contents

Overview

The core class of RegexToolbox.js is RegexBuilder. RegexBuilder makes it easy to build complicated regular expressions in a way that's more readable to JavaScript developers.

For example, say we want to use a regular expression to match people's names in a text file. We'll define a person's name as two words next to each other, both beginning with a capital letter. Here's how we might do it without using RegexBuilder:

const regex = /\b[A-Z][a-z]+\s+[A-Z][a-z]+\b/;

That's a pretty simple regular expression, but unless you're familiar with the syntax it can still look confusing and be difficult to read, understand and maintain. Here it is again with RegexBuilder:

const regex = new RegexBuilder()
    .wordBoundary()
    .uppercaseLetter()
    .lowercaseLetter(RegexQuantifier.oneOrMore)
    .whitespace(RegexQuantifier.oneOrMore)
    .uppercaseLetter()
    .lowercaseLetter(RegexQuantifier.oneOrMore)
    .wordBoundary()
    .buildRegex();

Some things you'll notice straight off the bat:

  1. There's no regex syntax on display here at all - just simple, clearly-named method calls in a fluent chain.
  2. We defined the regex by adding building blocks such as lowercaseLetter(), and finished by calling buildRegex(). buildRegex() returns a standard JavaScript RegExp object, so you can treat it just the same as an object built with the regular syntax.
  3. Matching an element repeatedly is done by passing in a RegexQuantifier object: more about those in Quantifiers.
  4. The code got a lot longer. That's unavoidable, but a worthwhile trade-off for cleaner, more maintainable code. We're not trying to win a round of code golf here. If you are, then bare regex syntax is definitely the way to go. 😉

Building the regex

The last step in building your regex is always calling buildRegex(). The building-block methods (text(), digit(), etc.) all return a RegexBuilder object so they can be chained, but buildRegex() returns a regular JavaScript RegExp object. It also resets the RegexBuilder object, so if you really want to you can re-use it to build a new regex.

buildRegex() takes a single parameter which is either a RegexOptions value or an array of RegexOptions values. The available values are as follows:

Value Description
RegexOptions.IGNORE_CASE Makes the regex case-insensitive. Note that this causes element methods like uppercaseLetter() to lose their case sensitivity.
RegexOptions.MATCH_ALL Makes the regex match all occurrences in a string rather than just the first. This can be especially useful when your regex is going to be passed to String.replace().
RegexOptions.MULTI_LINE Causes startOfString() and endOfString() to also match line breaks within a multi-line string.

Features

RegexBuilder supports the most commonly used features of regular expressions. Some advanced features that are rarely used are omitted for the sake of simplicity, but they may be added in future if there's enough demand. (Or of course you can fork this repo and add all you like. 😃)

The current features of RegexBuilder are described in the following pages:

  • Elements. These are the building blocks that we make regexes from: things like letters, numbers, whitespace and so on.
  • Quantifiers. These are used to match multiple occurrences of an element in a regex.
  • Groups. These are used either a) to bunch together a set of elements so you can apply quantifiers to the whole lot, or b) to "remember" part of a regex so you can extract it from the match later. Or both of the above.
Clone this wiki locally