Skip to content
Mark Whitaker edited this page Feb 18, 2021 · 9 revisions

Contents

Overview

The core of RegexToolbox.kt is the regex builder function and the RegexBuilder class. They make it easy to build complicated regular expressions in a way that's far more readable to Kotlin developers. regex is a super-Kotliny type-safe builder that leads to fluent, readable syntax.

Let's see an example. Say we want to use a regular expression to match people's names in a text file. We'll define a person's name as two words next to each other, both beginning with a capital letter. Here's how we might do it without using RegexToolbox.kt:

val regex = Regex("\\b[A-Z][a-z]+\\s+[A-Z][a-z]+\\b")

Or using a raw string:

val regex = Regex("""\w[A-Z][a-z]+\s+[A-Z][a-z]+\w""")

That's a pretty simple regular expression, but unless you're familiar with the syntax it can still look confusing and be difficult to read, understand and maintain. Here it is again with regex:

val regex = regex {
    wordBoundary()
    uppercaseLetter()
    lowercaseLetter(OneOrMore)
    whitespace(OneOrMore)
    uppercaseLetter()
    lowercaseLetter(OneOrMore)
    wordBoundary()
}

Some things you'll notice straight off the bat:

  1. There's no regex syntax on display here at all - just simple, clearly-named building blocks such as lowercaseLetter().
  2. regex returns a standard kotlin.text.Regex object, so you can treat it just the same as an object built with the regular syntax. (There's also pattern for legacy cases where you need to build a java.util.regex.Pattern.)
  3. Matching an element conditionally or repeatedly is done by passing in a RegexQuantifier: more about those in Quantifiers.
  4. The code got longer. That's unavoidable, but a worthwhile trade-off for cleaner, more maintainable code. We're not trying to win a round of code golf here. If you are, then bare regex syntax is definitely the way to go. 😉

Regex options

regex takes an optional, variable array of RegexOptions as a parameter. Supported values are:

Value Description
IGNORE_CASE Makes the regex case-insensitive. Note that this causes element methods like uppercaseLetter() to lose their case sensitivity.
MULTILINE Causes startOfString() and endOfString() to also match line breaks within a multi-line string.

You use them like this:

val regex = regex(IGNORE_CASE, MULTILINE) {
    letter()
    digit()
}

Features

RegexToolbox supports the most commonly used features of regular expressions. Some advanced features that are rarely used are omitted for the sake of simplicity, but they may be added in future if there's enough demand. (Or of course you can fork this repo and add all you like. 😃)

The current features of RegexToolbox are described in the following pages:

  • Elements. These are the building blocks that we make regexes from: things like letters, numbers, whitespace and so on.
  • Quantifiers. These are used to match multiple occurrences of an element in a regex.
  • Groups. These are used either a) to bunch together a set of elements so you can apply quantifiers to the whole lot, or b) to "remember" part of a regex so you can extract it from the match later. Or both of the above.

Download from JitPack

Clone this wiki locally