Home
Regular expressions are power tools for strings. Actually, it may be more accurate to say that regular expressions are like the bits for power tools for strings, as the tools themselves are functions. Many people are put off, and rightly so, by the presentation of dense, obscure code. However, generous use of comments can make even the most twisted regular-expression code quite comprehensible.
Regular expressions were popularized through the grep
utility in UNIX. For this author, this is one of the few things he learned in the mid-1990's that he still uses regularly (pardon the pun). Any time you need to manipulate text, proper use of regular expressions will almost assuredly make the job easier and more robust.
At its most basic, a regular expression is used with functions to identify a patterns within a string, and sometimes to divide that identified pattern into groups. Functions can also be used with regular expressions to extract, manipulate, and/or substitute back into the original text.
Here are some example regular expressions, "borrowed" from the excellent Wikipedia Page:
-
hat
matches "hat" -
[hc]at
matches "hat" and "cat" -
.at
matches any three-character string ending with "at", including "hat", "cat", and "bat" -
[^b]at
matches all strings matched by .at except "bat" -
[^hc]at
matches all strings matched by .at other than "hat" and "cat" -
[hc]at$
matches "hat" and "cat", but only at the end of the string or line -
^[hc]at
matches "hat" and "cat", but only at the beginning of the string or line
This repository is meant to be only the briefest of introductions to the richness and capability of regular expressions. The goal is allow the user to get started with regular expressions, and the functions within the R package stringr
that use regular expressions.
The goal of this repository is to demonstrate the use of regular expressions, using R. Towards this end, three activities are proposed to the user:
- Install -- Download the requisite R packages.
- Learn -- Watch a series of YouTube videos. Follow along with R code.
- Practice -- Perform a series of exercises using regular expressions using R.
This repository is based on a number of packages written by Hadley Wickham. The format for the exercise documentation is inspired by the problem-sets for Andrew Ng's course on Machine Learning, offered at Coursera. As well, Paul Buda and Sylvain Marié have provided valuable feedback on the presentation of this Wiki.
Tony Gray asked the question about lookaround regular-expressions. As he noted, these can be useful as they do not "consume" characters. A couple of notes on lookarounds using stringr
:
- Lookarounds are supported in perl-style regular expressions, which means that we have to use the
perl()
function on our regular expression. - We cannot use
str_match()
orstr_match_all()
because the base-R function it wraps does not support perl-style expressions.
A quick example (left to the reader to discover lookahead, lookbehind):
> str_replace_all("baseball base", perl("base(?=ball)"), "dodge")
[1] "dodgeball base"