# Regular expressions

Regular expressions are a mini language for specifying search patterns and are an indispensable
tool for handling text data in any programming language. Here we will briefly touch on some use cases in Python.
For more details, please visit the Python [documentation](https://docs.python.org/3.7/library/re.html). This [short introduction](https://realpython.com/regex-python/) can also be helpful.


In [1]:
import re

Finding a match in a string

In [4]:
## Test if the string contains the substring "be"

has_match = re.search(r"be", "Let it be")

if has_match:
    print("Success!")

Success!


Sometimes you want to ignore the case in both the regular expression and the string. You can specify this behaviour by passing the `re.IGNORECASE` modifier.

In [10]:
has_match = re.match("be", "Be it as it may", re.IGNORECASE)

print("Success") if has_match else print("No match")


Success


Find all words starting with g of length 4. The character class `\w` matches
word characters.

In [12]:
matches = re.findall(r"g\w{3}", "The goal was to catch the goat.")
print(matches)

['goal', 'goat']


The character class `\d` matches decimal digits.

In [16]:
matches = re.findall(r"\d+", "Find all integers like 2 and 301 here.")
print(matches)

['2', '301']


Find all substrings starting with g and ending with either t or l.


In [22]:
matches = re.findall(r"g.*?[t]", "The goal was to catch the goat.")
print(matches)

['goal was t', 'goa.t']


Use search to find substrings starting with @ and return it as a dictionary.


In [7]:
matches = re.search(r"(?P<mention>@\w+)", "Hi, @Ann23, @Patrick")
print(matches.groupdict())

{'mention': '@Ann23'}


Use a regular expression to compress whitespace within the string


In [8]:
cleaned_string = re.sub(r"\s+", " ", "A     string                       with lots  of white space.")
print(cleaned_string)

A string with lots of white space.


Use a regular expression to change the position of the first and last names in the following string:


In [9]:
cleaned_string = re.sub(r"(\w+) (\w+)", r"\2, \1", "Mike Santori")
cleaned_string

'Santori, Mike'