# Introduction to Regular Expressions
##### By Avi Rabin

### Also known as Regex

Regex is a powerful tool used to parse what you need from a given text.  There is a python library called "re" that you can import that enables you to use regex.  I'm going to use its "findall" method to illustrate how regex works.

In [8]:
import re

Now we can apply the "findall" method.  Basically, re.findall is a method that takes two arguments: the template you are using to find matches, and the string you are applying the template to.

Say we are given the string "Hello, my name is Avi!" and we want to find every instance of the letter "m".  We can do this by typing re.findall('m', 'Hello, my name is Avi!').  Note that it will match this instance literally, so uppercase "M" won't be caught.

In [4]:
re.findall('5', 'Hello, my name is Avi!')

[]

You can mess around with this on your own to see what you come up with.  Useful tools like regexr.com and regex101.com are also helpful to see what your regex pattern will match.   I really recommend you do this so that you understand the full effect of regex.  There are special characters in regex that make it so helpful and powerful, and I'm going to go over some of the main ones.

# .

The "." period sign matches any character but a new line.

In [4]:
re.findall('number is: .', 'My favorite single digit number is: 5')

['number is: 5']

# ^

The "^" character matches the position at the start of the string character.

In [5]:
re.findall('^favorite', 'My favorite single digit number is: 5') # favorite isn't the first word, so it doesn't match

[]

In [6]:
re.findall('^My', 'My favorite single digit number is: 5')

['My']

# $

The "$" character matches the position at the end of the string character.

In [7]:
re.findall('last$', 'Matches a word that is last')

['last']

# ?

The "?" character matches the preceding character 0 or 1 times.

In [6]:
re.findall('Goodly?', 'Goodly Goodl Good beegl')

['Goodly', 'Goodl']

# *

The "*" character matches the preceding character 0 or more times.

In [9]:
re.findall("Hello*", "Helloooooo, my name is Avi")

['Helloooooo']

# +

The "+" character matches the preceding character 1 or more times.

In [10]:
re.findall("Hello+", "Hell, my name is Avi") # doesn't match because we need 1 or more o's after Hell

[]

In [11]:
re.findall("Hello+", "Helloooooo, my name is Avi")

['Helloooooo']

# {#}

The "{" and "}" character provide a range to match.  You can use them in a range (e.g. {2, 5}) or just as a single length (e.g. {4}).

In [12]:
re.findall("Hello{4,8}", "This matches up to 8 o's and more than 4 after Hellooooo")

['Hellooooo']

# (#)

The "(" and ")" characters provide a sequence to match.

In [13]:
re.findall(": (10)", "What's my favorite number? It's: 10. I also like 10") # You can see there's only 1 match here

['10']

# [#]

The "[" and "]" characters provide a sequence where any character inside can be matched.

In [14]:
re.findall("My favorite numer is [123]", "My favorite number is 4")

[]

In [15]:
re.findall("My favorite number is [123]", "My favorite number is 2")

['My favorite number is 2']

# \d \w \s \n

<b> \d </b> matches a single character that is a digit.  <b>\D</b> matches anything that isn't a digit. <br>
<b> \w </b> matches a single character that is a word character.  <b>\W</b> matches anything that is not a word character.<br>
<b> \s </b> matches a whitespace character. <b>\S</b> matches anything that is not a whitespace character <br>
<b> \n </b> matches a newline character 

In [16]:
re.findall("My favorite number is: \d", "My favorite number is: 522")

['My favorite number is: 5']

In [17]:
re.findall("My favorite number is: [\d]*", "My favorite number is: 522")

['My favorite number is: 522']

In [18]:
re.findall("My name is \D\D\D", "My name is Avi")

['My name is Avi']

In [19]:
re.findall("My name is [\w]*", "My name is Avi")

['My name is Avi']

In [20]:
re.findall("He\w\wo \world", "Hello world")

['Hello world']

In [21]:
re.findall("Hello\s\Shere!", "Hello there!")

['Hello there!']

Try to understand all of the above examples.  Regex is incredibly powerful and versatile so it's a valuable skill to have mastered!

# Your turn!

Try to grab my favorite number, my favorite word, and every instance of my name with exactly 3 i's following it, regardless of capitalization.

In [9]:
s = "My favorite number is 14.  My friend's favorite number is 13.  I do not like the number 12.  I \
dislike the word 'dislike' but I like the word 'like'.  My name is Avi.  Avii avii AvII aviii avIII AvIiii."

In [23]:
# your turn

In [None]:
re.findall("")