# Meet Regular Expressions

## What?
- Regular expressions (called regexes or regex patterns) are a tiny language for dealing with text and character patterns.
- With RegEx patterns we can:
    - Does this string match a pattern?
    - Is there a match for the pattern anywhere in the string?
    - Modify + split strings in various ways
    
    
re library functions
- `re.search` scans through a string, looking for any location where this RE matches.
- `re.findall` Finds all substrings where the RE matches; returns a list.


## So What?
- Power + precision
    - Cost is learning something new and potentially unfamiliar.
    - Payoff is a language that works with any other programming language to operate on text and character patterns.
- Regular Expressions are cross platform and available in many popular programming languages


## Now What?


In [1]:
import re

### Patterns to Match Literals 

In [2]:
string = "Two households, both alike in dignity, In fair Verona, where we lay our scene, From ancient grudge break to new mutiny, Where civil blood makes civil hands unclean."
string

'Two households, both alike in dignity, In fair Verona, where we lay our scene, From ancient grudge break to new mutiny, Where civil blood makes civil hands unclean.'

In [3]:
# We can search for a literal match of the string Verona
x = re.search(r"Verona", string)

In [4]:
# the span returned is the index. 
# Consider if we were to splice the string using the span bounds
string[47:53]

'Verona'

In [5]:
re.search(r"In fair Verona", string)

<re.Match object; span=(39, 53), match='In fair Verona'>

In [6]:
# The string "Leonardo DiCaprio" is not here, so re.search returns None
re.search(r"Leonardo DiCaprio", string)

In [7]:
# re.search returns the first match
re.search(r"civil", string)

<re.Match object; span=(126, 131), match='civil'>

In [8]:
# .findall returns all matches
re.findall(r"civil", string)

['civil', 'civil']

In [9]:
# empty set for no matches with .findall
re.findall(r"Claire Danes", string)

[]

In [10]:
re.search(r"Two", string)

<re.Match object; span=(0, 3), match='Two'>

In [11]:
# Are computers particular?
re.search(r"two", string)

In [12]:
# The re.IGNORECASE flag does exactly that
re.search(r"two", string, re.IGNORECASE)

<re.Match object; span=(0, 3), match='Two'>

In [13]:
string = "Two households, both alike in dignity, In fair Verona, where we lay our scene, From ancient grudge break to new mutiny, Where civil blood makes civil hands unclean."
string

re.findall(r"civil\s.{5}", string)


['civil blood', 'civil hands']

In [14]:
# OR
re.findall(r"gray|grey", "I can't remember if you spell grey gray or gray like grey!")

['grey', 'gray', 'gray', 'grey']

In [23]:
re.split(r"\s", "this that and the other")

['this', 'that', 'and', 'the', 'other']

In [17]:
re.split(r"-", "210-226-3232")

['210', '226', '3232']

In [22]:
# Parse these songs into a dataframe containing 2 columns: artist_name and song_name
# Hint: break the string into an array of strings that hold each song/artist record
songs = "Harry_Belafonte_-_Jump_In_the_Line.mp3,Willie_Mae_'Big_Mama'_Thornton_-_Hound_Dog.mp3,Tina_Turner_-_Proud_Mary.mp3,Prince_-_Purple_Rain.mp3"
songs

"Harry_Belafonte_-_Jump_In_the_Line.mp3,Willie_Mae_'Big_Mama'_Thornton_-_Hound_Dog.mp3,Tina_Turner_-_Proud_Mary.mp3,Prince_-_Purple_Rain.mp3"