# Advanced Regular Expressions Lab

Complete the following set of exercises to solidify your knowledge of regular expressions.

In [1]:
import re

### 1. Use a regular expression to find and extract all vowels in the following text.

In [2]:
text = "This is going to be a sentence with a good number of vowels in it."

In [3]:
pattern = '[aeiou]'
print(re.findall(pattern, text))

['i', 'i', 'o', 'i', 'o', 'e', 'a', 'e', 'e', 'e', 'i', 'a', 'o', 'o', 'u', 'e', 'o', 'o', 'e', 'i', 'i']


### 2. Use a regular expression to find and extract all occurrences and tenses (singular and plural) of the word "puppy" in the text below.

In [4]:
text = "The puppy saw all the rest of the puppies playing and wanted to join them. I saw this and wanted a puppy of my own!"

In [5]:
pattern = r'\bpuppy\b|\bpuppies\b'
print(re.findall(pattern, text))

['puppy', 'puppies', 'puppy']


### 3. Use a regular expression to find and extract all tenses (present and past) of the word "run" in the text below.

In [6]:
text = "I ran the relay race the only way I knew how to run it."

In [7]:
pattern = r'\brun\b|\bran\b'
print(re.findall(pattern, text))

['ran', 'run']


### 4. Use a regular expression to find and extract all words that begin with the letter "r" from the previous text.

In [8]:
pattern = r'\b[rR]\w+'

print(re.findall(pattern, text))

['ran', 'relay', 'race', 'run']


### 5. Use a regular expression to find and substitute the letter "i" for the exclamation marks in the text below.

In [9]:
text = "Th!s !s a sentence w!th spec!al characters !n !t."

In [10]:
pattern = r'!+'
replacement = 'i'

print(re.sub(pattern, replacement, text))

This is a sentence with special characters in it.


### 6. Use a regular expression to find and extract words longer than 4 characters in the text below.

In [11]:
text = "This sentence has words of varying lengths."

In [12]:
pattern = r'\b\w{4,}\b'

print(re.findall(pattern, text))

['This', 'sentence', 'words', 'varying', 'lengths']


### 7. Use a regular expression to find and extract all occurrences of the letter "b", some letter(s), and then the letter "t" in the sentence below.

In [13]:
text = "I bet the robot couldn't beat the other bot with a bat, but instead it bit me."

In [14]:
pattern = r'b\w+t'
# pattern = '\\b[bB]\\w*t\\b'

print(re.findall(pattern, text))

['bet', 'bot', 'beat', 'bot', 'bat', 'but', 'bit']


### 8. Use a regular expression to find and extract all words that contain either "ea" or "eo" in them.

In [15]:
text = "During many of the peaks and troughs of history, the people living it didn't fully realize what was unfolding. But we all know we're navigating breathtaking history: Nearly every day could be — maybe will be — a book."

In [16]:
pattern = r'\w*ea\w*|\w*eo+\w*'

print(re.findall(pattern, text))

['peaks', 'people', 'realize', 'breathtaking', 'Nearly']


### 9. Use a regular expression to find and extract all the capitalized words in the text below individually.

In [17]:
text = "Teddy Roosevelt and Abraham Lincoln walk into a bar."

In [18]:
pattern = '[A-Z]'

print(re.findall(pattern, text))

['T', 'R', 'A', 'L']


### 10. Use a regular expression to find and extract all the sets of consecutive capitalized words in the text above.

In [19]:
pattern = r'\b[A-Z][A-Za-z]*\s[A-Z][A-Za-z]*\b'

print(re.findall(pattern, text))

['Teddy Roosevelt', 'Abraham Lincoln']


### 11. Use a regular expression to find and extract all the quotes from the text below.

*Hint: This one is a little more complex than the single quote example in the lesson because there are multiple quotes in the text.*

In [20]:
text = 'Roosevelt says to Lincoln, "I will bet you $50 I can get the bartender to give me a free drink." Lincoln says, "I am in!"'

In [21]:
pattern = r'"([^"]+)"'

print(re.findall(pattern, text))

['I will bet you $50 I can get the bartender to give me a free drink.', 'I am in!']


### 12. Use a regular expression to find and extract all the numbers from the text below.

In [22]:
text = "There were 30 students in the class. Of the 30 students, 14 were male and 16 were female. Only 10 students got A's on the exam."

In [23]:
pattern = '[\d]+'

print(re.findall(pattern, text))

['30', '30', '14', '16', '10']


### 13. Use a regular expression to find and extract all the social security numbers from the text below.

In [24]:
text = """
Henry's social security number is 876-93-2289 and his phone number is (847)789-0984.
Darlene's social security number is 098-32-5295 and her phone number is (987)222-0901.
"""

In [25]:
pattern = '\d{3}[-]\d{2}[-]\d{4}'

print(re.findall(pattern, text))

['876-93-2289', '098-32-5295']


### 14. Use a regular expression to find and extract all the phone numbers from the text below.

In [26]:
pattern = '\(\d{3}\)\d{3}[-]\d{4}'

print(re.findall(pattern, text))

['(847)789-0984', '(987)222-0901']


### 15. Use a regular expression to find and extract all the formatted numbers (both social security and phone) from the text below.

In [27]:
pattern = r'(\(\d{3}\)\d{3}[-]\d{4}|\d{3}[-]\d{2}[-]\d{4})'

print(re.findall(pattern, text))

['876-93-2289', '(847)789-0984', '098-32-5295', '(987)222-0901']


In [28]:
'''
Meta-characters
[]  : Match set of characters
.   : Match any character except the newline character (\n)
^   : 1) Match characters not listed if within set or 
      2) match beginning of string
$   : Match end of string
|   : Functions as an "OR" operator
'''

'''
Character classes
\w: Any alphanumeric character. equivalent to [A-Za-z0-9_]
\W: Any non-alphanumeric character. equivalent to [^A-Za-z0-9_]
\d: Any numeric character. [0-9]
\D: Any non-numeric character. [^0-9]
\s: Any whitespace characters. [ \t\n\f]
\S: Any non-whitespace characters.
'''

'''
specify number of matches:
*: Matches previous character 0 or more times
+: Matches previous character 1 or more times
?: Matches previous character 0 or 1 times (optional)
{}: Matches previous characters however many times specified within:
{n} : Exactly n times 
{n,} : At least n times
{n,m} : Between n and m times
'''

'\nspecify number of matches:\n*: Matches previous character 0 or more times\n+: Matches previous character 1 or more times\n?: Matches previous character 0 or 1 times (optional)\n{}: Matches previous characters however many times specified within:\n{n} : Exactly n times \n{n,} : At least n times\n{n,m} : Between n and m times\n'

In [29]:
text ="""
Aeromexico 800-237-6639
Air Canada 888-247-2262
Air Canada Rouge 888-247-2262
Air Creebec 800-567-6567
Air Inuit 800-361-2965
Air North 800-661-0407
Air Tindi 888-545-6794
Air Transat 866-847-1112
Alaska Airlines 800-426-0333,866-516-1685
Allegiant Air 702-505-8888
American Airlines 800-433-7300
Bearskin Airlines 807-577-1141
Buffalo Airways 867-874-3333
Calm Air 800-839-2256
Cape Air 800-227-3247
Delta Air Lines 800-455-2720
First Air 800-267-1247
Flair Airlines 204-888-2665
Frontier Airlines 801-401-9000
Harb-or-Air 800-665-0212
Hawaiian Airlines 877-426-4537
Horizon Air 800-547-9308
InterJet 866-285-8307
Island Air 800-388-1105
JetBlue 800-538-2583
Porter Airlines 888-619-8622
Silver Airways 801-401-9100
Southwest Airlines 800-435-9792
Spirit Airlines 801-401-2222
Sun Country Airlines 800-359-6786
Sunwing 877-SUN-WING
Thunder Airlines 800-803-9943
United Airlines 800-864-8331
Virgin America 877-359-8474
VivaAerobus 888-935-988 
Volaris 855-865-2747
WestJet Airlines 888-937-8538
"""


In [30]:
pattern = '\d{3}[-]\d{3}[-]\d{4}'

print(re.findall(pattern, text))

['800-237-6639', '888-247-2262', '888-247-2262', '800-567-6567', '800-361-2965', '800-661-0407', '888-545-6794', '866-847-1112', '800-426-0333', '866-516-1685', '702-505-8888', '800-433-7300', '807-577-1141', '867-874-3333', '800-839-2256', '800-227-3247', '800-455-2720', '800-267-1247', '204-888-2665', '801-401-9000', '800-665-0212', '877-426-4537', '800-547-9308', '866-285-8307', '800-388-1105', '800-538-2583', '888-619-8622', '801-401-9100', '800-435-9792', '801-401-2222', '800-359-6786', '800-803-9943', '800-864-8331', '877-359-8474', '855-865-2747', '888-937-8538']
