# Advanced Regular Expressions Lab

Complete the following set of exercises to solidify your knowledge of regular expressions.

In [1]:
import re

### 1. Use a regular expression to find and extract all vowels in the following text.

In [2]:
text = "This is going to be a sentence with a good number of vowels in it."

In [4]:
all_vowels = re.findall('[aeiou]', text)
print(all_vowels)

['i', 'i', 'o', 'i', 'o', 'e', 'a', 'e', 'e', 'e', 'i', 'a', 'o', 'o', 'u', 'e', 'o', 'o', 'e', 'i', 'i']


### 2. Use a regular expression to find and extract all occurrences and tenses (singular and plural) of the word "puppy" in the text below.

In [13]:
text = "The puppy saw all the rest of the puppies playing and wanted to join them. I saw this and wanted a puppy of my own!"

In [14]:
all_puppies = re.findall('puppy|puppies', text)
print(all_puppies)
#  don't forget the RUN the cell above so that the variable text is reassigned the string for this problem, not the previous one


['puppy', 'puppies', 'puppy']


### 3. Use a regular expression to find and extract all tenses (present and past) of the word "run" in the text below.

In [15]:
text = "I ran the relay race the only way I knew how to run it."

In [16]:
all_tenses = re.findall('r[au]n', text)
print(all_tenses)


['ran', 'run']


### 4. Use a regular expression to find and extract all words that begin with the letter "r" from the previous text.

In [17]:
all_r_starters = re.findall('r\w+', text)
print(all_r_starters)


['ran', 'relay', 'race', 'run']


### 5. Use a regular expression to find and substitute the letter "i" for the exclamation marks in the text below.

In [18]:
text = "Th!s !s a sentence w!th spec!al characters !n !t."

In [21]:
letter_i = re.sub('!', 'i', text)
print(letter_i)
#  (pattern, substitution, string)


This is a sentence with special characters in it.


### 6. Use a regular expression to find and extract words longer than 4 characters in the text below.

In [22]:
text = "This sentence has words of varying lengths."

In [25]:
over_4 = re.findall('\w{5,}', text)
print(over_4)


['sentence', 'words', 'varying', 'lengths']


### 7. Use a regular expression to find and extract all occurrences of the letter "b", some letter(s), and then the letter "t" in the sentence below.

In [26]:
text = "I bet the robot couldn't beat the other bot with a bat, but instead it bit me."

In [27]:
b_word = re.findall('b\w+t', text)
print(b_word)


['bet', 'bot', 'beat', 'bot', 'bat', 'but', 'bit']


### 8. Use a regular expression to find and extract all words that contain either "ea" or "eo" in them.

In [28]:
text = "During many of the peaks and troughs of history, the people living it didn't fully realize what was unfolding. But we all know we're navigating breathtaking history: Nearly every day could be — maybe will be — a book."


In [31]:
eaeo_words = re.findall('\w*e[ao]\w*', text)
print(eaeo_words)
#  any number of char + e + either a or o + any number of char

['peaks', 'people', 'realize', 'breathtaking', 'Nearly']


### 9. Use a regular expression to find and extract all the capitalized words in the text below individually.

In [32]:
text = "Teddy Roosevelt and Abraham Lincoln walk into a bar."

In [33]:
capitals = re.findall('[A-Z][a-z]*', text)
print(capitals)


['Teddy', 'Roosevelt', 'Abraham', 'Lincoln']


### 10. Use a regular expression to find and extract all the sets of consecutive capitalized words in the text above.

In [36]:
#  capital word followed by space, then another capital word
cap_sequence = re.findall('[A-Z][a-z]* [A-Z][a-z]*', text)
print(cap_sequence)


['Teddy Roosevelt', 'Abraham Lincoln']


### 11. Use a regular expression to find and extract all the quotes from the text below.

*Hint: This one is a little more complex than the single quote example in the lesson because there are multiple quotes in the text.*

In [37]:
text = 'Roosevelt says to Lincoln, "I will bet you $50 I can get the bartender to give me a free drink." Lincoln says, "I am in!"'


In [43]:
# any number of characters between a quote, but ANY number of times
quote_mcgoat = re.findall('".*"', text)
print(quote_mcgoat)
#  maybe match all but stop at the first quote you see, then run again? 

# need to come back to this one

['"I will bet you $50 I can get the bartender to give me a free drink." Lincoln says, "I am in!"']


### 12. Use a regular expression to find and extract all the numbers from the text below.

In [44]:
text = "There were 30 students in the class. Of the 30 students, 14 were male and 16 were female. Only 10 students got A's on the exam."


In [45]:
numbers = re.findall('[0-9]', text)
print(numbers)


['3', '0', '3', '0', '1', '4', '1', '6', '1', '0']


### 13. Use a regular expression to find and extract all the social security numbers from the text below.

In [46]:
text = """
Henry's social security number is 876-93-2289 and his phone number is (847)789-0984.
Darlene's social security number is 098-32-5295 and her phone number is (987)222-0901.
"""

In [47]:
ss_stealing = re.findall('\d{3}-\d{2}-\d{4}', text)
print(ss_stealing)
# my code will work on social security formatting only, so even if those phone numbers were formatted xxx-xxx-xxxx

['876-93-2289', '098-32-5295']


### 14. Use a regular expression to find and extract all the phone numbers from the text below.

In [54]:
phone_grab = re.findall('.\d{3}.\d{3}.\d{4}', text)
print(phone_grab)

#  what if the phone was formatted xxx-xxx-xxxx?
# text99 = "His phone number is 847-789-0984."
# print(re.findall('.\d{3}.\d{3}.\d{4}', text99))
# so it grabs it, but gives a leading whitespace, which makes sense. able to .strip() that out no problem

['(847)789-0984', '(987)222-0901']


### 15. Use a regular expression to find and extract all the formatted numbers (both social security and phone) from the text below.

In [61]:
all_numbers = re.findall('\d+[)-]\d+-\d+', text)
print(all_numbers)


['876-93-2289', '847)789-0984', '098-32-5295', '987)222-0901']
