# Advanced Regular Expressions Lab

Complete the following set of exercises to solidify your knowledge of regular expressions.

In [3]:
import re

### 1. Use a regular expression to find and extract all vowels in the following text.

In [4]:
text = "This is going to be a sentence with a good number of vowels in it."

In [6]:
pattern = "[a,e,i,o,u]"
re.findall(pattern, text)

['i',
 'i',
 'o',
 'i',
 'o',
 'e',
 'a',
 'e',
 'e',
 'e',
 'i',
 'a',
 'o',
 'o',
 'u',
 'e',
 'o',
 'o',
 'e',
 'i',
 'i']

### 2. Use a regular expression to find and extract all occurrences and tenses (singular and plural) of the word "puppy" in the text below.

In [19]:
text2 = "The puppy saw all the rest of the puppies playing and wanted to join them. I saw this and wanted a puppy of my own!"

In [43]:
pattern = "puppy | puppies"
re.findall(pattern, text2)

['puppy ', ' puppies', 'puppy ']

### 3. Use a regular expression to find and extract all tenses (present and past) of the word "run" in the text below.

In [44]:
text = "I ran the relay race the only way I knew how to run it."

In [45]:
pattern = "ran | run"
re.findall(pattern, text)

['ran ', ' run']

### 4. Use a regular expression to find and extract all words that begin with the letter "r" from the previous text.

In [60]:
pattern = "(?<=\\b)r[^\\s]+"
re.findall(pattern, text)

['ran', 'relay', 'race', 'run']

### 5. Use a regular expression to find and substitute the letter "i" for the exclamation marks in the text below.

In [66]:
text = "Th!s !s a sentence w!th spec!al characters !n !t."

In [75]:
pattern = '(!)'
display(re.findall(pattern, text))
re.sub(pattern, "i", text)

['!', '!', '!', '!', '!', '!']

'This is a sentence with special characters in it.'

### 6. Use a regular expression to find and extract words longer than 4 characters in the text below.

In [76]:
text = "This sentence has words of varying lengths."

In [78]:
pattern = "[\w']{4,}"
display(re.findall(pattern, text))

['This', 'sentence', 'words', 'varying', 'lengths']

### 7. Use a regular expression to find and extract all occurrences of the letter "b", some letter(s), and then the letter "t" in the sentence below.

In [89]:
text = "I bet the robot couldn't beat the other bot with a bat, but instead it bit me."

In [94]:
pattern = '(b[a-z]*t)'
display(re.findall(pattern, text))

['bet', 'bot', 'beat', 'bot', 'bat', 'but', 'bit']

### 8. Use a regular expression to find and extract all words that contain either "ea" or "eo" in them.

In [156]:
text = "During many of the peaks and troughs of history, the people living it didn't fully realize what was unfolding. But we all know we're navigating breathtaking history: Nearly every day could be — maybe will be — a book."


In [157]:
pattern = '\w*e[ao]\w*'
re.findall(pattern, text)

['peaks', 'people', 'realize', 'breathtaking', 'Nearly']

In [160]:
#or
pattern = '\w+ea\w+|\w+eo\w+'
re.findall(pattern, text)

['peaks', 'people', 'realize', 'breathtaking', 'Nearly']

### 9. Use a regular expression to find and extract all the capitalized words in the text below individually.

In [203]:
text = "Teddy Roosevelt and Abraham Lincoln walk into a bar."

In [204]:
pattern = '([A-Z][a-z]+)'

print(re.findall(pattern, text))

['Teddy', 'Roosevelt', 'Abraham', 'Lincoln']


### 10. Use a regular expression to find and extract all the sets of consecutive capitalized words in the text above.

In [206]:
pattern = '[A-Z][a-z]+ ?[A-Z][a-z]+' 

print(re.findall(pattern, text))

['Teddy Roosevelt', 'Abraham Lincoln']


### 11. Use a regular expression to find and extract all the quotes from the text below.

*Hint: This one is a little more complex than the single quote example in the lesson because there are multiple quotes in the text.*

In [297]:
text = 'Roosevelt says to Lincoln, "I will bet you $50 I can get the bartender to give me a free drink." Lincoln says, "I am in!"'


In [298]:

pattern = "(.*)"
re.findall(pattern, text)


['Roosevelt says to Lincoln, "I will bet you $50 I can get the bartender to give me a free drink." Lincoln says, "I am in!"',
 '']

In [299]:
# I don´t understand why this is working
pattern = '(?:"(.*?)")'
re.findall(pattern, text)

['I will bet you $50 I can get the bartender to give me a free drink.',
 'I am in!']

In [302]:
pattern = r'"(.*?)"'
re.findall(pattern, text)

['I will bet you $50 I can get the bartender to give me a free drink.',
 'I am in!']

### 12. Use a regular expression to find and extract all the numbers from the text below.

In [174]:
text = "There were 30 students in the class. Of the 30 students, 14 were male and 16 were female. Only 10 students got A's on the exam."


In [177]:
pattern = "\d+"
re.findall(pattern, text)

['30', '30', '14', '16', '10']

### 13. Use a regular expression to find and extract all the social security numbers from the text below.

In [289]:
text = """
Henry's social security number is 876-93-2289 and his phone number is (847)789-0984.
Darlene's social security number is 098-32-5295 and her phone number is (987)222-0901.
"""

In [290]:

pattern = '\d{3}-\d{2}-\d{4}'
re.findall(pattern, text)

['876-93-2289', '098-32-5295']

### 14. Use a regular expression to find and extract all the phone numbers from the text below.

In [291]:
pattern = '\D\d{3}\D\d{3}-\d{4}'
re.findall(pattern, text)

['(847)789-0984', '(987)222-0901']

### 15. Use a regular expression to find and extract all the formatted numbers (both social security and phone) from the text below.

In [292]:
#to do 
pattern = '\\bis [\d{3}-\d{2}-\d{4}]
re.findall(pattern, text)

SyntaxError: EOL while scanning string literal (3874918996.py, line 2)

In [293]:
#to do 
pattern = '\d{3}-\d{2}-\d{4}? \D\d{3}\D\d{3}-\d{4}'
re.findall(pattern, text)

[]

In [295]:
re_list = ['(\D\d{3}\D\d{3}-\d{4})','(\d{3}-\d{2}-\d{4})']

matches = []
for r in re_list:
   matches += re.findall(r, text)


In [296]:
matches 

['(847)789-0984', '(987)222-0901', '876-93-2289', '098-32-5295']