# Advanced Regular Expressions Lab

Complete the following set of exercises to solidify your knowledge of regular expressions.

In [1]:
import re

### 1. Use a regular expression to find and extract all vowels in the following text.

In [2]:
text = "This is going to be a sentence with a good number of vowels in it."
pattern = '[aeiou]'
re.findall(pattern, text)

['i',
 'i',
 'o',
 'i',
 'o',
 'e',
 'a',
 'e',
 'e',
 'e',
 'i',
 'a',
 'o',
 'o',
 'u',
 'e',
 'o',
 'o',
 'e',
 'i',
 'i']

In [None]:
#findall() method to extract every instance in the text that matches the regular expression

### 2. Use a regular expression to find and extract all occurrences and tenses (singular and plural) of the word "puppy" in the text below.

In [68]:
text = "The puppy saw all the rest of the puppies playing and wanted to join them. I saw this and wanted a puppy of my own!"
pattern = 'puppy|puppies'
re.findall(pattern, text)
#pattern = 'pupp[y|ies]' # how to use the [ies] in there to find also the puppies instances?
#re.findall(pattern, text)

['puppy', 'puppies', 'puppy']

### 3. Use a regular expression to find and extract all tenses (present and past) of the word "run" in the text below.

In [10]:
text = "I ran the relay race the only way I knew how to run it."
pattern = 'r[au]n'
re.findall(pattern, text)

['ran', 'run']

### 4. Use a regular expression to find and extract all words that begin with the letter "r" from the previous text.

In [88]:
text = "I ran the relay race the only way I knew how to run it."
text1 = re.findall(r'[r]\w+', text)
print(text1)

['ran', 'relay', 'race', 'run']


### 5. Use a regular expression to find and substitute the letter "i" for the exclamation marks in the text below.

In [90]:
text = "Th!s !s a sentence w!th spec!al characters !n !t."
#pattern = '[!]'
#re.findall(pattern, text)
print(re.sub('[!]', 'i', text))

This is a sentence with special characters in it.


### 6. Use a regular expression to find and extract words longer than 4 characters in the text below.

In [3]:
text = "This sentence has words of varying lengths."
pattern = '\w{4,}' #the comma ensures it is a full word and not just parts of a word
re.findall(pattern, text)
#paolo: good approach, longer than four meaning minimum 5
# so \w{5,}

['This', 'sentence', 'words', 'varying', 'lengths']

### 7. Use a regular expression to find and extract all occurrences of the letter "b", some letter(s), and then the letter "t" in the sentence below.

In [134]:
text = "I bet the robot couldn't beat the other bot with a bat, but instead it bit me."
pattern = 'b.t|b..t' #the .|.. replaces any potential letter in between b and t
re.findall(pattern, text)
#paolo:yes!

['bet', 'bot', 'beat', 'bot', 'bat', 'but', 'bit']

### 8. Use a regular expression to find and extract all words that contain either "ea" or "eo" in them.

In [124]:
text = "During many of the peaks and troughs of history, the people living it didn't fully realize what was unfolding. But we all know we're navigating breathtaking history: Nearly every day could be — maybe will be — a book."
#pattern = '[ea|eo]'
#re.findall(pattern, text)
text1 = re.findall('\w*ea\w*', text)
text2 = re.findall('\w*eo\w*', text)
print(text1+text2)
#paolo: yes - and could you try in a single line?

['peaks', 'realize', 'breathtaking', 'Nearly', 'people']


### 9. Use a regular expression to find and extract all the capitalized words in the text below individually.

In [45]:
text = "Teddy Roosevelt and Abraham Lincoln walk into a bar."
pattern = '[A-Z][a-z]+' #the first [A-Z] ensures the word starts with a capital letter, then the [a-z]+ returns the rest of the word
re.findall(pattern, text)

['Teddy', 'Roosevelt', 'Abraham', 'Lincoln']

### 10. Use a regular expression to find and extract all the sets of consecutive capitalized words in the text above.

In [50]:
text = "Teddy Roosevelt and Abraham Lincoln walk into a bar."
pattern = '([A-Z][a-z]+ ?[A-Z][a-z]+)' #|([A-Z][a-z]+)' <-we use this additional part if we have single words capitalized that we would also want to retrieve
re.findall(pattern, text)

['Teddy Roosevelt', 'Abraham Lincoln']

### 11. Use a regular expression to find and extract all the quotes from the text below.

*Hint: This one is a little more complex than the single quote example in the lesson because there are multiple quotes in the text.*

In [54]:
text = 'Roosevelt says to Lincoln, "I will bet you $50 I can get the bartender to give me a free drink." Lincoln says, "I am in!"'
pattern = '".*"'
print(re.findall(pattern, text)) #for me the part not in quotes (Lincoln says) it;s also pulled and I do not understand why
#paolo: I refer you to the solution for this one-
# hint: a question mark could help here

['"I will bet you $50 I can get the bartender to give me a free drink." Lincoln says, "I am in!"']


### 12. Use a regular expression to find and extract all the numbers from the text below.

In [35]:
text = "There were 30 students in the class. Of the 30 students, 14 were male and 16 were female. Only 10 students got A's on the exam."
pattern = '\d+'
print(re.findall(pattern, text))

['30', '30', '14', '16', '10']


### 13. Use a regular expression to find and extract all the social security numbers from the text below.

In [56]:
text = """
Henry's social security number is 876-93-2289 and his phone number is (847)789-0984.
Darlene's social security number is 098-32-5295 and her phone number is (987)222-0901.
"""
pattern = '\d+-\d+-\d+'
print(re.findall(pattern, text))

['876-93-2289', '098-32-5295']


### 14. Use a regular expression to find and extract all the phone numbers from the text below.

In [145]:
text = """
Henry's social security number is 876-93-2289 and his phone number is (847)789-0984.
Darlene's social security number is 098-32-5295 and her phone number is (987)222-0901.
"""
pattern = '\d{3}[-\.\s]??\d{3}[-\.\s]??\d{4}|\(\d{3}\)\s*\d{3}[-\.\s]??\d{4}|\d{3}[-\.\s]??\d{4}' #this is code found online for capturing phone numbers in this format
print(re.findall(pattern, text))

['(847)789-0984', '(987)222-0901']


### 15. Use a regular expression to find and extract all the formatted numbers (both social security and phone) from the text below.

In [146]:
text = """
Henry's social security number is 876-93-2289 and his phone number is (847)789-0984.
Darlene's social security number is 098-32-5295 and her phone number is (987)222-0901.
"""
pattern = '\d+-\d+-\d+|\d{3}[-\.\s]??\d{3}[-\.\s]??\d{4}|\(\d{3}\)\s*\d{3}[-\.\s]??\d{4}|\d{3}[-\.\s]??\d{4}' #use the | to find either OR of the types of numbers
print(re.findall(pattern, text))

['876-93-2289', '(847)789-0984', '098-32-5295', '(987)222-0901']
