# Advanced Regular Expressions Lab

Complete the following set of exercises to solidify your knowledge of regular expressions.

In [1]:
import re

### 1. Use a regular expression to find and extract all vowels in the following text.

In [66]:
text = "This is going to be a sentence with a good number of vowels in it."

In [67]:
print(re.findall(r'[aeiou]',text))

['i', 'i', 'o', 'i', 'o', 'e', 'a', 'e', 'e', 'e', 'i', 'a', 'o', 'o', 'u', 'e', 'o', 'o', 'e', 'i', 'i']


### 2. Use a regular expression to find and extract all occurrences and tenses (singular and plural) of the word "puppy" in the text below.

In [68]:
text = "The puppy saw all the rest of the puppies playing and wanted to join them. I saw this and wanted a puppy of my own!"

In [69]:

word = 'puppy'
word2 = 'puppies'
# the findall matches the word puppy and puppies, i concatenate the matches of the two words 
((re.findall(word, text, flags=re.IGNORECASE)) +re.findall(word2, text2, flags=re.IGNORECASE))

['puppy', 'puppy', 'puppies']

### 3. Use a regular expression to find and extract all tenses (present and past) of the word "run" in the text below.

In [70]:
text = "I ran the relay race the only way I knew how to run it."

In [71]:
pattern = 'r[au]n'
re.findall(pattern, text)

['ran', 'run']

### 4. Use a regular expression to find and extract all words that begin with the letter "r" from the previous text.

In [72]:
pattern = 'r[a-z]\w+'
re.findall(pattern, text)

['ran', 'relay', 'race', 'run']

### 5. Use a regular expression to find and substitute the letter "i" for the exclamation marks in the text below.

In [73]:
text = "Th!s !s a sentence w!th spec!al characters !n !t."

In [74]:
#the sub function from the regex substitutes i for exclamation mark in the string inside the variable text5

print(re.sub('i',  '!', text))

Th!s !s a sentence w!th spec!al characters !n !t.


### 6. Use a regular expression to find and extract words longer than 4 characters in the text below.

In [75]:
text = "This sentence has words of varying lengths."

In [82]:
#character class ( \w) followed by a + to return alphanumeric character sequences of a length greater than or equal to 1
pattern = '\w{4,}'
print(re.findall(pattern, text))

['Teddy', 'Roosevelt', 'Abraham', 'Lincoln', 'walk', 'into']


### 7. Use a regular expression to find and extract all occurrences of the letter "b", some letter(s), and then the letter "t" in the sentence below.

In [214]:
text = "I bet the robot couldn't beat the other bot with a bat, but instead it bit me."

In [221]:
b_occurences = "b.t"
re.findall(b_occurences,text)


['bet', 'bot', 'bot', 'bat', 'but', 'bit']

In [225]:
#Correct solution the one above don't return all the words
b_occurences = 'b[a-zA-Z]+t'
re.findall(b_occurences,text)


['bet', 'bot', 'beat', 'bot', 'bat', 'but', 'bit']

### 8. Use a regular expression to find and extract all words that contain either "ea" or "eo" in them.

In [191]:
text = "During many of the peaks and troughs of history, the people living it didn't fully realize what was unfolding. But we all know we're navigating breathtaking history: Nearly every day could be — maybe will be — a book."


In [195]:
ea_occurences = "[a-zA-Z]*ea[a-zA-Z]*"
eo_occurences = "[a-zA-Z]*eo[a-zA-Z]*"
re.findall(ea_occurences, text) + re.findall(eo_occurences, text)
#re.findall(eo_occurences, text)

['peaks', 'realize', 'breathtaking', 'Nearly', 'people']

### 9. Use a regular expression to find and extract all the capitalized words in the text below individually.

In [88]:
text = "Teddy Roosevelt and Abraham Lincoln walk into a bar."

In [89]:
pattern = '[A-Z][a-z]+'
print(re.findall(pattern, text))

['Teddy', 'Roosevelt', 'Abraham', 'Lincoln']


### 10. Use a regular expression to find and extract all the sets of consecutive capitalized words in the text above.

In [93]:
pattern = '([A-Z][a-z]+ ?[A-Z][a-z]+)|([A-Z][a-z]+)'
print(re.findall(pattern, text))


[('Teddy Roosevelt', ''), ('Abraham Lincoln', '')]


In [94]:
results = [i for j in re.findall(pattern, text) for i in j if i != '']
print(results)

['Teddy Roosevelt', 'Abraham Lincoln']


### 11. Use a regular expression to find and extract all the quotes from the text below.

*Hint: This one is a little more complex than the single quote example in the lesson because there are multiple quotes in the text.*

In [150]:
text = 'Roosevelt says to Lincoln, "I will bet you $50 I can get the bartender to give me a free drink." Lincoln says, "I am in!"'


In [151]:
pattern = '".*"'
re.findall(pattern, text)

['"I will bet you $50 I can get the bartender to give me a free drink." Lincoln says, "I am in!"']

### 12. Use a regular expression to find and extract all the numbers from the text below.

In [104]:
text = "There were 30 students in the class. Of the 30 students, 14 were male and 16 were female. Only 10 students got A's on the exam."


In [105]:
pattern = '[0-9]+'
re.findall(pattern, text)

['30', '30', '14', '16', '10']

### 13. Use a regular expression to find and extract all the social security numbers from the text below.

In [114]:
text = """
Henry's social security number is 876-93-2289 and his phone number is (847)789-0984.
Darlene's social security number is 098-32-5295 and her phone number is (987)222-0901.
"""

In [115]:
#regular expression consisting of \d+- character classes is going to find all digits that apear 1 or more times and stop in the hypen
pattern ='\d+-\d+-\d+'
re.findall(pattern, text)

['876-93-2289', '098-32-5295']

### 14. Use a regular expression to find and extract all the phone numbers from the text below.

In [142]:
pattern ='\(\d\d\d\)\d\d\d-\d\d\d'
re.findall(pattern, text)

['(847)789-098', '(987)222-090']

### 15. Use a regular expression to find and extract all the formatted numbers (both social security and phone) from the text below.

In [148]:
pattern ='(\(\d\d\d\)\d\d\d-\d\d\d)'
pattern2 ='\d+-\d+-\d+'
re.findall(pattern, text)+re.findall(pattern2, text)

['(847)789-098', '(987)222-090', '876-93-2289', '098-32-5295']