# Advanced Regular Expressions

Complete the following set of exercises to solidify your knowledge of regular expressions.

In [1]:
import re

### 1. Use a regular expression to find and extract all vowels in the following text.

In [2]:
text = "This is going to be a sentence with a good number of vowels in it."

In [4]:
vowels = re.findall('[aeiou]',text)
print(vowels)

['i', 'i', 'o', 'i', 'o', 'e', 'a', 'e', 'e', 'e', 'i', 'a', 'o', 'o', 'u', 'e', 'o', 'o', 'e', 'i', 'i']


### 2. Use a regular expression to find and extract all occurrences and tenses (singular and plural) of the word "puppy" in the text below.

In [9]:
text = "The puppy saw all the rest of the puppies playing and wanted to join them. I saw this and wanted a puppy of my own!"

In [14]:
re.findall('pupp\S*',text) #Find all the words starting by pupp and followed by a non white space and matches it 0 or more times.

['puppy', 'puppies', 'puppy']

### 3. Use a regular expression to find and extract all tenses (present and past) of the word "run" in the text below.

In [16]:
text = "I ran the relay race the only way I knew how to run it."

In [18]:
re.findall('r[ua]\S\s', text) #Since run/ran has only three letters and we now the combination for past tenses
#We ask the function to give us all the words that start with r and are followed by a or u + followed by a letter and then a space

['ran ', 'run ']

### 4. Use a regular expression to find and extract all words that begin with the letter "r" from the previous text.

In [20]:
re.findall('r\S*',text) #Give me all the words starting by r and followed by a non whitespace at least 0 or more times.

['ran', 'relay', 'race', 'run']

### 5. Use a regular expression to find and substitute the letter "i" for the exclamation marks in the text below.

In [21]:
text = "Th!s !s a sentence w!th spec!al characters !n !t."

In [22]:
re.sub('!','i',text)

'This is a sentence with special characters in it.'

### 6. Use a regular expression to find and extract words longer than 4 characters in the text below.

In [25]:
text = "This sentence has words of varying lengths."

In [42]:
re.findall(r'^[a-zA-Z]\S\S\S\s', text) #Give me all the words that have either lower or upper case and have three non whitespaces below and one whitespace after the fourth letter.
#Adding ^ for the beggining of the line

['This ']

### 7. Use a regular expression to find and extract all occurrences of the letter "b", some letter(s), and then the letter "t" in the sentence below. 
Ex. beat, bat & bot.

In [43]:
text = "I bet the robot couldn't beat the other bot with a bat, but instead it bit me."

In [47]:
re.findall('\sb[a-zA-Z]+t\s', text) #Finding all three letter words that start with b, have a letter or several in between, then a T and then nothing more.  

[' bet ', ' beat ', ' bot ', ' but ', ' bit ']

### 8. Use a regular expression to find and extract all words that contain either "ea" or "eo" in them.

In [53]:
text = "During many of the peaks and troughs of history, the people living it didn't fully realize what was unfolding. But we all know we're navigating breathtaking history: Nearly every day could be — maybe will be — a book."


In [61]:
re.findall('(\w+ea\w+|\w+eo\w+)', text)

['peaks', 'people', 'realize', 'breathtaking', 'Nearly']

### 9. Use a regular expression to find and extract all the capitalized words in the text below individually.

In [62]:
text = "Teddy Roosevelt and Abraham Lincoln walk into a bar."

In [63]:
re.findall('[A-Z]',text)

['T', 'R', 'A', 'L']

### 10. Use a regular expression to find and extract all the sets of consecutive capitalized words in the text above.

In [66]:
re.findall('[A-Z]\w+\s\w+',text) 
#Give me all the words that start with capital letters, followed by alphanumerical elements then by a white space and again alplhanumerical elements

['Teddy Roosevelt', 'Abraham Lincoln']

### 11. Use a regular expression to find and extract all the quotes from the text below.

*Hint: This one is a little more complex than the single quote example in the lesson because there are multiple quotes in the text.*

In [77]:
text = 'Roosevelt says to Lincoln, "I will bet you $50 I can get the bartender to give me a free drink." Lincoln says, "I am in!"'
re.findall('(?:"(.*?)")', text)
#

['I will bet you $50 I can get the bartender to give me a free drink.',
 'I am in!']

### 12. Use a regular expression to find and extract all the numbers from the text below.

In [80]:
text = "There were 30 students in the class. Of the 30 students, 14 were male and 16 were female. Only 10 students got A's on the exam."


In [83]:
re.findall('\d+', text)

['30', '30', '14', '16', '10']

### 13. Use a regular expression to find and extract all the social security numbers from the text below.

In [100]:
text = """
Henry's social security number is 876-93-2289 and his phone number is (847)789-0984.
Darlene's social security number is 098-32-5295 and her phone number is (987)222-0901.
"""

In [101]:
re.findall('[0-9]{3}-[0-9]{2}-[0-9]{4}', text) #Give me all the series of numbers that start with 3 numbers from 0-9
#Followed by a - and then followed by 2 numbers and a - and 4 numbers

['876-93-2289', '098-32-5295']

### 14. Use a regular expression to find and extract all the phone numbers from the text below.

In [104]:
re.findall('\(?[0-9]{3}\)?-?[0-9]{3}-[0-9]{4}', text)
#Give me all the series of numbers that start with a ( and then have three numbers and end with a ) and then have a - followed by 3 numbers and 4 numbers


['(847)789-0984', '(987)222-0901']

### 15. Use a regular expression to find and extract all the formatted numbers (both social security and phone) from the text below.

In [106]:
re.findall(('\(?[0-9]{3}\)?-?[0-9]{3}-[0-9]{4}|[0-9]{3}-[0-9]{2}-[0-9]{4}'), text)
#Same as two last exercises but putting them together with an or condition"

['876-93-2289', '(847)789-0984', '098-32-5295', '(987)222-0901']