![Ironhack logo](https://i.imgur.com/1QgrNNw.png)

# Lab | Advanced Regular Expressions

## Introduction

In the Advanced Regular Expressions lesson, you learned about the various components of regular expressions as well as how to use them both in isolation and together with other components.

In this lab, you will practice putting together your own regular expressions from scratch. Some of the examples are similar to the ones we went over in the lesson, while others have slight modifications included to test your knowledge and ensure that you grasp the concepts covered.

## Getting Started

There are a bunch of questions to be solved. If you get stuck in one exercise you can skip to the next one. Read each instruction carefully and provide your answer beneath it.

## Resources

- [Regular Expression Operations | Python Documentation](https://docs.python.org/3/library/re.html)
- [Regular Expression How To | Python Documentation](https://docs.python.org/3/howto/regex.html)
- [Python - Regular Expressions | TutorialsPoint](https://www.tutorialspoint.com/python/python_reg_expressions.htm)

# Advanced Regular Expressions Lab

Complete the following set of exercises to solidify your knowledge of regular expressions.

In [1]:
import re

### 1. Use a regular expression to find and extract all vowels in the following text.

In [3]:
text = "This is going to be a sentence with a good number of vowels in it."

In [12]:
pattern='[aeiou]'
re.findall(pattern,text)

['e',
 'u',
 'a',
 'a',
 'e',
 'e',
 'o',
 'e',
 'u',
 'i',
 'e',
 'a',
 'i',
 'a',
 'a',
 'e',
 'o',
 'o',
 'i',
 'e',
 'a',
 'i',
 'a',
 'a',
 'e',
 'a',
 'u',
 'o',
 'o']

### 2. Use a regular expression to find and extract all occurrences and tenses (singular and plural) of the word "puppy" in the text below.

In [10]:
text = "The puppy saw all the rest of the puppies playing and wanted to join them. I saw this and wanted a puppy of my own!"

In [15]:
pattern='pupp[yi]e?s?'
re.findall(pattern,text)

['puppy', 'puppies', 'puppy']

### 3. Use a regular expression to find and extract all tenses (present and past) of the word "run" in the text below.

In [70]:
text = "I ran the relay race the only way I knew how to run it."

In [71]:
pattern = 'r[ua]n'
re.findall(pattern,text)

['ran', 'run']

### 4. Use a regular expression to find and extract all words that begin with the letter "r" from the previous text.

In [72]:
#Using for:
lista=[]
a=text.split()
for i in a:
    if i.startswith('r'):
        lista.append(i)
lista

['ran', 'relay', 'race', 'run']

In [78]:
#Using regular expression:
pattern='r\w+'
re.findall(pattern,text)

['ran', 'relay', 'race', 'run']

### 5. Use a regular expression to find and substitute the letter "i" for the exclamation marks in the text below.

In [None]:
text = "Th!s !s a sentence w!th spec!al characters !n !t."

In [51]:
text = "Th!s !s a sentence w!th spec!al characters !n !t."
re.sub('!','i',text)

'This is a sentence with special characters in it.'

### 6. Use a regular expression to find and extract words longer than 4 characters in the text below.

In [85]:
text = "This sentence has words of varying lengths."

In [89]:
re.findall('\w\w\w\w\w+', text)

['sentence', 'words', 'varying', 'lengths']

### 7. Use a regular expression to find and extract all occurrences of the letter "b", some letter(s), and then the letter "t" in the sentence below.

In [99]:
text = "I bet the robot couldn't beat the other bot with a bat, but instead it bit me."

In [102]:
re.findall('b+.t', text)

['bet', 'bot', 'bot', 'bat', 'but', 'bit']

### 8. Use a regular expression to find and extract all words that contain either "ea" or "eo" in them.

In [104]:
text = "During many of the peaks and troughs of history, the people living it didn't fully realize what was unfolding. But we all know we're navigating breathtaking history: Nearly every day could be — maybe will be — a book."

In [127]:
re.findall('\w+ea\w+|\w+eo+\w+', text)

['peaks', 'people', 'realize', 'breathtaking', 'Nearly']

### 9. Use a regular expression to find and extract all the capitalized words in the text below individually.

In [132]:
text = "Teddy Roosevelt and Abraham Lincoln walk into a bar."

In [146]:
pattern='[A-Z][a-z]+'
re.findall(pattern,text)

['Teddy', 'Roosevelt', 'Abraham', 'Lincoln']

### 10. Use a regular expression to find and extract all the sets of consecutive capitalized words in the text above.

In [152]:
pattern='[A-Z][a-z]+ [A-Z][a-z]+'
re.findall(pattern,text)

['Teddy Roosevelt', 'Abraham Lincoln']

### 11. Use a regular expression to find and extract all the quotes from the text below.

*Hint: This one is a little more complex than the single quote example in the lesson because there are multiple quotes in the text.*

In [218]:
text = 'Roosevelt says to Lincoln, "I will bet you $50 I can get the bartender to give me a free drink." Lincoln says, "I am in!"'

In [221]:
pattern = '\"(.*?)\"'
re.findall(pattern,text)

['I will bet you $50 I can get the bartender to give me a free drink.',
 'I am in!']

### 12. Use a regular expression to find and extract all the numbers from the text below.

In [168]:
text = "There were 30 students in the class. Of the 30 students, 14 were male and 16 were female. Only 10 students got A's on the exam."
pattern = '\d+'
print(re.findall(pattern, text))

['30', '30', '14', '16', '10']


### 13. Use a regular expression to find and extract all the social security numbers from the text below.

In [169]:
text = """
Henry's social security number is 876-93-2289 and his phone number is (847)789-0984.
Darlene's social security number is 098-32-5295 and her phone number is (987)222-0901.
"""

In [171]:
pattern = '\d\d\d\-\d\d\-\d\d\d\d'
print(re.findall(pattern, text))

['876-93-2289', '098-32-5295']


### 14. Use a regular expression to find and extract all the phone numbers from the same text.

In [173]:
pattern = '\(\d\d\d\)\d\d\d\-\d\d\d\d'
print(re.findall(pattern, text))

['(847)789-0984', '(987)222-0901']


### 15. Use a regular expression to find and extract all the formatted numbers (both social security and phone) from the text.

In [175]:
pattern = '\d\d\d\-\d\d\-\d\d\d\d|\(\d\d\d\)\d\d\d\-\d\d\d\d'
print(re.findall(pattern, text))

['876-93-2289', '(847)789-0984', '098-32-5295', '(987)222-0901']
