# Advanced Regular Expressions Lab

Complete the following set of exercises to solidify your knowledge of regular expressions.

In [3]:
import re

* https://www.programiz.com/python-programming/regex

##### Test regex
* https://www.regexpal.com/97161

#### search or match?
* https://docs.python.org/3/library/re.html#matching-vs-searching

#### Regex cheatsheet
* https://www.debuggex.com/cheatsheet/regex/python

In [6]:
# define a function that finds patterns in a given text
def find_pattern(regex_pattern, text=text):
    result = re.findall(regex_pattern, text)
    return(result)

### 1. Use a regular expression to find and extract all vowels in the following text.

In [5]:
text = "This is going to be a sentence with a good number of vowels in it."

In [7]:
vowels = '[aeiouyAEIOUY]'

In [8]:
print(find_pattern(vowels))

['i', 'i', 'o', 'i', 'o', 'e', 'a', 'e', 'e', 'e', 'i', 'a', 'o', 'o', 'u', 'e', 'o', 'o', 'e', 'i', 'i']


In [9]:
#You can use the function re.findall directly instead of defining the "find_pattern" function
print(re.findall('[aeiouyAEIOUY]', text))

['i', 'i', 'o', 'i', 'o', 'e', 'a', 'e', 'e', 'e', 'i', 'a', 'o', 'o', 'u', 'e', 'o', 'o', 'e', 'i', 'i']


### 2. Use a regular expression to find and extract all occurrences and tenses (singular and plural) of the word "puppy" in the text below.

In [10]:
text = "The puppy saw all the rest of the puppies playing and wanted to join them. I saw this and wanted a puppy of my own!"

In [11]:
# There are several ways of writing this pattern in regex. Write it in the way you feel more confortable.
pattern = 'pupp[y|ies]\S*'

In [361]:
print(find_pattern(pattern, text))

['puppy', 'puppies', 'puppy']


In [362]:
print(re.findall(pattern, text))

['puppy', 'puppies', 'puppy']


### 3. Use a regular expression to find and extract all tenses (present and past) of the word "run" in the text below.

In [363]:
text = "I ran the relay race the only way I knew how to run it."

In [364]:
pattern = 'r[a|u]n'

In [365]:
print(find_pattern(pattern, text))

['ran', 'run']


In [366]:
print(re.findall(pattern, text))

['ran', 'run']


### 4. Use a regular expression to find and extract all words that begin with the letter "r" from the previous text.

###### In order to find a word we have to define the starting letter, in this case 'r', and than `\S` until the and of the word with `.*`

* \S = anything except a whitespace (newline, tab, space)
* .* = zero or more of anything but newline

In [367]:
print(re.findall('r\S*', text))

['ran', 'relay', 'race', 'run']


### 5. Use a regular expression to find and substitute the letter "i" for the exclamation marks in the text below.

In [368]:
text = "Th!s !s a sentence w!th spec!al characters !n !t."

In [369]:
print(text.replace("!", "i"))

This is a sentence with special characters in it.


### 6. Use a regular expression to find and extract words longer than 4 characters in the text below.

In [370]:
text = "This sentence has words of varying lengths."

In [371]:
#[a-zA-Z] will search for upper and lower case letters. 
#{start,finish} is used to determine the amount of repetitions of the choosen pattern
#{5,} will search for 5 or more repetitions of consecutive letters(can't hace any no letter character in between).

pattern = '[a-zA-Z]{5,}'

In [372]:
print(re.findall(pattern, text))

['sentence', 'words', 'varying', 'lengths']


### 7. Use a regular expression to find and extract all occurrences of the letter "b", some letter(s), and then the letter "t" in the sentence below.

In [373]:
text = "I bet the robot couldn't beat the other bot with a bat, but instead it bit me."

In [374]:
print(re.findall('b|t', text))

['b', 't', 't', 'b', 't', 't', 'b', 't', 't', 't', 'b', 't', 't', 'b', 't', 'b', 't', 't', 't', 'b', 't']


### 8. Use a regular expression to find and extract all words that contain either "ea" or "eo" in them.

In [14]:
text = "During many of the peaks and troughs of history, the people living it didn't fully realize what was unfolding. But we all know we're navigating breathtaking history: Nearly every day could be — maybe will be — a book."


In [15]:
pattern = '\S*ea\S*|\S*eo\S*'

In [16]:
re.findall(pattern, text)

['peaks', 'people', 'realize', 'breathtaking', 'Nearly']

### 9. Use a regular expression to find and extract all the capitalized words in the text below individually.

In [378]:
text = "Teddy Roosevelt and Abraham Lincoln walk into a bar."

In [379]:
pattern = '[A-Z]\S*'

In [380]:
re.findall(pattern, text)

['Teddy', 'Roosevelt', 'Abraham', 'Lincoln']

### 10. Use a regular expression to find and extract all the sets of consecutive capitalized words in the text above.

In [382]:
# https://stackoverflow.com/questions/9525993/get-consecutive-capitalized-words-using-regex

# https://stackoverflow.com/questions/31570699/regex-to-get-consecutive-capitalized-words-with-one-or-more-words-doesnt-work/31570772

pattern = '([A-Z][a-z]+(?=\s[A-Z])(?:\s[A-Z][a-z]+)+)'

In [383]:
re.findall(pattern, text)

['Teddy Roosevelt', 'Abraham Lincoln']

### 11. Use a regular expression to find and extract all the quotes from the text below.

*Hint: This one is a little more complex than the single quote example in the lesson because there are multiple quotes in the text.*

In [398]:
text = 'Roosevelt says to Lincoln, "I will bet you $50 I can get the bartender to give me a free drink." Lincoln says, "I am in!"'


* https://www.regular-expressions.info/brackets.html

In [397]:
print(re.findall(r'"(.*?)"', text))

['I will bet you $50 I can get the bartender to give me a free drink.', 'I am in!']


### 12. Use a regular expression to find and extract all the numbers from the text below.

In [386]:
text = "There were 30 students in the class. Of the 30 students, 14 were male and 16 were female. Only 10 students got A's on the exam."


In [387]:
print(re.findall('\d{1,}', text))

['30', '30', '14', '16', '10']


### 13. Use a regular expression to find and extract all the social security numbers from the text below.

In [388]:
text = """
Henry's social security number is 876-93-2289 and his phone number is (847)789-0984.
Darlene's social security number is 098-32-5295 and her phone number is (987)222-0901.
"""

In [389]:
print(re.findall('\d{1,}-\d{1,}-\d{1,}', text))

['876-93-2289', '098-32-5295']


### 14. Use a regular expression to find and extract all the phone numbers from the text below.

In [390]:
print(re.findall('\(\d{1,}\)\d{1,}-\d{1,}', text))

['(847)789-0984', '(987)222-0901']


### 15. Use a regular expression to find and extract all the formatted numbers (both social security and phone) from the text below.

In [391]:
print(re.findall('\d{3,}[-|)]\d{2,}-\d{2,}', text))

['876-93-2289', '847)789-0984', '098-32-5295', '987)222-0901']
