# Quantifiers

A quantifier metacharacter immediately follows a portion of a <regex> and indicates how many times that portion must occur for the match to succeed.

## *

Matches zero or more repetitions of the preceding regex.

## +

Matches one or more repetitions of the preceding regex.

## ?

Matches zero or one repetitions of the preceding regex.

In [2]:
import re

test_string = "What stores are open (Labor Day)? (Lowe's, Walmart, Target, \
Kohl's and JCPenney) are, but (Costco) is closed Monday."

In [8]:
re.findall('\w+[t]', test_string)

['What', 'st', 'Walmart', 'Target', 'but', 'Cost']

In [14]:
re.findall('\w+tbb', test_string)

['What', 'st', 'Walmart', 'Target', 'but', 'Cost']

# Anchors

# Escape Character

The backslash \ removes the special meaning of a metacharacter.

Special characters are 

. (dot) ^ (caret) $ (dollar sign) * (asterisk) + ? { (open brace/curly bracket) } 

[(open bracket) ] \ (backslash) | (pipe) ( )

# Look-behind and Look ahead

To capture the parenthetical text, we’ll use some special regex, so the first step is to describe the pattern for which we are searching:

1. text after an opening paren

2. text before a closing paren

3. “text” can be any character

Let’s tackle the opening paren. We can use a look-around assertion for this. To describe the content that begins after an opening paren, we’ll use a look-behind.

In [1]:
# We can specify a look-behind characters in the following pattern: (?<= + <Pattern> + )


look_behind = "(?<=\()"

Let’s take a similar approach for our closing paren. We'll use a look ahead to describe the content that ends before a closing paren

In [None]:
# We can specify a look-behind characters in the following pattern: (?= + <Pattern> + )

look_ahead = "(?=\))"

In [15]:
test_string = "What are the 5 most popular places in Ann Arbor, Michigan? They are (Michigan Stadium), \
(University of Michigan), (Matthaei Botanical Gardens), (Law Quadrangle) and (Gallup Park)."

In [18]:
# We now need a way to describe our “text” indicated by <PATTERN>
# We will use a . here because in regular expressions, the period ( . , also called "dot") 
# is the wildcard pattern which matches any single character. 

pattern = "(?<=\()(.+?)(?=\))"

for item in re.finditer(pattern, test_string):
    print(item.group())

Michigan Stadium
University of Michigan
Matthaei Botanical Gardens
Law Quadrangle
Gallup Park


In [19]:
re.findall(pattern, test_string)

['Michigan Stadium',
 'University of Michigan',
 'Matthaei Botanical Gardens',
 'Law Quadrangle',
 'Gallup Park']

# Pandas

Series and Dataframe

In [3]:
import numpy as np
import pandas as pd
students = [{'Name': 'Alice',
              'Class': 'Physics',
              'Score': 85},
            {'Name': 'Jack',
             'Class': 'Chemistry',
             'Score': 82},
            {'Name': 'Helen',
             'Class': 'Biology',
             'Score': 90}]

# Then we pass this list of dictionaries into the DataFrame function
df = pd.DataFrame(students, index=['school1', 'school2', 'school1'])
# And lets print the head again
df.head()

Unnamed: 0,Name,Class,Score
school1,Alice,Physics,85
school2,Jack,Chemistry,82
school1,Helen,Biology,90


In [4]:
# Then we can call .loc on the transpose to get the student names only
df.T.loc['Name']

school1    Alice
school2     Jack
school1    Helen
Name: Name, dtype: object

In [5]:
df.loc[:,['Name', 'Score']]

Unnamed: 0,Name,Score
school1,Alice,85
school2,Jack,82
school1,Helen,90


In [6]:
df.drop('school1')

Unnamed: 0,Name,Class,Score
school2,Jack,Chemistry,82
