# Quantifiers

This chapter continues coverage of the regex special characters, focusing on those that are used to match repeated characters. These metacharacters are called **quantifiers**, because they quantify the amount of repetition. The four quantifiers are `*`, `+`, `?`, and `{}`.

## The asterisk metacharacter `*`

The **asterisk** or **star** metacharacter matches the previous character **zero** or more times. For instance, the regex, `'Ah* No'` will look for strings that have an uppercase `'A'` followed by zero or more lowercase `'h'` followed by `' No'`. Let's see how this works on a Series of sample data:

In [None]:
import re
import pandas as pd

def find_pattern(s, pattern, **kwargs):
    filt = s.str.contains(pattern, **kwargs)
    return s[filt]

s = pd.Series(['Ouch', 'Ah No', 'Ahh', 'Nooo', 'Ahhhhhhh No', 'A No', 'A'])
s

Let's use our same function to filter for the matched values.

In [None]:
find_pattern(s, r'Ah* No')

Without the `' No'` at the end, two more values are matched.

In [None]:
pattern = r'Ah*'
find_pattern(s, r'Ah*')

### Using `.*` to match any number of characters

A common use for the asterisk metacharacter is to use it immediately after the dot to match any character any number of times. This is useful when matching a known start and end of text. Here, we match all movies titles beginning with `B` and ending with `d`. We use `.*` to match all characters any number of times between the first and last letters of the movie.

In [None]:
movie = pd.read_csv('../data/movie.csv')
title = movie['title']
find_pattern(title, r'^B.*d$').head()

## The plus sign metacharacter `+`

The **plus sign** metacharacter is very similar to the asterisk, except that it matches **one** or more of the previous character. For the regex `'Ah+ No'`, the `'h'` must appear at least once.

In [None]:
find_pattern(s, r'Ah+ No')

## The question mark metacharacter `?`

The question mark is similar to both the asterisk and the plus sign, except that it matches the previous character **zero or one** times exactly.

In [None]:
find_pattern(s, r'Ah? No')

The following regex pattern matches movie titles containing `Card` or `Cad` somewhere in them. The `?` metacharacter makes the character before it **optional**.

In [None]:
find_pattern(title, r'Car?d')

## The curly braces metacharacter `{m,n}`

The curly braces metacharacter allows you to control the exact number of repetitions of the previous character. There are four different ways to use the curly braces:

* A single integer `a{3}` - matches exactly three `'a'` characters in a row
* A single integer followed by a comma `a{3,}` - matches three or more `'a'` characters in a row
* A comma followed by a single integer `a{,3}` - matches zero to three `'a'` characters in a row
* Two integers separated by a comma `a{3,5}` - matches between 3 and 5 `'a'` characters in a row

Let's create another Series by hand and match all the values that begin with `'A'`, have the letter `'h'` repeat between 2 and 5 times and then followed by `' No'`.

In [None]:
s = pd.Series(['Ouch', 'Ahhh No', 'Ahh No', 'Nooo', 'Ahhhhhhh No', 
               'A No', 'A', 'Ahhh'])
s

In [None]:
pattern = r'Ah{2,5} No'
find_pattern(s, pattern)

## Exercises
Use the title column of the movie Series for these exercises.

### Exercise 1

<span style="color:green; font-size:16px">Find all movies that have `'z'` as their 15th character.</span>

### Exercise 2

<span style="color:green; font-size:16px">Find all movies that have the word `'Boy'` or `'Boys'` in them followed by a space.</span>

### Exercise 3

<span style="color:green; font-size:16px">Find all movies that have between 40 and 43 characters in them. Can you verify the results with another `str` accessor method?</span>

### Exercise 4

<span style="color:green; font-size:16px">Find all movies that begin with 'The' and end in 'Movie'.</span>

### Exercise 5

<span style="color:green; font-size:16px">Find all movies that begin with 'The' and end in 'Movie' and have no more than 10 characters between these two words.</span>

### Exercise 6

<span style="color:green; font-size:16px">Find all movies that begin with 'The' and end in 'Movie' and have at least 30 characters between these two words.</span>

### Exercise 7

<span style="color:green; font-size:16px">Find all movies that begin with capital `G` followed by at least one `o`, followed by a `d`.</span>

### Exercise 8

<span style="color:green; font-size:16px">Find all movies have either `Free` or `Fee` in them.</span>

### Exercise 9

<span style="color:green; font-size:16px">Find all movies that begin with any five characters followed by a space, followed by a `'t'` not case sensitive.</span>