# Study Questions for Regex Quiz 

https://regexone.com/references/python

## The basics:

Completed assigned and supplemental questions in RegexOne 
https://regexone.com/problem/matching_phone_numbers?

## Regex in Python

Raw strings = r' '

## Matching a String

re.search()
* if no match, will return None
* if match, will return a re.MatchObject with info about the part of the straing that matched
* NOTE: stops after the first match

Sample: matchObject = re.search(pattern, input_str, flags=0)

#### start() and end() method

In [5]:
import re

regex = r"(\w+) (\d+)"
if re.search(regex, "June 24"):  
    # MatchObject's start() and end() method --> shows where it matches 
    match = re.search(regex, "June 24")
    # prints [0, 7) b/c it matches the beginning and end of the string
    print("Match at index %s, %s" % (match.start(), match.end()))
    print("Match from", match.start(), "and", match.end())
else:
    # If re.search() does not match, then None is returned
    print("The regex pattern does not match. :(")

Match at index 0, 7
Match from 0 and 7


#### group() method

* match.group(0) always returns the fully matched string
* match.group(1), match.group(2), ... will return the capture
    * groups in order from left to right in the input string
* match.group() is equivalent to match.group(0)

In [7]:
regex = r"(\w+) (\d+)"
if re.search(regex, "June 24"):  
    # MatchObject's start() and end() method --> shows where it matches 
    match = re.search(regex, "June 24")
    # this will print the entire string: "June 24"
    print("Full match: %s" % (match.group(0)))
    # again:
    print("Full match:", match.group())
    # this will print the first capture group: "June"
    print("Month: %s" % (match.group(1)))
    # this will print the second capture group: "24"
    print("Day: %s" % (match.group(2)))

Full match: June 24
Full match: June 24
Month: June
Day: 24


## Capturing groups

re.findall()

* global search over ENTIRE string
* if there are capture groups in pattern: returns a list with captured data
* if no capture groups: returns a list of matches
* if no match: returns empty list

Sample: matchList = re.findall(pattern, input_str, flags=0)   

In [8]:
import re

# no capture group, just match
regex = r"\w+ \d+"
matches = re.findall(regex, "June 24, August 9, Dec 12")
for match in matches:
    # This will print:
    #   June 24
    #   August 9
    #   Dec 12
    print("Full match:", match)

# capture the specific months of each date
regex = r"(\w+) \d+"
matches = re.findall(regex, "June 24, August 9, Dec 12")
for match in matches:
    # This will now print:
    #   June
    #   August
    #   Dec
    print("Match month:", match)

Full match: June 24
Full match: August 9
Full match: Dec 12
Match month: June
Match month: August
Match month: Dec


re.findinter()

* returns an iterator of re.MatchObjects to walk through

Sample: matchList = re.finditer(pattern, input_str, flags=0)

In [12]:
# If we need the exact positions of each match
regex = r"([a-zA-Z]+) \d+"
matches = re.finditer(regex, "Ju9ne 24, August 9, Dec 12")
for match in matches:
    # This will now print:
    #   0 7
    #   9 17
    #   19 25
    # which corresponds with the start and end of each match in the input string
    print("Match at index: %s, %s" % (match.start(), match.end()))

Match at index: 3, 8
Match at index: 10, 18
Match at index: 20, 26


## Finding and replacing strings

re.sub()

* optional count argument --> exact # of replacements 
* if count is less than or equal to 0 --> every match in string is replaced!

Sample: replacedString = re.sub(pattern, replacement_pattern, input_str, count, flags=0)

In [11]:
import re

# Reverse the order of the day and month in a date string
regex = r"(\w+) (\d+)"

# This will reorder the string and print:
#   24 of June, 9 of August, 12 of Dec
print(re.sub(regex, r"\2 of \1", "June 24, August 9, Dec 12"))

24 of June, 9 of August, 12 of Dec


## re Flags

The regex expressions in this page can take flags.

* re.IGNORECASE --> makes the pattern case insensitive
* re.MULTILINE --> is necessary if your input string has newlines (\n)
    * start and end (^ and $) at the beginning and end of each line, NOT at the beginning and end of the whole input string
* re.DOTALL --> dot (.) can match all characters, including the newline character (\n)

## Compiling a pattern for performance

* Compiles regular expression patterns if you have a lot of input strings
* Returned object is the same as those above, BUT you don't need patterns/flags for each call

Sample: regexObject = re.compile(pattern, flags=0)

In [13]:
import re

regex = re.compile(r"(\w+) World")
result = regex.search("Hello World is the easiest")
if result:
    # This will print:
    #   0 11
    print(result.start(), result.end())
    # This will print:
    #   Hello
    print(result.group(1))
    print(result.group(0))
    

# This will print:
#   Hello
#   Bonjour
# for each of the captured groups that matched
for result in regex.findall("Hello World, Bonjour World"):
    print(result)

# This will substitute "World" with "Earth" and print:
#   Hello Earth
print(regex.sub(r"\1 Earth", "Hello World"))

0 11
Hello
Hello World
Hello
Bonjour
Hello Earth
