<a href="https://colab.research.google.com/github/UIHackyHour/AutomateTheBoringSweigart/blob/main/07-pattern-matching-with-regular-expressions/ABS_Chap_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Summary

While a computer can search for text quickly, it must be told precisely what to look for. Regular expressions allow you to specify the pattern of characters you are looking for, rather than the exact text itself. In fact, some word processing and spreadsheet applications provide find-and-replace features that allow you to search using regular expressions.

The re module that comes with Python lets you compile Regex objects. These objects have several methods: `search()` to find a single match, `findall()` to find all matching instances, and `sub()` to do a find-and-replace substitution of text.

# Definitions

* __Regular expressions__: Regular expressions, called regexes for short, are descriptions for a pattern of text, and allow you to specify a pattern of text to search for.

* __Pipe__: Allows for more than one expression to be searched.

* __Greedy__: By default, Python interprets ambiguous regular expressions by the longest possible sting.



# New functions covered in this chapter

*   `\re.compile()` requires `import re`
*   `re.DOTALL`requires `import re`
*   `re.IGNORECASE` requires `import re`
*   `re.VERBOSE` requires `import re`
*   `search()`
*   `group()`
*   `groups()`
*   `findall()`
*   `sub()`



### Try using these functions, then explain what you think these functions are doing. 
#### Google them to learn more! 
(Googling is a very important skill when programming)

# Regex cheat sheet

*   `r''` raw text string
*   `\d` == `(0|1|2|3|4|5|6|7|8|9)`
*   `\D` =! `(0|1|2|3|4|5|6|7|8|9)`
*   `\w` == `[a-zA-Z0-9_]`
*   `\W` =! `[a-zA-Z0-9_]`
*   `\s` == `[\n\t ]`
*   `\S` =! `[\n\t ]`
*   `\d{3}` == `\d\d\d`
*   `()?` not greedy
*   `()*` match zero or more
*   `()+` match one or more
*   `(){3}` match 3 times
*   `(){3,5}` match 3, 4, or 5 (greedy)
*   `(){3,5}?` not greedy
*   `[a-zA-Z]`
*   `.` anything but a newline
*   `.*` anything but a newline of any length

# Code Snippets

In [None]:
# isPhoneNumber.py

def isPhoneNumber(text):
    if len(text) != 12:
        return False
    for i in range(0, 3):
        if not text[i].isdecimal():
            return False
    if text[3] != '-':
        return False
    for i in range(4, 7):
        if not text[i].isdecimal():
            return False
    if text[7] != '-':
        return False
    for i in range(8, 12):
        if not text[i].isdecimal():
            return False
    return True

print('Is 415-555-4242 a phone number?')
print(isPhoneNumber('415-555-4242'))
print('Is Moshi moshi a phone number?')
print(isPhoneNumber('Moshi moshi'))

In [None]:
# isPhoneNumber.py with for loop search

def isPhoneNumber(text):
    if len(text) != 12:
        return False
    for i in range(0, 3):
        if not text[i].isdecimal():
            return False
    if text[3] != '-':
        return False
    for i in range(4, 7):
        if not text[i].isdecimal():
            return False
    if text[7] != '-':
        return False
    for i in range(8, 12):
        if not text[i].isdecimal():
            return False
    return True

message = 'Call me at 415-555-1011 tomorrow. 415-555-9999 is my office.'
for i in range(len(message)):
    chunk = message[i:i+12]
    if isPhoneNumber(chunk):
        print('Phone number found: ' + chunk)
print('Done')

# Practice Questions

1. What is the function that creats `Regex` objects?
1. Why are raw strings often used when creating `Regex` objects?
1. What does the `search()` method return?
1. How do you get the actual strings that match the pattern from a `Match` object?
1. In the regex created from `r'(\d\d\d)-(\d\d\d-\d\d\d\d)'`, what does group 0 cover? Group 1? Group 2?
1. Parentheses and periods have specific meanings in regular expression syntax. How would you specify that you want a regex to match actual parentheses and period characters?
1. The `findall()` method returns a list of strings or a list of tuples of strings. What makes it return one or the other?
1. What does the `|` character signify in regular expressions?
1. What two things does the `?` character signify in regular expressions?
1. What is the difference between the `+` and `*` characters in regular expressions?
1. What is the difference between `{3}` and `{3,5}` in regular expressions?
1. What do the `\d`, `\w`, and `\s` shorthand character classes signify in regular expressions?
1. What do the `\D`, `\W`, and `\S` shorthand character classes signify in regular expressions?
1. What is the difference between `.*` and `.*?`?
1. What is the character class syntax to match all numbers and lowercase letters?
1. How do you make a regular expression case-insensitive?
1. What does the `.` character normally match? What does it match if `re.DOTALL` is passed as the second argument to `re.compile()`?
1. If `numRegex = re.compile(r'\d+')`, what will `numRegex.sub('X', '12 drummers, 11 pipers, five rings, 3 hens')` return?
1. What does passing `re.VERBOSE` as the second argument to `re.compile()` allow you to do?
1. How would you write a regex that matches a number with commas for every three digits? It must match the following:
* `'42'`
* `'1,234'`
* `'6,368,745'`

but not the following:
* `'12,34,567'` (which has only two digits between the commas)
* `'1234'` (which lacks commas)
21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:

* `'Haruto Watanabe'`
* `'Alice Watanabe'`
* `'RoboCop Watanabe'`

but not the following:

* `'haruto Watanabe'` (where the first name is not capitalized)
* `'Mr. Watanabe'` (where the preceding word has a nonletter character)
* `'Watanabe'` (which has no first name)
* `'Haruto watanabe'` (where Watanabe is not capitalized)
22. How would you write a regex that matches a sentence where the first word is either *Alice*, *Bob*, or *Carol*; the second word is either *eats*, *pets*, or *throws*; the third word is *apples*, *cats*, or *baseballs*; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:

* `'Alice eats apples.'`
* `'Bob pets cats.'`
* `'Carol throws baseballs.'`
* `'Alice throws Apples.'`
* `'BOB EATS CATS.'`

but not the following:

* `'RoboCop eats apples.'`
* `'ALICE THROWS FOOTBALLS.'`
* `'Carol eats 7 cats.'`

# Practice Projects

## Date Detection

Write a regular expression that can detect dates in the DD / MM / YYYY format. Assume that the days range from 01 to 31, the months range from 01 to 12, and the years range from 1000 to 2999. Note that if the day or month is a single digit, it’ll have a leading zero.

The regular expression doesn’t have to detect correct days for each month or for leap years; it will accept nonexistent dates like 31 / 02 / 2020 or 31 / 04 / 2021. Then store these strings into variables named month, day, and year, and write additional code that can detect if it is a valid date. April, June, September, and November have 30 days, February has 28 days, and the rest of the months have 31 days. February has 29 days in leap years. Leap years are every year evenly divisible by 4, except for years evenly divisible by 100, unless the year is also evenly divisible by 400. Note how this calculation makes it impossible to make a reasonably sized regular expression that can detect a valid date.

## Strong Password Detection

Write a function that uses regular expressions to make sure the password string it is passed is strong. A strong password is defined as one that is at least eight characters long, contains both uppercase and lowercase characters, and has at least one digit. You may need to test the string against multiple regex patterns to validate its strength.

## Regex Version of the `strip()` Method

Write a function that takes a string and does the same thing as the `strip()` string method. If no other arguments are passed other than the string to strip, then whitespace characters will be removed from the beginning and end of the string. Otherwise, the characters specified in the second argument to the function will be removed from the string.