# Chapter 7 - Pattern Matching with Regular Expressions

## Finding Patterns of Text w/o Regular Expressions

This generally increases the number of lines of code needed to write a function. Another disadvantage of this method is that it can only look for one type of pattern. 

Ex. Find a phone number in a string (415-555-4242) where the first three characters are numbers, followed by a hyphen, three more numbers, followed by a hyphen, then four numbers. 
```python
def isPhoneNumber(text):
    if len(text) != 12:             # Check to see if length of input is exactly 12 characters
        return False
    for i in range(0,3):            # Loop through the first 3 characters
        if not text[i].isdecimal(): # If the character is not numeric, return false  
            return False
    for text[3] != '-':             # If the fourth character is not a hyphen, return false
        return False
    for i in range(4,7):            # Loop through the next 3 characters
        if not text[i].isdecimal(): # If the charaacter is not numeric, return false
            return False
    if text[7] != '-':              # If the eighth character is not a hyphen, return false
        return False
    for i in range(8,12):           # Loop through the last 4 characters
        if not text[i].isdecimal(): # If the character is not numeric, return false
            return False
    return True                     # Return True if all checks pass

print('415-555-4242 is a phone number')
print(isPhoneNumber('415-555-4242')    # returns True
print('Moshi Moshi is a phone number')
print(isPhoneNumber('Moshi Moshi')    # returns False
```

## Regular Expressions

Regular expressions, or _regexes_ are descriptions for a pattern of text. 

Examples of regex expressions:
- \d --> stands for a digit character
- \d\d\d-\d\d\d-\d\d\d\d --> Essentially does what the isPhoneNumber() function does
- \d{3}-\d{3}-\d{4} --> The curly brackets means "repeat this pattern {x} many times"

### Creating Regex Objects in Python
In order to use regex functions in Python, you need to import the re module:`import re`. Passing a string value representing your regular expression to `re.compile()` returns a regex object. This is a __built-in__ module, so there is no need to install anything extra for this. 

** Escape characters in Python also use the `\` characters. Therefore, when you type something like `re.compile('\n')`, it'll interpret that as a newline character instead of a backslash followed by a lowercase n. There are two ways around this:
1. `re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')`   --> the `r` at the beginning marks the string as a _raw_ string and doesn't escape characters. 
2. `re.compile('\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d')` --> the first `\` escapes the second `\` of each character. 

```python
phoneNum = re.compile(r'\d{3}-\d{3}-\d{4}')       # returns a Regex object
mo = phoneNum.search('My number is 415-555-4242') # searches for the predefined Regex object
print('Phone number found: ' + mo.group())        # mo.group() returns the match
```

### Other Regex Characters and Methods
- Search for parentheses by escaping the character. Ex. `re.compile(r'(\(\d\d\d\)) (\d{3}-\d{4})')
- The pipe character, `|`, is used to match one of many expressions. ex. `re.compile(r'Batman|Tina Fey')` will match either 'Batman' or 'Tina Fey'. If both occur in the string, then the first occurrence will be returned. 
- Return the strings of *every* match in the searched string instead of just the first using `x.findall()`. 
- Optional matching with `?`. ex. `re.compile(r'Bat(wo)?man')`
- The `?` can also indicate *non-greedy* matching. ex. `re.compile(r'(Ha){3,5}?')` will look for shortest match first instead of the default longest match.
- Match zero or more with `*`. The group that precedes the star can occur any number of times in the text.
```python
batRegex = re.compile(r'Bat(wo)*man')
mo = batRegex.search('The Adventures of Batwowowowowoman')
mo.group() # returns 'Batwowowowowoman'
```
- Match one or more with `+`. The group that precedes the plus must appear at least once. 
```python
batRegex = re.compile(r'Bat(wo)+man')
mo = batRegex.search('The Adventures of Batman')
mo.group() # returns None
```

### Define Your Own Characters
- You can define your own characters using square brackets `[]`. In the following example, the character class [aeiouAEIOU] will match any vowel, both lowercase and uppercase:
```python
vowelRegex = re.compile(r'[aeiouAEIOU]')
vowelRegex.findall('RoboCop eats baby food. BABY FOOD.') # returns ['o','o','o','e','a','a','o','o','A','O','O']
```
- You can also include ranges of letters or numbers using hyphens. ex. `[a-zA-Z0-9]`
- Use the `^` character to match characters *not* in the character class. ex.`re.compile(r'[^aeiouAEIOU]')` will return every *except* the vowels. 

### Review of Regex Matching
1. `import re`
2. Create a Regex option using `re.compile()`. Remember to add `r` at the beginning to designate input as a raw string.
3. Pass the string you want to search into the `x.search()` method to return a _Match_ object
4. Use `x.group()` method to return the string of the actual matched text. 

### Resource
[Regex Pal](http://regexpal.com/) - A web-based regular expression tester to show you how a regex matches a piece of text that you enter