## Pattern Matching with Regular Expression

Regular Expression allow you to specify a pattern of text to search for.

### Finding Patterns of Text Without Regular Expressions

Say you want to find a phone number in a string.You know the pattern:three numbers, a hyphen, three numbers, a hyphen, and four numbers.Here’s an example: 415-555-4242.Let’s use a function named isPhoneNumber() to check whether a string matches this pattern, returning either True or False .

In [2]:
# describe more about .isdecimal()
'2'.isdecimal()

True

In [3]:
not '2'.isdecimal()

False

In [9]:
def isPhoneNumber(text):
    if len(text) != 12: # check if the length of text is 12
        return False
    for i in range(0, 3): # loop through the first 3 character
        if not text[i].isdecimal(): # checks if digits in [i] are not numeric character and return false
            return False
    if text[3] != '-':
        return False
    for i in range(4,7):
        if not text[i].isdecimal():
            return False
    if text[7] != '-':
        return False
    for i in range(8,12):
        if not text[i].isdecimal():
            return False
    return True # retun true is all checks are ok

print('434-434-5678 is a phone number?')
print(isPhoneNumber('434-434-5678'))

434-434-5678 is a phone number?
True


In [10]:
print(isPhoneNumber('44-434-5678'))

False


In [11]:
print(isPhoneNumber('Wash- wash'))

False


In [12]:
message = 'Call me at 434-434-5678 tomorrow. 434-434-5678 is my office'
for i in range(len(message)):
    chunk = message[i:i+12]
    if isPhoneNumber(chunk):
        print('Phone number found: ' + chunk)
print('Done')

Phone number found: 434-434-5678
Phone number found: 434-434-5678
Done


#### Explanation of line 12 code

On each iteration of the for loop, a new chunk of 12 characters from message is assigned to the variable chunk. For example, on the first iteration, i is 0, and chunk is assigned message[0:12] (that is, the string 'Call me at 4').On the next iteration, i is 1, and chunk is assigned message[1:13] (the string
'all me at 41').You pass chunk to isPhoneNumber() to see whether it matches the phone number pattern, and if so, you print the chunk.Continue to loop through message, and eventually the 12 characters in chunk will be a phone number.The loop goes through the entire string, testing each 12-character piece and printing any chunk it finds that satisfies isPhoneNumber().Once we’re done going through message, we print Done .

### Finding Patterns of Text with Regular Expressions

Regular expressions, called _regexes_ for short, are descriptions for a pattern of text.

For example, a __\d in a regex stands for a digit character— that is, any single numeral 0 to 9__. The regex __\d\d\d-\d\d\d-\d\d\d\d__ is used by Python to match the same text the previous isPhoneNumber() function did: a string of three numbers, a hyphen, three more numbers, another hyphen, and four numbers. Any other string would not match the \d\d\d-\d\d\d-\d\d\d\d regex.

But regular expressions can be much more sophisticated.For example, adding a 3 in curly brackets ( __{3}__ ) after a pattern is like saying, “Match this pattern three times.” So the slightly shorter regex __\d{3}-\d{3}-\d{4}__ also
matches the correct phone number format.

In [None]:
# importing the regular expression module
import re

Passing a string value representing your regular expression to __re.compile()__ returns a Regex pattern object (or simply, a Regex object).

In [None]:
import re

phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') # phoneNumRegex variable contains a Regex object

__NOTE__

by putting an __r__ before the first quote of the string value, you can mark the string as a raw string , which does not escape characters.

Since regular expressions frequently use backslashes in them, it is convenient to pass raw strings to the __re.compile()__ function instead of typing extra backslashes.

### Matching Regex Objects

A Regex object’s __search() method__ searches the string it is passed for any matches to the regex.The search() method will return __None__ if the regex pattern is not found in the string.If the pattern is found, the search() method __returns a Match object__. Match objects have a __group() method__ that will return the actual matched text from the searched string.

In [1]:
import re

phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('My number is: 451-456-3456.')
print('Phone number found: ' + mo.group())

Phone number found: 451-456-3456


#### Explanation 

we pass our desired pattern to __re.compile()__ and store the resulting Regex object in phoneNumRegex.Then we call __search()__ on phoneNumRegex and pass search() the string we want to search for a match.The result of the search gets stored in the variable __mo__.In this example, we know that our pattern will be found in the string, so we know that a Match object will be returned.Knowing that __mo__ contains a Match object and not the null value None, we can call __group()__ on __mo__ to return the match.Writing __mo.group()__ inside our print statement displays the whole match, 415-555-4242.

### Review of Regular Expression Matching

While there are several steps to using regular expressions in Python, each step is fairly simple.

1. Import the regex module with import re.

2. Create a Regex object with the re.compile() function. (Remember to use a raw string.)

3. Pass the string you want to search into the Regex object’s search() method.This returns a Match object.

4. Call the Match object’s group() method to return a string of the actual matched text.