# Pattern Matching

The most fundamental and useful operations that regular expressions allow us to do all have to do with pattern matching. In this lesson, we will explore the matching operations supported by Python's `re` module.

The `re` module comes with Python in the standard installation, so we can simply import it.

In [1]:
import re

## Matching operations

To perform a regex pattern matching operation using Python's `re`, we need 3 things

1. The **pattern**
2. The **string** we are matching the pattern against
3. The type of **matching operation** we want to do

### Pattern
To use a pattern in regexes, we first need to compile it, which can be done with the `re.compile()` function, which returns a `Pattern` object that we can use later on.

In [2]:
pattern = re.compile("Python")
print(pattern)
print(type(pattern))

re.compile('Python')
<class 're.Pattern'>


Note: In this lesson, we are focusing on familiarizing with matching methods. So rather than using actual *patterns* like we would in real programs, we are simply using the string `"Python"`. In later lessons, we will learn about matching against real patterns like *letters, numbers, whitespaces, etc*.

### String

We can match our pattern against any string. We will deal with that last.

### Matching operation

Lastly, we need to choose what kind of pattern matching we want to do (or what we want our program to do with the matching results). Here are our options

| Method        | Description                                           | Returns                                        |
|---------------|-------------------------------------------------------|------------------------------------------------|
| `match()`     | matches pattern against the **beginning** of the text | `None` if no match; 1 `Match` object otherwise |
| `fullmatch()` | matches pattern against the **full** text             | `None` if no match; 1 `Match` object otherwise |
| `search()`    | **searches** the entire text for presence of pattern  | `None` if no match; 1 `Match` object otherwise |
| `findall()`   | **finds all** matches in text                         | `list` of `str`s                               |
| `finditer()`  | **finds all** matches in text                         | `iterator` of `Match` objects                  |
| `sub()`       | **substitutes** matches in text                       | `str`                                          |
| `split()`     | **splits** text by pattern matches                    | `list` of split substrings                     |

To run a matching operation, we can simply call the corresponding method on the `Pattern` object with the text as argument.

In [3]:
match = pattern.match("Python is amazing! Python is awesome!")
print(match)
print(type(match))

match = pattern.match("I love Python! Python is amazing!")
print(match)

<re.Match object; span=(0, 6), match='Python'>
<class 're.Match'>
None


In [4]:
match = pattern.fullmatch("Python")
print(match)
print(type(match))

match = pattern.fullmatch("Python ")
print(match)

<re.Match object; span=(0, 6), match='Python'>
<class 're.Match'>
None


In [5]:
match = pattern.search("I love Python! Python is amazing!")
print(match)
print(type(match))

match = pattern.search("I love python! python is amazing!")
print(match)

<re.Match object; span=(7, 13), match='Python'>
<class 're.Match'>
None


In [6]:
matches = pattern.findall("I love Python! Python is amazing!")
print(matches)
print(type(matches))
print(type(matches[0]))

['Python', 'Python']
<class 'list'>
<class 'str'>


In [7]:
matches = pattern.finditer("I love Python! Python is amazing!")
print(matches)
print(type(matches))

for match in matches:
    print(match)
    print(type(match))

<callable_iterator object at 0x00000247AF9C4C70>
<class 'callable_iterator'>
<re.Match object; span=(7, 13), match='Python'>
<class 're.Match'>
<re.Match object; span=(15, 21), match='Python'>
<class 're.Match'>


In [8]:
new_string = pattern.sub("C++", "Python is a language called Python")
print(new_string)

new_string = pattern.sub("C++", "R is a language")
print(new_string)

C++ is a language called C++
R is a language


In [9]:
print(pattern.split("APythonBPythonCPythonD"))
print(pattern.split("APythonBPythonCPythonD", maxsplit=2))
print(pattern.split("ABCD"))

['A', 'B', 'C', 'D']
['A', 'B', 'CPythonD']
['ABCD']


## Module-level functions

All the operations above were done using methods of `re.Pattern` objects. Alternatively, we can also use module-level functions of the `re` module to do the same things.

In [10]:
s = "Python"
print(re.match(s, "Python is amazing! Python is awesome!"))
print(re.fullmatch(s, "Python"))
print(re.search(s, "I love Python! Python is amazing!"))
print(re.findall(s, "I love Python! Python is amazing!"))
print(re.finditer(s, "I love Python! Python is amazing!"))
print(re.sub(s, "C++", "Python is a language called Python"))
print(re.split(s, "APythonBPythonCPythonD"))

<re.Match object; span=(0, 6), match='Python'>
<re.Match object; span=(0, 6), match='Python'>
<re.Match object; span=(7, 13), match='Python'>
['Python', 'Python']
<callable_iterator object at 0x00000247AF9D60B0>
C++ is a language called C++
['A', 'B', 'C', 'D']


Whether you use methods of `Pattern` objects or functions of the `re` module is up to you. The behavior is similar with no significant advantage to either.

## Summary

That is it for basic pattern matching in Python using regular expressions. Of course, actual programs typically contain patterns more complicated than the `"Python"` we saw in today's examples. In the upcoming lessons, we will learn about more ways to use regular expressions to fit our program's needs.