# Regular Expressions
Regular expressions are extremely useful in data processing and textual searching. They can identify and return certain patterns of text. This is a lot easier than using the normal methods found inside of Python and other languages for text processing. 

## Imports
Python has a dedicated module that can be imported using `import re`. Run the following line code block to import the module:

In [1]:
import re

Now that we have it imported we can begin to use it:

In [10]:
re.search(r'dragon', 'dragonball z')

<re.Match object; span=(0, 6), match='dragon'>

The `search()` method found inside of the `re` module returns a `re.Match` object that has both the span (the indices of the string where the match is found), as well as the match itself. 

If there is no match then `None` is returned. 

One more thing to keep in mind is that the best practice for regular expressions is to use `raw strings` - denoted by `r`. What this does is make Python not interpret special escape characters, etc. 

## Character Classes and Wildcards

Here is a list of all the metacharacters:

`. ^ $ * + ? { } [ ] \ | ( )`

The first metacharacters we will look at are `[` and `]`. These can be used to specify character classes. Character classes are a set of characters you wish to match. Characters may be listed individually or with a range (a-z). Metacharacters are not active inside of `[]`, with the exception of `\`

The `.` wildcard is the broadest of them all. It can match to any character. For example:

In [3]:
re.search(r'd.agon', 'dragon')

<re.Match object; span=(0, 6), match='dragon'>

In [4]:
re.search(r'd.agon', 'dwagon')

<re.Match object; span=(0, 6), match='dwagon'>

As you can see, the `.` allows a match for both because it matches to any character. 

We can also pass other options like case sensitivity into our search function:

In [7]:
re.search(r'd.agon', 'DWAGON', re.IGNORECASE)

<re.Match object; span=(0, 6), match='DWAGON'>

Now let's add character classes into the mix. Remember that character classes allow us to define a set or range of characters to attempt to match. Like so:

In [8]:
re.search(r'[Dd]ragon', 'Dragonball z')

<re.Match object; span=(0, 6), match='Dragon'>

In [9]:
re.search(r'[Dd]ragon', 'dragonball z')

<re.Match object; span=(0, 6), match='dragon'>

Again, our search matches both. It will match either a lowercase or uppercase `D` followed by `ragon`. 

As mentioned earlier, character classes can also define a range:

In [11]:
re.search(r'[a-zA-Z]ragon', 'uragon')

<re.Match object; span=(0, 6), match='uragon'>

In [13]:
re.search(r'[a-zA-Z]ragon', 'Eragon is a good movie')

<re.Match object; span=(0, 6), match='Eragon'>

You can combine as many ranges and symbols as want.

Now for example let's say we wanted to match something not found in the character class. This is possible with the circumflex `^` or carrot symbol:

In [15]:
re.search(r'[^a-zA-Z]', 'What is your favorite food?')

<re.Match object; span=(4, 5), match=' '>

The `^` can kind of be thought of as a not. In the above example it matches the first character found that is NOT in the character class. In this case it would be a space, since it was not added in the character class. Watch how quickly we can change that though:

In [16]:
re.search(r'[^a-zA-Z ]', 'What is your favorite food?')

<re.Match object; span=(26, 27), match='?'>

Now that we added the space character into the character class, the only remaining option for a match would be the `?`. 