# Regular Expresions
## What is a *regular expression*?
A regular expression, or regex, is essentially a search query for text that's expressed by a string pattern.

On a console, we can use de `grep` command, to query a file.
The `/usr/share/dict/words` file is generally used by spell-checking programs to check if a word exist.  

```Bash
grep thon /usr/share/dict/words
```
This command returns all words that matches with `thon`, case sensitive.
To make the grep command query insensitive, just add the -i flag.
<br>

```Bash
grep -i python /usr/share/dict/words
```
<br>



# Special Characters
Are those characters that allow us do a more advanced matching.
<br>
<br>

## The  `dot(.)`  character
Matches any character.
<br>
```Bash
grep l.rts /usr/share/dict/words
```

The `dot(.)` will match any character(s) between l and rts.
<br>
<br>

## The  `circumflex(^)`  character
Matches any end of line that starts with the pattern.
<br>

```Bash
grep ^fruit /usr/share/dict/words
```

The `circumflex(^)` will match any end of line that starts with "fruit".
<br>
<br>

## The  `dollar sign($)`  character
Matches any end of line that ends with the pattern.
<br>

```Bash
grep cat$ /usr/share/dict/words
```

The `dollar sign($)` will match any end of line that ends with "cat".


# Basic Regular Expressions
On Python, the module for regular expressions is the `re` module.  
The re.search() function will return a match object, with the span of where the "end of line is found".  
The `r`, on the following example, represents a *rawstring*, which is to let Python know to not interpret any special characters. 
<br>

If a match is not found, a `None` result will be returned.

In [4]:
import re
result = re.search(r"aza", "plaza")
print(result)

<re.Match object; span=(2, 5), match='aza'>


In [5]:
result = re.search(r"aza","this")
print(result)

None


In [9]:
result = re.search(r"p.ng","penguin")
print(result)
result = re.search(r"p.ng","sponge")
print(result)
# To match regardless of the case, add the parameter IGNORECASE to the search functions
result = re.search(r"p.ng","Pangaea", re.IGNORECASE)
print(result)

<re.Match object; span=(0, 4), match='peng'>
<re.Match object; span=(1, 5), match='pong'>
<re.Match object; span=(0, 4), match='Pang'>


The following example shows code to check if the text passed contains the vowels a, e and i, with exactly one occurrence of any other character in between.

In [8]:
import re
def check_aei (text):
  result = re.search(r"a.e.i", text)
  return result != None

print(check_aei("academia")) # True
print(check_aei("aerial")) # False
print(check_aei("paramedic")) # True

True
False
True


# Wildcards and Character Classes

The dot is known as a wildcard, because it can match more than one character.   
Using a dot is the broadest possible wildcard because it matches absolutely any character. But what if we wanted something stricter, like checking if an answer given by a user contains a valid character, or finding all the usernames in a CSV file that start with a vowel? We have to restrict our wildcards to a range of characters to do this. For this task we use another feature of regexes called character classes. 

## Character Clases
Character classes are written inside square brackets and let us list the characters we want to match inside of those brackets.

In [10]:
# In this example, the [Pp] matches either Python or python
result = re.search(r"[Pp]ython","Python")
print(result)

<re.Match object; span=(0, 6), match='Python'>


Inside the square brackets, we can also define a range of characters using a dash.
For example, we could use lowercase a to lowercase z to state any lowercase letter. So if we wanted to look for the string way preceded by any letter, we could write the expression like this:

In [11]:
print(re.search(r"[a-z]way", "The end of the highway"))

<re.Match object; span=(18, 22), match='lway'>
