# Regular Expressions 
---
A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern.

The module **re** provides full support for Perl-like regular expressions in Python. 

In [1]:
# first import re
import re

# Regular Expressions Functions
---

* ## match Function
Here is the syntax for this function −

```python
re.match(pattern, string, flags=0)
```
This function attempts to match pattern to string with optional flags.
#### The re.match function returns a match object on success, None on failure. 
---
* ## search Function
Here is the syntax for this function −

```python
re.search(pattern, string, flags=0)
```
This function searches for first occurrence of RE pattern within string with optional flags.
#### The re.search function returns a match object on success, none on failure.
---
* ## findall Function
Here is the syntax for this function −

```python
re.findall(pattern, string, flags=0)
```
The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
#### Return all non-overlapping matches of pattern in string, as a list of strings.
---
* ## search and replace
Here is the syntax for this function −

```python
re.sub(pattern, repl, string, max=0)
```
This method replaces all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided. 
#### This method returns modified string.

For more information visit the Regular Expression [Documentaion](https://docs.python.org/2/library/re.html)

Another useful [Resource](https://www.tutorialspoint.com/python/python_reg_expressions.htm)

# Meta-characters: Character matches
- __.__ : wildcard, matches a single character
- __^__ : starts of a string
- __$__ : end of a string
- __[ ]__ : matches one of the range of character **a, b, ..., z**
- __[^abc]__ : matches a character that is **not a, b, or, c**
- __a|b__ : matches either **a or b**, where a and b are strings
- __( )__ : scoping for operations
- __\__ : Escape character for special characters i.e; **\t, \n, \b**
- __\b__ : matches word boundary
- __\d__ : Any digit, equicalent to **[0-9]**
- __\D__ : Any non-digit, equivalent to **[^0-9]**
- __\s__ : Any whitespace, equivalent to **[ \t\n\r\f\v]**
- __\S__ : Any whitespace, equivalent to **[^ \t\n\r\f\v]**
- __\w__ : Any alphanumeric character, equivalent to **[a-zA-Z0-9_]**
- __\W__ : Any non-alphanumeric character, equivalent to **[^a-zA-Z0-9_]**

# Meta-characters: Repetitions
- __*__ : matches **zero** or **more** occurrences
- __+__ : matches **one** or **more** occurrences
- __?__ : matches **zero** or **one** occurrences
- __{n}__ : exactly **n** repetitions, **n >= 0**
- __{n,}__ : at least **n** repetitions
- __{,n}__ : at most **n** repetitions
- __{m,n}__ : at least **m** and at most **n** repetitions

## Example:

![Example](https://preview.ibb.co/g1gkta/Screenshot_34.png)



In [2]:
# find all the vowels
text = 'ouagadougou'
print(re.findall(r'[aeiou]', text))
# or 
print(re.findall('[aeiou]', text))

['o', 'u', 'a', 'a', 'o', 'u', 'o', 'u']
['o', 'u', 'a', 'a', 'o', 'u', 'o', 'u']


In [3]:
# find all the consonants
text = 'ouagadougou' 
print(re.findall('[^aeiou]', text))

['g', 'd', 'g']


# Regular Expressiions for dates
---
![Date](https://preview.ibb.co/he1P3a/Screenshot_35.png)

In [4]:
dateStr = '23-10-2002\n23/10/2002\n23/10/02\n10/23/2002\n23 Oct 2002\n23 October 2002\nOct 23, 2002\nOctober 23, 2002\n'

# {2 dig} / or - {2 dig} / or - {4 dig}
print(re.findall('\d{2}[/-]\d{2}[/-]\d{4}',dateStr))
# {2 dig} / or - {2 dig} / or - {4 or 2 dig}
print(re.findall('\d{2}[/-]\d{2}[/-]\d{2,4}',dateStr))
# {1 or 2 dig} / or - {1 or 2 dig} / or - {4 or 2 dig}
print(re.findall('\d{1,2}[/-]\d{1,2}[/-]\d{2,4}',dateStr))

['23-10-2002', '23/10/2002', '10/23/2002']
['23-10-2002', '23/10/2002', '23/10/02', '10/23/2002']
['23-10-2002', '23/10/2002', '23/10/02', '10/23/2002']


In [5]:
dateStr = '23-10-2002\n23/10/2002\n23/10/02\n10/23/2002\n23 Oct 2002\n23 October 2002\nOct 23, 2002\nOctober 23, 2002\n'

# Maching the dates containing characters
print(re.findall(r'\d{1,2} (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{2,4}',dateStr))
# Using the ? to Remove the Scope
print(re.findall('\d{1,2} (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{2,4}',dateStr))
# Using [a-z]* to match October
print(re.findall('\d{1,2} (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]* \d{2,4}',dateStr))
# Fixing the Regular expression to match the other type of date paterns
print(re.findall('(?:\d{1,2} )?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]* (?:\d{1,2}, )?\d{2,4}',dateStr))

['Oct']
['23 Oct 2002']
['23 Oct 2002', '23 October 2002']
['23 Oct 2002', '23 October 2002', 'Oct 23, 2002', 'October 23, 2002']
