# Regular Expressions
#### := sequences of characters that define specific patterns to search in text

![Saving the day with regular expressions!](https://imgs.xkcd.com/comics/regular_expressions.png)

Task to save the day: **"Look for something that looks like an Email Address"**

In [1]:
list_email = 'arjun@hotmail.com info@star-wars.shop 123@yahoo.de arjun_1-2@spiced-academy.com'

In [2]:
email_pattern = r'[\w_-]+@[\w-]+\.\w+'

In [3]:
import re
re.findall(email_pattern,list_email)

['arjun@hotmail.com',
 'info@star-wars.shop',
 '123@yahoo.de',
 'arjun_1-2@spiced-academy.com']

Try out patterns on regex101.com

| character | meaning |
|-----------|---------|
| `.` | any character |
| `\w` | matches any alphanumeric character |
| `\d` | matches any digit character |
| `\s` | matches any space character |
| `[a-z]` | matches any letter |
| `[0-9]` | matches any digit between 0 and 9 |
| `+` | repeats previous symbol one or more times |
| `*` | repeats previous symbol 0 or more times |
|  `vertical bar` | logical OR; used to add multiple search patterns together |
| `\` | excape special characters |
| `(x)` | match group; extract out whatever you put in parenthesis |
| `[^a]` | not "a"|

### Regular expressions in Python

import re

Some useful methods:
- **re.findall()** 	returns a list of matching strings
- **re.search()** 	returns a match object for the first
- **re.sub()** 	substitute pattern by a string

- **re.IGNORECASE** 	switch for matching upper/lowercase

In [4]:
import re

In [5]:
text = "thyme coriander rosemary Cinnamon pepper tarragon basil salvia cumin nutmeg saffron"

#### get all spice starting with c

In [11]:
pattern2 = r'[c|C]\w+|p\w+|'
pattern = r'c\w+'

### `findall()` <- most useful method
finds all matches and returns them in a list

In [7]:
re.findall(pattern,text,re.IGNORECASE)

['coriander', 'Cinnamon', 'cumin']

In [14]:
re.findall(pattern2,text)

['',
 '',
 '',
 '',
 '',
 '',
 'coriander',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'Cinnamon',
 '',
 'pepper',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'cumin',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '']

In [8]:
re.findall(pattern,text)

['coriander', 'cumin']

In [9]:
first_c_word = re.search(pattern,text)

In [10]:
first_c_word

<re.Match object; span=(6, 15), match='coriander'>

In [38]:
first_c_word.span()

(6, 15)

### `sub()` Find a pattern and replace

In [15]:
text

'thyme coriander rosemary Cinnamon pepper tarragon basil salvia cumin nutmeg saffron'

In [16]:
re.sub(pattern,"some text to replace words strating with c",text)

'thyme some text to replace words strating with c rosemary Cinnamon pepper tarragon basil salvia some text to replace words strating with c nutmeg saffron'

In [17]:
text

'thyme coriander rosemary Cinnamon pepper tarragon basil salvia cumin nutmeg saffron'

### Challenge !!

In [18]:
number_text = 'There are some numbers 012-345-6789. You can call me at 012.345.6789 and of course I am always reachable at (012)345-6789 or 0123456789'

In [22]:
pattern = r'\(?\d+[)-.]?\d+[)-.]?\d+'

In [23]:
numbers = re.findall(pattern,number_text)
numbers

['012-345-6789', '012.345.6789', '(012)345-6789', '0123456789']

In [24]:
# Pattern for finding out the special characters inorder to replace them
pattern_change = r'[^\w+]'

In [None]:
# List comprehension vs for loop

In [25]:
[num for num in numbers]

['012-345-6789', '012.345.6789', '(012)345-6789', '0123456789']

In [26]:
for num in numbers:
    print(num)

012-345-6789
012.345.6789
(012)345-6789
0123456789


In [None]:
# Replacing the special characters to get proper numbers

In [27]:
[re.sub(pattern_change,'',num) for num in numbers]

['0123456789', '0123456789', '0123456789', '0123456789']