# Python 101
## XI. Regular expressions

---

## The `re` module

Python implements regex search with the `re` module.

In [None]:
import re

### Regular expressions

Regular expression syntax can be found <a href="https://docs.python.org/3/library/re.html">here</a>. Generated expressions can be tested with online tools, like <a href="https://regex101.com/">regex101</a> or <a href="https://pythex.org/">pythex</a>.

### Finding matches

- Raw strings vs regular strings

In [None]:
print('te\nst')

In [None]:
print(r'te\nst')

- Matching pattern in the beginning of the text with `re.match`

In [None]:
re.match(r't\S', 'test string')

- Matching pattern anywhere with `re.search`

In [None]:
re.search(r't\S', 'test string')

- Finding every occurence of a pattern with `re.findall`

In [None]:
re.findall(r't\S', 'test string')

- Splitting text with `re.split`

In [None]:
re.split(r't\S', 'test string')

- Substituting matches with `re.sub`

In [None]:
re.sub(r't\S', 'XX', 'test string')

- Speed up matches by compiling regular expressions with `re.compile`

In [None]:
pattern = re.compile(r't\S')
pattern.findall('test string')

### Exercises

#### 1. Find every email address from an example log file (data/test.log)

In [None]:
import random

prefixes = ['ERROR', 'WARNING'] + ['DEBUG'] * 5 + ['INFO'] * 3
users = [''.join([chr(random.randint(97, 122)) 
                  for char in range(random.randint(5, 15))])]
providers = ['gmail.com', 'yahoo.com', 'hotmail.com']

with open('data/test.log', 'w') as f:
    for _ in range(100):
        prefix = random.choice(prefixes, )
        user = random.choice(users)
        provider = random.choice(providers)
        premsg = ' '.join([''.join([chr(random.randint(97, 122)) 
                                    for char in range(random.randint(5, 15))]) 
                           for word in range(random.randint(3, 7))])
        postmsg = ' '.join([''.join([chr(random.randint(97, 122)) 
                                     for char in range(random.randint(5, 15))]) 
                            for word in range(random.randint(3, 7))])
        f.write(f'[{prefix}] {premsg} - {user}@{provider} - {postmsg}\n')

#### 2. Parse a csv file (data/text.csv)