# Regular Expressions
                     Regexes
A sort of meta-language with a few applicable uses. Generally speaking, it can be used to describe patterns in text. It falls within the scope of lexical analysis. 
- Find and/or extract text that matches a pattern
- Replace or substitute text that matches a pattern.

Regular expressions fit under the python standard library ```re``` module. _findall_ will be used most rigorously in this notebook. 
_findall_ receives a string that is a regular expression, the pattern, and another string that is the string you wish to search; it returns a list of all the times the given regex matches the string. 
***
#### Raw Strings
##### Any string in python prefixed with an r is a raw string. This means that backslashes will be included in the string verbatim, and don't carry special meaning. It is very common to use raw strings when creating a regular expression.
***

In [1]:
import re
import pandas as pd
import numpy as np

***
### ((Reg(ex))ercises)
***
1. Write a function named is_vowel. It should accept a string as input and use a regular expression to determine if the passed string is a vowel. While not explicity mentioned in the lesson, you can treat the result of re.search as a boolean value that indicates whether or not the regular expression matches the given string.

2. Write a function named is_valid_username that accepts a string as input. A valid username starts with a lowercase letter, and only consists of lowercase letters, numbers, or the _ character. It should also be no longer than 32 characters. The function should return either True or False depending on whether the passed string is a valid username.

```
>>> is_valid_username('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')
False
>>> is_valid_username('codeup')
True
>>> is_valid_username('Codeup')
False
>>> is_valid_username('codeup123')
True
>>> is_valid_username('1codeup')
False
```

3. Write a regular expression to capture phone numbers. It should match all of the following:
```
(210) 867 5309
+1 210.867.5309
867-5309
210-867-5309
```


4. Use regular expressions to convert the dates below to the standardized year-month-day format.

```
02/04/19
02/05/19
02/06/19
02/07/19
02/08/19
02/09/19
02/10/19
```

5. Write a regex to extract the various parts of these logfile lines:

```
GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58
POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58
GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58
```

*** 

Bonus

You can find a list of words on your mac at /usr/share/dict/words. Use this file to answer the following questions:

- How many words have at least 3 vowels?
- How many words have at least 3 vowels in a row?
- How many words have at least 4 consonants in a row?
- How many words start and end with the same letter?
- How many words start and end with a vowel?
- How many words contain the same letter 3 times in a row?
- What other interesting patterns in words can you find?


***
***
1.
__Write a function named is_vowel__.       
- It should accept a string as input and use a regular expression to determine if the passed string is a vowel. 
- While not explicity mentioned in the lesson, you can treat the result of re.search as a boolean value that indicates whether or not the regular expression matches the given string.

In [None]:
def is_vowel():
    

***
***
2. __Write a function named is_valid_username__
- Accept a string as input. 
- A valid username starts with a lowercase letter, and only consists of lowercase letters, numbers, or the _ character.
- It should also be no longer than 32 characters.
- The function should return either True or False depending on whether the passed string is a valid username.

In [None]:
def is_valid_username():

***
***
3. Write a regular expression to capture phone numbers.
- It should match all of the following:
    - (210) 867 5309
    - +1 210.867.5309
    - 867-5309
    - 210-867-5309


***
***
4. Use regular expressions to convert the dates below to the standardized year-month-day format.
        - 02/04/19
        - 02/05/19
        - 02/06/19
        - 02/07/19
        - 02/08/19
        - 02/09/19
        - 02/10/19

***
***
5. Write a regex to extract the various parts of these logfile lines:
            
                        
```
GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58
POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58
GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58
```


***
***
### BONUSES 
                                                            and extra materials
                                                            
                                                            

In [2]:
# This function is taken from the DS textbook via Codeup
def show_all_matches(regexes, subject, re_length=6):
    print('Sentence:')
    print()
    print('    {}'.format(subject))
    print()
    print(' regexp{} | matches'.format(' ' * (re_length - 6)))
    print(' ------{} | -------'.format(' ' * (re_length - 6)))
    for regexp in regexes:
        fmt = ' {:<%d} | {!r}' % re_length
        matches = re.findall(regexp, subject)
        if len(matches) > 8:
            matches = matches[:8] + ['...']
        print(fmt.format(regexp, matches))

In [3]:
sentence = 'Mary had a little lamb. 1 little lamb. Not 10, not 12, not 22, just one.'

show_all_matches([
    r'a',
    r'm',
    r'M',
    r'Mary',
    r'little',
    r'1',
    r'10',
    r'22'
], sentence)


Sentence:

    Mary had a little lamb. 1 little lamb. Not 10, not 12, not 22, just one.

 regexp | matches
 ------ | -------
 a      | ['a', 'a', 'a', 'a', 'a']
 m      | ['m', 'm']
 M      | ['M']
 Mary   | ['Mary']
 little | ['little', 'little']
 1      | ['1', '1', '1']
 10     | ['10']
 22     | ['22']
