# Regex Exercises

In [1]:
import re
import pandas as pd

## 1. Write a function named is_vowel. It should accept a string as input and use a regular expression to determine if the passed string is a vowel. While not explicity mentioned in the lesson, you can treat the result of re.search as a boolean value that indicates whether or not the regular expression matches the given string.

In [9]:
def is_vowel(string):
    pattern = r'(aeiouAEIOU)\w+'
    return bool(re.search(pattern, string))

In [10]:
is_vowel('a')

False

## 2. Write a function named is_valid_username that accepts a string as input. A valid username starts with a lowercase letter, and only consists of lowercase letters, numbers, or the _ character. It should also be no longer than 32 characters. The function should return either True or False depending on whether the passed string is a valid username.

```
>>> is_valid_username('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')
False
>>> is_valid_username('codeup')
True
>>> is_valid_username('Codeup')
False
>>> is_valid_username('codeup123')
True
>>> is_valid_username('1codeup')
False
```

## 3. Write a regular expression to capture phone numbers. It should match all of the following:

```
(210) 867 5309
+1 210.867.5309
867-5309
210-867-5309
```

## 4. Use regular expressions to convert the dates below to the standardized year-month-day format.

```
02/04/19
02/05/19
02/06/19
02/07/19
02/08/19
02/09/19
02/10/19
```

In [16]:
dates = [
    '02/04/19',
    '02/05/19',
    '02/06/19',
    '02/07/19',
    '02/08/19',
    '02/09/19',
    '02/10/19'
]

In [18]:
date_regex = r'(\d+)/(\d+)/(\d+)'

# testing
re.sub(date_regex, r'20\3-\1-\2',dates[0])

'2019-02-04'

In [19]:
# apply to all
[re.sub(date_regex, r'20\3-\1-\2',date) for date in dates]

['2019-02-04',
 '2019-02-05',
 '2019-02-06',
 '2019-02-07',
 '2019-02-08',
 '2019-02-09',
 '2019-02-10']

## 5. Write a regex to extract the various parts of these logfile lines:

```
GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58
POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58
GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58
```

In [11]:
lines = """
GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58
POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58
GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58
"""

In [12]:
regexp = r'''
^
(?P<method>GET|POST)
\s
(?P<path>/[/\w\-\?=]+)
\s
\[(?P<timestamp>.+)\]
\s
(?P<http_version>HTTP/\d+\.\d+)
\s
\{(?P<status_code>\d+)\}
\s
(?P<bytes>\d+)
\s
"(?P<user_agent>.+)"
\s
(?P<ip>\d+\.\d+\.\d+\.\d+)
$
'''

In [14]:
regexp2 = r'''
^
(?P<method>GET|POST) (?# ?P<> -chevrons- are used to designate the name of the capture-group. GET OR POST)
\s (?# Indicates the whitespace for the lines)
(?P<path>/[/\w\-\?=]+) (?# The path can contain one or more of any of the bracketed symbols, hence the plus)
\s
\[(?P<timestamp>.+)\] (?# The timestamp is anything, and any number of it all, contained within brackets)
\s
(?P<http_version>HTTP/\d+\.\d+) (?# version is HTTP/ \d+ -one or more numbers- \. -anything for the dot- and \d+)
\s
\{(?P<status_code>\d+)\} (?# status code is any number of digits contained in curly brackets)
\s
(?P<bytes>\d+) (?# collect all digits after the white space for bytes)
\s
"(?P<user_agent>.+)" (?# All of anything contained in quotation marks for user agent)
\s
(?P<ip>\d+\.\d+\.\d+\.\d+) (?# any number of digits between the period dividers)
$ (?# end of regex)
'''

In [15]:
[re.search(regexp2, line, re.VERBOSE).groupdict() for line in lines.strip().split('\n')]

[{'method': 'GET',
  'path': '/api/v1/sales?page=86',
  'timestamp': '16/Apr/2019:193452+0000',
  'http_version': 'HTTP/1.1',
  'status_code': '200',
  'bytes': '510348',
  'user_agent': 'python-requests/2.21.0',
  'ip': '97.105.19.58'},
 {'method': 'POST',
  'path': '/users_accounts/file-upload',
  'timestamp': '16/Apr/2019:193452+0000',
  'http_version': 'HTTP/1.1',
  'status_code': '201',
  'bytes': '42',
  'user_agent': 'User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
  'ip': '97.105.19.58'},
 {'method': 'GET',
  'path': '/api/v1/items?page=3',
  'timestamp': '16/Apr/2019:193453+0000',
  'http_version': 'HTTP/1.1',
  'status_code': '429',
  'bytes': '3561',
  'user_agent': 'python-requests/2.21.0',
  'ip': '97.105.19.58'}]