# Regex Exercises

In [1]:
import re
import pandas as pd

### 1. Write a function named ```is_vowel```. It should accept a string as input and use a regular expression to determine if the passed string is a vowel. While not explicity mentioned in the lesson, you can treat the result of ```re.search``` as a boolean value that indicates whether or not the regular expression matches the given string.

In [5]:
bool(re.search(r'[aeiou]', 'test'))

True

In [9]:
re.search(r'[aeiou]', 'banana')

<re.Match object; span=(1, 2), match='a'>

In [10]:
re.search(r'^[aeiou]', "bananarama")

In [11]:
def is_vowel(string):
    if re.search(r'^[aeiou]$', string, re.IGNORECASE):
        return True
    else:
        return False

In [12]:
is_vowel('help')

False

### 2. Write a function named ```is_valid_username``` that accepts a string as input. A valid username starts with a lowercase letter, and only consists of lowercase letters, numbers, or the ```_``` character. It should also be no longer than 32 characters. The function should return either ```True``` or ```False``` depending on whether the passed string is a valid username.

In [16]:
re.search(r'^[a-z]', 'test')

<re.Match object; span=(0, 1), match='t'>

In [17]:
re.search(r'^[a-z]', 'Test')

In [30]:
bool(re.search(r'^[a-z]\w{1,31}$', 'teSt'))

True

In [32]:
re.search(r'^[a-z][a-z_0-9]{1,31}$', 'test')

<re.Match object; span=(0, 4), match='test'>

In [39]:
def is_valid_username(string):
    search = r'^[a-z][a-z_0-9]{1,31}$'
    if re.search(search, string):
        return True
    else:
        return False

In [41]:
is_valid_username('Mark1987')

False

In [48]:
is_valid_username('mark_1987_')

True

In [43]:
is_valid_username('marK_1987')

False

### 3. Write a regular expression to capture phone numbers. 

In [68]:
phone_numbers = ['+86 186 6908 9563',
                 '+1 (325) 455-0446',
                 '+66 89-757-4306',
                 '216-0427',
                '() 66 000',
                '325:665~4633']

In [69]:
regex = r'(\+?\d+)?.?(\(?\d{3}\)?)?.?\d{3}.?\d{4}'

for num in phone_numbers:
    print(f'{num} {bool(re.search(regex, num))}')

+86 186 6908 9563 True
+1 (325) 455-0446 True
+66 89-757-4306 True
216-0427 True
() 66 000 False
325:665~4633 True


### 4. Use regular expressions to convert the dates below to the standardized year-month-day format.

In [70]:
date_list = ["02/04/19",
             "02/05/19",
             "02/06/19",
             "02/07/19",
             "02/08/19",
             "02/09/19",
             "02/10/19",
            ]

In [71]:
regex = r'(\d{2})/(\d{2})/(\d{2})'

In [72]:
for date in date_list:
    print(re.sub(regex, r'20\3-\1-\2', date))

2019-02-04
2019-02-05
2019-02-06
2019-02-07
2019-02-08
2019-02-09
2019-02-10


### 5. Write a regex to extract the various parts of these logfile lines:

In [79]:
logfile_re = r'''
^(?P<method>GET|POST)
\s+
(?P<path>.*?)
\s+
\[(?P<timestamp>.*?)\]
\s+
(?P<http_version>.*?)
\s+
\{(?P<status>\d+)\}
\s+
(?P<bytes>\d+)
\s+
"(?P<user_agent>.*)"
\s+
(?P<ip>.*)$
'''

lines = pd.Series([
    'GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58',
    'POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58',
    'GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58',
])
lines.str.extract(logfile_re, re.VERBOSE)

Unnamed: 0,method,path,timestamp,http_version,status,bytes,user_agent,ip
0,GET,/api/v1/sales?page=86,16/Apr/2019:193452+0000,HTTP/1.1,200,510348,python-requests/2.21.0,97.105.19.58
1,POST,/users_accounts/file-upload,16/Apr/2019:193452+0000,HTTP/1.1,201,42,User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; ...,97.105.19.58
2,GET,/api/v1/items?page=3,16/Apr/2019:193453+0000,HTTP/1.1,429,3561,python-requests/2.21.0,97.105.19.58
