# Regular Expressions (Regex)

In [1]:
import re
import pandas as pd

# 1.
- Write a function named is_vowel. 
    - It should accept a string as input 
    - and use a regular expression to determine if the passed string is a vowel. 
    
While not explicity mentioned in the lesson, you can treat the result of re.search as a boolean value that indicates whether or not the regular expression matches the given string.

In [2]:
def is_vowel(string):
    return bool(re.search(r'^[aeiou]$', string, re.IGNORECASE))



    
assert is_vowel("a") == True
assert is_vowel("E") == True
assert is_vowel("aaa") == False
assert is_vowel("aeiou") == False

print("Exercise 1 is correct.")

Exercise 1 is correct.


# 2. 
- Write a function named is_valid_username 
    - that accepts a string as input. 
    - A valid username starts with a lowercase letter, 
    [a-z]
    - and only consists of lowercase letters, numbers, or the _ character 
    [a-z0-9_]
    - It should also be no longer than 32 characters 
    {,31}
    - The function should return either True or False depending on whether the passed string is a valid username.

In [3]:
def is_valid_username(string):
    password_pattern= r"^[a-z][a-z0-9_]{,31}$"
    return bool(re.search(password_pattern, string))


assert is_valid_username('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa') == False
assert is_valid_username('codeup') == True
assert is_valid_username('Codeup') == False
assert is_valid_username('codeup123') == True
assert is_valid_username('1codeup') == False

print("Exercise 2 is correct.")

Exercise 2 is correct.


# 3. 
- Write a regular expression to capture phone numbers. It should match all of the following:

- (210) 867 5309
- +1 210.867.5309
- 867-5309
- 210-867-5309

- Put the subject strings in order of increasing complexity

In [4]:
# Match seven digits with no other characters, do not include parentheses or numbers within them
re.search(r"\d{7}", "8675309")

<re.Match object; span=(0, 7), match='8675309'>

In [5]:
# Match 3 digits then a hypohen then 4 digits
re.search(r"\d{3}-\d{4}", "867-5309")

<re.Match object; span=(0, 8), match='867-5309'>

In [6]:
# Match 3 digits then a hypohen or a dot then 4 digits
re.search(r"\d{3}[-.]\d{4}", "867-5309")

<re.Match object; span=(0, 8), match='867-5309'>

In [7]:
# Match 3 digits then a hyphen, dot, or space then 4 digits
re.search(r"\d{3}[-. ]\d{4}", "867 5309")

<re.Match object; span=(0, 8), match='867 5309'>

Alernative to previous cell

In [8]:
# Another approach on the delimiter could be to use \D for any non-digit
re.search(r"\d{3}\D?\d{4}", "8675309")

<re.Match object; span=(0, 7), match='8675309'>

In [9]:
re.search(r"\(?\d{3}\)?.?\d{3}.?\d{4}", "210-867-5309")

<re.Match object; span=(0, 12), match='210-867-5309'>

In [10]:

re.search(r"\(?\d{3}\)?.?\d{3}.?\d{4}", "210.867.5309")

<re.Match object; span=(0, 12), match='210.867.5309'>

### 3. Alternative Answer

In [11]:
phone_regex = re.compile(
"""
^
(?P<country_code>\+\d+)?
\D*?
(?P<area_code>\d{3})?
\D*?
(?P<exchange_code>\d{3})
\D*?
(?P<line_number>\d{4})
""", re.VERBOSE)

In [12]:
df = pd.DataFrame()
df['number'] = [
    '(210) 867 5309',
    '+1 210.867.5309',
    '867-5309',
    '210-867-5309',
    '2108675309',
]
df

Unnamed: 0,number
0,(210) 867 5309
1,+1 210.867.5309
2,867-5309
3,210-867-5309
4,2108675309


In [13]:
df.number.str.extract(phone_regex)

Unnamed: 0,country_code,area_code,exchange_code,line_number
0,,210.0,867,5309
1,1.0,210.0,867,5309
2,,,867,5309
3,,210.0,867,5309
4,,210.0,867,5309


# 4. 
- Use regular expressions to convert the dates below to the standardized year-month-day format.

02/04/19
02/05/19
02/06/19
02/07/19
02/08/19
02/09/19
02/10/19

In [15]:
dates = pd.Series([
    '02/04/19',
    '02/05/19',
    '02/06/19',
    '02/07/19',
    '02/08/19',
    '02/09/19',
    '02/10/19',
])
dates.str.replace(r'(\d+)/(\d+)/(\d+)', r'20\3-\1-\2', regex=True)

0    2019-02-04
1    2019-02-05
2    2019-02-06
3    2019-02-07
4    2019-02-08
5    2019-02-09
6    2019-02-10
dtype: object

# 5. 
Write a regex to extract the various parts of these logfile lines:

GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58
POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58
GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58

In [17]:
logfile_re = r'''
^(?P<method>GET|POST)
\s+
(?P<path>.*?)
\s+
\[(?P<timestamp>.*?)\]
\s+
(?P<http_version>.*?)
\s+
\{(?P<status>\d+)\}
\s+
(?P<bytes_sent>\d+)
\s+
"(?P<user_agent>.*)$
'''

lines = pd.Series([
    'GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58',
    'POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58',
    'GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58',
])
lines.str.extract(logfile_re, re.VERBOSE)

Unnamed: 0,method,path,timestamp,http_version,status,bytes_sent,user_agent
0,GET,/api/v1/sales?page=86,16/Apr/2019:193452+0000,HTTP/1.1,200,510348,"python-requests/2.21.0"" 97.105.19.58"
1,POST,/users_accounts/file-upload,16/Apr/2019:193452+0000,HTTP/1.1,201,42,User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; ...
2,GET,/api/v1/items?page=3,16/Apr/2019:193453+0000,HTTP/1.1,429,3561,"python-requests/2.21.0"" 97.105.19.58"


# BONUS

Bonus Exercise

You can find a list of words on your mac at /usr/share/dict/words. Use this file to answer the following questions:

- How many words have at least 3 vowels?
- How many words have at least 3 vowels in a row?
- How many words have at least 4 consonants in a row?
- How many words start and end with the same letter?
- How many words start and end with a vowel?
- How many words contain the same letter 3 times in a row?
- What other interesting patterns in words can you find?