In [1]:
import re

1. Write a function named `is_vowel`. It should accept a string as input and use a regular expression to determine if the passed string is a vowel. While not explicity mentioned in the lesson, you can treat the result of re.search as a boolean value that indicates whether or not the regular expression matches the given string.

In [139]:
#making an assumption that I am looking for strings that
#are a single vowel

def is_vowel(s):
    return bool(re.findall(r'^[aeiou]$', s, re.IGNORECASE))

In [141]:
is_vowel('a'), is_vowel('ea'), is_vowel('hello'), is_vowel('A'), is_vowel('D')

(True, False, False, True, False)

2. Write a function named `is_valid_username` that accepts a string as input. A valid username starts with a lowercase letter, and only consists of lowercase letters, numbers, or the _ character. It should also be no longer than 32 characters. The function should return either True or False depending on whether the passed string is a valid username.

In [143]:
def is_valid_username(s):
    return bool(re.match(r'^[a-z][a-z0-9/_]{,31}$', s))

In [146]:
is_valid_username('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'),is_valid_username('codeup'), is_valid_username('code_up')

(False, True, True)

In [145]:
is_valid_username('Codeup'), is_valid_username('codeup123'), is_valid_username('1codeup')

(False, True, False)

Breaking down the regex:
- `^[a-z]` must begin with a letter
- `[a-z0-9/_]{,31}` can be followed by any letter, number or underscore, but can be followed by more than 31 characters
- `$` no more characters allowed

3. Write a regular expression to capture phone numbers. It should match all of the following:

`(210) 867 5309`

`+1 210.867.5309`

`867-5309`

`210-867-5309`


In [150]:
def match_phonenumber(s):
    return bool(re.match(r'^(\+?\d+\s)?(\(?[2-9][0-9]{2}\)?[\s\.\-]?)?[2-9][0-9]{2}[\.\-\s]?[0-9]{4}$',s))

In [151]:
match_phonenumber('(210) 867 5309'), match_phonenumber('+1 210.867.5309')

(True, True)

In [152]:
match_phonenumber('867-5309'), match_phonenumber('210-867-5309')

(True, True)

Breaking down the regular expression:

- `(\+[1-9]\s)?` optional country code preceded by a `+`
- `(\(?[2-9][0-9]{2}\)?[\s\.\-])?` optional area code
    - `\(?` can begin with a `(`
    - `[2-9][0-9]{2}` area codes begin with a number 2 through 9 and have two digits after
    - `\)?` can end in a `(`
    - `[\s\.\-]` can be seperated by a space, period or dash
- `[2-9][0-9]{2}` central office/exchange code formatted same as area code
- `[\.\-\s]` can be seperated by a space, period or dash
- `[0-9]{4}` line number
- `^` ... `$` to capture only the entire string, so that numbers cannot be longer than a phone number

Note the problem, this will match `+1 (210-567-8729` and similar formats.  ideally, a function checking for phone numbers would just remove any punctuation and check that groups of numbers are valid based on the phone numbering system.  I also assumed the following to determine digit types: <a href='https://en.wikipedia.org/wiki/North_American_Numbering_Plan#Modern_plan'>North American Numbering Plan/Modern Plan</a>.

4. Use regular expressions to convert the dates below to the standardized year-month-day format.

`02/04/19`

`02/05/19`

`02/06/19`

`02/07/19`

`02/08/19`

`02/09/19`

`02/10/19`

Note: I assume that the years will be 2000 or later

In [10]:
def format_date(s):
    date_re = r'([0-1][1-9])/([0-3][0-9])/([0-9]{2})'
    month, day, year = re.search(date_re, s).groups()
    return f"20{year}-{month}-{day}"

In [11]:
format_date('02/04/19'), format_date('02/05/19'), format_date('02/06/19'), format_date('02/07/19')

('2019-02-04', '2019-02-05', '2019-02-06', '2019-02-07')

In [12]:
format_date('02/08/19'), format_date('02/09/19'), format_date('02/10/19')

('2019-02-08', '2019-02-09', '2019-02-10')

Can also user pandas: `.str.replace(regex=True)`

5. Write a regex to extract the various parts of these logfile lines:

`GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58`

`POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58`

`GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58`

In [67]:
def parse_log_entry(s):
    log_re = r'([A-Z]{,4}) ([/].*) (\[[0-9]{2}\/[A-Z][a-z]{2}\/[0-9]{4}:[0-9]*\+[0-9]{4}]) (HTTP/[1-9]\.[1-9]) (\{[0-9]*\}) ([0-9]+) (\"[\w\W]+\") ([0-9]{,3}\.[0-9]{,3}\.[0-9]{,3}\.[0-9]{,3})'
    result = re.findall(log_re, s)
    return result

In [68]:
parse_log_entry('GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58')

[('GET',
  '/api/v1/sales?page=86',
  '[16/Apr/2019:193452+0000]',
  'HTTP/1.1',
  '{200}',
  '510348',
  '"python-requests/2.21.0"',
  '97.105.19.58')]

In [69]:
parse_log_entry('POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58')

[('POST',
  '/users_accounts/file-upload',
  '[16/Apr/2019:193452+0000]',
  'HTTP/1.1',
  '{201}',
  '42',
  '"User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"',
  '97.105.19.58')]

In [70]:
parse_log_entry('GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58')

[('GET',
  '/api/v1/items?page=3',
  '[16/Apr/2019:193453+0000]',
  'HTTP/1.1',
  '{429}',
  '3561',
  '"python-requests/2.21.0"',
  '97.105.19.58')]

#### Bonus

In [104]:
def get_text():
    with open('/usr/share/dict/words', 'r') as f:
        text = f.read()
    return text
        
text = get_text()

In [111]:
#How many words have at least 3 vowels?
def at_least_three_vowels(t):
    regex_1 = r'.*[aeiou].*[aeiou].*[aeiou].*'
    return re.findall(regex_1, t, re.IGNORECASE)

In [113]:
len(at_least_three_vowels(text))

191365

In [114]:
#How many words have at least 3 vowels in a row?
def at_least_three_vowels_in_row(t):
    regex_2 = r'.*[aeiou][aeiou][aeiou].*'
    return re.findall(regex_2, t, re.IGNORECASE)

In [120]:
len(at_least_three_vowels_in_row(text))

6182

In [129]:
#How many words have at least 4 consonants in a row?
def at_least_four_consonants_in_row(t):
    regex_3 = r'.*[^aeiou\s][^aeiou\s][^aeiou\s][^aeiou\s].*'
    return re.findall(regex_3, t, re.IGNORECASE)

In [130]:
len(at_least_four_consonants_in_row(text))

19241

In [134]:
#How many words start and end with a vowel?
def start_and_end_vowel(t):
    regex_4 = r'\b[aeiou].*[aeiou]\b'
    return re.findall(regex_4, t, re.IGNORECASE)

In [136]:
len(start_and_end_vowel(text))

14657