# Exercises for `01_regex` lesson

In [1]:
# library imports
import pandas as pd
import numpy as np

import re

# 1. `is_vowel` function
Write a function named `is_vowel`. It should accept a `string` as input and use a `regular expression` to determine if the passed string is a vowel. While not explicity mentioned in the lesson, you can treat the result of `re.search` as a boolean value that indicates whether or not the regular expression matches the given string.

>Generating and testing the code for the function...

In [7]:

# first, testing on a string that contains vowels, but is not a vowel itself
print(re.search(r'aeiou', 'Test: This is a test.'))

None


>`None`: Passed! Now testing on string that is a vowel...

In [10]:
print(re.search(r'aeiou', 'a'))

None


>`None`: Failed. I will try the `|` sign in the regex between each vowel character...

In [11]:
print(re.search(r'a|e|i|o|u', 'a'))

<re.Match object; span=(0, 1), match='a'>


>`Match`: Passed! Now I will include another character that is not a vowel...

In [14]:
print(re.search(r'a|e|i|o|u', 'ab'))

<re.Match object; span=(0, 1), match='a'>


>`Match`: Failed. I will try to add the metacharacter `{n}` with *n* == `1`

In [20]:
print(re.search(r'aeiou', 'ab'))

None


>`None`: Passed! Now I will try it on a non-vowel...

In [37]:
print(re.match(r'[aeiou]', 'ab'))

<re.Match object; span=(0, 1), match='a'>


>`Match`: Failed. I will try using the `{n}` with *n* == `1` so it only matches one instance of the vowel and `$` so that it ends.

In [41]:
print(re.search(r'[aeiou]{1}$', 'ab'))

None


>`None`: Passed! I will try this with 

In [40]:
print(re.search(r'[aeiou]{1}$', 'a'))

<re.Match object; span=(0, 1), match='a'>

>Defining `is_vowel` function...

In [162]:
def is_vowel(string):
    '''
    this function takes in a string and defines a regex statement that, when combined with the 
    re search function, will test the string to see whether it is a single vowel character.
        - if the string is a not a single vowel, it will print: Now a vowel.
        - if the strng is a single vowel, it will print: Vowel!
    '''
    regex = r'[aeiou]{1}$'
    
    # the lines of code below will print the corresponding statements... 
    if re.search(regex, string, re.IGNORECASE) == None:
        print('Not a vowel.')
    
    else:
        print('Vowel!')




>Testing...

In [163]:
# single vowel test
is_vowel('a')

Vowel!


In [164]:
# not single vowel test
is_vowel('ab')

Not a vowel.


In [165]:
is_vowel('A')

Vowel!


>It works!

### Instructor solution

# 2. `is_valid_username` function
Write a function named `is_valid_username` that accepts a `string` as input. A valid username starts with a lowercase letter, and only consists of lowercase letters, numbers, or the `_` character. It should also be no longer than 32 characters. 


Conditions:
1. starts with lowercase letters
2. only consists of:
    - lowercase letters,
    - numbers, or
    - `_` char
3. no longer than 32 chars
<br>
<br>

The function should return either `True` or `False` depending on whether the passed string is a valid username.


>Generating and testing code for the function...<br>
>1. Starting with starts with lowercase letter

In [67]:
# starts with lowercase letter 
re.search(r'^[a-z]', 'user_name')

# Pass >> Match = 'user_name'

<re.Match object; span=(0, 1), match='u'>

In [68]:
# starts with uppercase letter
re.search(r'^[a-z]', 'User_name')

# None >> Pass

>2. only consists of:
>    - lowercase letters,
>    - numbers, or
>    - `_` char

In [111]:
# all lowercase and _ test
re.search(r'^[a-z0-9_]+$', 'user_name')

<re.Match object; span=(0, 9), match='user_name'>

In [112]:
# not all lowercase test
re.search(r'^[a-z0-9_]+$', 'user_Name')

In [113]:
# numbers test
re.search(r'^[a-z0-9_]+$', 'user_name1')

<re.Match object; span=(0, 10), match='user_name1'>

In [114]:
# no _ test
re.search(r'^[a-z0-9_]+$', 'username')

<re.Match object; span=(0, 8), match='username'>

In [116]:
# empty space test
re.search(r'^[a-z0-9_]+$', 'user name')

>3. no longer than 32 chars

In [141]:
def check_len(string, max_len):
    '''
    this function takes in a string and max_len value and returns True if the string is less than
    or equal to the max len of the input string.
    '''
    
    return len(string) <= max_len

In [142]:
check_len('string'*8, 100)

True

In [143]:
check_len('string'*8, 32)

False

>Now that I have a way to test the max_lenth, I can put everything together in a function to test...

In [154]:
def is_valid_user_name(string, max_len):
    '''
    
    '''
    
    # defining regular expression
    regex = r'^[a-z0-9_]+$'
    
    if re.search(regex, string) == None:
        print('Invalid password. Password contains invalid character.')
    
    else:
        if len(string) > max_len:
            print(f'Invalid password. Password cannot be longer than {max_len} characters.')
        else:
            print('Valid password. Password meets criteria!')
    

In [155]:
# testing pw that meets all criteria
is_valid_user_name('user_name', 32)

Valid password. Password meets criteria!


In [156]:
# testing password that contains empty space
is_valid_user_name('user name', 32)

Invalid password. Password contains invalid character.


In [157]:
# testing password that begins with uppercase character
is_valid_user_name('User_name', 32)

Invalid password. Password contains invalid character.


In [158]:
# testing pw that contains uppercase character
is_valid_user_name('user_Name', 32)

Invalid password. Password contains invalid character.


In [159]:
# testing password that contains invalid special character
is_valid_user_name('user_name!', 32)

Invalid password. Password contains invalid character.


In [160]:
# testing password that is longer than max_len
is_valid_user_name('user_name123456789101112', 10)

Invalid password. Password cannot be longer than 10 characters.


>Function tests successful!

# 3. ph# regex
Write a regular expression to capture phone numbers. It should match all of the following:
>`'(210) 867 5309'`<br>
`'+1 210.867.5309'`<br>
`'867-5309'`<br>
`'210-867-530'`<br>

# 4. dates regex
Use regular expressions to convert the dates below to the standardized year-month-day format.

>`'02/04/19'`<br>
`'02/05/19'`<br>
`'02/06/19'`<br>
`'02/07/19'`<br>
`'02/08/19'`<br>
`'02/09/19'`<br>
`'02/10/19'`<br>

# 5. logfil regex
Write a regex to extract the various parts of these logfile lines:

`'GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58'`<br><hr>
`'POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58'`<br><hr>
`'GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58'`<br><hr>


# Bonus | `/usr/share/dict/words` mac words
You can find a list of words on your mac at `/usr/share/dict/words`. Use this file to answer the following questions:
1. How many words have at least 3 vowels?
2. How many words have at least 3 vowels in a row?
3. How many words have at least 4 consonants in a row?
4. How many words start and end with the same letter?
5. How many words start and end with a vowel?
6. How many words contain the same letter 3 times in a row?
7. What other interesting patterns in words can you find?

In [169]:
bonus = pd.read_csv('/usr/share/dict/words', header = None)
bonus

Unnamed: 0,0
0,A
1,a
2,aa
3,aal
4,aalii
...,...
235881,zythem
235882,Zythia
235883,zythum
235884,Zyzomys


In [172]:
def starts_w_z(string):
    
    return string[0] == 'z'

In [175]:
bonus['z_start'] = bonus[0].apply(starts_w_z)
bonus

TypeError: 'float' object is not subscriptable

In [171]:
st[0]

's'

In [176]:
bonus[0]

0                  A
1                  a
2                 aa
3                aal
4              aalii
             ...    
235881        zythem
235882        Zythia
235883        zythum
235884       Zyzomys
235885    Zyzzogeton
Name: 0, Length: 235886, dtype: object