# Part 0 of 2: Simple string processing review

**Exercise ordering:** Each exercise builds logically on previous exercises, but you may solve them in any order. That is, if you can't solve an exercise, you can still move on and try the next one. Use this to your advantage, as the exercises are **not** necessarily ordered in terms of difficulty. Higher point values generally indicate more difficult exercises. 

**Debugging your code:** Right before each exercise test cell, there is a block of text explaining the variables available to you for debugging. You may use these to test your code and can print/display them as needed (careful when printing large objects, you may want to print the head or chunks of rows at a time).

**Exercise point breakdown:**

- Exercise 0: **1** point
- Exercise 1: **2** points
- Exercise 2: **2** points
- Exercise 3: **3** points

**Final reminders:** 

- Submit after **every exercise**
- Review the generated grade report after you submit to see what errors were returned
- Stay calm, skip problems as needed, and take short breaks at your leisure

In [1]:
### Global imports
import dill
from cse6040_devkit import plugins, utils
from cse6040_devkit.training_wheels import run_with_timeout, suppress_stdout
import tracemalloc
from time import time
import string 
import re 
from pprint import pprint



In [2]:
text = "sgtEEEr2020.0"

In [3]:
# Strings have methods for checking "global" string properties
print("1.", text.isalpha())

# These can also be applied per character
print("2.", [c.isalpha() for c in text])

1. False
2. [True, True, True, True, True, True, True, False, False, False, False, False, False]


In [4]:
# Here are a bunch of additional useful methods
print("BELOW: (global) -> (per character)")
print(text.isdigit(), "-->", [c.isdigit() for c in text])
print(text.isspace(), "-->", [c.isspace() for c in text])
print(text.islower(), "-->", [c.islower() for c in text])
print(text.isupper(), "-->", [c.isupper() for c in text])
print(text.isnumeric(), "-->", [c.isnumeric() for c in text])

BELOW: (global) -> (per character)
False --> [False, False, False, False, False, False, False, True, True, True, True, False, True]
False --> [False, False, False, False, False, False, False, False, False, False, False, False, False]
False --> [True, True, True, False, False, False, True, False, False, False, False, False, False]
False --> [False, False, False, True, True, True, False, False, False, False, False, False, False]
False --> [False, False, False, False, False, False, False, True, True, True, True, False, True]


### Exercise 0: (1 points)
**is_ssn**  

**Your task:** define `is_ssn` as follows:

Create a new function that checks whether a given input string is a properly formatted social security number, i.e., has the pattern, `XXX-XX-XXXX`, _including_ the separator dashes, where each `X` is a digit. It should return `True` if so or `False` otherwise.


In [5]:
### Solution - Exercise 0  
def is_ssn(s:str)->bool:
    
    pattern_object = re.compile(r"^\d{3}-\d{2}-\d{4}$")
    #Create matches object
    matches = pattern_object.search(s)
     #check for a match by evaluating if matches exists
    if matches:
        return True
    else:
        return False
    

### Demo function call
demo_ex0_s = ['832-38-1847', 
              '832 -38 -  1847', 
              '832-bc-3847', 
              '832381847']
results = []
for i, scenario in enumerate(demo_ex0_s):
    result = is_ssn(scenario)
    print(f"is_ssn({demo_ex0_s[i]})")
    print(f"--> {result}")
    print('\n')
    results.append(result)



is_ssn(832-38-1847)
--> True


is_ssn(832 -38 -  1847)
--> False


is_ssn(832-bc-3847)
--> False


is_ssn(832381847)
--> False




 

**The demo should display this printed output.**
```
is_ssn(832-38-1847)
--> True


is_ssn(832 -38 -  1847)
--> False


is_ssn(832-bc-3847)
--> False


is_ssn(832381847)
--> False
```


 ---
 <!-- Test Cell Boilerplate -->  
The cell below will test your solution for is_ssn (exercise 0). The testing variables will be available for debugging under the following names in a dictionary format.  
- `input_vars` - Input variables for your solution.   
- `original_input_vars` - Copy of input variables from prior to running your solution. Any `key:value` pair in `original_input_vars` should also exist in `input_vars` - otherwise the inputs were modified by your solution.  
- `returned_output_vars` - Outputs returned by your solution.  
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output. 


In [6]:
### Test Cell - Exercise 0  


from cse6040_devkit.tester_fw.testers import Tester
from yaml import safe_load
from time import time

tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")

# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
    executor = dill.load(f)

@run_with_timeout(error_threshold=200.0, warning_threshold=100.0)
@suppress_stdout
def execute_tests(**kwargs):
    return executor(**kwargs)


# Execute test
start_time = time()
passed, test_case_vars, e = execute_tests(func=is_ssn,
              ex_name='is_ssn',
              key=b'flr2UdDaaq0o1_4co0EqLREbUvnD9Ws2lhtvMkxG5HA=', 
              n_iter=20)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to is_ssn did not pass the test.'

###
### AUTOGRADER TEST - DO NOT REMOVE
###

print('Passed! Please submit.')

initial memory usage: 0.00 MB
Test duration: 0.11 seconds
memory after test: 3.07 MB
memory peak during test: 4.07 MB
Passed! Please submit.


# Regular expressions

Exercise 0 hints at the general problem of finding patterns in text. A handy tool for this problem is Python's Regular Expression module, `re`.

A _regular expression_ is a specially formatted pattern, written as a string. Matching patterns with regular expressions has 3 steps:

1. You come up with a pattern to find.
2. You compile it into a _pattern object_.
3. You apply the pattern object to a string to find _matches_, i.e., instances of the pattern within the string.

As you read through the examples below, refer also to the [regular expression HOWTO document](https://docs.python.org/3/howto/regex.html) for many more examples and details.

In [7]:
import re

## Basics

Let's see how this scheme works for the simplest case, in which the pattern is an *exact substring*. In the following example, suppose want to look for the substring `'fox'` within a larger input string.

In [8]:
pattern = 'fox'
pattern_matcher = re.compile(pattern)

input_string = 'The quick brown fox jumps over the lazy dog'
matches = pattern_matcher.search(input_string)
print(matches)

<re.Match object; span=(16, 19), match='fox'>


Observe that the returned object, `matches`, is a special object. Inspecting the printed output, notice that the matching text, `'fox'`, was found and located at positions 16-18 of the `input_string`. Had there been no matches, then `.search()` would have returned `None`, as in this example:

In [9]:
print(pattern_matcher.search("This input has a FOX, but it's all uppercase and so won't match."))

None


In [10]:
print(matches.group())
print(matches.start())
print(matches.end())
print(matches.span())

fox
16
19
(16, 19)


**Module-level searching.** For infrequently used patterns, you can also skip creating the pattern object and just call the module-level search function, `re.search()`.

In [11]:
matches_2 = re.search('jump', input_string)
assert matches_2 is not None
print ("Found", matches_2.group(), "@", matches_2.span())

Found jump @ (20, 24)


**Other Search Methods.** Besides `search()`, there are several other pattern-matching procedures:

1. `match()`    - Determine if the regular expression (RE) matches at the beginning of the string.
2. `search()`   - Scan through a string, looking for any location where this RE matches.
3. `findall()`  - Find all substrings where the RE matches, and returns them as a list.
4. `finditer()` - Find all substrings where the RE matches, and returns them as an iterator.

We'll use several of these below; again, refer to the [HOWTO](https://docs.python.org/3/howto/regex.html) for more details.

## A pattern language

An exact substring is one kind of pattern, but the power of regular expressions is that it provides an entire "_mini-language_" for specifying more general patterns.

To start, read the section of the HOWTO on ["Simple Patterns"](https://docs.python.org/3/howto/regex.html#simple-patterns). We highlight a few constructs below.

In [12]:
# Metacharacter classes
vowels = '[aeiou]'

print(f"Scanning `{input_string}` for vowels, `{vowels}`:")
for match_vowel in re.finditer(vowels, input_string):
    print(match_vowel)

Scanning `The quick brown fox jumps over the lazy dog` for vowels, `[aeiou]`:
<re.Match object; span=(2, 3), match='e'>
<re.Match object; span=(5, 6), match='u'>
<re.Match object; span=(6, 7), match='i'>
<re.Match object; span=(12, 13), match='o'>
<re.Match object; span=(17, 18), match='o'>
<re.Match object; span=(21, 22), match='u'>
<re.Match object; span=(26, 27), match='o'>
<re.Match object; span=(28, 29), match='e'>
<re.Match object; span=(33, 34), match='e'>
<re.Match object; span=(36, 37), match='a'>
<re.Match object; span=(41, 42), match='o'>


In [13]:
# Counts: For instance, two or more consecutive vowels:
two_or_more_vowels = vowels + '{2,}'
print(f"Pattern: {two_or_more_vowels}")
print(re.findall(two_or_more_vowels, input_string))

Pattern: [aeiou]{2,}
['ui']


In [14]:
# Wildcards
cats = "ca+t"
print(re.search(cats, "is this a ct?"))
print(re.search(cats, "how about this cat?"))
print(re.search(cats, "and this one: caaaaat, yes or no?"))

None
<re.Match object; span=(15, 18), match='cat'>
<re.Match object; span=(14, 21), match='caaaaat'>


In [15]:
# Special operator: "or"
adjectives = "lazy|brown"
print(f"Scanning `{input_string}` for adjectives, `{adjectives}`:")
for match_adjective in re.finditer(adjectives, input_string):
    print(match_adjective)

Scanning `The quick brown fox jumps over the lazy dog` for adjectives, `lazy|brown`:
<re.Match object; span=(10, 15), match='brown'>
<re.Match object; span=(35, 39), match='lazy'>


In [16]:
# Predefined character classes
three_digits = '\d\d\d'
print(re.findall(three_digits, "My number is 555-123-4567"))

['555', '123', '456']


> In the previous example, notice that the pattern search proceeds from left-to-right and does not return overlaps: here, the matcher returns `456` but not `567`. In fact, this case is an instance of the default [_greedy behavior_](https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy) of the matcher.

**The backslash plague.** In the "three-digits" example, we used the predefined metacharacter class, `'\d'`, to match slashes. But what if you want to match a _literal_ slash? The HOWTO describes how things can get out of control in its subsection on ["The Backslash Plague"](https://docs.python.org/3/howto/regex.html#the-backslash-plague), which occurs because the Python interpreter processes backslashes in string literals (e.g., so that `\t` expands to a tab character and `\n` to a newline) while the regular expression processor also gives backslashes meaning (e.g., so that `\d` is a digit metaclass).

For example, suppose you want to look for the text string, `\section`, in some input string. Which of the following will match it? Recall that `\s` is a predefined metacharacter class that matches any whitespace character.

In [17]:
input_with_slash_section = "This string contains `\section`, which we would like to match."

print(f"Searching: {input_with_slash_section}")

print(re.search("\section", input_with_slash_section))
print(re.search("\\section", input_with_slash_section))
print(re.search("\\\\section", input_with_slash_section))

Searching: This string contains `\section`, which we would like to match.
None
None
<re.Match object; span=(22, 30), match='\\section'>


To help mitigate this case, Python provides a special type of string called a _raw string_, which is a string literal prefixed by the letter `r`. For such strings, the Python interpreter will not process the backslash.

> Although the interpreter won't process the backslash, the regular expression processor will do so. As such, the pattern string still needs _two_ slashes, as shown below.

In [18]:
print(re.search(r"\section", input_with_slash_section))
print(re.search(r"\\section", input_with_slash_section))
print(re.search(r"\\\\section", input_with_slash_section))

None
<re.Match object; span=(22, 30), match='\\section'>
None


Indeed, it is common style to always use raw strings for regular expression patterns, as we'll do in the examples that follow.

**Creating pattern groups.** Another handy construct are [_pattern groups_](https://docs.python.org/3/howto/regex.html#grouping), as we show in the next code cell.

Suppose we have a string that we know contains a name of the form, "(first) (middle) (last)", where the middle name is _optional_. We can use pattern groups to isolate each component of the name and tag the middle name as optional using the "zero-or-one" metacharacter, `'?'`.

The group itself is a subpattern enclosed within parentheses. When a match is found, we can extract the groups by calling `.groups()` on the match object, which returns a tuple of all matched groups.

> To make this pattern more readable, we have also used Python's multiline string literal combined with the [`re.VERBOSE` option](https://docs.python.org/2/library/re.html#re.VERBOSE), which then allows us to include whitespace and comments as part of the pattern string.

In [19]:
# Make the expression more readable with a re.VERBOSE pattern
re_names2 = re.compile(r'''^              # Beginning of string
                           ([a-zA-Z]+)    # First name
                           \s+            # At least one space
                           ([a-zA-Z]+\s)? # Optional middle name
                           ([a-zA-Z]+)    # Last name
                           $              # End of string
                        ''',
                        re.VERBOSE)
print(re_names2.match('Rich Vuduc').groups())
print(re_names2.match('Rich S Vuduc').groups())
print(re_names2.match('Rich Salamander Vuduc').groups())

('Rich', None, 'Vuduc')
('Rich', 'S ', 'Vuduc')
('Rich', 'Salamander ', 'Vuduc')


**Tagging pattern groups.** You can also name pattern groups, which helps make your extraction code a bit more readable.

In [20]:
# Named groups
re_names3 = re.compile(r'''^
                           (?P<first>[a-zA-Z]+)
                           \s
                           (?P<middle>[a-zA-Z]+\s)?
                           \s*
                           (?P<last>[a-zA-Z]+)
                           $
                        ''',
                        re.VERBOSE)
print(re_names3.match('Rich Vuduc').group('first'))
print(re_names3.match('Rich S Vuduc').group('middle'))
print(re_names3.match('Rich Salamander Vuduc').group('last'))

Rich
S 
Vuduc


**A regular expression debugger.** Regular expressions can be tough to write and debug, but thankfully, there are several online tools to help! See, for instance, [regex101](https://regex101.com/), [pythex](https://pythex.org/), [regexr](https://regexr.com/), or [debuggex](https://www.debuggex.com/). These all allow you to supply some sample input text and test what your pattern does in real time.

## Email addresses

In the next exercise, you'll apply what you've read and learned about regular expressions to build a pattern matcher for email addresses. Again, if you haven't looked through the HOWTO yet, take a moment to do that!

Although there is a [formal specification of what constitutes a valid email address](https://tools.ietf.org/html/rfc5322#section-3.4.1), for this exercise, let's use the following simplified rules.

* We will restrict our attention to ASCII addresses and ignore Unicode. If you don't know what that means, don't worry about it---you shouldn't need to do anything special given our code templates, below.
* An email address has two parts, the username and the domain name. These are separated by an `@` character.
* A username **must begin with an alphabetic** character. It may be followed by any number of additional _alphanumeric_ characters or any of the following special characters: `.` (period), `-` (hyphen), `_` (underscore), or `+` (plus).
* A domain name **must end with an alphabetic** character. It may consist of any of the following characters: alphanumeric characters, `.` (period), `-` (hyphen), or `_` (underscore).
* Alphabetic characters may be uppercase or lowercase.
* No whitespace characters are allowed.

Valid domain names usually have additional restrictions, e.g., there are a limited number of endings, such as `.com`, `.edu`, and so on. However, for this exercise you may ignore this fact.

### Exercise 1: (2 points)
**parse_email**  

**Your task:** define `parse_email` as follows:

Write a function `parse_email` that, given an email address `s`, returns a tuple, `(user-id, domain)` corresponding to the user name and domain name.

For instance, given `richie@cc.gatech.edu` it should return `('richie', 'cc.gatech.edu')`.

Your function should parse the email only if it exactly matches the email specification. For example, if there are leading or trailing spaces, the function should *not* match those. See the test cases for examples.

If the input is not a valid email address, the function should raise a `ValueError`.

> The requirement, "raise a `ValueError`" refers to a technique for handling errors in a program known as _exception handling_. The Python documentation covers [exceptions](https://docs.python.org/3/tutorial/errors.html) in more detail, including [raising `ValueError` objects](https://docs.python.org/3/tutorial/errors.html#raising-exceptions).


In [21]:
### Solution - Exercise 1  
def parse_email(s):
    #Parses a string as an email address, returning an (id, domain) pair.
   
    pattern = re.compile(r'''^
                            (?P<user_id>[a-zA-Z][a-zA-Z0-9+._-]*)
                            @
                            
                            (?P<domain>[a-zA-Z0-9._-]*[a-zA-Z])
                            $''',
                            re.VERBOSE)
    matches = pattern.match(s)

    
    if matches:
        return matches.groups()
    raise ValueError("Bad email address")
    
  

### Demo function call
demo_ex1_s = ['richie@cc.gatech.edu',
              'what-do-you-know+not-much@gmail.com',
              'x @hpcgarage.org',
              'richie@cc.gatech.edu7']
results = []
for i, scenario in enumerate(demo_ex1_s):
    try:
        result = parse_email(scenario)
        print(f"parse_email({demo_ex1_s[i]}, parse_email)")
        print(f"--> {result}")
        print('\n')
        results.append(result)
    except ValueError as e:
        print(f"parse_email({demo_ex1_s[i]}, parse_email)")
        print(f"--> ValueError: {e}")
        print('\n')
        results.append(f"ValueError: {e}")



parse_email(richie@cc.gatech.edu, parse_email)
--> ('richie', 'cc.gatech.edu')


parse_email(what-do-you-know+not-much@gmail.com, parse_email)
--> ('what-do-you-know+not-much', 'gmail.com')


parse_email(x @hpcgarage.org, parse_email)
--> ValueError: Bad email address


parse_email(richie@cc.gatech.edu7, parse_email)
--> ValueError: Bad email address




 

**The demo should display this printed output.**
```
parse_email(richie@cc.gatech.edu, parse_email)
--> ('richie', 'cc.gatech.edu')


parse_email(what-do-you-know+not-much@gmail.com, parse_email)
--> ('what-do-you-know+not-much', 'gmail.com')


parse_email(x @hpcgarage.org, parse_email)
--> ValueError: Bad email address


parse_email(richie@cc.gatech.edu7, parse_email)
--> ValueError: Bad email address
```


 ---
 <!-- Test Cell Boilerplate -->  
The cell below will test your solution for parse_email (exercise 1). The testing variables will be available for debugging under the following names in a dictionary format.  
- `input_vars` - Input variables for your solution.   
- `original_input_vars` - Copy of input variables from prior to running your solution. Any `key:value` pair in `original_input_vars` should also exist in `input_vars` - otherwise the inputs were modified by your solution.  
- `returned_output_vars` - Outputs returned by your solution.  
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output. 


In [22]:
### Test Cell - Exercise 1  


from cse6040_devkit.tester_fw.testers import Tester
from yaml import safe_load
from time import time

tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")

# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
    executor = dill.load(f)

@run_with_timeout(error_threshold=200.0, warning_threshold=100.0)
@suppress_stdout
def execute_tests(**kwargs):
    return executor(**kwargs)


# Execute test
start_time = time()
passed, test_case_vars, e = execute_tests(func=plugins.error_handler(parse_email),
              ex_name='parse_email',
              key=b'flr2UdDaaq0o1_4co0EqLREbUvnD9Ws2lhtvMkxG5HA=', 
              n_iter=51)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to parse_email did not pass the test.'

###
### AUTOGRADER TEST - DO NOT REMOVE
###

print('Passed! Please submit.')

initial memory usage: 0.00 MB
Test duration: 0.04 seconds
memory after test: 0.02 MB
memory peak during test: 1.19 MB
Passed! Please submit.


## Phone numbers

### Exercise 2: (2 points)
**parse_phone1**  

**Your task:** define `parse_phone1` as follows:

Write a function to parse US phone numbers written in the canonical "(404) 555-1212" format, i.e., a three-digit area code enclosed in parentheses followed by a seven-digit local number in three-hyphen-four digit format. It should also **ignore** all leading and trailing spaces, as well as any spaces that appear between the area code and local numbers. However, it should **not** accept any spaces in the area code (e.g., in '(404)') nor should it in the seven-digit local number.

For example, these would be considered valid phone number strings:
```python
'(404) 121-2121'
'(404)121-2121     '
'   (404)      121-2121'
```

By contrast, these should be rejected:
```python
'404-121-2121'
'(404)555 -1212'
' ( 404)121-2121'
'(abc) 555-12i2'
```

It should return a triple of strings, `(area_code, first_three, last_four)`. 

If the input is not a valid phone number, it should raise a `ValueError`.


In [23]:
### Solution - Exercise 2  
def parse_phone1(s):
    #remove leading and trailing spaces
    
    s_string = s.strip()
    
    pattern = re.compile(r'''^
        \( 
        (?P<area_code>\d{3}) 
        \)
        
        \s*
        
        (?P<first_three>\d{3})
        -
        (?P<last_four>\d{4})
        $''',
        re.VERBOSE
        )
    matches = pattern.search(s_string)
    if matches:
        return matches.groups()
    raise ValueError(f"Invalid phone number? '{s}'")


### Demo function call
demo_ex2_s = ['(404) 121-2121',
              '404-121-2121']
results = []
for i, scenario in enumerate(demo_ex2_s):
    try:
        result = parse_phone1(scenario)
        print(f"parse_phone1({demo_ex2_s[i]}, parse_phone1)")
        print(f"--> {result}")
        print('\n')
        results.append(result)
    except ValueError as e:
        print(f"parse_phone1({demo_ex2_s[i]}, parse_phone1)")
        print(f"--> ValueError: {e}")
        print('\n')
        results.append(f"ValueError: {e}")



parse_phone1((404) 121-2121, parse_phone1)
--> ('404', '121', '2121')


parse_phone1(404-121-2121, parse_phone1)
--> ValueError: Invalid phone number? '404-121-2121'




 

**The demo should display this printed output.**
```
parse_phone1((404) 121-2121, parse_phone1)
--> ('404', '121', '2121')


parse_phone1(404-121-2121, parse_phone1)
--> ValueError: Invalid phone number? '404-121-2121'
```


 ---
 <!-- Test Cell Boilerplate -->  
The cell below will test your solution for parse_phone1 (exercise 2). The testing variables will be available for debugging under the following names in a dictionary format.  
- `input_vars` - Input variables for your solution.   
- `original_input_vars` - Copy of input variables from prior to running your solution. Any `key:value` pair in `original_input_vars` should also exist in `input_vars` - otherwise the inputs were modified by your solution.  
- `returned_output_vars` - Outputs returned by your solution.  
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output. 


In [72]:
### Test Cell - Exercise 2  


from cse6040_devkit.tester_fw.testers import Tester
from yaml import safe_load
from time import time

tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")

# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
    executor = dill.load(f)

@run_with_timeout(error_threshold=200.0, warning_threshold=100.0)
@suppress_stdout
def execute_tests(**kwargs):
    return executor(**kwargs)


# Execute test
start_time = time()
passed, test_case_vars, e = execute_tests(func=plugins.error_handler(parse_phone1),
              ex_name='parse_phone1',
              key=b'flr2UdDaaq0o1_4co0EqLREbUvnD9Ws2lhtvMkxG5HA=', 
              n_iter=20)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to parse_phone1 did not pass the test.'

###
### AUTOGRADER TEST - DO NOT REMOVE
###

print('Passed! Please submit.')

initial memory usage: 0.00 MB
Test duration: 0.03 seconds
memory after test: 0.02 MB
memory peak during test: 1.19 MB
Passed! Please submit.


### Exercise 3: (3 points)
**parse_phone2**  

**Your task:** define `parse_phone2` as follows:

Implement an enhanced phone number parser that can handle any of these patterns.

* (404) 555-1212
* (404) 5551212
* 404-555-1212
* 404-5551212
* 404555-1212
* 4045551212

As seen in the examples above, this parser should also handle the cases in which area code is not enclosed in parentheses. As before, it should not be sensitive to leading or trailing spaces. Also, for the patterns in which the area code is enclosed in parentheses, it should not be sensitive to the number of spaces separating the area code from the remainder of the number.


In [37]:
### Solution - Exercise 3  
def parse_phone2(s):
    
    s_string = s.strip()
    
    pattern = re.compile(r'''^
        (?:
        \((?P<area_code>\d{3})\)
        \s* 
        |
        (?P<area_code_alt>\d{3})
        [-\s]?
        )
         
        (?P<first_three>\d{3})
        
        [-\s]?
        
        (?P<last_four>\d{4})
        $''',
        re.VERBOSE
        )
    matches = pattern.search(s_string)
    if matches:
        area = matches.group('area_code') or matches.group('area_code_alt')
        return (area, matches.group('first_three'), matches.group('last_four') )
    
    raise ValueError(f"'{s}' is not in the right format.") 

### Demo function call
demo_ex3_s = ['404-5551212',
              '(404-555-1212']
results = []
for i, scenario in enumerate(demo_ex3_s):
    try:
        result = parse_phone2(scenario)
        print(f"parse_phone2({demo_ex3_s[i]}, parse_phone2)")
        print(f"--> {result}")
        print('\n')
        results.append(result)
    except ValueError as e:
        print(f"parse_phone2({demo_ex3_s[i]}, parse_phone2)")
        print(f"--> ValueError: {e}")
        print('\n')
        results.append(f"ValueError: {e}")



parse_phone2(404-5551212, parse_phone2)
--> ('404', '555', '1212')


parse_phone2((404-555-1212, parse_phone2)
--> ValueError: '(404-555-1212' is not in the right format.




 

**The demo should display this printed output.**
```
parse_phone2(404-5551212, parse_phone2)
--> ('404', '555', '1212')


parse_phone2((404-555-1212, parse_phone2)
--> ValueError: '(404-555-1212' is not in the right format.
```


 ---
 <!-- Test Cell Boilerplate -->  
The cell below will test your solution for parse_phone2 (exercise 3). The testing variables will be available for debugging under the following names in a dictionary format.  
- `input_vars` - Input variables for your solution.   
- `original_input_vars` - Copy of input variables from prior to running your solution. Any `key:value` pair in `original_input_vars` should also exist in `input_vars` - otherwise the inputs were modified by your solution.  
- `returned_output_vars` - Outputs returned by your solution.  
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output. 


In [38]:
### Test Cell - Exercise 3  


from cse6040_devkit.tester_fw.testers import Tester
from yaml import safe_load
from time import time

tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")

# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
    executor = dill.load(f)

@run_with_timeout(error_threshold=200.0, warning_threshold=100.0)
@suppress_stdout
def execute_tests(**kwargs):
    return executor(**kwargs)


# Execute test
start_time = time()
passed, test_case_vars, e = execute_tests(func=plugins.error_handler(parse_phone2),
              ex_name='parse_phone2',
              key=b'flr2UdDaaq0o1_4co0EqLREbUvnD9Ws2lhtvMkxG5HA=', 
              n_iter=51)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to parse_phone2 did not pass the test.'

###
### AUTOGRADER TEST - DO NOT REMOVE
###

print('Passed! Please submit.')

initial memory usage: 0.00 MB
Test duration: 0.04 seconds
memory after test: 0.02 MB
memory peak during test: 1.19 MB
Passed! Please submit.


**Fin!** This cell marks the end of Part 0. Don't forget to save, restart and rerun all cells, and submit it. When you are done, proceed to Parts 1 and 2.