#CSE 101: Computer Science Principles
####Stony Brook University, Summer 2020

### Homework Assignment #8

#### Due: **Friday**, July 17, 2020 at 11:59 pm EDT

#### Learning Outcomes
By the end of these assignment you should be able to:
* Implement simple compression and cryptography algorithms

### Preliminaries

For this assignment you will be working with regular expressions in Python. Various functions for working with regular expressions are available in the `re` module. Fortunately, Python makes it pretty easy to check if a string matches
a particular pattern.

At the top of the Colab file we must import the `re` module:

`import re`

Then we can use the `search()` function to test whether a string matches a pattern. In the example below, the
regular expression has been saved in a string called `pattern` for convenience:

```
phone = "123-456-7890"
pattern = r"\d{3}-\d{3}-\d{4}"
if re.search(pattern, phone):
    print("The string matches the pattern.")
else:
    print("The string does not match the pattern.")
```

The `r` that precedes the pattern string is not a typo. Rather, the `r` indicates that the string is a "raw" string. In a raw string, as opposed to a "normal" string, any backslash character is interpreted as simply a backslash, as opposed to defining an escape sequence like `\n` or `\t`. Make sure you use raw strings in your Python code when defining regular expressions.

The `ˆ` and `$` at the respective beginning and end of the regular expression indicate that the entire string must match the regular expression, and not just part of the string. Make sure you include these symbols in your regular
expressions too!

You can play around with regular expressions on this web-based platform: [regex101.com](https://regex101.com/).

### Part 1: Phone Number Checker (20 points)

Write a function `phone_number()` that takes one parameter, which is a list of strings, each being a potential phone number. The function examines each string in the list and returns a list of indexes of those strings which meet the following description/requirement.

A valid string starts with:
1. an optional open parenthesis, "("; followed by 
1. three digits (the first one being non-zero); then followed by 
1. an *optional* close parenthesis, ")" (the close parenthesis will only be present if there is a corresponding open parenthesis and vice versa); then followed by
1. an *optional* hyphen, "-"; followed by
1. three more digits; followed by
1. an *optional* hyphen, "-" again; and finally ends with (vii) four more digits.

A few valid examples of this format are: 
* "(631)111-2211"
* "631111-2211"
* "631-111-2211"

A few invalid examples are: 
* "(631111-2211"
* "631 111-2211"
* "631)-111-2211"

Note: A validly formatted string may not contain any spaces.

Expected Results:

Function Argument | Return Value
:-- | :--
`['(091)-111-1234', '6311112222', '(631)-111-2222', '(631) 111-2222']` | `[1, 2]`
`['091-111-1234', '631 111 2222', '(631)1112222', '0911111234']` | `[2]`
`['631-1111-234', '631--111-2222', '(6311112222', '631)1111234'] | `[]`
`['(631)-8675309', '6311112222', '(631)111-2222', '(631)1111234']` | `[0, 1, 2, 3]`

In [None]:
import re

def phone_number(numbers):
    valid = r"^(\([1-9]\d{2}\)|[1-9]\d{2})-?\d{3}(-{1})?\d{4}$"
    return [i for i in range(len(numbers)) if re.search(valid, numbers[i])]

# Test cases
print(phone_number(['(091)-111-1234', '6311112222', '(631)-111-2222', '(631) 111-2222']))
print(phone_number(['091-111-1234', '631 111 2222', '(631)1112222', '0911111234']))
print(phone_number(['631-1111-234', '631--111-2222', '(6311112222', '631)1111234']))
print(phone_number(['(631)-8675309', '6311112222', '(631)111-2222', '(631)1111234']))

[1, 2]
[2]
[]
[0, 1, 2, 3]


### Part 2: Python Function Declaration Checker (20 points)

Write a function `python_function()` that takes one parameter, which is a string. The string provides a potential Python function declaration. The function examines the string and returns the length of the string if it
is a valid python function declaration or 0, otherwise. A valid python function declaration obeys the following description/requirement:

A valid string:

1. starts with even number of `space` characters (0, 2, 4, ..., etc.)
1. followed by the keyword `def`
1. followed by one or more `space` character(s)
1. followed by the name of the function (starts with either an underscore, "_" or a letter)
1. followed by zero or more `space` character(s)
1. followed by a open parenthesis, "("
1. followed by zero or more `space` character(s)
1. followed by a close parenthesis, ")"
1. followed by zero or more `space` character(s)
1. finally ends with a colon, ":" character.

Note that the function will never take any arguments.

Hint: You may use short-cuts like `\s` to denote white space characters, `\d` to denote digits, and `\w` to denote any word characters (letters + digits).

Note: The special symbol • in the following examples is for visualization purpose only; you'll not receive these
symbols in your test cases. You'll get the `space` character instead.

Expected Results:

Function Argument | Return Value
:-- | :--
`'••def•••___foo123•(•••)••:'` | `26`
`'•def•foo():'` | `0`
`'def•_1foo•(•)•:'` | `15`
`'••••def•foo(•••):'` | `17`
`'•••def•foo(•••):'` | `0`
`'def•1foo():'` | `0`

In [None]:
import re

def python_function(header_string):
    valid = r"^(\s\s)*def\s+([a-zA-Z_]\w*)\s*\(\s*\)\s*:$"    
    res = [len(header_string) if re.search(valid,header_string) else 0]
    return res[0]

# Test cases
print(python_function("  def   ___foo123 (   )  :"))
print(python_function(" def foo():"))
print(python_function("def _1foo ( ) :"))
print(python_function("    def foo(   ):"))
print(python_function("   def foo(   ):"))
print(python_function("def 1foo():"))

26
0
15
17
0
0


### Part 3: Score Formatter (20 points)

Write a function `score_formatter()` that takes one parameter, which is a formatted string that contains information about the score received by a student in a particular subject. The function analyzes the string and
returns a reformatted string for the score. You will need to use the function `re.sub` that is discussed in the lecture notes. If the string does not match the pattern, the function returns the string `"error"`.

The input string is always given in the format specified below. When there is a space in the output, there is exactly one space; when there’s an identifier like `(1)` or `(2)`, those are the parts that can be replaced (which means you
should use regular expression syntax to identify them in the pattern).

Format:

```
STUDENT got SCORE in SUBJECT on    DATE
  (1)   got  (2)  in   (3)   on (4) (5) (6)
```

* STUDENT (1): This part represents the name of the student. It will always be a string that starts with a letter.

* SCORE (2): This part represents the score of the student. It will always be a non-negative decimal integer.

* SUBJECT (3): This part represents the subject. It will always be a string that starts with a letter.

* DATE (4) (5) (6): These three parts all belong to the date. (4) represents the spelled-out month (e.g., June); (5) represents the two-digit day; and (6) represents the four-digit year. These three parts are separated by
spaces.

For this part you can safely assume that valid months are entered, so simply use `\w+` to match a month name.

**What the Function Returns**

As mentioned above, your program should take the input string and reformat it in a specific way. The return string has the format indicated below. Your function should return a string containing information about **Subject**, **Score**, **Date**, and **Student**, each on a separate line. When this string is printed out, it should look something like this:

```
Subject: The name of the subject
Score: The score obtained by the student
Date: Day-Month-Year
Student: The student’s name
````

Note: Please follow the format and do not include excessive elements or spaces, since we will be expecting the string to be formatted exactly as specified.

***Example #1:***

Argument: `"Janet got 100 in CSE101 on August 11 2019"`

Return value: `"Subject: CSE101\nScore: 100\nDate: 11-August-2019\nStudent: Janet"`

***Example #2:***

Argument: `"Bobby got 90 in CSE101 on July 12 2018"`

Return value: `"Subject: CSE101\nScore: 90\nDate: 12-July-2018\nStudent: Bobby"`

***Example #3:***

Function argument: `"Laura got 99 in Biology on July 02 2018"`

Return value: `"Subject: Biology\nScore: 99\nDate: 02-July-2018\nStudent: Laura"`

***Example #4:***

Function argument: `"alena got 15 in physics on July 22 2019"`

Return value: `"Subject: physics\nScore: 15\nDate: 22-July-2019\nStudent: alena"`

***Example #5:***

Function argument: `"Sam got 75 in physics on July 22 2019"`

Return value: `"Subject: physics\nScore: 75\nDate: 22-July-2019\nStudent: Sam"`


In [None]:
import re

def score_formatter(score_string):
    regexes = {r"^([a-zA-Z]+) got (\d+) in ([a-zA-z]\w*) on ([a-zA-Z]+) (\d{2}) (\d{4})$": r"Subject: \3\nScore: \2\nDate: \5-\4-\6\nStudent: \1"}
    for regex in regexes:
        if re.match(regex, score_string):
            return re.sub(regex, regexes[regex], score_string)
        else:
            return 'Error'

    

# Test cases
print(score_formatter("Janet got 100 in CSE101 on August 11 2019"), end='\n\n')
print(score_formatter("Bobby got 90 in CSE101 on July 12 2018"), end='\n\n')
print(score_formatter("Laura got 99 in Biology on July 02 2018"), end='\n\n')
print(score_formatter("alena got 15 in physics on July 22 2019"), end='\n\n')
print(score_formatter("Sam got 75 in physics on July 22 2019"))

Subject: CSE101
Score: 100
Date: 11-August-2019
Student: Janet

Subject: CSE101
Score: 90
Date: 12-July-2018
Student: Bobby

Subject: Biology
Score: 99
Date: 02-July-2018
Student: Laura

Subject: physics
Score: 15
Date: 22-July-2019
Student: alena

Subject: physics
Score: 75
Date: 22-July-2019
Student: Sam
