# Homework Assignment 1: Introduction to Python

## What is Jupyter Notebook?

The Jupyter Notebook integrates code and its output into a single document that combines visualizations, narrative text, mathematical equations and other rich media. In other words: it's a single document where you can run code, display the output, and also add explanations, formulas, charts, and make your work more transparent, understandable, repeatable and shareable. 

Although it is possible to use many different programming languages in Jupyter Notebooks, in this course we will focus on Python.

## The Notebook Interface?

Notebooks have code cells (that are generally followed by result cells) and text cells. The text cells are the stuff that you're reading now. The code cells start with `In []:` with some number generally in brackets. If you put your cursor in the code cell and hit `Ctrl + Enter`, the code will run in the Python interpreter and the result will print out in the output cell. 

# 1. Basics

## Using Python as a Calculator

Many of the things you used to use a calculator for, you can now use Python for. By hitting `Ctrl + Enter` you can run the code inside the code cells to generate the output. 

In [None]:
2+2

In [None]:
(50 - 5*6)/4

In [None]:
7/3

Calculating a number raised to some power requires the `**` operation:

In [None]:
2**10

**Exercise 1:** Calculate $\frac{2 \cdot (3-1)^4}{\sqrt{25}}$ 

Python has a huge number of libraries included with its distribution. To keep things simple, most of their variables and functions are not accessible from a normal Python interactive session. Instead, you have to import the name of the library. For example, there is a __math__ module containing many useful functions. To access, say, the square root function, you can either:

In [None]:
from math import sqrt
sqrt(81)

or you can simply import the entire math library itself

In [None]:
import math
math.sqrt(81)

You can define variables using the equals `=` sign:

In [None]:
# Anything after a `#` will be ignored. This is what we call a 'comment'. 
width = 20 # Assigning a value to the variable width
length = 30 # Assigning a value to the variable length
area = length*width # Assigning the product of width and length to the variable area
area

If you try to access a variable that you haven't yet defined, you get a name error:

In [None]:
volume

and you need to define it:

In [None]:
height = 10
volume = area * height
volume

You can name a variable *almost* anything you want. It needs to start with an alphabetical character or an underscore `_`, and can contain alphanumeric characters plus underscores. Certain words, however, are reserved for the Python language:

    and, as, assert, break, class, continue, def, del, elif, else, except,
    exec, finally, for, from, global, if, import, in, is, lambda, not or,
    pass, print, raise, return, try, while, with, yield

Trying to define a variable using one of these will result in a syntax error:

In [None]:
return = 0

## Problem 1: Libraries and Defining Variables (1 point)

Use the **math** module to define a variable `a` which equals $e^2$ and define a variable `b` which equals $\sin(\pi/6)$. Write your solution in the code cell below by replacing everything (the parts that say `# YOUR CODE HERE` and `raise NotImplementedError()`) with the correct solution. You can check your work by running the cell below it and comparing it to the expected output.

N.B. Since we are dealing with approximations when doing numerical calculations, we have to deal with a bit of rounding. Observe that $\sin(\pi/6)$ isn't exactly equal to $1/2$ when you use higher precision. 

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Test case
print("a is approximately %.15f \nb is approximately %.15f" % (a, b))

Expected output:

    a is approximately 7.389056098930650  
    b is approximately 0.500000000000000

The cell below is used to autograde your solution. If you run it and it doesn't generate any AssertionError, it means your code passed all the tests to check if your solution is correct. You don't have to understand this code. Sometimes certain tests will be included to catch common mistakes or workarounds to an intended solution (i.e. by hardcoding the expected output. Check what happens if you copy the numbers from the expected output to your solution). 

In [None]:
# AUTOGRADING
import hashlib

def _hash(s):
    return hashlib.blake2b(bytes(s, encoding='utf8'), digest_size=5).hexdigest()

assert _hash(str(a)) != '22cd62b4d4', 'Did you try to hardcode your answer? Tsk, tsk, tsk.'
assert _hash(str(b)) != 'a1a91a050a', 'Did you try to hardcode your answer? Tsk, tsk, tsk.'
assert _hash(str(a)) == '4dff9b1771', 'Wrong value for a'
assert _hash(str(b)) == '6015c6d6e9', 'Wrong value for b'

### HIDDEN TESTS

### HIDDEN TESTS

# 2. Built-in types

Everything in Python is an object. Each object is of a certain type. Here's a list of Python types you will often use:
* integer number (int)
* decimal number (float)
* boolean (bool)
* string of characters (str)
* list of objects (list)
* tuple of objects (tuple)
* dictionary (dict)
* set of objects (set)
* function (function)

The function `type` gives the type of object in its argument.

In [None]:
a = 10
type(a)

In [None]:
b = 10.0
type(b)

## Strings

Strings are lists of printable characters, and can be defined using either single quotes

In [None]:
'Hello, World!'

or double quotes

In [None]:
"Hello, World!"

The **print** statement is often used for printing character strings or other data types. 

In [None]:
greeting = "Hello, World!"
introduction = "Welcome to this introduction to Python!"
print(greeting)
print("The area is", area)

In the above snippet, the integer 600 (stored in the variable `area`) is converted into a string before being printed out. 

**Exercise 2:** Use the `+` operator to concatenate the strings strings `greeting` and `introduction` together to form a combined string.

**Exercise 3:** The resulting string is missing a space in between the words 'World' and 'Welcome'. Correct this by inserting a third string into the sum.

**Exercise 4:** Use the built-in function `str` to turn the integer 8471 into a string. Call the resulting object `d`.

To include formatting to your print statements, you can use methods that should be familiar to you from R:

In [None]:
print("pi is approximately %.8f \nEuler's constant e is approximately %.2f" % (math.pi, math.e))

Note that in the previous command, we didn't have to import the `math` library again; importing it once per session will suffice. 

Python has a set of built-in functions and methods that you can use to manipulate strings. The terms built-in function and method refer to two different things. Methods are associated with the object of a particular type they belong to. Typically methods are used in the form `object.method(arguments)`, see examples below. Built-in functions on the other hand can be invoked just by its name and are usually applicable to any object type, see for instance `len`. 

We mention a couple of them, you figure out what they do. 

In [None]:
statement = "Hello, World! Welcome to this introduction to Python!"
print(statement)
print(len(statement)) 
print(statement.lower())
print(statement.replace("!", "."))
print(statement.count("o"))

**Exercise 5:** Apply a single method to `statement` to transform all lowercase letters to uppercase letters, and vice versa. (Hint: Type `statement.<Tab>` and find an appropriate method). 

## Lists

Very often in a programming language, one wants to keep a group of similar items together. Python does this using a data type called **lists** which are constructed using square brackets `[ ]` or the built-in `list` function. 

In [None]:
days_of_the_week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
prime_numbers = [2, 3, 5, 9, 11, 13, 17, 19, 23, 29]

**Exercise 6:** Use the `append` method to attach the integer 31 to the end of `prime_numbers`. Print the result. 

You can access members of the list using the **index** of that item. One can also use these to re-define objects within a list.

In [None]:
days_of_the_week[2]

In [None]:
prime_numbers[3] = 7
prime_numbers

Python lists (unlike R lists) use 0 as the index of the first element of a list. Thus, in this example, the 0 element of days_of_the_week is "Monday", 1 is "Tuesday", and so on. If you need to access the $n$th element from the end of your list, you can use a negative index. For example, the -1 element of a list is the last element:

In [None]:
days_of_the_week[-1]

The `range()` command is a convenient way to make sequential lists of numbers:

In [None]:
list(range(10))

Note that `range(n)` starts at 0 and gives the sequential list of integers strictly less than $n$. If you want to start at a different number, use `range(start, stop)`. 

In [None]:
list(range(2,8))

The lists created above with range() have a *stepsize* of 1 between elements. You can also give a fixed step size via a third argument:

In [None]:
even = list(range(2, 20, 2))
print(even)
print("The fourth smallest positive even number is", even[3])

**Exercise 7:** Use `+` to concatenate `days_of_the_week` and `prime_numbers`, and print the result. 

**Exercise 8:** Use the function `len` to count the number of elements in `prime_numbers`. 

We can split a string into a list where each word is a list item using any whitespace as separator (note that it doesn't take punctuation into account):

In [None]:
statement = "Hello, World! Welcome to this introduction to Python!"
x = statement.split(" ")
x

**Exercise 9:** Use the method `sort` to sort `days_of_the_week` alphabetically, and print the result.

## Tuples

Like a list, a tuple is an ordered sequence of Python objects. Crucially, unlike a list, a tuple is an immutable object. This means that once the tuple is defined, its length and its objects cannot be changed anymore.

One can define a tuple using commas only. Parentheses `()` are optional.

In [None]:
tuple1 = (1, 2, ['tree','house',9.9] , 4, 'king')
tuple2 = ('queen', 'door', 'leaf')

**Exercise 10:** Use the command `tuple` to create a tuple out of the string `greeting` defined earlier.

**Exercise 11:** Try to append the integer 6 to `tuple1`, like you did earlier for lists. You will encounter an error, because tuples are immutable. 

## Dictionaries

With a dictionary, you can connect a value to another value to represent the relationship between them in your code. This is similar to a regular dictionary, which connects words with their description. In this example, the dictionary connects a number name (string) with their value (integer). 

You can define a dictionary by enclosing a comma-separated list of key-value pairs in curly braces `{ }`. A colon `:` separates each key from its associated value:

In [None]:
numNames = {"One": 1,
            "Two": 2, 
            "Three": 3}

**Keys** are the equivalent of indices in lists to access a value. The **values** are what you can access with their corresponding key.

A value is retrieved from a dictionary by specifying its corersponding key in square brackets `[ ]` instead of the index number. If you refer to a key that is not in the dictionary, Python raises an exception:

In [None]:
print(numNames["One"])
print(numNames["Four"])

Adding an entry to an existing dictionary is simply a matter of assigning a new key van value:

In [None]:
numNames["Four"] = 4
numNames

Similar to adding a new key-value pair, we can can just as easily modify a key-value pair.

In [None]:
numNames["One"] = 10
numNames

Sometimes it can be very helpful to check if a key alrady exists in a dictionary (remember that keys have to be unique). To check whether a single key is in the dictionary, use the `in` keyword. 

In [None]:
print("Two" in numNames)
print("Five" in numNames)

This `in` operator checks the keys, not the values. You can use the `in` operator to check if a value is in a dictionary with `<dict>.values()`: 

In [None]:
print(3 in numNames.values())
print(7 in numNames.values())

**Exercise 12:** Create a dictionary `id` that with key-value pairs `"Name" : <Your name>`, `"Age" : <Your age>`,  `"Nationality" : <Your nationality>`.

# 3. Control flow

Control flow statements are an essential part of Python. In this section we introduce to most important ones.

## `for`-loops
One of the most useful things to do with lists is to iterate through them, i.e. to go through each element one at a time. To do this in Python, we use the `for` statement:

In [None]:
days_of_the_week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]

for day in days_of_the_week:
    print(day)

This code snippet goes through each element of the list called `days_of_the_week` and assigns it to the variable `day`. It then executes everything in the indented block (in this case only one line of code, the print statement) using those variable assignments. When the program has gone through every element of the list, it exits the block. 

In R we would have used curly brackets `{}` to define these blocks. Python uses a colon `:`, followed by an indentation to define code blocks. Everything at a higher level of indentation is taken to be in the same block. In the above example the block was only a single line, but we could have had longer blocks as well:

In [None]:
for day in days_of_the_week:
    statement = "Today is " + day + "."
    print(statement)

The `range()` command is particularly useful in combination with the `for` statement to execute loops of a specified length. We also included some formatting for numbers. 

In [None]:
squares =[] # Creating an empty list, which will be filled in the for loop.

for i in range(20):
    squares.append(i**2)
    print("The square of %2.d is %3.d" % (i, i**2))

print(squares)

You can iterate over dictionaries using a for loop. There are various approaches to do this and they are all equally relevent. The principal option is to iterate over the keys as follows:

In [None]:
for name in numNames:
    print("%s is the name for %d." % (name, numNames[name]))

## Problem 2: Strings, Lists, Loops and Dictionaries (2 points)

Given is a dictionary `names` of names whose values are a list that contains the person's height and weight. **Use a `for` loop** to construct a new dictionary `first_names` whose keys are the first names of the people in `names` and their values are their height (as an integer, not as a list). Again: replace the lines that say `# YOUR CODE HERE` and `raise NotImplementedError()` with your solution. 

In [None]:
names = {"Cierra Vega": [189, 74],
         "Alden Cantrell": [180, 65], 
         "Kierra Gentry": [158, 59], 
         "Pierre Cox": [176, 73],  
         "Natasha Howard": [169, 62], 
         "Austin Little": [178, 81], 
         "Jamie Rowe": [192, 79]}

first_names = {}

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Test case
print(first_names)

Expected output:

    {'Cierra': 189, 'Alden': 180, 'Kierra': 158, 'Pierre': 176, 'Natasha': 169, 'Austin': 178, 'Jamie': 192}

In [None]:
### AUTOGRADER
assert set(first_names.keys()) == {"Cierra", "Alden", "Kierra", "Pierre", 
                                    "Natasha", "Austin", "Jamie"}
assert set(first_names.values()) ==  {189, 180, 158, 176, 169, 178, 192}
assert first_names["Cierra"] == 189

## Booleans and Truth Testing

We invariably need some concept of *condition* in programming to control branching behaviour to allow a program to react differently to different situations. If it's Monday, I'll go to work. But if it's Sunday, I'll sleep in. To do this in Python, we use a combination of **boolean** variables, which evaluate to either `True` of `False`, and `if` statements that control branching based on boolean values.

In [None]:
day = "Sunday"

if day == "Sunday":
    print("Sleep in.")
else:
    print("Go to work.")

To explain what happened, first note the statement `day == "Sunday"` evaluates to True, since the last time we assigned a value to the variable day, was when we looped through all days of the way a little back. 

In [None]:
day == "Sunday"

If we evaluate it by itself, we see that it returns a boolean value *True*. The `==` operator performs *equality testing*. If the two items are equal, it returns True, otherwise it returns False. Be aware of the difference between a single equality `=` that is used in assigning variables, and a double equality `==` which is used to test whether two variables are equal. 

The first block of code is followed by an `else` statement, which is executed if nothing else in the above statement is true. Since the value was True, this code was not executed. 

You can compare any data type in Python:

In [None]:
1 == 2

In [None]:
50 == 2*25

In [None]:
3 < math.pi

In [None]:
1 == 1.0

In [None]:
1 != 0

In [None]:
1 <= 2

In [None]:
1 >= 1

Finally, note that you can also string multiple comparisons together, which can result in very intuitive tests:

In [None]:
hours = 5
0 < hours < 24

This would be equivalent to

In [None]:
hours = 5
(hours > 0) and (hours < 24)

If statements can have `elif` parts ("else if"), in addition to if/else parts. For example:

In [None]:
if day == "Sunday":
    print("Sleep in.")
elif day == "Saterday":
    print("Do chores.")
else:
    print("Go to work.")

Of course we can combine if statements with for loops, to make a snippet that is almost interesting:

In [None]:
for day in days_of_the_week:
    statement = "Today is " + day + "."
    print(statement)
    if day == "Sunday":
        print("   Sleep in.")
    elif day == "Saterday":
        print("   Do chores.")
    else:
        print("   Go to work.")

# 4. Functions

Similar to variables, Python allows for user-defined functions that can be easily reused. In this course we will rely heavily on user-defined functions. 

User-defined functions are declared using the `def` keyword. The output of a function is declared by the `return` statement.

In [None]:
def proportion(a, b):
    """
    We can use triple quotation marks " to describe what our function does.
    
    Here for example: Calculate the proportion of a of a+b.
    """
    p = a / (a + b)
    return(p)

We can call our function by passing it input values:

In [None]:
proportion(20, 21)

In [None]:
proportion(3, 1)

We can read our description of the function by typing

In [None]:
proportion?

A function can also have multiple output arguments. They are returned as a tuple.

In [None]:
def plusminus(a, b):
    """This is the docstring of the function plusminus."""
    return(a+b, a-b)

c, d = plusminus(1, 2)

c, d

A function can have multiple `return` statements. As soon as a `return` statement is encountered, the function is exited. If no `return` statement is encountered, the function output is `None`.

In [None]:
def signtest(a):
    if a > 0:
        return 'Positive'
    if a < 0:
        return 'Negative'
    
print(signtest(-1.2))

print(signtest(0)) # Function output is None, since no return statement is encountered.

Instead of positional arguments, we can also pass keyword arguments (using the `=` sign). For keyword arguments, the order does not matter. But positional arguments always need to precede keyword argument. 

In [None]:
proportion(b=2, a=3) # For keyword arguments the order does not matter.

It often happens that keyword arguments are used in the definition of a function. In that case they are used to specify default values for an argument.

In [None]:
def mypower(x, y=2):  # Positional (nonkeyword) arguments always precede keyword arguments.
    return x**y 

print(mypower(3))
print(mypower(3, 2))
print(mypower(3, y=2))
print(mypower(y=2, x=3))

As a last example we will create a function to calculate the sum of the first $n$ integers
$$\sum_{k=1}^{n} k$$

In [None]:
def arithmetic_sum(n):
    total = 0
    for k in range(1, n+1):
        total = total + k
    return(total)

print(arithmetic_sum(9))

Note that we are using `range(1, n+1)` to get a list of numbers from $1$ (inclusive) to $n+1$ (exclusive). 


From other courses, we recognize this is the arithmetic series that satisfies the simple closed form expression
$$\sum_{k=1}^{n} k = \frac{n(n+1)}{2}.$$
Let us verify this for some fixed value.

In [None]:
n = 9
expected_sum = n * (n+1) / 2
total = arithmetic_sum(n)
total == expected_sum

There is a final shortcut in the form of the `sum` function. We combine this with a nice way to create lists by means of *list comprehensions*:

In [None]:
n = 9
first_n_integers = [i for i in range(1, n+1)]
sum(first_n_integers)

## Problem 3: Factorial (2 points)

Write a function called `factorial` that computes for a positive integer $n$ the product of the first $n$ positive integers
$$n! = n(n-1)(n-2)\cdot \dots \cdot 1.$$
For example, `factorial(5)` should return 120. You can check your function with the code in the cell below. Feel free to also include your own test cases in case you need to debug your solution. Again: replace the lines that say `# YOUR CODE HERE` and `raise NotImplementedError()` with your solution. 

In [None]:
def factorial(n):
    """
    Return n!. 
    Do not use a built-in factorial function. 
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# You can use this code cell to play around with your function to make sure
# it does what it is intended to do, i.e. to debug your code. 


In [None]:
# Test cases
print("1! = %d" % factorial(1))
print("5! = %d" % factorial(5))
print("20! = %d" % factorial(20))

Expected output:
    
    1! = 1
    5! = 120
    20! = 2432902008176640000

In [None]:
# AUTOGRADING
import inspect
sig = inspect.signature(factorial)
source = inspect.getsource(factorial)

assert "return" in source, "Please return to your answer. Return... get it?"
assert "math.factorial" not in source, "Did you try to use a built-in function?"
assert "120" not in source, "Did you try to hard code the answer? Tsk, tsk tsk."

assert factorial(1) == 1, "Expected output factorial(1) = 1"
assert factorial(5) == 120, "Expected output factorial(5) = 120"
assert factorial(20) == 2432902008176640000, "Expected output factorial(20) = 2432902008176640000"
### HIDDEN TESTS
assert factorial(10) == 3628800
### HIDDEN TESTS

## Problem 4: Fibonacci Numbers (2 points)

The Fibonacci sequence is a sequence that starts with 0 and 1, and then each successive entry is the sum of the previous two. Thus, the sequence goes 
$$0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,...$$
Write a function called `fibonacci` that returns a **list** of the first $n$ Fibonacci numbers. Again: replace the lines that say `# YOUR CODE HERE` and `raise NotImplementedError()` with your solution. 

In [None]:
def fibonacci(n):
    """
    Return the Fibonacci sequence of length n.
    Return the error message "Fibonacci sequence only defined for length 1 or greater." when n < 1. 
    """
    sequence = [0, 1]
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# You can use this code cell to play around with your function to make sure
# it does what it is intended to do, i.e. to debug your code. 


In [None]:
# Test cases
print(fibonacci(12))
print(fibonacci(0))

Expected output: 

    [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
    Fibonacci sequence only defined for length 1 or greater.

In [None]:
# AUTOGRADING
assert fibonacci(12) == [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
assert fibonacci(0) == "Fibonacci sequence only defined for length 1 or greater."

### HIDDEN TESTS
assert fibonacci(10) == [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
assert fibonacci(-5) == "Fibonacci sequence only defined for length 1 or greater."
### HIDDEN TESTS

## Problem 5: Plagiarism Detection (3 points)

In this exercise we want to compute a metric that measures the distance between two text documents. This allows us to find out whether two documents are similar. This is important for instance if you are Google, that tries to find how well a website matches to your search query. This distance will also allow us to detect duplicates (like Wikipedia mirrors) and cases of plagiarism (you are warned). 

The idea is to define distance in terms of shared words. We will think of a document D as a dictionary such that D[W] = # occurrences of word W. In other terms, we only keep track of how often each word occurs in the document. For example the documents 

    "The dog ate the homework."
and 

    "The cat ate the homework."
will be thought of as two dictionaries
    
    {"the": 2, "dog": 1, "ate": 1, "homework": 1}
    {"the": 2, "cat": 1, "ate": 1, "homework": 1}

Note that we make no distinction between upper and lower case words. To make the two documents comparable, we want to ensure that both dictionaries have the same keys, so

    d1 = {"the": 2, "dog": 1, "ate": 1, "homework": 1, "cat": 0}
    d2 = {"the": 2, "dog": 0, "ate": 1, "homework": 1, "cat": 1}

An appropriate distance measure between two documents we will use is the angle between these documents. In this week's lecture we have learned that we can compute the angle between two vectors $\vec{x}$ and $\vec{y}$ as
$$\angle(\vec{x}, \vec{y}) = \cos^{-1}\left(\frac{\langle \vec{x}, \vec{y}\rangle}{\|\vec{x}\| \|\vec{y}\|}\right).$$
If we extract only the word frequencies of the two documents as lists, we can compute their angle:

    x = [2, 1, 1, 1, 0]
    y = [2, 0, 1, 1, 1]
    angle = arccos(6/7)
    
If the angle is zero radians, this means the two documents are identical (in terms of word counts), whereas an angle of $\frac{\pi}{2}$ radians means there are no common words. 

Implement the following three functions below: `word_list` that converts a string into a list containing each word (lowercase) separely, the function `word_frequencies` that converts a string of words into a dictionary of word-count pairs, and the function `dictionary_angle` that computes the angle between the count-values of two dictionaries. The exact instructions are described in the docstrings. 

You may assume that the only types of punctuation that occur in the two documents are: `.,!?:;`. We also assume each document ends with a period. 

In [None]:
def word_list(document):
    """
    Return a list of lowercase words from a string, in the order
    in which they appear in the document. 
    That is; remove the punctuation symbols, turn every word into
    lowercase words and split the resulting string into a list of
    individual words. 
    
    Parameters
    ----------
        document (str): The string that is converted into a list.
        
        
    Returns
    -------
        document_list (list): The list containing the lowercase words in document.
        
    Example
    -------
        document = "The dog ate the homework."
        word_list(document)
            ["the", "dog", "ate", "the", "homework"]
            
    """
    remove_characters = ".,!?:;"
    # YOUR CODE HERE
    raise NotImplementedError()
    return(document_list)

In [None]:
# You can use this code cell to play around with your function to make sure
# it does what it is intended to do, i.e. to debug your code. 


In [None]:
# Test case
doc = "The dog ate the homework. Or did the cat eat the homework?"
print(word_list(doc))

Expected output:
    
    ['the', 'dog', 'ate', 'the', 'homework', 'or', 'did', 'the', 'cat', 'eat', 'the', 'homework']

In [None]:
def word_frequencies(word_list1, word_list2):
    """
    Return two dictionaries whose keys are the words that occur in the union
    of the two wordlists, and whose corresponding value is the word count
    in their respective document.
    
    Parameters
    ----------
        word_list1 (list): A list containing lowercase words.
        word_list2 (list): A second list containing lowercase words.
        
    Returns
    -------
        dictionary1 (dict): Dictionary containing word-count pairs for word_list1,
                            whose keys are words that occur in word_list1 or word_list2.
        dictionary2 (dict): Dictionary containing word-count pairs for word_list2,
                            whose keys are words that occur in word_list1 or word_list2.
                            
    Example
    -------
        word_list1 = ["the", "dog", "ate", "the", "homework"]
        word_list2 = ["the", "cat", "ate", "the", "homework"]
        dict1, dict2 = word_frequencies(document1, document2)
            ({'the': 2, 'dog': 1, 'ate': 1, 'homework': 1, 'cat': 0},
             {'the': 2, 'dog': 0, 'ate': 1, 'homework': 1, 'cat': 1})
    
    """
    dictionary1, dictionary2 = {}, {}
    combined_word_list = word_list1 + word_list2
        
    for word in combined_word_list:
        dictionary1[word] = 0
        dictionary2[word] = 0
        
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# You can use this code cell to play around with your function to make sure
# it does what it is intended to do, i.e. to debug your code. 


In [None]:
# Test case
word_list1 = ["the", "dog", "ate", "the", "homework"]
word_list2 = ["the", "cat", "ate", "the", "homework"]
dict1, dict2 = word_frequencies(word_list1, word_list2)
print(dict1)
print(dict2)

Expected output:
    
    {'the': 2, 'dog': 1, 'ate': 1, 'homework': 1, 'cat': 0}
    {'the': 2, 'dog': 0, 'ate': 1, 'homework': 1, 'cat': 1}

In [None]:
import math

def dictionary_angle(dictionary1, dictionary2):
    """
    Return the angle between two word-count dictionaries. 
    Hint: Use math.acos for the inverse cosine. 
    
    Parameters
    ----------
        dictionary1 (dict): First input dictionary containing word-count pairs
        dictionary2 (dict): Second input dictionary containing word-count pairs,
                            whose keys (the words) are the same as dictionary1,
                            and also in the same order.
    
    Returns
    -------
        angle (float): The angle between the value vectors (the word counts).
        
    Example
    -------
        dictionary1 = {'the': 2, 'dog': 1, 'ate': 1, 'homework': 1, 'cat': 0}
        dictionary2 = {'the': 2, 'dog': 0, 'ate': 1, 'homework': 1, 'cat': 1}
        dictionary_angle(dictionary1, dictionary2)
            0.541099525957146
    """
    
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# You can use this code cell to play around with your function to make sure
# it does what it is intended to do, i.e. to debug your code. 


In [None]:
# Test case
dict1 = {'the': 2, 'dog': 1, 'ate': 1, 'homework': 1, 'cat': 0}
dict2 = {'the': 2, 'dog': 0, 'ate': 1, 'homework': 1, 'cat': 1}
dictionary_angle(dict1, dict2)

Expected output:

    0.541099525957146

### Case study for autograding

Here are two excerpts, one from a speech Michelle Obama gave in 2008 and one from Melania Trump eight years later of similar nature. It serves as a nice test case for autograding.

In [None]:
# AUTOGRADING

Obama = "And Barack and I were raised with so many of the same values: \
         that you work hard for what you want in life; that your word \
         is your bond and you do what you say you're going to do; that \
         you treat people with dignity and respect, even if you don't know \
         them, and even if you don't agree with them. And Barack and I set \
         out to build lives guided by these values, and to pass them on to \
         the next generation. Because we want our children and all children \
         in this nation to know that the only limit to the height of your \
         achievements is the reach of your dreams and your willingness to \
         work for them."

Trump = "From a young age, my parents impressed on me the values that you \
         work hard for what you want in life, that your word is your bond and \
         you do what you say and keep your promise, that you treat people \
         with respect. They taught and showed me values and morals in their \
         daily lives. That is a lesson that I continue to pass along to our \
         son. And we need to pass those lessons on to the many generations to \
         follow. Because we want our children in this nation to know that the \
         only limit to your achievements is the strength of your dreams and \
         your willingness to work for them."

list_Obama = word_list(Obama)
list_Trump = word_list(Trump)

assert list_Obama[20] == 'want'
assert list_Trump[20] == 'life'

### HIDDEN TESTS

document = "The dog ate the homework. Or did the cat eat the homework?"
document_list = word_list(document)

assert document_list == ['the', 'dog', 'ate', 'the', 'homework', 'or', 'did', 'the', 'cat', 'eat', 'the', 'homework']

### HIDDEN TESTS

In [None]:
# AUTOGRADING

list_Obama = ['and', 'barack', 'and', 'i', 'were', 'raised', 'with', 'so', 'many', 'of', 'the', 'same', 'values', 
              'that', 'you', 'work', 'hard', 'for', 'what', 'you', 'want', 'in', 'life', 'that', 'your', 'word', 
              'is', 'your', 'bond', 'and', 'you', 'do', 'what', 'you', 'say', "you're", 'going', 'to', 'do', 'that',
              'you', 'treat', 'people', 'with', 'dignity', 'and', 'respect', 'even', 'if', 'you', "don't", 'know', 
              'them', 'and', 'even', 'if', 'you', "don't", 'agree', 'with', 'them', 'and', 'barack', 'and', 'i', 'set', 
              'out','to', 'build', 'lives','guided', 'by', 'these', 'values', 'and', 'to', 'pass', 'them', 'on', 'to', 
              'the', 'next', 'generation', 'because', 'we', 'want', 'our', 'children', 'and', 'all', 'children', 
              'in', 'this', 'nation', 'to', 'know', 'that', 'the', 'only', 'limit', 'to', 'the', 'height', 'of', 'your', 
              'achievements', 'is', 'the', 'reach', 'of', 'your', 'dreams', 'and', 'your', 'willingness', 'to', 
              'work', 'for', 'them']

list_Trump = ['from', 'a', 'young', 'age', 'my', 'parents', 'impressed', 'on', 'me', 'the', 'values', 'that', 'you',
              'work', 'hard', 'for', 'what', 'you', 'want', 'in', 'life', 'that', 'your', 'word', 'is', 'your', 'bond',
              'and', 'you', 'do', 'what', 'you', 'say', 'and', 'keep', 'your', 'promise', 'that', 'you', 'treat', 'people',
              'with', 'respect', 'they', 'taught', 'and', 'showed', 'me', 'values', 'and', 'morals', 'in', 'their', 
              'daily', 'lives', 'that', 'is', 'a', 'lesson', 'that', 'i', 'continue', 'to', 'pass', 'along', 'to', 'our',
              'son', 'and', 'we', 'need', 'to', 'pass', 'those', 'lessons', 'on', 'to', 'the', 'many', 'generations', 'to', 
              'follow', 'because', 'we', 'want', 'our', 'children', 'in', 'this', 'nation', 'to', 'know', 'that', 'the',
              'only', 'limit', 'to', 'your', 'achievements', 'is', 'the', 'strength', 'of', 'your', 'dreams', 'and', 
              'your', 'willingness', 'to', 'work', 'for', 'them']

dict_Obama, dict_Trump = word_frequencies(list_Obama, list_Trump)

assert dict_Obama['you'] == 7
assert dict_Trump['to'] == 8

### HIDDEN TESTS
word_list1 = ["the", "dog", "ate", "the", "homework"]
word_list2 = ["the", "cat", "ate", "the", "homework"]

dict1, dict2 = word_frequencies(word_list1, word_list2)

assert dict1 == {'the': 2, 'dog': 1, 'ate': 1, 'homework': 1, 'cat': 0} 
assert dict2 == {'the': 2, 'dog': 0, 'ate': 1, 'homework': 1, 'cat': 1}
### HIDDEN TESTS

In [None]:
# AUTOGRADING
dict_Obama = {'and': 10, 'barack': 2, 'i': 2, 'were': 1, 'raised': 1, 'with': 3, 'so': 1, 'many': 1, 
              'of': 3, 'the': 5, 'same': 1, 'values': 2, 'that': 4, 'you': 7, 'work': 2, 'hard': 1, 
              'for': 2, 'what': 2, 'want': 2, 'in': 2, 'life': 1, 'your': 5, 'word': 1, 'is': 2, 
              'bond': 1, 'do': 2, 'say': 1, "you're": 1, 'going': 1, 'to': 7, 'treat': 1, 'people': 1, 
              'dignity': 1, 'respect': 1, 'even': 2, 'if': 2, "don't": 2, 'know': 2, 'them': 4, 'agree': 1, 
              'set': 1, 'out': 1, 'build': 1, 'lives': 1, 'guided': 1, 'by': 1, 'these': 1, 'pass': 1, 
              'on': 1, 'next': 1, 'generation': 1, 'because': 1, 'we': 1, 'our': 1, 'children': 2, 
              'all': 1, 'this': 1, 'nation': 1, 'only': 1, 'limit': 1, 'height': 1, 'achievements': 1, 
              'reach': 1, 'dreams': 1, 'willingness': 1, 'from': 0, 'a': 0, 'young': 0, 'age': 0, 'my': 0, 
              'parents': 0, 'impressed': 0, 'me': 0, 'keep': 0, 'promise': 0, 'they': 0, 'taught': 0, 
              'showed': 0, 'morals': 0, 'their': 0, 'daily': 0, 'lesson': 0, 'continue': 0, 'along': 0, 
              'son': 0, 'need': 0, 'those': 0, 'lessons': 0, 'generations': 0, 'follow': 0, 'strength': 0} 

dict_Trump = {'and': 6, 'barack': 0, 'i': 1, 'were': 0, 'raised': 0, 'with': 1, 'so': 0, 'many': 1, 'of': 1, 
              'the': 4, 'same': 0, 'values': 2, 'that': 6, 'you': 5, 'work': 2, 'hard': 1, 'for': 2, 'what': 2, 
              'want': 2, 'in': 3, 'life': 1, 'your': 6, 'word': 1, 'is': 3, 'bond': 1, 'do': 1, 'say': 1, 
              "you're": 0, 'going': 0, 'to': 8, 'treat': 1, 'people': 1, 'dignity': 0, 'respect': 1, 'even': 0, 
              'if': 0, "don't": 0, 'know': 1, 'them': 1, 'agree': 0, 'set': 0, 'out': 0, 'build': 0, 'lives': 1, 
              'guided': 0, 'by': 0, 'these': 0, 'pass': 2, 'on': 2, 'next': 0, 'generation': 0, 'because': 1, 
              'we': 2, 'our': 2, 'children': 1, 'all': 0, 'this': 1, 'nation': 1, 'only': 1, 'limit': 1, 
              'height': 0, 'achievements': 1, 'reach': 0, 'dreams': 1, 'willingness': 1, 'from': 1, 'a': 2, 
              'young': 1, 'age': 1, 'my': 1, 'parents': 1, 'impressed': 1, 'me': 2, 'keep': 1, 'promise': 1, 
              'they': 1, 'taught': 1, 'showed': 1, 'morals': 1, 'their': 1, 'daily': 1, 'lesson': 1, 
              'continue': 1, 'along': 1, 'son': 1, 'need': 1, 'those': 1, 'lessons': 1, 'generations': 1, 
              'follow': 1, 'strength': 1} 

angle1 = dictionary_angle(dict_Obama, dict_Trump)
angle2 = dictionary_angle(dict_Trump, dict_Obama)

assert angle1 == angle2

### HIDDEN TESTS
assert math.isclose(angle1, 0.578729546285134)

dict1 = {'the': 2, 'dog': 1, 'ate': 1, 'homework': 1, 'cat': 0}
dict2 = {'the': 2, 'dog': 0, 'ate': 1, 'homework': 1, 'cat': 1}
assert math.isclose(dictionary_angle(dict1, dict2), 0.541099525957146)

### HIDDEN TESTS