# Homework Assignment 1: Introduction to Python and review Programming

## What is Jupyter Notebook?

The Jupyter Notebook integrates code and its output into a single document that combines visualizations, narrative text, mathematical equations and other rich media. In other words: it's a single document where you can run code, display the output, and also add explanations, formulas, charts, and make your work more transparent, understandable, repeatable and shareable. 

Although it is possible to use many different programming languages in Jupyter Notebooks, in this course we will focus on Python.

## The Notebook Interface?

Notebooks have code cells (that are generally followed by result cells) and text cells. The text cells are the stuff that you're reading now. The code cells start with `In []:` with some number generally in brackets. If you put your cursor in the code cell and hit `Ctrl + Enter`, the code will run in the Python interpreter and the result will print out in the output cell. 

# 1. Basics

## Using Python as a Calculator

Many of the things you used to use a calculator for, you can now use Python for. By hitting `Ctrl + Enter` you can run the code inside the code cells to generate the output. 

In [1]:
2+2

4

In [2]:
(50 - 5*6)/4

5.0

In [3]:
7/3

2.3333333333333335

Calculating a number raised to some power requires the `**` operation (instead of the perhaps more familiar [caret](https://docs.python.org/3/reference/expressions.html#binary-bitwise-operations) `^`, which we won't need):

In [4]:
2**10

1024

**Exercise 1:** Calculate $\frac{2 \cdot (3-1)^4}{\sqrt{4096}}$. 

In [5]:
2*(3-1)**4/4096**(0.5)

0.5

## Libraries

Python has a huge number of libraries included with its distribution. To keep things simple, most of their variables and functions are not directly accessible from a normal Python interactive session. Instead, you have to first import the name of the library. For example, there is a __math__ module containing many useful functions. To access, say, the square root function, you can either:

In [6]:
from math import sqrt
sqrt(81)

9.0

or you can simply import the entire math library itself. Note that the latter requires you to put as a prefix to the function name the name of the library it came from. These import statements need only be run once during a Python sessions for the imported functions to be available. When running a new session, you need to rerun the import statements. 

In [7]:
import math
math.sqrt(81)

9.0

## Variables

You can define variables using the equals sign `=`:

In [8]:
# Anything after a `#` within a code cell will be ignored. This is what we call a 'comment'. 
width = 20 # Assigning a value to the variable width
length = 30 # Assigning a value to the variable length
area = length*width # Assigning the product of width and length to the variable area
area

600

If you try to access a variable that you haven't yet defined, you get a name error:

In [9]:
volume

NameError: name 'volume' is not defined

and you need to first define it:

In [15]:
height = 10
volume = area * height
volume

6000

You can name a variable *almost* anything you want. It needs to start with an alphabetical character or an underscore `_`, and can contain alphanumeric characters plus underscores. Certain words, however, are reserved for the Python language:

    and, as, assert, break, class, continue, def, del, elif, else, except,
    exec, finally, for, from, global, if, import, in, is, lambda, not or,
    pass, print, raise, return, try, while, with, yield

Trying to define a variable using one of these will result in a syntax error:

In [16]:
return = 0

SyntaxError: invalid syntax (3966660672.py, line 1)

## Problem 1: Libraries and Defining Variables (1 point)

Use the [**math**](https://docs.python.org/3/library/math.html) module to define a variable named `a` which equals $\cos(\pi^2)$ and define a variable `b` which stores the value $\ln(e+3)$. Write your solution in the code cell below by replacing everything (the parts that say `# YOUR CODE HERE` and `raise NotImplementedError()`) with the correct solution. You can check your solution by running the cell below it (the `Test case`) and comparing its output to the expected output.

In [22]:
# YOUR CODE HERE
from math import cos
from math import pi
from math import log
from math import exp

a = cos(pi**2)
b = log(exp(1) + 3)

In [23]:
# Test case
print("a is approximately %17.14f \nb is approximately %17.14f" % (a, b))

a is approximately -0.90268536193307 
b is approximately  1.74366838062868


Expected output:

    a is approximately -0.90268536193307 
    b is approximately  1.74366838062868

The cell below is used to autograde your solution for Problem 1. If you run it and it doesn't generate any `AssertionError` (i.e. nothing shows up below the code cell), it means your code passed all the tests to check if your solution is correct. You don't have to understand this piece code. Sometimes certain tests will be included to catch common mistakes or workarounds to an intended solution (i.e. by hardcoding the expected output. Check what happens if you copy the numbers from expected output to your solution). 

In [24]:
# AUTOGRADING
import hashlib

def _hash(s):
    return hashlib.blake2b(bytes(s, encoding='utf8'), digest_size=5).hexdigest()

assert _hash(str(a)) != '5c0b6e63a3', 'Did you try to hardcode your answer? Tsk, tsk, tsk.'
assert _hash(str(b)) != 'fba443794e', 'Did you try to hardcode your answer? Tsk, tsk, tsk.'
assert _hash(str(a)) == '13f4aa948b', 'Wrong value for a'
assert _hash(str(b)) == 'fe1d8902fe', 'Wrong value for b'


# 2. Built-in types

Everything in Python is an object. Each object is of a certain type. Here's a list of Python types you will often use:
* integer number (int)
* decimal number (float)
* boolean (true/false) (bool)
* string of characters (str)
* list of objects (list)
* tuple of objects (tuple)
* dictionary (dict)
* set of objects (set)
* function (function)

The function `type` gives the type of object in its argument.

In [25]:
a = 10
type(a)

int

In [26]:
b = 10.0
type(b)

float

## Strings

Strings are lists of printable characters, and can be defined using either single quotes

In [27]:
'Hello, World!'

'Hello, World!'

or double quotes

In [28]:
"Hello, World!"

'Hello, World!'

The **print** statement is often used for printing character strings or other data types. 

In [29]:
greeting = "Hello, World!"
introduction = "Welcome to this introduction to Python!"
print(greeting)
print("The area is", area)

Hello, World!
The area is 600


In the above snippet, the integer 600 (stored in the variable `area`) was converted into a string before being printed out. 

**Exercise 2:** Use the `+` operator to concatenate the strings strings `greeting` and `introduction` together to form a combined string.

In [30]:
print(greeting + introduction)

Hello, World!Welcome to this introduction to Python!


**Exercise 3:** The resulting string is missing a space in between the words 'World' and 'Welcome'. Correct this by inserting a third string into the sum.

In [31]:
print(greeting + " " + introduction)

Hello, World! Welcome to this introduction to Python!


**Exercise 4:** Use the built-in function `str` to turn the integer 8471 into a string. Call the resulting object `d`.

In [32]:
d_int = 8471
d = str(d_int)
print(type(d_int), type(d))

<class 'int'> <class 'str'>


To include formatting into your print statements, you can use expressions that should be familiar to you from $\textsf{R}$:

In [33]:
print("pi is approximately %.8f \nEuler's number e is approximately %.3f" % (math.pi, math.e))

pi is approximately 3.14159265 
Euler's number e is approximately 2.718


Note that in the previous command, we didn't have to import the `math` library again; importing it once per session will suffice. 

Python has a set of built-in functions and methods that allow you to manipulate strings. The terms built-in function and method refer to two different things. Methods are associated with the object of a particular type they belong to. Typically methods are used in the form `object.method(arguments)`, see examples below. Built-in functions on the other hand can be invoked just by its name and are usually applicable to any object type, see for instance `len`. 

We mention a couple of them, you figure out what they do. 
<a id="string_cell"></a>

In [34]:
statement = "Hello, World! Welcome to this introduction to Python!"
print(statement)
print(len(statement)) 
print(statement.lower())
print(statement.replace("!", "."))
print(statement.count("o"))

Hello, World! Welcome to this introduction to Python!
53
hello, world! welcome to this introduction to python!
Hello, World. Welcome to this introduction to Python.
8


**Exercise 5:** Apply a single method to `statement` to transform all lowercase letters to uppercase letters, and vice versa. (Hint: Type `statement.<Tab>` and find an appropriate method. That is, type `statement.` in the code cell  below and then hit the `<Tab>` key on your keyboard). 

In [35]:
print(statement.upper())

HELLO, WORLD! WELCOME TO THIS INTRODUCTION TO PYTHON!


## Lists

Very often in a programming language, one wants to keep a group of similar items together. Python does this using a data type called **lists** which are constructed using square brackets `[ ]` or the built-in `list` function. 

In [36]:
days_of_the_week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
prime_numbers = [2, 3, 5, 9, 11, 13, 17, 19, 23, 29]

**Exercise 6:** Use the `append` method (see the [documentation](https://docs.python.org/3/tutorial/datastructures.html) if needed) to attach the integer 31 to the end of `prime_numbers`. Print the result. 

In [37]:
print(prime_numbers)
prime_numbers.append(31)
print(prime_numbers)


[2, 3, 5, 9, 11, 13, 17, 19, 23, 29]
[2, 3, 5, 9, 11, 13, 17, 19, 23, 29, 31]


You can access members of the list using the **index** of that item. One can also use these to re-define objects within a list.

In [38]:
days_of_the_week[2]

'Wednesday'

In [39]:
prime_numbers[3] = 7
prime_numbers

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31]

**IMPORTANT**: Python lists (unlike $\mathsf{R}$ lists) use 0 as the index of its first element. Thus, in this example, the 0 element of `days_of_the_week` is "Monday", 1 is "Tuesday", and so on. If you need to access the $n$th element from the end of your list, you can use a negative index. For example, the -1 element of a list is the last element:

In [40]:
days_of_the_week[-1]

'Sunday'

The `range()` command is a convenient way to make sequential lists of numbers and will often be used when creating `for` loops:

In [41]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Note that `range(n)` starts at 0 and gives the sequential list of integers strictly less than $n$. So it's a list of size $n$, ranging from $0$ up to and including $n-1$. If you want to start at a different number, use `range(start, stop)`. 

In [42]:
list(range(2,8))

[2, 3, 4, 5, 6, 7]

The lists created above with range() have a *stepsize* of 1 between elements. You can also give a fixed step size via a third argument:

In [43]:
even = list(range(2, 20, 2))
print(even)
print("The fourth smallest positive even number is", even[3])

[2, 4, 6, 8, 10, 12, 14, 16, 18]
The fourth smallest positive even number is 8


**Exercise 7:** Use `+` to concatenate the lists `days_of_the_week` and `prime_numbers`, and print the result. 

In [44]:
print(days_of_the_week + prime_numbers)

['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday', 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31]


**Exercise 8:** Use the function `len` to count the number of elements in `prime_numbers`. 

In [45]:
len(prime_numbers)

11

We can split a string into a list where each word is a list item using any whitespace as separator (note that it doesn't take punctuation into account):
<a id="split_cell"></a>

In [46]:
statement = "Hello, World! Welcome to this introduction to Python!"
x = statement.split(" ")
x

['Hello,', 'World!', 'Welcome', 'to', 'this', 'introduction', 'to', 'Python!']

**Exercise 9:** Use the *method* `sort` to sort `days_of_the_week` alphabetically, and print the result. 

In [47]:
days_of_the_week.sort()
print(days_of_the_week)

['Friday', 'Monday', 'Saturday', 'Sunday', 'Thursday', 'Tuesday', 'Wednesday']


## Tuples

Like a list, a tuple is an ordered sequence of Python objects. Crucially, unlike a list, a tuple is an immutable object. This means that once the tuple is defined, its length and its objects cannot be changed anymore.

One can define a tuple using commas only. Parentheses `()` are optional.

In [48]:
tuple1 = (1, 2, ['tree','house',9.9] , 4, 'king')
tuple2 = ('queen', 'door', 'leaf')

**Exercise 10:** Use the command `tuple` to create a tuple out of the string `greeting` defined earlier.

In [49]:
print(tuple(greeting))

('H', 'e', 'l', 'l', 'o', ',', ' ', 'W', 'o', 'r', 'l', 'd', '!')


**Exercise 11:** Try to append the integer 6 to `tuple1`, like you did earlier for lists. You will encounter an error, because tuples are immutable. 

In [50]:
tuple1.append(6)

AttributeError: 'tuple' object has no attribute 'append'

## Dictionaries

With a dictionary, you can connect a value to another value to represent the relationship between them in your code. This is similar to a regular dictionary, which connects words with their description. In this example, the dictionary connects a number name (string) with their value (integer). 

You can define a dictionary by enclosing a comma-separated list of key-value pairs in curly brackets `{ }`. A colon `:` separates each key from its associated value:

In [51]:
numNames = {"One": 1,
            "Two": 2, 
            "Three": 3}

**Keys** in dictionaries are the equivalent of indices in lists to access a value. The **values** are what you can access with their corresponding key.

A value is retrieved from a dictionary by specifying its corresponding key in square brackets `[ ]` instead of the index number. If you refer to a key that is not in the dictionary, Python raises an exception:

In [52]:
print(numNames["One"])
print(numNames["Four"])

1


KeyError: 'Four'

Adding an entry to an existing dictionary is simply a matter of assigning a new key to a value:

In [53]:
numNames["Four"] = 4
numNames

{'One': 1, 'Two': 2, 'Three': 3, 'Four': 4}

Similar to adding a new key-value pair, we can can just as easily modify a key-value pair.

In [54]:
numNames["One"] = 10
numNames

{'One': 10, 'Two': 2, 'Three': 3, 'Four': 4}

Sometimes it can be very helpful to check if a key already exists in a dictionary (remember that keys have to be unique). To check whether a single key is in the dictionary, use the `in` keyword. 

In [55]:
print("Two" in numNames)
print("Five" in numNames)

True
False


This `in` operator checks the keys, not the values. You can use the `in` operator to check if a value is in a dictionary with `<dict>.values()`: 

In [56]:
print(3 in numNames.values())
print(7 in numNames.values())

True
False


**Exercise 12:** Create a dictionary called `id` containing the key-value pairs `"Name" : <Your name>`, `"Age" : <Your age>`,  `"Nationality" : <Your nationality>`.

In [57]:
id = {
    "Name": "Povilas",
    "Age": 22,
    "Nationality": "Lithuanian"
}
print(id)

{'Name': 'Povilas', 'Age': 22, 'Nationality': 'Lithuanian'}


# 3. Control flow

Control flow statements are an essential part of Python and programming in general. In this section we introduce to most important ones.

## `for`-loops
One of the most useful things to do with lists is to iterate through them, i.e. to go through each element one at a time. To do this in Python, we use the `for` statement:

In [58]:
days_of_the_week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]

for day in days_of_the_week:
    print(day)

Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday


This code snippet goes through each element of the list called `days_of_the_week` and assigns it to the variable `day`. It then executes everything in the indented block (in this case only one line of code, the print statement) using those variable assignments. When the program has gone through every element of the list, it exits the block. 

In $\mathsf{R}$ we would have used curly brackets `{}` to define the beginning and end of these blocks. Python uses a colon `:`, followed by an indentation (`<tab>` or four spaces) to define code blocks. Everything at a higher level of indentation is taken to be in the same block. In the above example the block was only a single line, but we could have had longer blocks as well:

In [59]:
for day in days_of_the_week:
    statement = "Today is " + day + "."
    print(statement)

Today is Monday.
Today is Tuesday.
Today is Wednesday.
Today is Thursday.
Today is Friday.
Today is Saturday.
Today is Sunday.


The `range()` command is particularly useful in combination with the `for` statement to execute loops of a specified length. We also included some formatting for numbers. 

In [60]:
squares = [] # Creating an empty list, which will be filled inside the for loop.

for i in range(20):
    squares.append(i**2)
    print("The square of %2.d is %3.d" % (i, i**2))

print(squares)

The square of  0 is   0
The square of  1 is   1
The square of  2 is   4
The square of  3 is   9
The square of  4 is  16
The square of  5 is  25
The square of  6 is  36
The square of  7 is  49
The square of  8 is  64
The square of  9 is  81
The square of 10 is 100
The square of 11 is 121
The square of 12 is 144
The square of 13 is 169
The square of 14 is 196
The square of 15 is 225
The square of 16 is 256
The square of 17 is 289
The square of 18 is 324
The square of 19 is 361
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361]


You can iterate over dictionaries using a for loop. There are various approaches to do this and they are all equally relevant. The principal option is to iterate over the keys as follows:

In [61]:
for name in numNames:
    print("%s is the name for %d." % (name, numNames[name]))

One is the name for 10.
Two is the name for 2.
Three is the name for 3.
Four is the name for 4.


## Problem 2: Strings, Lists, Loops and Dictionaries (2 points)

You are given a dictionary `capitals` that maps countries to their capital cities. **Use a `for` loop** to construct a new dictionary `reverse_capitals` whose keys are the capitals and their values are their corresponding country. For example, if `capitals = {"France": "Paris", "Germany": "Berlin"}`, then `reverse_capitals` should be `{"Paris": "France", "Berlin", "Germany"}`.

Again: replace the lines that say `# YOUR CODE HERE` and `raise NotImplementedError()` with your solution. 

In [62]:
capitals = {"France": "Paris",
            "Germany": "Berlin",
            "Italy": "Rome",
            "Spain": "Madrid",
            "Portugal": "Lisbon",
            "Netherlands": "Amsterdam"}

reverse_capitals = {}

# YOUR CODE HERE
for country in capitals:
    reverse_capitals[capitals[country]] = country

In [63]:
# Test case
print(reverse_capitals["Amsterdam"])

Netherlands


Expected output:

    Netherlands

In [64]:
### AUTOGRADER
assert reverse_capitals["Paris"] == "France"
assert reverse_capitals["Berlin"] == "Germany"
assert reverse_capitals["Rome"] == "Italy"
assert reverse_capitals["Amsterdam"] == "Netherlands"


## Booleans and Truth Testing

We invariably need some concept of *condition* in programming to control branching behaviour to allow a program to react differently to different situations. If it's Monday, I'll go to work. But if it's Sunday, I'll sleep in. To do this in Python, we use a combination of **boolean** variables, which evaluate to either `True` of `False`, and `if` statements that control branching based on boolean values.

In [65]:
day = "Sunday"

if day == "Sunday":
    print("Sleep in.")
else:
    print("Go to work.")

Sleep in.


The `==` operator performs *equality testing*. If the two items are equal, it returns `True`, otherwise it returns `False`. Be aware of the difference between a single equality `=` that is used in assigning variables, and a double equality `==` which is used to test whether two variables are equal. 

The first block of code is followed by an `else` statement, which is executed if nothing else in the above statement is true. Since the value was True, this code was not executed. 

You can compare any data type in Python:

In [66]:
1 == 2

False

In [67]:
50 == 2*25

True

In [68]:
3 < math.pi

True

In [69]:
1 == 1.0

True

In [70]:
1 != 0

True

In [71]:
1 <= 2

True

In [72]:
1 >= 1

True

Finally, note that you can also string multiple comparisons together, which can result in very intuitive tests:

In [73]:
hours = 5
0 < hours < 24

True

This would be equivalent to

In [74]:
hours = 5
(hours > 0) and (hours < 24)

True

If statements can have `elif` parts ("else if"), in addition to if/else parts. For example:

In [75]:
day = "Saturday"

if day == "Sunday":
    print("Sleep in.")
elif day == "Saturday":
    print("Do chores.")
else:
    print("Go to work.")

Do chores.


Of course we can combine if statements with for-loops, to make a snippet that is almost interesting:

In [76]:
for day in days_of_the_week:
    statement = "Today is " + day + "."
    print(statement)
    if day == "Sunday":
        print("   Sleep in.")
    elif day == "Saturday":
        print("   Do chores.")
    else:
        print("   Go to work.")

Today is Monday.
   Go to work.
Today is Tuesday.
   Go to work.
Today is Wednesday.
   Go to work.
Today is Thursday.
   Go to work.
Today is Friday.
   Go to work.
Today is Saturday.
   Do chores.
Today is Sunday.
   Sleep in.


# 4. Functions

Similar to variables, Python allows for user-defined functions that can be easily reused. In this course we will rely heavily on user-defined functions. 

User-defined functions are declared using the `def` keyword. The core of the function is coded inside the indented part of the function. The output of a function is declared by the `return` statement.

In [77]:
def proportion(a, b):
    """
    We can use triple quotation marks " to describe what our function does.
    Here for example: Calculates the proportion of a w.r.t. a+b.
    """
    p = a / (a + b)
    return p

# In R this code would have looked something like this
#
# proportion <- function(a,b) {
#     p <- a / (a + b)
#     return(p)
# }

We can call our function by passing it input values:

In [78]:
proportion(20, 21)

0.4878048780487805

In [79]:
proportion(3, 1)

0.75

We can read our description of the function by typing

In [80]:
proportion?

[0;31mSignature:[0m [0mproportion[0m[0;34m([0m[0ma[0m[0;34m,[0m [0mb[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
We can use triple quotation marks " to describe what our function does.
Here for example: Calculates the proportion of a w.r.t. a+b.
[0;31mFile:[0m      /tmp/ipykernel_3299342/3241175346.py
[0;31mType:[0m      function

A function can also have multiple output arguments. They are returned as a tuple.

In [81]:
def plusminus(a, b):
    """This is the docstring of the function plusminus."""
    return a+b, a-b

c, d = plusminus(1, 2)
print(c, d)

3 -1


A function can have multiple `return` statements. As soon as a `return` statement is encountered for the first time, the function is terminated. If no `return` statement is encountered, the function output is `None`.

**IMPORTANT**: unlike $\mathsf{R}$ where an explicit `return` statement is not necessary, a Python function will not return anything unless you specifically mention what it needs to return. 

In [82]:
def signtest(a):
    if a > 0:
        return 'Positive'
    if a < 0:
        return 'Negative'
    
print(signtest(-1.2))

print(signtest(0)) # Function output is None, since no return statement is encountered.

Negative
None


Instead of positional arguments, we can also pass keyword arguments (using the `=` sign). For keyword arguments, the order does not matter. But positional arguments always need to precede keyword argument. 

In [83]:
proportion(b=2, a=3) # For keyword arguments the order does not matter.

0.6

It often happens that keyword arguments are used in the definition of a function. In that case they are used to specify default values for an argument.

In [84]:
def mypower(x, y=2):  # Positional (non-keyword) arguments always precede keyword arguments.
    return x**y 

print(mypower(3))
print(mypower(3, 2))
print(mypower(3, y=2))
print(mypower(y=2, x=3))

9
9
9
9


As a last example we will create a function to calculate the sum of the first $n$ integers
$$\sum_{k=1}^{n} k$$

In [85]:
def arithmetic_series(n):
    total = 0
    for k in range(1, n+1):
        total = total + k
    return(total)

print(arithmetic_series(9))

45


Note that we are using `range(1, n+1)` to get a list of numbers from $1$ (inclusive) to $n+1$ (exclusive). 


From other courses, we recognize this is the arithmetic series that satisfies the simple closed form expression
$$\sum_{k=1}^{n} k = \frac{n(n+1)}{2}.$$
Let us verify this for some fixed value.

In [86]:
n = 9
expected_sum = n * (n+1) / 2
total = arithmetic_series(n)
total == expected_sum

True

There is a final shortcut in the form of the `sum` function. We combine this with a nice way to create lists by means of *list comprehensions*:

In [87]:
n = 9
first_n_integers = [k for k in range(1, n+1)]
print(first_n_integers)
print(sum(first_n_integers))

[1, 2, 3, 4, 5, 6, 7, 8, 9]
45


## Problem 3: Even-Odd Splitter (2 points)

Implement the function `split_even_odd` below that takes a list of integers and returns a tuple containing two lists:

- The first list contains all even numberse from the input.
- The second list contains all odd numbers from the input.

For example `split_even_odd([1, 2, 3, 4, 5])` should return `([2, 4], [1, 3, 5])`. Make sure the order of the numbers in the input list is preserved in the output.

You can check your function with the code in the cell below. Feel free to also include your own test cases in case you need to debug your solution. Again: **ONLY** replace the lines that say `# YOUR CODE HERE` and `raise NotImplementedError()` with your solution. 

**Hint:** You can use the modulo operator `%` to find the remainder after integer division. For example $17~ \% ~5 = 2$ since $17 = 3*5 + 2$.

In [94]:
def split_even_odd(numbers):
    """
    Splits a list of integers into two lists: 
        - one containing the even numbers
        - one containing the odd numbers
    
    Parameters
    ----------
        numbers (list): List of integers
        
    Returns
    -------
        (tuple of (list, list)): A tuple (evens, odds) where evens contains all even numbers
        and odds contains all odd numbers, in the same order as input.
        
    """
    
    # YOUR CODE HERE
    evens =  []
    odds = []
    for num in numbers:
        if num == 0:
            print("The list contains 0 :(")
        elif num % 2 == 0:
            evens.append(num)
        else:
            odds.append(num)

    return(evens, odds)

In [95]:
# You can use this code cell to play around with your function to make sure
# it does what it is intended to do, i.e. to debug your code. 
3 %2

1

In [96]:
# Test cases
print(split_even_odd([1, 2, 3, 4, 5]))
print(split_even_odd([7]))

([2, 4], [1, 3, 5])
([], [7])


Expected output:
    
    ([2, 4], [1, 3, 5])
    ([], [7])

In [98]:
# AUTOGRADING
evens, odds = split_even_odd([1, 2, 3, 4, 5])
assert evens == [2, 4]
assert odds == [1, 3, 5]

evens, odds = split_even_odd([10, 11, 12, 13])
assert evens == [10, 12]
assert odds == [11, 13]

evens, odds = split_even_odd([])
assert evens == []
assert odds == []


## Problem 4: Tribonacci Numbers (2 points)

You might have heard of the Fibonacci sequence, which is a sequence that starts with 0 and 1, and then each successive entry is the sum of the previous two. Thus, the sequence goes 
$$0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,...$$

Lesser known is the tribonacci sequence, which is a sequence that starts with 0, 0 and 1, and then each successive entry is the sum of the previous three. Thus the sequence goes
$$0, 0, 1, 1, 2, 4, 7, 13, 24, 44, 81, 149, 274, ...$$

Write a function called `tribonacci` that returns a **list** of the first $n$ tribonacci numbers. Again: replace the lines that say `# YOUR CODE HERE` and `raise NotImplementedError()` with your solution. 

In [138]:
def tribonacci(n):
    """
    Returns a list containing the first n tribonacci numbers. 
    
    Parameters
    ----------
        n (int): number of tribonacci numbers to be returned 
        
    Returns
    -------
        (list): list containing the first n tribonacci numbers
    
    """
    
    # YOUR CODE HERE
    seq = [0,0,1]
    if n == 1:
        return [seq[n-1]]
    elif n == 2:
        return [seq[n-2], seq[n-1]]
    else:
        for i in range(n-3):
            last_three = seq[-3:]
            seq.append(sum(last_three))
        return(seq)

In [140]:
# You can use this code cell to play around with your function to make sure
# it does what it is intended to do, i.e. to debug your code. 
seq = [1,2,3,4,5,6]
seq[-4:]

[3, 4, 5, 6]

In [142]:
# Test cases
print(tribonacci(1))
print(tribonacci(2))
print(tribonacci(12))

[0]
[0, 0]
[0, 0, 1, 1, 2, 4, 7, 13, 24, 44, 81, 149]


Expected output: 

    [0]
    [0, 0]
    [0, 0, 1, 1, 2, 4, 7, 13, 24, 44, 81, 149]

In [143]:
# AUTOGRADING
assert tribonacci(12) == [0, 0, 1, 1, 2, 4, 7, 13, 24, 44, 81, 149]
assert tribonacci(1) == [0]


## Problem 5: Plagiarism Detection (3 points)

In this exercise we want to compute a metric that measures the distance between two text documents. This allows us to find out whether two documents are similar. This is important for instance if you are Google, that tries to find how well a website matches to your search query. Or how Youtube and Netflix offer you relevant recommendations for the best next video or show to watch based on your past activity. This distance will also allow us to detect duplicates (like Wikipedia mirrors) and cases of plagiarism (you are warned). 

The idea is to define distance in terms of shared words. We will think of a document D as a dictionary such that D[W] = # occurrences of word W. In other terms, we only keep track of how often each word occurs in the document. For example the documents 

    "The dog ate the homework."
and 

    "The cat ate the homework."
will be thought of as two dictionaries
    
    {"the": 2, "dog": 1, "ate": 1, "homework": 1}
    {"the": 2, "cat": 1, "ate": 1, "homework": 1}

Note that we make no distinction between upper and lower case words. To make the two documents comparable, we want to ensure that both dictionaries have the same keys, so

    d1 = {"the": 2, "dog": 1, "ate": 1, "homework": 1, "cat": 0}
    d2 = {"the": 2, "dog": 0, "ate": 1, "homework": 1, "cat": 1}

An appropriate distance measure between two documents we will use is the angle between these documents. In this week's lecture we have learned that we can compute the angle between two vectors $\vec{x}$ and $\vec{y}$ as
$$\angle(\vec{x}, \vec{y}) = \cos^{-1}\left(\frac{\langle \vec{x}, \vec{y}\rangle}{\|\vec{x}\| \|\vec{y}\|}\right).$$
If we extract only the word frequencies of the two documents as lists, we can compute their angle:

    x = [2, 1, 1, 1, 0]
    y = [2, 0, 1, 1, 1]
    angle = arccos(6/7)
    
If the angle is zero radians, this means the two documents are identical (in terms of word counts), whereas an angle of $\frac{\pi}{2}$ radians means there are no common words. 

Implement the following three functions below: `word_list` that converts a string into a list containing each word (lowercase) separely, the function `word_frequencies` that converts a string of words into a dictionary of word-count pairs, and the function `dictionary_angle` that computes the angle between the count-values of two dictionaries. The exact instructions are described in their docstrings. 

You may assume that the only types of punctuation that occur in the two documents are: `.,!?:;`. We also assume each document ends with a period. It could we useful to have a look back at the section on strings [here](#string_cell). 

In [2]:
def word_list(document):
    """
    Returns a list of lowercase words from a string, in the order
    in which they appear in the document. 
    That is: 
        1. remove the punctuation symbols, 
        2. turn every word into lowercase words 
        3. split the resulting string into a list of individual words. 
    
    Parameters
    ----------
        document (str): The string that is converted into a list.
        
        
    Returns
    -------
        document_list (list): The list containing the lowercase words in document.
        
    Example
    -------
        document = "The dog ate the homework."
        word_list(document)
            ["the", "dog", "ate", "the", "homework"]
            
    """
    remove_characters = ".,!?:;"
    
    # YOUR CODE HERE
    for char in remove_characters:
        document = document.replace(char, "")

    document = document.lower()
    document_list = document.split(" ")
    
    return(document_list)

In [None]:
# You can use this code cell to play around with your function to make sure
# it does what it is intended to do, i.e. to debug your code. 


In [3]:
# Test case
test_document = "The dog ate the homework. Or did the cat eat the homework?"
print(word_list(test_document))

['the', 'dog', 'ate', 'the', 'homework', 'or', 'did', 'the', 'cat', 'eat', 'the', 'homework']


Expected output:
    
    ['the', 'dog', 'ate', 'the', 'homework', 'or', 'did', 'the', 'cat', 'eat', 'the', 'homework']

In [4]:
def word_frequencies(word_list1, word_list2):
    """
    Returns two dictionaries whose keys are the words that occur in the union
    of the two wordlists, and whose corresponding value is the word count
    in their respective document.
    
    Parameters
    ----------
        word_list1 (list): A list containing lowercase words.
        word_list2 (list): A second list containing lowercase words.
        
    Returns
    -------
        dictionary1 (dict): Dictionary containing word-count pairs for word_list1,
                            whose keys are words that occur in word_list1 or word_list2.
        dictionary2 (dict): Dictionary containing word-count pairs for word_list2,
                            whose keys are words that occur in word_list1 or word_list2,
                            in the same order as dictionary1.
                            
    Example
    -------
        word_list1 = ["the", "dog", "ate", "the", "homework"]
        word_list2 = ["the", "cat", "ate", "the", "homework"]
        dict1, dict2 = word_frequencies(document1, document2)
            ({'the': 2, 'dog': 1, 'ate': 1, 'homework': 1, 'cat': 0},
             {'the': 2, 'dog': 0, 'ate': 1, 'homework': 1, 'cat': 1})
    
    """
    dictionary1, dictionary2 = {}, {}
    combined_word_list = word_list1 + word_list2
        
    for word in combined_word_list:
        dictionary1[word] = 0
        dictionary2[word] = 0
        
    # YOUR CODE HERE
    for word in word_list1:
        dictionary1[word] += 1

    for word in word_list2:
        dictionary2[word] += 1

    return(dictionary1, dictionary2)

In [5]:
# You can use this code cell to play around with your function to make sure
# it does what it is intended to do, i.e. to debug your code. 
["the", "dog", "ate", "the", "homework"] + ["the", "cat", "ate", "the", "homework"]

['the',
 'dog',
 'ate',
 'the',
 'homework',
 'the',
 'cat',
 'ate',
 'the',
 'homework']

In [6]:
# Test case
test_word_list1 = ["the", "dog", "ate", "the", "homework"]
test_word_list2 = ["the", "cat", "ate", "the", "homework"]
dict1, dict2 = word_frequencies(test_word_list1, test_word_list2)
print(dict1)
print(dict2)

{'the': 2, 'dog': 1, 'ate': 1, 'homework': 1, 'cat': 0}
{'the': 2, 'dog': 0, 'ate': 1, 'homework': 1, 'cat': 1}


Expected output:
    
    {'the': 2, 'dog': 1, 'ate': 1, 'homework': 1, 'cat': 0}
    {'the': 2, 'dog': 0, 'ate': 1, 'homework': 1, 'cat': 1}

In [10]:
import math

def dictionary_angle(dictionary1, dictionary2):
    """
    Returns the angle between two word-count dictionaries. 
    
    YOU DO NOT HAVE TO MAKE USE OF NUMPY ARRAYS TO COMPUTE INNER PRODUCTS OR LENGTHS OF VECTORS,
    THIS WILL BE COVERED IN A LATER ASSIGNMENT

    Hint: Use math.acos for the inverse cosine. 
    
    Parameters
    ----------
        dictionary1 (dict): First input dictionary containing word-count pairs
        dictionary2 (dict): Second input dictionary containing word-count pairs,
                            whose keys (the words) are the same as dictionary1,
                            and also in the same order.
    
    Returns
    -------
        angle (float): The angle between the value vectors (the word counts).
        
    Example
    -------
        dictionary1 = {'the': 2, 'dog': 1, 'ate': 1, 'homework': 1, 'cat': 0}
        dictionary2 = {'the': 2, 'dog': 0, 'ate': 1, 'homework': 1, 'cat': 1}
        dictionary_angle(dictionary1, dictionary2)
            0.541099525957146
    """
    
    # YOUR CODE HERE
    numerator = 0
    for word in dictionary1:
        numerator += dictionary1[word] * dictionary2[word]

    sum1, sum2 = 0, 0
    for word in dictionary1:
        sum1 += dictionary1[word]**2
        sum2 += dictionary2[word]**2

    length1, length2 = math.sqrt(sum1), math.sqrt(sum2)
    denumerator = length1 * length2

    return math.acos(numerator / denumerator)

In [None]:
# You can use this code cell to play around with your function to make sure
# it does what it is intended to do, i.e. to debug your code. 


In [8]:
# Test case
test_dictionary1 = {'the': 2, 'dog': 1, 'ate': 1, 'homework': 1, 'cat': 0}
test_dictionary2 = {'the': 2, 'dog': 0, 'ate': 1, 'homework': 1, 'cat': 1}
dictionary_angle(test_dictionary1, test_dictionary2)

0.541099525957146

Expected output:

    0.541099525957146

### Case study for autograding (NOT AN EXERCISE)

Here are two excerpts, one from a speech Michelle Obama gave in 2008 and one from Melania Trump eight years later of similar nature. It serves as a nice test case for autograding. You can try running these cells to see if your functions work correctly. The first cell is used for grading your function `word_list`, the second for `word_frequencies` and the third for `dictionary_angle`. 

In [11]:
# AUTOGRADING

Obama = """And Barack and I were raised with so many of the same values:
         that you work hard for what you want in life; that your word 
         is your bond and you do what you say you're going to do; that 
         you treat people with dignity and respect, even if you don't know 
         them, and even if you don't agree with them. And Barack and I set 
         out to build lives guided by these values, and to pass them on to 
         the next generation. Because we want our children and all children 
         in this nation to know that the only limit to the height of your 
         achievements is the reach of your dreams and your willingness to 
         work for them."""

Trump = """From a young age, my parents impressed on me the values that you 
         work hard for what you want in life, that your word is your bond and 
         you do what you say and keep your promise, that you treat people 
         with respect. They taught and showed me values and morals in their 
         daily lives. That is a lesson that I continue to pass along to our 
         son. And we need to pass those lessons on to the many generations to 
         follow. Because we want our children in this nation to know that the 
         only limit to your achievements is the strength of your dreams and 
         your willingness to work for them."""

list_Obama = word_list(Obama)
list_Trump = word_list(Trump)

assert list_Obama[20] == 'want'
assert list_Trump[20] == 'life'


AssertionError: 

In [12]:
# AUTOGRADING

list_Obama = ['and', 'barack', 'and', 'i', 'were', 'raised', 'with', 'so', 'many', 'of', 'the', 'same', 'values', 
              'that', 'you', 'work', 'hard', 'for', 'what', 'you', 'want', 'in', 'life', 'that', 'your', 'word', 
              'is', 'your', 'bond', 'and', 'you', 'do', 'what', 'you', 'say', "you're", 'going', 'to', 'do', 'that',
              'you', 'treat', 'people', 'with', 'dignity', 'and', 'respect', 'even', 'if', 'you', "don't", 'know', 
              'them', 'and', 'even', 'if', 'you', "don't", 'agree', 'with', 'them', 'and', 'barack', 'and', 'i', 'set', 
              'out','to', 'build', 'lives','guided', 'by', 'these', 'values', 'and', 'to', 'pass', 'them', 'on', 'to', 
              'the', 'next', 'generation', 'because', 'we', 'want', 'our', 'children', 'and', 'all', 'children', 
              'in', 'this', 'nation', 'to', 'know', 'that', 'the', 'only', 'limit', 'to', 'the', 'height', 'of', 'your', 
              'achievements', 'is', 'the', 'reach', 'of', 'your', 'dreams', 'and', 'your', 'willingness', 'to', 
              'work', 'for', 'them']

list_Trump = ['from', 'a', 'young', 'age', 'my', 'parents', 'impressed', 'on', 'me', 'the', 'values', 'that', 'you',
              'work', 'hard', 'for', 'what', 'you', 'want', 'in', 'life', 'that', 'your', 'word', 'is', 'your', 'bond',
              'and', 'you', 'do', 'what', 'you', 'say', 'and', 'keep', 'your', 'promise', 'that', 'you', 'treat', 'people',
              'with', 'respect', 'they', 'taught', 'and', 'showed', 'me', 'values', 'and', 'morals', 'in', 'their', 
              'daily', 'lives', 'that', 'is', 'a', 'lesson', 'that', 'i', 'continue', 'to', 'pass', 'along', 'to', 'our',
              'son', 'and', 'we', 'need', 'to', 'pass', 'those', 'lessons', 'on', 'to', 'the', 'many', 'generations', 'to', 
              'follow', 'because', 'we', 'want', 'our', 'children', 'in', 'this', 'nation', 'to', 'know', 'that', 'the',
              'only', 'limit', 'to', 'your', 'achievements', 'is', 'the', 'strength', 'of', 'your', 'dreams', 'and', 
              'your', 'willingness', 'to', 'work', 'for', 'them']

dict_Obama, dict_Trump = word_frequencies(list_Obama, list_Trump)

assert dict_Obama['you'] == 7
assert dict_Trump['to'] == 8


In [13]:
# AUTOGRADING
dict_Obama = {'and': 10, 'barack': 2, 'i': 2, 'were': 1, 'raised': 1, 'with': 3, 'so': 1, 'many': 1, 
              'of': 3, 'the': 5, 'same': 1, 'values': 2, 'that': 4, 'you': 7, 'work': 2, 'hard': 1, 
              'for': 2, 'what': 2, 'want': 2, 'in': 2, 'life': 1, 'your': 5, 'word': 1, 'is': 2, 
              'bond': 1, 'do': 2, 'say': 1, "you're": 1, 'going': 1, 'to': 7, 'treat': 1, 'people': 1, 
              'dignity': 1, 'respect': 1, 'even': 2, 'if': 2, "don't": 2, 'know': 2, 'them': 4, 'agree': 1, 
              'set': 1, 'out': 1, 'build': 1, 'lives': 1, 'guided': 1, 'by': 1, 'these': 1, 'pass': 1, 
              'on': 1, 'next': 1, 'generation': 1, 'because': 1, 'we': 1, 'our': 1, 'children': 2, 
              'all': 1, 'this': 1, 'nation': 1, 'only': 1, 'limit': 1, 'height': 1, 'achievements': 1, 
              'reach': 1, 'dreams': 1, 'willingness': 1, 'from': 0, 'a': 0, 'young': 0, 'age': 0, 'my': 0, 
              'parents': 0, 'impressed': 0, 'me': 0, 'keep': 0, 'promise': 0, 'they': 0, 'taught': 0, 
              'showed': 0, 'morals': 0, 'their': 0, 'daily': 0, 'lesson': 0, 'continue': 0, 'along': 0, 
              'son': 0, 'need': 0, 'those': 0, 'lessons': 0, 'generations': 0, 'follow': 0, 'strength': 0} 

dict_Trump = {'and': 6, 'barack': 0, 'i': 1, 'were': 0, 'raised': 0, 'with': 1, 'so': 0, 'many': 1, 'of': 1, 
              'the': 4, 'same': 0, 'values': 2, 'that': 6, 'you': 5, 'work': 2, 'hard': 1, 'for': 2, 'what': 2, 
              'want': 2, 'in': 3, 'life': 1, 'your': 6, 'word': 1, 'is': 3, 'bond': 1, 'do': 1, 'say': 1, 
              "you're": 0, 'going': 0, 'to': 8, 'treat': 1, 'people': 1, 'dignity': 0, 'respect': 1, 'even': 0, 
              'if': 0, "don't": 0, 'know': 1, 'them': 1, 'agree': 0, 'set': 0, 'out': 0, 'build': 0, 'lives': 1, 
              'guided': 0, 'by': 0, 'these': 0, 'pass': 2, 'on': 2, 'next': 0, 'generation': 0, 'because': 1, 
              'we': 2, 'our': 2, 'children': 1, 'all': 0, 'this': 1, 'nation': 1, 'only': 1, 'limit': 1, 
              'height': 0, 'achievements': 1, 'reach': 0, 'dreams': 1, 'willingness': 1, 'from': 1, 'a': 2, 
              'young': 1, 'age': 1, 'my': 1, 'parents': 1, 'impressed': 1, 'me': 2, 'keep': 1, 'promise': 1, 
              'they': 1, 'taught': 1, 'showed': 1, 'morals': 1, 'their': 1, 'daily': 1, 'lesson': 1, 
              'continue': 1, 'along': 1, 'son': 1, 'need': 1, 'those': 1, 'lessons': 1, 'generations': 1, 
              'follow': 1, 'strength': 1} 

angle1 = dictionary_angle(dict_Obama, dict_Trump)
angle2 = dictionary_angle(dict_Trump, dict_Obama)

assert math.isclose(angle1, angle2)
assert math.isclose(angle1, 0.578729546285134)
