In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab01.ipynb")

# Run the cell below

To run a code cell (i.e.; execute the python code inside a Jupyter notebook) you can click the play button on the ribbon underneath the name of the notebook. Before you begin click the "Run cell" button at the top that looks like ▶| or hold down `Shift` + `Return`.

## Lab 01: Python Basics

Welcome to the first lab of Advanced Topics in Data Science! Throughout the course you will complete assignments like this one. You can't learn technical subjects without hands-on practice, so these assignments are an important part of the course.

**Collaboration Policy:**

Collaborating on labs is more than okay -- it's encouraged! You should rarely remain stuck for more than a few minutes on questions in labs, so ask a neighbor or an instructor for help. Explaining things is beneficial, too -- the best way to solidify your knowledge of a subject is to explain it. You should **not** _just_ copy/paste someone else's code, but rather work together to gain understanding of the task you need to complete. 

**Due Date:** 

## Today's Assignment 

In today's assignment, you'll review:

- the basics of Python

- `NumPy`

- creating collection arrays

- writing `if` statements and `for` loops.

First, set up the imports by running the cell below.

In [None]:
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")

## 1. Magic Commands

`%matplotlib inline` is a Jupyter magic command that configures the notebook so that Matplotlib displays any plots that you draw directly in the notebook rather than to a file, allowing you to view the plots upon executing your code.

Another useful magic command is `%%time`, which times the execution of that cell. You can use this by writing it as the first line of a cell.

**Note:** `%%` is used for cell magic commands that apply to the entire cell, whereas `%` is used for line magic commands that only apply to a single line.

Run the cell below.

In [None]:
%%time

# Create an empty list
my_list = []

# for loop to iterate through the values 0 to 99
for i in range(100):
    my_list.append(i) # The .append method will add an item to the end of a list

The time for executing the previous cell is given in $\mu$s (i.e. micro seconds).

**Question 1.** How many seconds are in 1 micro-second? Assign your answer to `micro_second`.

In [None]:
micro_second = ...
micro_second

In [None]:
grader.check("q1")

## 2. Keyboard Shortcuts

Even if you are familiar with Jupyter, we strongly encourage you to become proficient with keyboard shortcuts (this will save you time in the future). To learn about keyboard shortcuts, go to **Help $\to$ Keyboard Shortcuts** in the menu above.

Here are a few that we like:

1. Ctrl + Return: Evaluate the current cell

1. Shift + Return: Evaluate the current cell and move to the next

1. ESC: command mode (may need to press before using any of the commands below)
 -  a: create a cell above
 -  b: create a cell below
 - dd: delete a cell
 -  z: undo the last cell operation
 -  m: convert a cell to markdown
 -  y: convert a cell to code

## 3. Prerequisites

It's time to answer some review questions. Each question has a response cell directly below it. Most response cells are followed by a test cell that runs automated tests to check your work. Please don't delete questions, response cells, or test cells. You won't get credit for your work if you do.

If you have extra content in a response cell, such as an example call to a function you're implementing, that's fine.

### Python

Python is the main programming language we'll use in the course. We expect that you've taken Foundations of Data Science or an equivalent class, so we will not be covering general Python syntax. If any of the below exercises are challenging (or if you would like to refresh your Python knowledge), please review one or more of the following materials.

- [Python Tutorial: An Introduction to Python from the Creators of Python.](https://docs.python.org/3/tutorial/index.html)

- [Advanced Crash Course in `NumPy`](http://cs231n.github.io/python-numpy-tutorial/)

**Question 1.** Write a function named `summation` that evaluates the following summation 

$$\sum_{i=1}^n i^3+3i^2$$

for $n \geq 1$. Put a check in function to make sure that $n$ is an integer and the value of $n$ is greater than or equal to 1. If the value of $n$ is not greater than or equal to 1 print the message

```
n must be an integer and greater than or equal to 1.
```

and not return a value. 

**Note:** You should test your function with multiple values.

In [None]:
def summation(n):
    """Compute the summation i^3 + 3 * i^2 for 1 <= i <= n."""
    ...

In [None]:
grader.check("q1")

### Lists 

* Lists are used to store multiple items in a single variable. 

* Lists are a built-in data type in Python used to store collections of data.
.
* Lists may contain different types

* List items are ordered, changeable, and allow duplicate values. 

* List items are indexed, the first item has index `[0]`, the second item has index `[1]` etc

* Lists are created using square brackets. For example, `my_list = [2, 3, 5, 7, 11]`, is a list of the first 5 prime numbers. 

**Question 2.** Write a function named `list_sum` that computes the square of each value in `list_1`, the cube of each value in `list_2`, then returns a list containing the element-wise sum of these results. 

**Note:** Assume that `list_1` and `list_2` have the same number of elements.

In [None]:
def list_sum(list_1, list_2):
    """Assume list_1 and list_2 have the same length.
       Compute x^2 + y^3 for each x in list_1, and each y in list_2. 
       Return a list containing the element-wise sum of the results."""
    ...

l1 = [1, 2, 3]
l2 = [4, 5, 6]
list_sum(l1, l2)

In [None]:
grader.check("q2")

**Question 3.** Write a function named `average_of_all` that takes a number and returns the average of the sum of natural numbers less than or equal to $n$, `average_of_all(3)` would return 2 because 

$$1 + 2 + 3 = 6$$

and 

$$\frac{6}{3} = 2$$

**Hint:** You will need a check to make sure the value passed to the function is greater than 0. If the value passed to the function is not greater than 0 return 0.0. Otherwise assume that all values passed to the function are positive integers.

In [None]:
def average_of_all(n):
    """Return the average of the sum all natural numbers less than or equal to n.
       >>> average(1)
       1.0
       >>> average(3)
       2.0
       >>> average(8)
       4.5
       >>> average(0)
       0.0
       >>> average(-2)
       0.0"""
    ...

average_of_all(3)

In [None]:
grader.check("q3")

### NumPy

`NumPy` is the numerical computing module introduced in Foundations of Data Science, which is a prerequisite for this course. Here's a quick recap of `NumPy`.

The core of `NumPy` is the array. Like Python lists, arrays store data; however, they store data in a more efficient manner. In many cases, this allows for faster computation and data manipulation.

In Foundations of Data Science, we used `make_array` from the `datascience` library, but that's not the most typical way. Instead, [use `np.array` to create an array](https://numpy.org/doc/stable/reference/generated/numpy.array.html). It takes a sequence, such as a list or range.

**Question 4.** Below, import the `NumPy` module using the appropriate aliasing. create an array named `arr_5` containing the values 5, 4, 3, 2, 1 (in that order).

In [None]:
...
arr_5

In [None]:
grader.check("q4")

**Question 5.** Create an array named `arr_500` containing the values 500, 499, 498, 497, ..., 1 (in that order).

In [None]:
arr_500 = ...

In [None]:
grader.check("q5")

We can access attributes of an array such as shape and data type. A full list of attributes can be found [here](https://docs.scipy.org/doc/numpy-1.15.0/reference/arrays.ndarray.html#array-attributes).

In [None]:
# The fourth item in the arr_5 array
arr_5[3]

In [None]:
# Items 2 through 3 in the arr_5 array not including the fourth
arr_5[2:4]

In [None]:
# The dimensions (i.e. rows and columns)
# arr_5 is one-dimensional array with 5 elements
arr_5.shape

# This method returns the dimensions as a tuple

In [None]:
# The data type for all the elements is numerical (integer)
arr_5.dtype

Arrays, unlike Python lists, can not store items of different data types.

In [None]:
# A regular Python list can store items of different data types
[1, '3']

In [None]:
# Arrays will convert everything to the same data type
np.array([1, '3'])

Notice that the 1 was changed to a character an put in between quotation marks.

In [None]:
# Another example of array type conversion
np.array([5, 8.3])

Notice that the 5 was changed from an integer to a decimal (float).

Arrays are also useful in performing **vectorized** operations. Given two or more arrays of equal length, arithmetic will perform element-wise computations across the arrays.

For example, observe the following:

In [None]:
# Python list addition will concatenate the two lists
[1, 2, 3] + [4, 5, 6]

In [None]:
# NumPy array addition will add them element-wise
np.array([1, 2, 3]) + np.array([4, 5, 6])

**Question 6.** Given the array `random_arr`, assign `valid_values` to an array containing all values $x$ such that 

$$2x^4>1$$.

In [None]:
# np.random.seed() sets a seed so that 
# the results are the same each time.
# Do not delete this line. 
np.random.seed(42)

# The line below creates an array 
# of 60 random values from the uniform
# distribution.
random_arr = np.random.rand(60)

valid_values = ...
valid_values

In [None]:
grader.check("q6")

**Question 7.** Use `NumPy` to recreate your answer to **Question 2.** The input parameters will both be lists, so you will need to convert the lists into arrays before performing your operations. Additionally, you will need to make sure that both lists have the same number of items. Assume that both lists are numeric data types.

**Hint:** You can use the `np.array` function to create a `NumPy` array.

In [None]:
def array_sum(list_1, list_2):
    """Assume list_1 and list_2 have the same length.
       Compute x^2 + y^3 for each x in list_1, and each y in list_2. 
       Return a list containing the element-wise sum of the results."""
    ...

l1 = [1, 2, 3]
l2 = [4, 5, 6]
array_sum(l1, l2)

In [None]:
grader.check("q7")

You might have been told that Python is slow, but array arithmetic is carried out very fast, even for large arrays. For ten numbers, `list_sum` and `array_sum` both take a similar amount of time.

**Note:** Line 1 in the code cell below uses list comprehension to create a list of 10 elements starting at 0 and ending at 9. For more information on list comprehension and when to use it, click [here](https://realpython.com/list-comprehension-python/).

In [None]:
# Here we use list comprehension to create a list
# of 10 elements starting at 0 and ending at 9.
sample_list_1 = [x for x in range(10)] 

# Here we use a NumPy function, np.arange(10)
# to create an array of 10 elements starting 
# at 0 and ending at 9.
sample_array_1 = np.arange(10)

In [None]:
%%time
list_sum(sample_list_1, sample_list_1)

In [None]:
%%time
array_sum(sample_array_1, sample_array_1)

The time difference seems negligible for a list/array of size 10; depending on your setup, you may even observe that `list_sum` executes faster than `array_sum`. However, we will commonly be working with much larger datasets:

In [None]:
# Here we use list comprehension to create a list
# of 100000 elements starting at 0 and ending at 99999.
sample_list_2 = [x for x in range(100000)]

# Here we use a NumPy function, np.arange(100000)
# to create an array of 100000 elements starting 
# at 0 and ending at 99999.
sample_array_2 = np.arange(100000)

In [None]:
%%time
list_sum(sample_list_2, sample_list_2)

# The semicolon hides the output
;

In [None]:
%%time
array_sum(sample_array_2, sample_array_2)

# The semicolon hides the output
;

With the larger dataset, we see that using `NumPy` results in code that executes over 50 times faster. Throughout this course (and in the real world), you will find that writing efficient code will be important; arrays and vectorized operations are the most common way of making Python programs run quickly.

**Question 9.** Create a list of the first 100 even natural numbers using a `for` loop and the `.append()` method. Save the list to `even_list`.

**Hint:** You must use a for loop to earn all the points for this question.

In [None]:
# Initialize an empty list
even_list = ...

...
even_list[0:10]

In [None]:
grader.check("q8")

**Question 9.** Create a list of the first 100 odd natural numbers using a list comprehension. Save the list to `odd_list`.

**Hint:** You must use list comprehension to earn all the points for this question.

In [None]:
odd_list = ...
odd_list[0:10]

In [None]:
grader.check("q9")

### Dictionary

A Python dictionary is a mutable object, and it contains the data in the form of key-value pairs. Each key is separated from its value by a colon (:). For example, a dictionary can be created like this 

`my_dict = {'key1': value, 'key2': value, 'key3': value}`


Dictionaries are a widely used data structure, and a good understanding of its methods and operations is necessary for doing data science. 

**Question 10.** Below are two lists. Using a `for` loop and the `update()` method of a dictionary convert them into a dictionary in a way that item from `keys_1` is the key and item from `values_1` is the value. Save it to `d`.

`keys_1 = ['Ten', 'Twenty', 'Thirty']`

`values_1 = [10, 20, 30]`

**Note:** Click [here](https://www.w3schools.com/python/ref_dictionary_update.asp) to read about how to use the `update` method.

In [None]:
keys_1 = ['Ten', 'Twenty', 'Thirty']
values_1 = [10, 20, 30]

# Intialize an empty dictionary
d = ...

...
d

In [None]:
grader.check("q10")

**Question 11.** Create a python dictionary whose keys are the positive integers from 1 to 26, and whose values are tuples containing the lowercase and uppercase letters of the English alphabet. For example,

```
{1:('A', 'a'), 2:('B', 'b'), ..., 26: ('Z', 'z')}
```

Name the dictionary `alphabets`.

**Hint:** The [`string` module](https://docs.python.org/3/library/string.html) provides several English-centric values. Remember to use the functions, methods, etc. from a module you need to import the module into your notebook environment.


In [None]:
...
print(alphabet)

In [None]:
grader.check("q11")

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(run_tests=True)