In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab01.ipynb")

# Lab 01: Python Basics

Welcome to the first lab of Advanced Data Science! This lab is meant to help you familiarize yourself with the basics of Python. In this lab you will review `NumPy`, create collection arrays, practice writing `if` statement and `for` loops, and make a plot using `matplotlib`, a Python visualization library.

To receive credit for a lab, answer all questions correctly and submit before the deadline.

**Due Date:**

**Collaboration Policy:** Data science is a collaborative activity. While you may talk with others about the labs, we ask that you **write your solutions individually**. If you do discuss the assignments with others **please include their names below** (it's a good way to learn your classmates' names).

**Collaborators:** 

List collaborators here.

# 1. Importing Libraries and Magic Commands

In Advanced Data Science, we will be using common Python libraries to help us process data. By convention, we import all libraries at the very top of the notebook. There are also a set of standard aliases that are used to shorten the library names. Below are some of the libraries that you may encounter throughout the course, along with their respective aliases.

## Importing Libraries

Run the cell below, but please **do not** change it.

In [None]:
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")

## Magic Commands

`%matplotlib inline` is a Jupyter magic command that configures the notebook so that Matplotlib displays any plots that you draw directly in the notebook rather than to a file, allowing you to view the plots upon executing your code.

Another useful magic command is `%%time`, which times the execution of that cell. You can use this by writing it as the first line of a cell.

**Note:** `%%` is used for cell magic commands that apply to the entire cell, whereas `%` is used for line magic commands that only apply to a single line.

Run the cell below.

In [None]:
%%time

# Create an empty list
my_list = []

# for loop to iterate through the values 0 to 99
for i in range(100):
    my_list.append(i) # The .append method will add an item to the end of a list

The cell above uses the `.append()` method to add an item to a list. 

The time for executing the previous cell is given in $\mu$s (i.e. micro seconds). 

**Question 1.** How many seconds are in 1 micro-second? Assign your answer to `micro_second`.


In [None]:
micro_second = ...
micro_second

In [None]:
grader.check("q1")

# 2. Keyboard Shortcuts

Even if you are familiar with Jupyter, we strongly encourage you to become proficient with keyboard shortcuts (this will save you time in the future). To learn about keyboard shortcuts, go to **Help $\to$ Keyboard Shortcuts** in the menu above.

Here are a few that we like:

1. Ctrl + Return: Evaluate the current cell
2. Shift + Return: Evaluate the current cell and move to the next
3. ESC: command mode (may need to press before using any of the commands below)
 -  a: create a cell above
 -  b: create a cell below
 - dd: delete a cell
 -  z: undo the last cell operation
 -  m: convert a cell to markdown
 -  y: convert a cell to code
 
 

# 3. Prerequisites

It's time to answer some review questions. Each question has a response cell directly below it. Most response cells are followed by a test cell that runs automated tests to check your work. Please don't delete questions, response cells, or test cells. You won't get credit for your work if you do.

If you have extra content in a response cell, such as an example call to a function you're implementing, that's fine.

To receive full credit on this assignment, you must pass all test cases by the deadline. Most all test cases are **public** for labs.

## Python

Python is the main programming language we'll use in the course. We expect that you've taken Introduction to Data Science or an equivalent class, so we will not be covering general Python syntax. If any of the below exercises are challenging (or if you would like to refresh your Python knowledge), please review one or more of the following materials.

- [Python Tutorial:](https://docs.python.org/3.5/tutorial/) Introduction to Python from the creators of Python.
- [Composing Programs Chapter 1:](http://composingprograms.com/pages/11-getting-started.html) This is more of a introduction to programming with Python.
- [Advanced Crash Course in `NumPy`:](http://cs231n.github.io/python-numpy-tutorial/) A fast crash course which assumes some programming background.

**Question 2.** Write a function named `summation` that evaluates the following summation 

$$\sum_{i=1}^n i^3+3i^2$$

for $n \geq 1$.


In [None]:
def summation(n):
    """Compute the summation i^3 + 3 * i^2 for 1 <= i <= n.
    """
def summation(n):
    total = ...
    return ...

In [None]:
grader.check("q2")

## Lists 

* Lists are used to store multiple items in a single variable. 

* Lists are a built-in data type in Python used to store collections of data.
.
* Lists may contain different types

* List items are ordered, changeable, and allow duplicate values. 

* List items are indexed, the first item has index `[0]`, the second item has index `[1]` etc

* Lists are created using square brackets. For example, `my_list = [2, 3, 5, 7, 11]`, is a list of the first 5 prime numbers. 

**Question 3.** Write a function named `list_sum` that computes the square of each value in `list_1`, the cube of each value in `list_2`, then returns a list containing the element-wise sum of these results. 

**Note:** Assume that `list_1` and `list_2` have the same number of elements.


In [None]:
def list_sum(list_1, list_2):
    """Compute x^2 + y^3 for each x in list_1, and each y in list_2. 
       Assume list_1 and list_2 have the same length. Both arguments 
       (lists) must have the same number of elements.
    """
    ...
    
    for ...
    
    return ...

In [None]:
grader.check("q3")

**Question 4.** Write a function named `average_of_all` that takes a number and returns the average of all inputs on which it has ever been called. For example, `average_of_all(3)` would return 2 because 

$$1 + 2 + 3 = 6$$

and 

$$\frac{6}{3} = 2.$$

**Hint:** You will need a check to make sure the value passed to the function is not 0. Otherwise assume that all values passed to `n` are positive. 


In [None]:
def average_of_all(n):
    """Return the average of all arguments ever passed to the average function.
    >>> average(1)
    1.0
    >>> average(3)
    2.0
    >>> average(8)
    4.5
    >>> average(0)
    0.0
    """
    return ...

In [None]:
grader.check("q4")

## NumPy

`NumPy` is the numerical computing module introduced in Foundations of Data Science, which is a prerequisite for this course. Here's a quick recap of `NumPy`. For more review, read the following [`NumPy` Quick Start Tutorial](https://docs.scipy.org/doc/numpy-1.15.4/user/quickstart.html).

The core of `NumPy` is the array. Like Python lists, arrays store data; however, they store data in a more efficient manner. In many cases, this allows for faster computation and data manipulation.

In Foundations of Data Science, we used `make_array` from the `datascience` module, but that's not the most typical way. Instead, use `np.array` to create an array. It takes a sequence, such as a list or range.

**Question 5.** Below, create an array named `arr` containing the values 5, 4, 3, 2, and 1 (in that order).


In [None]:
arr = ...

In [None]:
grader.check("q5")

In addition to values in the array, we can access attributes such as shape and data type. A full list of attributes can be found [here](https://docs.scipy.org/doc/numpy-1.15.0/reference/arrays.ndarray.html#array-attributes).

In [None]:
# The fourth item in the array
arr[3]

In [None]:
# Items 2 through 3 not including 3
arr[2:4]

In [None]:
# The dimensions (i.e. rows and columns)
# arr is one-dimensional array with 5 rows
arr.shape

In [None]:
# The data type for all the elements is numerical (integer)
arr.dtype

Arrays, unlike Python lists, cannot store items of different data types.

In [None]:
# A regular Python list can store items of different data types
[1, '3']

In [None]:
# Arrays will convert everything to the same data type
np.array([1, '3'])

In [None]:
# Another example of array type conversion
np.array([5, 8.3])

Notice that the 5 was changed from an integer to a decimal (float).

Arrays are also useful in performing **vectorized** operations. Given two or more arrays of equal length, arithmetic will perform element-wise computations across the arrays.

For example, observe the following:

In [None]:
# Python list addition will concatenate the two lists
[1, 2, 3] + [4, 5, 6]

In [None]:
# NumPy array addition will add them element-wise
np.array([1, 2, 3]) + np.array([4, 5, 6])

**Question 6.** Given the array `random_arr`, assign `valid_values` to an array containing all values $x$ such that $2x^4>1$.


**Note:** The `np.random.seed` function sets a [seed](https://en.wikipedia.org/wiki/Random_seedhttps://en.wikipedia.org/wiki/Random_seed), a number (or vector) used to initialize a random number generator. The random number generator needs a number to start with (a seed value), to be able to generate a random number.

In [None]:
# Do not delete this line. It sets a seed so that the results are the same each time.
np.random.seed(42)

random_arr = np.random.rand(60)
valid_values = ...
valid_values

In [None]:
grader.check("q6")

**Question 7.** Use `NumPy` to recreate your answer to **Question 3.**. The input parameters will both be lists, so you will need to convert the lists into arrays before performing your operations. Additionally, you will need to make sure that both lists have the same number of items. Assume that both lists are numeric data types.

**Hint:** You can use the `np.array` function to create a `NumPy` array.


In [None]:
def array_sum(list_1, list_2):
    """Compute x^2 + y^3 for each x, y in list_1, list_2.
       Assume list_1 and list_2 have the same length and return 
       a NumPy array. Both arguments (arrays) must have the 
       same number of elements.
    """ 
    ...
    
    return ...

In [None]:
grader.check("q7")

You might have been told that Python is slow, but array arithmetic is carried out very fast, even for large arrays. For ten numbers, `list_sum` and `array_sum` both take a similar amount of time.

**Note:** Line 1 in the code cell below uses list comprehension to create a list of 10 elements starting at 0 and ending at 9. For more information on list comprehension and when to use it, click [here](https://realpython.com/list-comprehension-python/).

In [None]:
# Here we use list comprehension to create a list
# of 10 elements starting at 0 and ending at 9.
sample_list_1 = [x for x in range(10)] 

# Here we use a NumPy function, np.arange(10)
# to create an array of 10 elements starting 
# at 0 and ending at 9.
sample_array_1 = np.arange(10)

In [None]:
%%time
list_sum(sample_list_1, sample_list_1)

In [None]:
%%time
array_sum(sample_array_1, sample_array_1)

The time difference seems negligible for a list/array of size 10; depending on your setup, you may even observe that `list_sum` executes faster than `array_sum`. However, we will commonly be working with much larger datasets:

In [None]:
# Here we use list comprehension to create a list
# of 100000 elements starting at 0 and ending at 99999.
sample_list_2 = [x for x in range(100000)]

# Here we use a NumPy function, np.arange(100000)
# to create an array of 100000 elements starting 
# at 0 and ending at 99999.
sample_array_2 = np.arange(100000)

In [None]:
%%time
list_sum(sample_list_2, sample_list_2)

# The semicolon hides the output
;

In [None]:
%%time
array_sum(sample_array_2, sample_array_2)

# The semicolon hides the output
;

With the larger dataset, we see that using `NumPy` results in code that executes over 50 times faster. Throughout this course (and in the real world), you will find that writing efficient code will be important; arrays and vectorized operations are the most common way of making Python programs run quickly.

# 4. Collections

## List Comprehension

Lists in Python can be created by just placing the sequence inside the square brackets `[ ]`.

* Empty list `[ ]`

* List of numbers `[1, 3, 5, 7]`

* List of strings `['red', 'white', 'blue']

* List of mixed data types `[1, 2, 'red', 3, 'white', 4, 'blue']`

Lists can be created using a `for` loop or using list comprehension

**Question 8.** Create a list of the first 100 even integers using a `for` loop and the `.append()` method. Save the list to `even_list`.


In [None]:
# Initialize empty list
even_list = ...

for ...
even_list[0:10]

In [None]:
grader.check("q8")

**Question 9.** Create a list of the first 100 odd integers using a list comprehension. Save the list to `odd_list`.


In [None]:
odd_list = ...
odd_list[0:10]

In [None]:
grader.check("q9")

## Dictionary

A Python dictionary is a mutable object, and it contains the data in the form of key-value pairs. Each key is separated from its value by a colon (:). For example, a dictionary can be created like this 

`my_dict = {'key1': value, 'key2': value, 'key3': value}`


Dictionaries are a widely used data structure, and a good understanding of its methods and operations is necessary for doing data science. 

**Question 10.** Below are two lists. Using a `for` loop and the `update()` method of a dictionary convert them into a dictionary in a way that item from `keys_1` is the key and item from `values_1` is the value. Save it to `d1`.

`keys_1 = ['Ten', 'Twenty', 'Thirty']`

`values_1 = [10, 20, 30]`

**Note:** Click [here](https://www.w3schools.com/python/ref_dictionary_update.asp) to read about how to use the `update` method.


In [None]:
keys_1 = ['Ten', 'Twenty', 'Thirty']
values_1 = [10, 20, 30]

# Intialize an empty dictionary
d1 = ...

for ...
print(d1)

In [None]:
grader.check("q10")

**Question 11.** Below are two lists. Using the `dict` function and the `zip` function convert them into a dictionary in a way that item from `keys_2` is the key and item from `values_2` is the value. Save it to `d2`.

`keys_2 = ['Forty', 'Fifty', 'Sixty']`

`values_2 = [40, 50, 60]`

**Note:** Click [here](https://www.w3schools.com/python/ref_func_zip.asp) to read about how to use the `zip` function and [here](https://www.w3schools.com/python/ref_func_dict.asp) to read about how to use the `dict` function. 


In [None]:
keys_2 = ['Forty', 'Fifty', 'Sixty']
values_2 = [40, 50, 60]

d2 = ...
print(d2)

In [None]:
grader.check("q11")

# 4. Matplotlib

We're going to start by going through the official `pyplot` tutorial. Please click [here](https://matplotlib.org/stable/tutorials/introductory/pyplot.htmlhttps://matplotlib.org/stable/tutorials/introductory/pyplot.html) and at some point before the end of the week go through the tutorial notebook and familiarize yourself with the basics of `pyplot`. This should take roughly 25 minutes.

**Note:** The tutorial uses `np.arange`, which returns an array that steps from $a$ to $b$ with a fixed step size $s$. While this is fine in some cases, we sometimes prefer to use `np.linspace(a, b, N)`, which divides the interval $[a, b]$ into $N$ equally spaced points.

For example, `np.linspace` always includes both end points while `np.arange` will not include the second end point $b$. For this reason, when we are plotting ranges of values we tend to prefer `np.linspace`.

Notice how the following two statements have different parameters but return the same result.

In [None]:
np.arange(-5, 6, 1.0)

In [None]:
np.linspace(-5, 5, 11)

Now that you're familiar with the basics of pyplot, let's practice with a plotting question.

<!-- BEGIN QUESTION -->

**Question 12.** Let's visualize the function $f(t) = 3\sin(2\pi t)$.

- Set the $x$ limit of all figures to $[0, \pi]$ and the $y$ limit to $[-10, 10]$. 

- Plot the sine function using `plt.plot` with 30 red plus signs. 

- Make sure the $x$ ticks are labeled $[0, \frac{\pi}{2}, \pi]$, and that your axes are labeled as well. 

**Note:** Click [here](https://matplotlib.org/api/pyplot_api.html) to use the matplotlib documentation for reference.

Your plot should look like the following:

<center><img src="images/graph1.png"></center>

**Hints:** 

* You can set axis bounds with `plt.axis`.

* You can set xticks and labels with `plt.xticks`.

* Make sure you add `plt.xlabel`, `plt.ylabel`, and `plt.title`.


In [None]:
t = ...
y = 3*np.sin(2*np.pi*t)
plt.ylim((..., ...))
plt.scatter(..., ..., ..., ..., linewidth=1)
plt.xticks([0, np.pi/2, np.pi],[r'$0$', r'$\pi/2$', r'$\pi$'])
plt.xlabel('t')
plt.ylabel('f(t)')
plt.title('f(t) = 3sin(2$\pi$t)');

<!-- END QUESTION -->



---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

When done exporting, download the .zip file by `SHIFT`-clicking on the file name and selecting **Save Link As**. Or, find the .zip file in the left side of the screen and right-click and select **Download**. You'll submit this .zip file for the assignment in Canvas to Gradescope for grading.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)