# Project 1: Setup, Prerequisites, and Image Classification

## Course Policies

Here are some important course policies. These are also located at
http://www.ds100.org/fa17/.

**Tentative Grading**

There will be 7 challenging projects. Projects must be completed
individually and will mix programming and short answer questions.

**Collaboration Policy**

Data science is a collaborative activity. While you may talk with others about
the homework, we ask that you **write your solutions individually**. If you do
discuss the assignments with others please **include their names** at the top
of your solution.

## This assignment

In this project, we'll cover:

[__Part 1: Prerequisites__](#Part-1:-Prerequisites)

This part goes over prerequisites to taking DS100. You should be able to quickly work through the coding and conceptual questions.

* How to set up Jupyter on your own computer.
* How to check out and submit assignments for this class.
* Python basics, like defining functions.
* How to use the `numpy` library to compute with arrays of numbers.
* Partial derivatives and matrix expressions

[__Part 2: Edge Detection__](#Part-2:-Edge-Detection)

In part 2, you'll use your knowledge to implement a basic image edge detection algorithm.

* Image processing using NumPy
* Edge detection using image gradients

[__Part 3: Image Classification__](#Part-3:-Image-Classification)

* Image classification using gradient magnitudes
* Kaggle competition

## Due Date

This assignment is due at 11:59pm Friday, September 1. Instructions for submission are at the bottom of this assignment.

## Part 1: Prerequisites

### Setup

If you haven't already, go through the instructions at
http://www.ds100.org/fa17/setup.

The instructions for submission are at the end of this notebook.

You should now be able to open this notebook in Jupyter and run cells.

### Running a Cell

Try running the following cell.  If you unfamiliar with Jupyter Notebooks consider skimming [this tutorial](http://nbviewer.jupyter.org/github/jupyter/notebook/blob/master/docs/source/examples/Notebook/Notebook%20Basics.ipynb) or selecting **Help -> User Interface Tour** in the menu above. 

In [None]:
print("Hello World!")

Even if you are familiar with Jupyter, we strongly encourage you to become proficient with keyboard shortcuts (this will save you time in the future).  To learn about keyboard shortcuts go to **Help -> Keyboard Shortcuts** in the menu above. 

Here are a few we like:
1. `ctrl`+`return` : *Evaluate the current cell*
1. `shift`+`return`: *Evaluate the current cell and move to the next*
1. `esc` : *command mode* (required before using any of the commands below)
1. `a` : *create a cell above*
1. `b` : *create a cell below*
1. `d` : *delete a cell*
1. `m` : *convert a cell to markdown*
1. `y` : *convert a cell to code*

### Setup Grading Tools 

First, let's make sure you have the latest version of okpy.

In [None]:
!pip install -U okpy

### Testing your Setup

If you've set up your environment properly, this cell should run without problems:

In [None]:
import math
import numpy as np
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
from datascience import *
import skimage
import skimage.io
import skimage.filters

from client.api.notebook import Notebook
ok = Notebook('proj1.ok')

Now, run this cell to log into OkPy:

In [None]:
ok.auth()

### Python

Python is the main programming language we'll use in the course. We expect that you've taken CS61A or an equivalent class, so you should be able to explain the following cells. Run them and make sure you understand what is happening in each.

If this seems difficult, please review one or more of the following materials.

- **[Python Tutorial](https://docs.python.org/3.5/tutorial/)**: Introduction to Python from the creators of Python
- **[Composing Programs Chapter 1](http://composingprograms.com/pages/11-getting-started.html)**: This is more of a introduction to programming with Python.
- **[Advanced Crash Course](http://cs231n.github.io/python-numpy-tutorial/)**: A fast crash course which assumes some programming background.


#### Mathematical Expressions

In [None]:
# This is a comment.
# In Python, the ** operator performs exponentiation.
math.sqrt(math.e ** (-math.pi + 1))

#### Output and Printing

In [None]:
"Why didn't this line print?"

print("Hello" + ",", "world!")

"Hello, cell" + "output!"

#### For Loops

In [None]:
# A for loop repeats a block of code once for each
# element in a given collection.
for i in range(5):
    if i % 2 == 0:
        print(2**i)
    else:
        print("Odd power of 2")

#### List Comprehension

In [None]:
[str(i) + " sheep." for i in range(1,5)] 

In [None]:
[i for i in range(10) if i % 2 == 0]

In [None]:
# A list comprehension is a convenient way to apply a function
# to each element in a given collection.
# The String method join appends together all its arguments
# separated by the given string.  So we append each element produced
# by the list comprehension, each separated by a newline ("\n").
print("\n".join([str(2**i) if i % 2 == 0 else "Odd power of 2"
                 for i in range(5)]))


#### Defining Functions

In [None]:
def add2(x):
    """This docstring explains what this function does: it adds 2 to a number."""
    return x + 2

#### Getting Help

In [None]:
help(add2)

In [None]:
add2?

You can close the window at the bottom by pressing `esc` several times. 

#### Passing Functions as Values

In [None]:
def makeAdder(amount):
    """Make a function that adds the given amount to a number."""
    def addAmount(x):
        return x + amount
    return addAmount

add3 = makeAdder(3)
add3(4)

In [None]:
makeAdder(3)(4)

#### Anonymous Functions and Lambdas

In [None]:
# add4 is very similar to add2, but it's been created using a lambda expression.
add4 = lambda x: x + 4
add4(5)

#### Recursion

In [None]:
def fib(n):
    if n <= 1:
        return 1
    else:
        # Functions can call themselves recursively.
        return fib(n-1) + fib(n-2)

fib(6)

### Question 1

#### Question 1a
Write a function nums_reversed that takes in an integer `n` and returns a string
containing the numbers 1 through `n` including `n` in reverse order, separated
by spaces. For example:

    >>> nums_reversed(5)
    '5 4 3 2 1'

***Note:*** The ellipsis (`...`) indicates something you should fill in.  It *doesn't* necessarily imply you should replace it with only one line of code.

In [None]:
def nums_reversed(n):
    ...

In [None]:
_ = ok.grade('q01a')
_ = ok.backup()

#### Question 1b

Write a function `string_splosion` that takes in a non-empty string like
`"Code"` and returns a long string containing every prefix of the input.
For example:

    >>> string_splosion('Code')
    'CCoCodCode'
    >>> string_splosion('data!')
    'ddadatdatadata!'
    >>> string_splosion('hi')
    'hhi'

**Hint:** Try to use recursion. Think about how you might answering the following two questions:
1. **[Base Case]** What is the `string_splosion` of the empty string?
1. **[Inductive Step]** If you had a `string_splosion` function for the first $n-1$ characters of your string how could you extend it to the $n^{th}$ character? For example, `string_splosion("Cod") = "CCoCod"` becomes `string_splosion("Code") = "CCoCodCode"`.


In [None]:
def string_splosion(string):
    ...

In [None]:
_ = ok.grade('q01b')
_ = ok.backup()

#### Question 1c

Write a function `double100` that takes in a list of integers
and returns `True` only if the list has two `100`s next to each other.

    >>> double100([100, 2, 3, 100])
    False
    >>> double100([2, 3, 100, 100, 5])
    True


In [None]:
def double100(nums):
    ...

In [None]:
_ = ok.grade('q01c')
_ = ok.backup()

### NumPy and Tables

The `NumPy` library lets us do fast, simple computing with numbers in Python. The `datascience` Table class from Data 8 gives us simple operations on tabular data.

You should be able to understand the code in the following cells. If not, review the following:

- [Inferential Thinking Chapter 4](https://www.inferentialthinking.com/chapters/04/4/arrays.html)
- [Inferential Thinking Chapter 5](https://www.inferentialthinking.com/chapters/05/tables.html)
- [Inferential Thinking Chapter 6](https://www.inferentialthinking.com/chapters/05/tables.html)
- [Inferential Thinking Chapter 7](https://www.inferentialthinking.com/chapters/07/functions-and-tables.html)

**Jupyter pro-tip**: Pull up the docs for any function in Jupyter by running a cell with
the function name and a `?` at the end:

In [None]:
np.arange?

**Another Jupyter pro-tip**: Pull up the docs for any function in Jupyter by typing the function
name, then `<Shift>-<Tab>` on your keyboard. Super convenient when you forget the order
of the arguments to a function. You can press `<Tab>` multiple tabs to expand the docs.

Try it on the function below:

In [None]:
np.linspace

You can use the tips above to help you deciper the following code.

In [None]:
# Let's take a 20-sided die...
NUM_FACES = 20

# ...and roll it 4 times
rolls = 4

# What's the probability that all 4 rolls are different? It's:
# 20/20 * 19/20 * 18/20 * 17/20
prob_diff = np.prod((NUM_FACES - np.arange(rolls))
                    / NUM_FACES)
prob_diff

In [None]:
# Let's compute that probability for 1 roll, 2 rolls, ..., 20 rolls.
# The array ys will contain:
# 
# 20/20
# 20/20 * 19/20
# 20/20 * 18/20
# ...
# 20/20 * 19/20 * ... * 1/20

xs = np.arange(20)
ys = np.cumprod((NUM_FACES - xs) / NUM_FACES)

# Python slicing works on arrays too
ys[:5]

In [None]:
# Plot those probabilities. You should know how to interpret this plot!
die_probs = Table().with_columns(
    'Num Rolls', xs,
    'P(all different)', ys,
)
die_probs.plot(0, 1)

In [None]:
# Mysterious...
mystery = np.exp(-xs ** 2 / (2 * NUM_FACES))
mystery

In [None]:
# If you're curious, this is the exponential approximation for our probability:
# https://textbook.prob140.org/ch1/Exponential_Approximation.html
die_probs.with_column('Mystery', mystery).plot(0)

### Question 2

This question uses the table shown in [Inferential Thinking Chapter 11](https://www.inferentialthinking.com/chapters/11/2/bootstrap.html).

The `sf` table contains the Organization, Job, and Total Compensation of all public workers in San Francisco in 2015 who made above $10000. 

In [None]:
sf = Table.read_table('san_francisco_2015.csv')
sf.set_format(2, NumberFormatter(0))
sf.show(3)

#### Question 2a

Create a Table called `richest` that contains the top 10 highest-paid public workers in SF.

In [None]:
richest = ...
richest

In [None]:
_ = ok.grade('q02a')
_ = ok.backup()

#### Question 2b

Create a Table called `orgs` that contains two columns. The first column should have one row for each Organization Group and the second column should contain the number of workers from each group. The table should be sorted in descending order of the number of workers in each group.

In [None]:
orgs = ...
orgs

In [None]:
_ = ok.grade('q02b')
_ = ok.backup()

#### Question 2c

You should have noticed that there was only one person in the General City Responsibilities organization. What is that person's total compensation? Store it in `lone_ranger`.

In [None]:
lone_ranger = ...
lone_ranger

In [None]:
_ = ok.grade('q02c')
_ = ok.backup()

## Multivariable Calculus and Linear Algebra

The following questions ask you to recall your knowledge of multivariable calculus and linear algebra. We will use some of the most fundamental concepts from each discipline in this class, so the following problems should at least seem familiar to you.

For the following problems, you should use LaTeX to format your answer. If you aren't familiar with LaTeX, not to worry. It's not hard to use in a Jupyter notebook. Just place your math in between dollar signs:

\$ f(x) = 2x \$ becomes $ f(x) = 2x $.

If you have a longer equation, use double dollar signs:

\$\$ \sum_{i=0}^n i^2 \$\$ becomes:

$$ \sum_{i=0}^n i^2 $$.

[For more about basic LaTeX formatting, you can read this article.](https://www.sharelatex.com/learn/Mathematical_expressions)

If you have trouble with these topics, we suggest reviewing:

- [Khan Academy's Multivariable Calculus](https://www.khanacademy.org/math/multivariable-calculus)
- [Khan Academy's Linear Algebra](https://www.khanacademy.org/math/linear-algebra)

### Question 3

#### Question 3a

Suppose we have the following scalar-valued function on $x$ and $y$:

$$ f(x, y) = x^2 + 4xy + 2y^3 $$

Derive the partial derivative with respect to $x$.

$$ \frac{\partial}{\partial x} f = ... $$

#### Question 3b

Suppose we have the same function $f$:

$$ f(x, y) = x^2 + 4xy + 2y^3 $$

Derive $ \nabla f(x, y) $, the vector-valued gradient of $f$.

$$
\nabla f(x, y) = \begin{bmatrix}
   ... \\
   ... \\
\end{bmatrix}
$$

#### Question 3c

Suppose we have the same function $f$:

$$ f(x, y) = x^2 + 4xy + 2y^3 $$

What is the direction of steepest ascent at the point $ (0, 0)$ ? At $ (5, 0) $? Leave your answers as a vector. (You don't have to normalize your vector, although it may be helpful for you later.)

As we keep moving in the positive x direction, what happens to the direction of steepest ascent? What happens to the magnitude?

*Write your answer here, replacing this text.*

### Question 4

#### Question 4a

Joey, Deb, and Sam are shopping for fruit at Berkeley Bowl. Berkeley Bowl, true to its name, only sells fruit bowls. A fruit bowl contains some fruit and the price of a fruit bowl is the total price of all of its individual fruit.

Berkeley Bowl has apples, bananas, and cantaloupes. The price of each of these can be written in a vector:

$$
\vec{v} = \begin{bmatrix}
     2 \\
     1 \\
     4 \\
\end{bmatrix}
$$

So the price of a single apple is $2 (expensive!).

Berkeley Bowl sells the following fruit bowls:

1. 2 of each fruit
2. 5 apples and 8 bananas
3. 2 bananas and 3 cantaloupes
4. 10 cantaloupes

Create a matrix $B$ such that the matrix-vector multiplication

$$
B\vec{v}
$$

evaluates to a length 4 column vector containing the price of each fruit bowl. The first entry of the result should be the cost of fruit bowl #1, the second entry the cost of fruit bowl #2, etc.

$$
B = \begin{bmatrix}
    ... % Use the & character to separate entries in a row, and \\ to start a new row.
\end{bmatrix}
$$

#### Question 4b

Joey, Deb, and Sam make the following purchases:

- Joey buys 2 fruit bowl #1's and 1 fruit bowl #2.
- Deb buys 1 of each fruit bowl.
- Sam buys 10 fruit bowl #4s (he really like cantaloupes).

Create a matrix $A$ such that the matrix expression

$$
AB\vec{v}
$$

evaluates to a length 3 column vector containing how much each of them spent. The first entry of the result should be the total amount spent by Joey, the second entry the amount sent by Deb, etc.

$$
A = \begin{bmatrix}
    ... % Use the & character to separate entries in a row, and \\ to start a new row.
\end{bmatrix}
$$

#### Question 4c

Now, compute the multiplication $ AB\vec{v} $ to get the actual amounts spend by each person. Who spent the most money?

$$
AB\vec{v} = \begin{bmatrix}
    ... % Use the & character to separate entries in a row, and \\ to start a new row.
\end{bmatrix}
$$

#### Question 4d

Let's suppose Berkeley Bowl changes their fruit prices, but you don't know what they changed their prices to. Joey, Deb, and Sam buy the same quantity of fruit baskets and the number of fruit in each basket is the same, but now they each spent these amounts:

$$
\vec{x} = \begin{bmatrix}
    80 \\
    80 \\
    100 \\
\end{bmatrix}
$$

Write a single matrix expression that evaluates to the new prices of the individual fruits. You may use the variables $A$, $B$, and $\vec{x}$ in your answer.

*Write your answer here, replacing this text*

## End of Part 1

Note that this notebook is not the complete assignment, so you won't be able to submit it. When the completed assignment is released, you should download it and copy over your solutions there.