In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab07.ipynb")

# Lab 07: Modeling and Estimation with Loss Functions

Welcome to Lab 07! In this lab you will implement a basic model, define loss functions, and minimize loss functions using numeric libraries. 

To receive credit for a lab, answer all questions correctly and submit before the deadline.

**Due Date:** 

**Collaboration Policy:** Data science is a collaborative activity. While you may talk with others about the labs, we ask that you **write your solutions individually**. If you do discuss the assignments with others **please include their names below** (it's a good way to learn your classmates' names).

**Collaborators:** 

List collaborators here.

Run the cell below.

In [None]:
import pandas as pd
import numpy as np
np.random.seed(42)
import seaborn as sns

%matplotlib inline
import matplotlib.pyplot as plt

# 0. Loading the Tips Dataset

To begin with, we load the `tips` dataset from the `seaborn` library.  The `tips` data contains records of tips, total bill, and information about the person who paid the bill.

In [None]:
df = sns.load_dataset("tips")

print("Number of Records:", len(df))
df.head()

# 1. Constant Model and Loss Functions

## Constant Model

In the modeling context, $y$ represents our "true observations", which are typically what we are trying to model. $\hat{y}$ (pronounced y "hat") represents our prediction for any model. In this lab, we will use the constant model, where our prediction for any input is a constant:

$$\hat{y} = c$$

$c$ is what we call a **parameter**. Our goal is to find the value of our parameter(s) that **best fit our data**. We represent the optimal parameter(s) with $\hat{\theta}$.

We call the constant model a **summary statistic**, as we are determining one number that best "summarizes" a set of values.


## Loss Functions

Loss functions are what we use to determine the optimal parameter(s) for our model. A loss function is a measure of how well a model is able to predict the expected outcome. In other words, it measures the deviations of the predicted values from the observed values. In the formulations below $y$ represents the observed values and $\hat{y}$ stands for our prediction.

- **Squared Loss** (also known as the $L_2$ loss, pronounced "ell-two") $L(y, \hat{y}) = (y - \hat{y})^2$


- **Absolute Loss** (also known as the $L_1$ loss, pronounced "ell-one") $L\left(y, \hat{y} \right) = \left| y - \hat{y} \right|$

# 2. Squared Loss Function

$$L\left(y,\theta \right)=\left(y-\theta \right)^2$$

**Question 1.** Based on the comments below, implement the squared loss function on the constant model $\hat{y}=c$. 

**Note:** Your answer should not use any loops.


In [None]:
def squared_loss(y_obs, c):
    """
    Purpose
    -------
    Calculate the squared loss of the observed data and a summary statistic.
    
    Parameters
    ----------
    y_obs: An observed value
    c:     A constant representing a summary statistic
    
    Returns
    -------
    The squared loss between the observation and the summary statistic.
    """
    ...

In [None]:
grader.check("q1")

<!-- BEGIN QUESTION -->

**Question 2.** Let us now consider the case where `y_obs` equals 10. For arbitrary values of `c`, plot the squared loss using the function you implemented in the previous question. Don't forget to label your graph.


In [None]:
y_obs = ...

# arbitrary values of theta
c_values = np.linspace(0, 20, 100)

# plot
plt.xlabel(r"Choice for $c$")
plt.ylabel(r"...")
plt.plot(..., ...);

<!-- END QUESTION -->

# 3. Average Loss

Our main concern is how "good" or how "bad" the model's predictions are for an entire data set, not just one observation. The average loss of a model 

$$\frac{1}{n}\sum\limits_{i=1}^n L(y_i,\hat{y}_i)$$ 

is a measure of how well the model "fits" the data. 

If squared loss is the loss function, then the average squared loss is referred to as mean squared error (MSE), and is of the following form 

$$\text{MSE}(y,\hat{y})=\frac{1}{n}\sum\limits_{i=1}^n (y_i-\hat{y}_i)^2$$

If absolute loss is the loss function, then the average absolute loss is referred to as mean absolute error (MAE), and is of the following form 

$$\text{MAE}(y,\hat{y})=\frac{1}{n}\sum\limits_{i=1}^n |y_i-\hat{y}_i|$$

where 

- $n$: Number of data values.

- $i$: $i$th value in a data set.

- $y_i$: Value for $i$th datum.

- $\hat y_i$: Prediction for $i$th datum.

Let's apply our knowledge to some real world data. In section **0. Loading the Tips Dataset** we loaded the `tips` dataset from the `seaborn` package.

In this section, you will try to find the best statistic $c$ to represent the tips given in the array. The simple procedure you will use in this lab includes constructing the mean squared error (MSE) for the tips data and finding the value that minimizes the MSE.

**Question 3.** Make an array named `tips` using the `tip` column from the `df` dataframe that was loaded in **Section 0. Loading the Tips Dataset**. 


In [None]:
tips = ...

In [None]:
grader.check("q3")

Now, we can extend the above loss functions to an entire dataset by taking the average. Let the dataset $D$ be the set of observations:

$$D = \{y_1, \ldots, y_n\}$$

where $y_i$ is the $i^{\text{th}}$ tip.

We can define the average loss over the dataset as:

$$R\left(c\right)=\frac{1}{n} \sum_{i=1}^n L(y_i, c)$$

**Question 4.** Define the `mean_squared_error` function which computes the mean squared error given the data and a value for `c`. Assume that `data` will be a `numpy` array.


In [None]:
def mean_squared_error(c, data):
    """
    Purpose
    -------
    Calculate the mean square error.
    
    Parameters
    ----------
    c: A constant representing a summary statistic
    data: A numpy array of data values
    
    Returns
    -------
    The value of the mean square error.
    """
    ...

In [None]:
grader.check("q4")

<!-- BEGIN QUESTION -->

**Question 5.** In the cell below plot the mean squared error for different `c` values. Note that `c_values` are given. Make sure to label the axes on your plot. And remember to use the `tips` variable we defined earlier.


In [None]:
c_values = np.linspace(0, 6, 100)
mse = ...

plt.xlabel(r"Choice for $c$")
plt.ylabel(r"...")
plt.plot(..., ...);

<!-- END QUESTION -->

**Question 6.** Find the value of `c` that minimizes the $L_2$ loss above via observation of the plot you've generated. Round your answer to the nearest integer.


In [None]:
min_observed_mse = ...
min_observed_mse

In [None]:
grader.check("q6")

## Find the Minimizing Value for Our Tips Dataset

The cell below plots some arbitrary 4$^\text{th}$ degree polynomial function. 

In [None]:
x_values = np.linspace(-4, 2.5, 100)

def fx(x):
    return 0.1 * x**4 + 0.2*x**3 + 0.2 * x **2 + 1 * x + 10

plt.plot(x_values, fx(x_values));

By looking at the plot, we see that the $x$ which minimizes the function is slightly larger than $-2$. What if we want the exact value?

The function `minimize` from [`scipy.optimize`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html) will attempt to minimize a function. Try running the cell below, and you will see that minimize seems to get the answer correct.

**Note:** For this lab, we'll just use the `minimize` function. We'll discuss how `minimize` works later in the course.

In [None]:
from scipy.optimize import minimize
minimize(fx, x0 = 1.1)

The `fun` value is the minimum value of the function. The `x` is the $x$ which minimizes the function. We can index into the object returned by `minimize` to get these values. We have to add the additional `[0]` at the end because the minimizing $x$ is returned as an array, but this is not necessarily the case for other attributes (i.e., `fun`). The reason for this is that `minimize` can also minimize multivariable functions.

The parameter `x0` that we passed to the `minimize` function is where the `minimize` function starts looking as it tries to find the minimum. For example, above, `minimize` started its search at $x = 1.1$ because that's where we told it to start. For the function above, it doesn't really matter what $x$ we start at because the function has only a single global minimum. More technically, the function is [convex](https://en.wikipedia.org/wiki/Convex_function), a property of functions that we will discuss later in the course.

Alas, `minimize` isn't perfect. For example, if we give it a function with many valleys (also known as local minima) it can get stuck. For example, consider the function below:

In [None]:
w_values = np.linspace(-2, 10, 100)

def fw(w):
    return 0.1 * w**4 - 1.5*w**3 + 6 * w **2 - 1 * w + 10

plt.plot(w_values, fw(w_values));

If we start the minimization at $w = 6.5$, we'll get stuck in the local minimum at $w = 7.03$. 

**Note:** No matter what your actual variable is called in your function, the `minimize` routine still calls the starting point `x0`.

In [None]:
minimize(fw, x0 = 6.5)['x'][0]

**Question 7.** Using the `minimize` function, find the value of `theta` that minimizes the mean squared error for our tips dataset. In other words, you want to find the exact minimum of the plot that you generated in **Question 5**.

Assign `min_sq_scipy` to the value of `c` that minimizes the MSE according to the `minimize` function.

You can't pass your `mean_squared_error` function to `minimize` because `mean_squared_error` has two variables: `c` and `data`. `minimize` will get confused because it thinks it needs to minimize by picking the best `c` and best `data` values. We only want it to use `c`.

Therefore, you need to pass a function of one variable, `c`, to the `minimize` function. This means you'll need to create a new function of only **one** variable `c`. In practice this is simple, but it can also be very tricky when you do this for the first time. Make sure to ask for help if you get stuck.


In [None]:
def mean_squared_error_with_one_variable(c):
    """
    Purpose
    -------
    Calculate the mean square error with one variable.
    
    Parameters
    ----------
    c: A constant representing a summary statistic 
    
    Returns
    -------
    The value of the mean square error.
    """
    ...

min_sq_scipy = minimize(..., x0 = ...)[...][...]

In [None]:
grader.check("q7")

**Question 8.** The value of `c` that minimizes the mean squared error is the average of the data for the constant model. Assign `min_sq_computed` to the mean of the tips dataset, and compare this to the values you observed in **Questions 5.** and **Question 7**.


In [None]:
min_sq_computed = ...
min_sq_computed

In [None]:
grader.check("q8")

Reflecting on the lab so far, we've now seen 3 ways to find the summary statistic `c` that minimizes the mean squared error:

1. Create a plot of the MSE for the given data array vs. `c` and eyeball the minimizing `c`.

2. Create a function that returns the MSE for a specific data array as a function of `c` and use the scipy `minimize` function to find the exact `c` which minimizes this function.

3. Simply compute the `mean` of the data array.

At this point, you've hopefully convinced yourself that the `mean` of the data is the summary statistic that minimizes mean squared error.

# 3. Absolute Loss Function 

$$L\left(y, c \right) = \left| y - c \right|$$

In this section, you will follow the exact same steps as above but for the absolute loss function.

**Question 9.** In the cell below define the function `abs_loss` which returns the absolute loss given a value of `c` and `y_obs`. 


In [None]:
def abs_loss(y_obs, c):
    """
    Purpose
    -------
    Calculate the absolute loss of the observed data and a summary statistic.
    
    Parameters
    ----------
    y_obs: an observed value
    c: A constant representing a summary statistic
    
    Returns
    -------
    The absolute loss between the observation and the summary statistic.
    """
    ...

In [None]:
grader.check("q9")

<!-- BEGIN QUESTION -->

**Question 10.** Let us now consider the case where `y_obs` equals 10. For arbitrary values of `c`, plot the absolute loss using the function you implemented in the previous question. Don't forget to label your graph.


In [None]:
y_obs = ...

# arbitrary values of theta
c_values = np.linspace(0, 20, 100)

# plot
plt.xlabel(r"Choice for $c$")
plt.ylabel(r"...")
plt.plot(..., ...);

<!-- END QUESTION -->

**Question 11.** Define the `mean_absolute_error` function which computes the mean absolute error given the data and a value for `c`. Assume that `data` will be a `numpy` array.


In [None]:
def mean_absolute_error(c, data):
    """
    Purpose
    -------
    Calculate the mean absolute error.
    
    Parameters
    ----------
    c: A constant representing a summary statistic
    data: A numpy array of data values
    
    Returns
    -------
    The value of the mean absolute error.
    """
    ...

In [None]:
grader.check("q11")

<!-- BEGIN QUESTION -->

**Question 12.** In the cell below plot the mean absolute error for different `c` values on the `tips` dataset. Note that `theta_values` are given. Make sure to label the axes on your plot.


In [None]:
c_values = np.linspace(0, 6, 100)
mae = ...

plt.xlabel(r"Choice for $c$")
plt.ylabel(r"...")
plt.plot(..., ...);

<!-- END QUESTION -->

You should see that the plot looks somewhat similar the plot of the mean squared error. However, there are three key differences that we identified between the plots of the MSE and MAE.

1. The minimizing $c$ is different.

2. The plot for MAE increases linearly instead of quadratically as we move far away from the minimizing $c$.

3. The plot for MAE is piecewise linear instead of smooth. Each change in slope happens at the same $c$ value as a data point in our dataset.

<!-- BEGIN QUESTION -->

**Question 13.** To minimize the function, let's zoom in closer to the graph to get a better idea of the value of the minimizing `c`. Plot the mean absolute error again using the given `c_values` below. Don't forget to label your axes.

**Note:** You will need to generate the list of `mae` values again.


In [None]:
c_values = np.linspace(2.7, 3.02, 100)
mae = ...

plt.xlabel(r"Choice for $c$")
plt.ylabel(r"...")
plt.plot(..., ...);

<!-- END QUESTION -->

This time, observe that the function is piecewise linear and has a slope of zero near its minimum. Because of the large flat region at the minimum, there are multiple values of `c` that minimize the $L_1$ loss.

**Question 14.** Give a `theta` rounded to the nearest tenth that minimizes $L_1$ loss. By "rounded to the nearest tenth" we mean you'd say 7.6 instead of 7.55.


In [None]:
min_observed_mae = ...
min_observed_mae

In [None]:
grader.check("q14")

# 4. Find the Minimizing Value for the Tips Dataset 

**Question 15.** As before, we will use the `minimize` function to find a solution. Assign `min_abs_scipy` to the value of `c` that minimizes the MAE according to the `minimize` function for the `tips` data. 

**Note:** Depending on the `x0` value you specify, you will get different results.


In [None]:
def mean_absolute_error_with_one_variable(c):
    """
    Purpose
    -------
    Calculate the mean absolute error with one variable.
    
    Parameters
    ----------
    c: A constant representing a summary statistic 
    
    Returns
    -------
    The value of the mean absolute error.
    """
    ...

min_abs_scipy = minimize(..., x0 = ...)[...][...]

In [None]:
grader.check("q15")

Just like the MSE, there are three ways to compute the summary statistic `c` that minimizes the MAE:

1. Create a plot of the MAE for the given data array vs. `c` and eyeball a minimizing `c`.

2. Create a function that returns the MAE for a specific data array as a function of `c` and use the scipy `minimize` function to find an exact `c` which minimizes this function.

3. Simply compute the ... of the data array.

Try to figure out what to substitute in for the ... above. To this, try out various statistics functions provided by `np`. Click [here](https://docs.scipy.org/doc/numpy/reference/routines.statistics.html) to view the `numpy` documentation.

**Question 16.** Assign `min_abs_computed` to the correct summary statistic using method `3` from the previous problem.


In [None]:
min_abs_computed = ...
min_abs_computed

In [None]:
grader.check("q16")

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

When done exporting, download the .zip file by `SHIFT`-clicking on the file name and selecting **Save Link As**. Or, find the .zip file in the left side of the screen and right-click and select **Download**. You'll submit this .zip file for the assignment in Canvas to Gradescope for grading.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)