<a href="https://colab.research.google.com/github/ellylai/15-112-TP/blob/main/hw1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 10-315 S25 HW1: Nearest Mean Classification and Neural Networks

<img src="https://www.cs.cmu.edu/~10315/figures/nearest_mean_summary.png" width="400"/>
<img src="https://www.cs.cmu.edu/~10315/figures/autoencoder_network_summary.png" width="800"/>

# Table of Contents

## [Setup (cells to just run)](#setup)
## [Q0: Autograder introduction](#q0)
## [Q1: Nearest mean classification](#q1)
## [Q2: Neural Networks](#q2)

# Setup <a class="anchor" name="setup"></a>

You'll need to run these cells, but you don't have to worry about their contents. You can look through them if you'd like of course.

In [None]:
# Install otter-grader if needed

import importlib

if importlib.util.find_spec("otter") is None:
    !pip install otter-grader

In [None]:
# Copy additional files if needed

import os

if not os.path.isdir("tests") or not os.path.isdir("autoencoder"):
    !curl https://www.cs.cmu.edu/~10315/assignments/hw1_additional_files.zip --output hw1_additional_files.zip
    !unzip hw1_additional_files.zip

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re

import otter

In [None]:
grader = otter.Notebook()

## Functions to load data

You'll need to run these cells, but you don't have to worry about their contents. You can look through them if you'd like of course.

In [None]:
def load_dataset(filename, num_train):
    df = pd.read_csv(filename)

    # Pull label names out of label column header
    # Assuming the format is "Label (label1,label2,label3)"
    label_header = df.columns[0]
#     label_names_string = re.split(r'\(|\)', label_header)[1]
#     label_names = label_names_string.split(',')
    if '(' in label_header:
        label_names_string = re.split(r'\(|\)', label_header)[1]
        label_names = label_names_string.split(',')
    else:
        label_names = [label_header]

    feature_names = list(df.columns[1:])

    y_train = df.values[:num_train, 0]
    y_test = df.values[num_train:, 0]

    x_train = df.values[:num_train, 1:]
    x_test = df.values[num_train:, 1:]

    return x_train, y_train, x_test, y_test, label_names, feature_names

In [None]:
def load_animals_dataset():
    num_train = 60
    return load_dataset('http://www.cs.cmu.edu/~10315/data/animals1.csv', num_train)

In [None]:
def load_iris_dataset():
    num_train = 120
    return load_dataset('http://www.cs.cmu.edu/~10315/data/iris.csv', num_train)

In [None]:
def load_iris_dataset_two_features():
    x_train, y_train, x_test, y_test, label_names, feature_names = load_iris_dataset()
    # Grab only the first two columns
    x_train = x_train[:, :2]
    x_test = x_test[:, :2]
    return x_train, y_train, x_test, y_test, label_names, feature_names

In [None]:
# Including a "cache" trick to avoid re-reading this 70,000 image dataset every time we want to use it
cached_digit_dataset = None

def load_digit_dataset():
    global cached_digit_dataset

    # Check to see if we have already loaded this before
    if cached_digit_dataset is not None:
        return cached_digit_dataset

#     num_train = 60000
#     cached_digit_dataset = load_dataset('http://www.cs.cmu.edu/~10315/data/mnist.csv', num_train)
    num_train = 900
    cached_digit_dataset = load_dataset('http://www.cs.cmu.edu/~10315/data/mnist_1000.csv', num_train)

    return cached_digit_dataset

In [None]:
# Source:
#     https://archive-beta.ics.uci.edu/ml/datasets/metro+interstate+traffic+volume
#     Hogue, John. (2019). Metro Interstate Traffic Volume. UCI Machine Learning Repository.
def load_traffic_dataset():
    num_train = 300
    return load_dataset('http://www.cs.cmu.edu/~10315/data/Metro_Interstate_Traffic_Volume_weekday_hour_small.csv', num_train)

## Functions to plot data and display images

You'll need to run these cells, but you don't have to worry about their contents. You can look through them if you'd like of course.

In [None]:
# Bump up the default font size for matplotlib
plt.rcParams.update({'font.size': 8})

In [None]:
def plot_points(x_data, feature_names=None):
    label_colors = ["tab:blue", "tab:orange", "tab:green", "tab:red", "tab:purple", "tab:brown", "tab:pink", "tab:gray", "tab:olive", "tab:cyan"]
    color = label_colors[0]

    if feature_names is None:
        feature_names = ['Feature 1', 'Feature 2']

    plt.figure(figsize=(4,4))
    plt.plot(x_data[:, 0], x_data[:, 1], 'o', markersize=6,
             markerfacecolor="None", markeredgecolor=color)

    plt.xlabel(feature_names[0])
    plt.ylabel(feature_names[1])

In [None]:
def plot_labeled_points(x_data, y_data, label_names, feature_names=None):
    label_colors = ["tab:blue", "tab:orange", "tab:green", "tab:red", "tab:purple", "tab:brown", "tab:pink", "tab:gray", "tab:olive", "tab:cyan"]

    if feature_names is None:
        feature_names = ['Feature 1', 'Feature 2']

    num_labels = len(label_names)

    plt.figure(figsize=(4,4))
    for label in range(num_labels):
        # Numpy trick to get just the rows of x_data corresponding to rows of
        # y_data that equal the current label.
        # This will still have the same number of columns as x_data but only
        # a subset of the rows.
        x_data_subset = x_data[y_data == label]

        plt.plot(x_data_subset[:, 0], x_data_subset[:, 1], 'o', markersize=6,
                 markerfacecolor="None", markeredgecolor=label_colors[label], label=label_names[label])

    plt.legend()
    plt.xlabel(feature_names[0])
    plt.ylabel(feature_names[1])

In [None]:
def plot_mean_points(means, label_names=None, feature_names=None):
    label_colors = ["tab:blue", "tab:orange", "tab:green", "tab:red", "tab:purple", "tab:brown", "tab:pink", "tab:gray", "tab:olive", "tab:cyan"]

    if feature_names is None:
        feature_names = ['Feature 1', 'Feature 2']

    if label_names is None:
        num_labels = 1
    else:
        num_labels = len(label_names)

    for label in range(num_labels):
        if label_names is None:
            label_name = ""
        else:
            label_name = f'Mean {label_names[label]}'
        plt.plot(means[label][0], means[label][1], 's', markersize=6,
                 color=label_colors[label], label=label_name)

    if label_names is not None:
        plt.legend()
    plt.xlabel(feature_names[0])
    plt.ylabel(feature_names[1])

In [None]:
def plot_regression_points(x_data, y_data, x_label="Input", y_label="Output", fig=None, color="tab:blue", fill=False, label=None):
#     if fig is None:
#         fig = plt.figure(figsize=(4,4))

    if fill:
        fill_color = color
    else:
        fill_color = "None"

    plt.plot(x_data, y_data, 'o', markersize=5,
             markerfacecolor=fill_color, markeredgecolor=color, label=label)

    plt.xlabel(x_label)
    plt.ylabel(y_label)

In [None]:
def plot_line(x_data, y_data, x_label=None, y_label=None, fig=None, color="tab:green", label=None):
#     if fig is None:
#         fig = plt.figure(figsize=(4,4))

    plt.plot(x_data, y_data, '-', linewidth=2, color=color, label=label)

    if x_label is not None:
        plt.xlabel(x_label)
    if y_label is not None:
        plt.ylabel(y_label)

In [None]:
def plot_network_prediction(x_train, y_train, x_new,
                            network, w11, b1, w21, b2, w31, w32, b3,
                            x_label="Hour of day", y_label="Traffic volume",
                            x_min=0, x_max=24):
    fig = plt.figure(figsize=(5,4))

    if x_train is not None and y_train is not None:
        plot_regression_points(x_train, y_train, x_label=x_label, y_label=y_label, label="Training data")

    x_grid = np.linspace(x_min, x_max, 100)
    y_grid = np.zeros(len(x_grid))
    for i in range(len(x_grid)):
        y_grid[i] = network(x_grid[i], w11, b1, w21, b2, w31, w32, b3)

    plot_line(x_grid, y_grid, color="tab:green", label="Prediction", x_label=x_label, y_label=y_label)

    if x_new is not None:
        y_new = network(x_new, w11, b1, w21, b2, w31, w32, b3)
        plt.plot(x_new, y_new, 'o', color="tab:purple", markersize=8, label=f'Prediction x={x_new}')

    plt.legend()

In [None]:
def show_digit(x, y_meas=None, y_pred=None):
    plt.figure(figsize=(1,1))
    plt.imshow(x.reshape((28,28)), cmap='gray')
    plt.axis('off')

    if y_meas is not None:
        title = f'Label: {y_meas}'

        if y_pred is not None:
            title += f' Predicted: {y_pred}'

        plt.title(title)

# Question 0: Autograder introduction <a class="anchor" name="q0"></a>

Just a quick question so you can see the pattern to:
- Implement a function for a given question
- Run cells to test your code on your own
- Run `grader.check` to execute local tests for a question

Here is a simple function for you to implement. Fill this out to implement $x^2$.

*Note*: Don't forget to `return` the result.

In [None]:
def square(x):
    """ Compute the square of input value x

        Input:
        x: numerical value

        Returns: x squared as a numerical value
    """
    ...

Cells that you can run to make sure your code is working before running the local test.

Are the output values what you expect them to be?

In [None]:
x = 2
square(x)

In [None]:
x = -1
square(x)

In [None]:
z = 3
square(z)

In [None]:
a = -3
square(a)

Feel free to add as many more test/debug cells as you want! You can add new cells anywhere you want.

Once you finish testing and debugging with your cells above, you can run the local autograder test by running `grader.check(<test_name_string>)`. These will run the same "public" tests that you will see later when you submit your notebook to Gradescope.

In [None]:
grader.check("Q0")

### Submit your code to Gradescope early and often

There is no limit on the number of submissions to Gradescope, so as you complete parts of the assignment it is a really good idea to save your notebook and upload it to Gradescope.

Not all of the tests are included in the local autograder. Some of the tests are "hidden" and only run in the server autograder on Gradescope.

Before continuing with the rest of the assignment, go ahead and save your notebook (or click File->Download->Download .ipynb) and then upload your hw1.ipynb file to Gradescope under assignment HW1 (programming).

# Question 1: Nearest Mean Classification <a class="anchor" name="q1"></a>

<img src="https://www.cs.cmu.edu/~10315/figures/nearest_mean_summary.png" width="400"/>

## Q1: Table of Contents

* [Q1a: Computing the mean](#q1a)
* [Q1b: Compute the mean for each label](#q1b)
* [Q1c: Nearest mean classification](#q1c)
* [Q1d: Classification performance measure](#q1d)
* [Q1e: Putting it all together](#q1e)
* [Q1f: Exploring results](#q1f)

## Q1 Overview

In this question, we'll implement a modification of the standard nearest neighbor algorithm.

### Nearest Neighbor Classification

Standard nearest neighbor classification:
* Use all of our training data
* When predicting a label for a new point, compute the distances from the new point to *all training points*
* Return the label associated with the closest *training data point*

<img src="http://www.cs.cmu.edu/~10315/figures/nearest_process.png" width="850"/>

### Nearest Mean Classification

We first compute the mean point for each label in our training data:

<img src="https://www.cs.cmu.edu/~10315/figures/nearest_means.png" width="400"/>

Nearest mean classification:
* Use just the mean point for each label
* When predicting a label for a new point, compute the distances from the new point to *each mean*
* Return the label associated with the closest *mean* data point


<img src="https://www.cs.cmu.edu/~10315/figures/nearest_mean_process.png" width="850"/>

## Q1a: Computing the mean <a class="anchor" name="q1a"></a>

Compute the mean of N M-dimensional data points. To accomplish this we just take the mean for each dimension independently. <span style="color:red">Note:</span> You must use NumPy in this function; no loops allowed. Points will be taken off during manual grading (autograder would still give full points).

In the example below, there are N=4 2-dimensional points. The mean of feature 1 across all four points is 3.5 and the mean of feature 2 across all four points is 2.0:

$$\text{mean}_1 = \frac{1}{4}\big(2.0 + 3.0 + 4.0 + 5.0\big) = 3.5$$

$$\text{mean}_2 = \frac{1}{4}\big(3.0 + 3.0 + 1.0 + 1.0\big) = 2.0$$

Generically, for dataset $\mathcal{D} = \{\mathbf{x}^{(i)}\}_{i=1}^{N}$, where $\mathbf{x}^{(i)} \in \mathbb{R}^M$  $\forall i \in \{1, \dots, N\}$, the mean, $\boldsymbol{\mu} \in \mathbb{R}^M$ of the dataset $\mathcal{D}$ is:

$$\boldsymbol{\mu} = \frac{1}{N}\sum_{i=1}^N \mathbf{x}^{(i)}$$

<img src="https://www.cs.cmu.edu/~10315/figures/mean_example.png" width="400"/>

In [None]:
def compute_mean(x_data):
    """ Compute the mean of datapoints stored as rows in x_data

        Input:
        x_data: Numpy array with shape (N, M) where the rows are filled with N M-dimensional data points

        Returns: Numpy array with shape (M,) containing the mean of x_data
    """
    ...

Cells that you can run to make sure your code is working before running the local tests.

Are the results what you expect them to be?

Feel free to modify these or add more cells to help you test your code.

In [None]:
points = np.array([[5, 0.0],
                   [3, 0.0],
                   [2, 0.0],
                   [4, 0.0]])
mean = compute_mean(points)
plot_points(points)
plot_mean_points([mean])
mean

In [None]:
points = np.array([[5, 1],
                   [3, 3],
                   [2, 3],
                   [4, 1]])
mean = compute_mean(points)
plot_points(points)
plot_mean_points([mean])
plt.ylim(-0.1, 4.1)
mean

In [None]:
points = np.array([[5, -2.0],
                   [3, -3.0],
                   [2, -1.0],
                   [4, -4.0]])
mean = compute_mean(points)
plot_points(points)
plot_mean_points([mean])
mean

In [None]:
points = np.array([[2.8, 2.7],
                   [3.6, 1.2],
                   [4.0, 1.9],
                   [3.7, 4.9],
                   [3.1, 3.4],
                   [3.0, 2.6],
                   [3.0, 1.8],
                   [3.4, 1.6],
                   [3.3, 1.7],
                   [3.3, 1.3]])

mean = compute_mean(points)
plot_points(points)
plot_mean_points([mean])
mean

In [None]:
# Load the iris dataset (with just two features)
x_train, y_train, x_test, y_test, label_names, feature_names = load_iris_dataset_two_features()

mean = compute_mean(x_train)
plot_points(x_train, feature_names=feature_names)
plot_mean_points([mean], feature_names=feature_names)
mean

### More than two features

If you implemented `compute_mean` function above correctly, no changes needed!! Your code should work on these example cells also!

In [None]:
points = np.array([[5, 0.0, 0.0, 0.0],
                   [3, 0.0, 0.0, 0.0],
                   [2, 0.0, 0.0, 0.0],
                   [4, 0.0, 0.0, 0.0]])
mean = compute_mean(points)
mean

In [None]:
points = np.array([[5,  0.0, 2.0, 100.0],
                   [3, -1.0, 2.0, 101.0],
                   [2,  0.0, 2.0, 102.0],
                   [4, -1.0, 2.0, 103.0]])
mean = compute_mean(points)
mean

In [None]:
x_train, y_train, x_test, y_test, label_names, feature_names = load_iris_dataset()
print(x_train.shape)
mean = compute_mean(x_train)
mean

### Mean for images

If you implemented `compute_mean` function above correctly, no changes needed!! Your code should work on these example cells also!

Load MNIST handwritten digit dataset and display the first four images

In [None]:
# This may take a minute or so

x_train, y_train, x_test, y_test, label_names, feature_names = load_digit_dataset()
print('x_train.shape:', x_train.shape)
print('y_train.shape:', y_train.shape)
print('x_test.shape:', x_test.shape)
print('y_test.shape:', y_test.shape)
print('label_names:', label_names)

In [None]:
print(x_train[0].shape)
print(x_train[0])
show_digit(x_train[0], y_train[0])

In [None]:
show_digit(x_train[1], y_train[1])

In [None]:
show_digit(x_train[2], y_train[2])

In [None]:
show_digit(x_train[3], y_train[3])

Average all of the zeros and then average all of the fours. Your `compute_mean` function should work on these example cells also.

In [None]:
# Numpy trick to get just the rows of x_data corresponding to rows of
# y_data that equal the current label.
# This will still have the same number of columns as x_data but only
# a subset of the rows.
label = 0
x_train_only_zeros = x_train[y_train == label]
print('x_train_only_zeros.shape:', x_train_only_zeros.shape)

mean = compute_mean(x_train_only_zeros)
print('mean.shape:', mean.shape)

In [None]:
show_digit(mean, label)

In [None]:
# Numpy trick to get just the rows of x_data corresponding to rows of
# y_data that equal the current label.
# This will still have the same number of columns as x_data but only
# a subset of the rows.
label = 4
x_train_only_fours = x_train[y_train == label]
print('x_train_only_fours.shape:', x_train_only_fours.shape)

mean = compute_mean(x_train_only_fours)
print('mean.shape:', mean.shape)

In [None]:
show_digit(mean, label)

For fun, we could average all of the zero and one digits :) Your `compute_mean` function should work on this example cell also.

In [None]:
# Get all x data for labels <= 1
x_train_zeros_ones = x_train[y_train <= 1]


mean = compute_mean(x_train_zeros_ones)
show_digit(mean)

### Run the local autograder tests

In [None]:
grader.check("Q1a")

### Again, remember to keep submitting to Gradescope as you go.

This will help you to make sure you are passing the hidden tests as well as the local tests. It also is a good reminder to save your work and make sure you officially collect points in Gradescope. You wouldn't want to finish all of your work and then realize at the last minute that there was something wrong with Q1 that caused problems for your whole assignment in Gradescope.

## Q1b: Compute the mean for each label <a class="anchor" name="q1b"></a>

To build our model for nearest mean classification, we first need to compute the mean for each label.

For each label:
1. Collect all of the input points with that label
2. Compute the mean for that subset of input points

<img src="https://www.cs.cmu.edu/~10315/figures/nearest_means.png" width="400"/>

<span style="color:red">Note:</span> Looping over the number of labels is acceptable. Other than that, you must use NumPy in this function rather than loops. Points will be taken off during manual grading (autograder would still give full points).

In [None]:
def compute_mean_for_each_label(x_data, y_data, num_labels):
    """ For each label value from 0 to num_labels-1, compute the mean of datapoints in the rows of x_data
        that have corresponding entries in y_data

        Input:
        x_data: Numpy array with shape (N, M) where the rows are filled with N M-dimensional data points
        y_data: Numpy array with shape (N,). The label value for each of the N points in x_data

        Returns: Return a list of length num_labels, where i-th entry in the list is a Numpy array with
                 shape (M,) containing the mean of datapoints in x_data with label equal to i.
    """
    ...

Cells that you can run to make sure your code is working before running the local tests.

Are the results what you expect them to be?

Feel free to modify these or add more cells to help you test your code.

In [None]:
x_data = np.array([[8, 10],
                   [4, 22],
                   [6, 20],
                   [9, 12]])
y_data = np.array([1, 0, 0, 1])

num_labels = 2
label_names = ['A', 'B']

means = compute_mean_for_each_label(x_data, y_data, num_labels)

plot_labeled_points(x_data, y_data, label_names)
plot_mean_points(means, label_names)
means

In [None]:
x_train, y_train, x_test, y_test, label_names, feature_names = load_animals_dataset()

num_labels = len(label_names)

means = compute_mean_for_each_label(x_train, y_train, num_labels)

plot_labeled_points(x_train, y_train, label_names)
plot_mean_points(means, label_names)
means

In [None]:
x_train, y_train, x_test, y_test, label_names, feature_names = load_iris_dataset_two_features()

num_labels = len(label_names)

means = compute_mean_for_each_label(x_train, y_train, num_labels)

plot_labeled_points(x_train, y_train, label_names)
plot_mean_points(means, label_names)
means

In [None]:
# Load iris dataset with all four features
x_train, y_train, x_test, y_test, label_names, feature_names = load_iris_dataset()

num_labels = len(label_names)

means = compute_mean_for_each_label(x_train, y_train, num_labels)

print('Features names:', feature_names)
for label in range(num_labels):
    print(f'Mean {label_names[label]}: {means[label]}')

In [None]:
x_train, y_train, x_test, y_test, label_names, feature_names = load_digit_dataset()

In [None]:
num_labels = len(label_names)

means = compute_mean_for_each_label(x_train, y_train, num_labels)

for label in range(num_labels):
    show_digit(means[label], label)

### Run the local autograder tests for this question

In [None]:
grader.check("Q1b")

## Q1c: Nearest mean classification <a class="anchor" name="q1c"></a>

Given a list of mean points per label and a new point, predict the label for the new point.

1. Find the closest mean by computing the distances from the new point to each mean
2. Return the label (index of the list of means) of the closest mean

<img src="https://www.cs.cmu.edu/~10315/figures/nearest_mean_process.png" width="750"/>

### Distance function in higher dimensions

We may be used to seeing the distance function in 2-D:

$$dist\left(\mathbf{u}, \mathbf{v}\right) = \sqrt{\left(u_1 - v_1\right)^2 + \left(u_2 - v_2\right)^2}$$

The distance function in 3-D is similarly:

$$dist\left(\mathbf{u}, \mathbf{v}\right) = \sqrt{\left(u_1 - v_1\right)^2 + \left(u_2 - v_2\right)^2  + \left(u_3 - v_3\right)^2} = \left(\sum_{i=1}^3 \left(u_i - v_i\right)^2\right)^\frac{1}{2}$$

The distance function in N-D just continues this pattern:

$$dist\left(\mathbf{u}, \mathbf{v}\right) = \left(\sum_{i=1}^N \left(u_i - v_i\right)^2\right)^\frac{1}{2}$$

We can also write this using linear algebra with vector subtraction and the L-2 norm:

$$dist\left(\mathbf{u}, \mathbf{v}\right) = \left\|\mathbf{u} - \mathbf{v}\right\|_2$$

where the L-2 norm is defined as $\left\|\mathbf{z}\right\|_2 = \left(\sum_{i=1}^N {z_i}^2\right)^\frac{1}{2}$.


### Implement predict_nearest_mean

<span style="color:red">Note:</span> Looping over the number of labels is acceptable. Other than that, you must use NumPy in these function rather than loops. Points will be taken off during manual grading (autograder would still give full points).

In [None]:
# It may be helpful to implement and use this multidimensional dimensional distance function
#
# Note: We are using the letter M as the number of dimensions (rather than N) to start
# getting used to our convention of using M to represent the number of features (of a
# single point) and N for the number of data points)
def distance(point1, point2):
    """ Return the distance between point1 and point2

        Input:
        point1: Numpy array with shape (M,)
        point2: Numpy array with shape (M,)

        Returns: distance as a single number
    """

    ...

In [None]:
def predict_nearest_mean(means_for_each_label, x_new):
    """ Determine which mean in means_for_each_label is closest to x_new
        and return the associated label

        For example, if the means_for_each_label[2] is the closest to x_new,
        then this function should return 2

        Input:
        means_for_each_label: list of length K, where each entry in the list
            is a Numpy array with shape (M,) representing the mean point for
            each label class. K is the number of possible labels and M is the
            number of features in the data.
        x_new: Numpy array with shape (M,)

        Returns:
        Best label as an integer between 0 and K-1, inclusively
    """

    ...

Cells that you can run to make sure your code is working before running the local tests.

Are the results what you expect them to be?

Feel free to modify these or add more cells to help you test your code.

In [None]:
means = [np.array([0, 0]),
         np.array([1, 0]),
         np.array([2, 0])]

new_point = np.array([0, 1])

predicted_label = predict_nearest_mean(means, new_point)
predicted_label

In [None]:
# Visualize the above results
plt.figure(figsize=(4,4))
plot_mean_points(means, label_names=[0,1,2])
plt.plot(new_point[0], new_point[1], 'm*', markersize=10, label=f"New predicted as {predicted_label}")
plt.legend()

In [None]:
means = [np.array([0, 0]),
         np.array([0, 1]),
         np.array([0, 2])]

new_point = np.array([1, 2])

predicted_label = predict_nearest_mean(means, new_point)
predicted_label

In [None]:
# Visualize the above results
plt.figure(figsize=(4,4))
plot_mean_points(means, label_names=[0,1,2])
plt.plot(new_point[0], new_point[1], 'm*', markersize=10, label=f"New predicted as {predicted_label}")
plt.legend()

In [None]:
x_train, y_train, x_test, y_test, label_names, feature_names = load_animals_dataset()

# Skip the training and define our own means
means = np.array([[13.4,  8.0],
       [28.0, 22.9],
       [25.5, 10.6],
       [10.5,  4.7],
       [17.6,  6.4]])

# Pick one test point
test_index = 0
new_point = x_test[test_index]

predicted_label = predict_nearest_mean(means, new_point)
predicted_label

In [None]:
# Visualize the above results
plt.figure(figsize=(4,4))
plot_mean_points(means, label_names)
plt.plot(new_point[0], new_point[1], 'm*', markersize=10, label=f"New predicted as {label_names[predicted_label]}")
plt.legend()

In [None]:
x_train, y_train, x_test, y_test, label_names, feature_names = load_iris_dataset_two_features()

# Skip the training and define our own means
means = np.array([[5.0, 3.4],
                  [5.9, 2.8],
                  [6.6, 3.0]])

# Pick one test point
test_index = 3
new_point = x_test[test_index]

predicted_label = predict_nearest_mean(means, new_point)
predicted_label

In [None]:
# Visualize the above results
plt.figure(figsize=(4,4))
plot_mean_points(means, label_names)
plt.plot(new_point[0], new_point[1], 'm*', markersize=10, label=f"New predicted as {label_names[predicted_label]}")
plt.legend()

In [None]:
x_train, y_train, x_test, y_test, label_names, feature_names = load_digit_dataset()

means = compute_mean_for_each_label(x_train, y_train, num_labels)

# Pick one test point
test_index = 0
new_point = x_test[test_index]

predicted_label = predict_nearest_mean(means, new_point)

measured_label = y_test[test_index]

show_digit(new_point, y_meas=label_names[measured_label], y_pred=label_names[predicted_label])

In [None]:
x_train, y_train, x_test, y_test, label_names, feature_names = load_digit_dataset()

means = compute_mean_for_each_label(x_train, y_train, num_labels)

# Pick one test point
test_index = 1
new_point = x_test[test_index]

predicted_label = predict_nearest_mean(means, new_point)

measured_label = y_test[test_index]

show_digit(new_point, y_meas=label_names[measured_label], y_pred=label_names[predicted_label])

### Run the local autograder tests for this question

In [None]:
grader.check("Q1c")

## Q1d: Classification performance measure <a class="anchor" name="q1d"></a>

Using NumPy here is convenient and good practice, but it isn't required.

In [None]:
def classification_error_rate(y_meas, y_pred):
    """ Calculate the fraction of times the entries the y_meas array differ from the
        entries in the y_pred array.

        Input:
        y_meas: Numpy array with shape (N,)
        y_pred: Numpy array with shape (N,)

        Return: error rate as a numerical value from 0.0 to 1.0
    """
    ...

In [None]:
def classification_accuracy(y_meas, y_pred):
    """ Calculate the fraction of times the entries the y_meas array are the same as the
        entries in the y_pred array.

        Input:
        y_meas: Numpy array with shape (N,)
        y_pred: Numpy array with shape (N,)

        Return: accuracy as a numerical value from 0.0 to 1.0
    """
    ...

Cells that you can run to make sure your code is working before running the local tests.

Are the results what you expect them to be?

Feel free to modify these or add more cells to help you test your code.

In [None]:
y_meas = np.array([0, 0, 0, 0])
y_pred = np.array([1, 1, 1, 1])
classification_error_rate(y_meas, y_pred)

In [None]:
y_meas = np.array([1, 1, 1, 1])
y_pred = np.array([1, 1, 1, 1])
classification_error_rate(y_meas, y_pred)

In [None]:
y_meas = np.array([0, 0, 0, 0])
y_pred = np.array([1, 1, 1, 1])
classification_error_rate(y_meas, y_pred)

In [None]:
y_meas = np.array([1, 1, 1, 1])
y_pred = np.array([0, 1, 1, 1])
classification_error_rate(y_meas, y_pred)

In [None]:
y_meas = np.array([1, 1, 1, 1, 0, 0])
y_pred = np.array([1, 1, 0, 1, 1, 1])
classification_error_rate(y_meas, y_pred)

In [None]:
y_meas = np.array([0, 0, 0, 0])
y_pred = np.array([1, 1, 0, 1])
classification_error_rate(y_meas, y_pred)

In [None]:
y_meas = np.array([0, 0, 0, 0])
y_pred = np.array([1, 1, 0, 1])
classification_accuracy(y_meas, y_pred)

In [None]:
y_meas = np.array([0, 0, 0, 0])
y_pred = np.array([0, 0, 0, 0])
classification_accuracy(y_meas, y_pred)

In [None]:
y_meas = np.array([2, 2, 0, 1])
y_pred = np.array([2, 0, 4, 2])
classification_accuracy(y_meas, y_pred)

### Run the local autograder tests for this question

In [None]:
grader.check("Q1d")

## Q1e Putting it all together <a class="anchor" name="q1e"></a>

### Provided functions

The `model = train(x_train, y_train, num_labels)` function has been provided for you.

We recommend that you look at this function to see exactly what we mean by "train" in this context.

In [None]:
# Given
# Do NOT change
def train(x_train, y_train, num_labels):
    """ Use training data x_train and y_train to train a nearest mean model.
        The returned model is simply a list of means, one for each label.

        Input:
        x_train: Numpy array with shape (N, M) where the rows are filled with N M-dimensional data points
        y_train: Numpy array with shape (N,) The label value for each of the N points in x_train

        Returns: Return a list of length num_labels, where i-th entry in the list is a Numpy array with
                 shape (M,) containing the mean of datapoints in x_train with label equal to i.
    """
    means_for_each_label = compute_mean_for_each_label(x_train, y_train, num_labels)

    return means_for_each_label

### Implement train_predict_and_measure_performance

In [None]:
# Helper function to predict many points
# A loop over points is acceptible here
def predict_all(means_for_each_label, x_data):
    """ For each input point in x_data, predict the output label y using
        predict_nearest_mean and the provided list of means for each label

        Input:
        means_for_each_label: a list of length K, where each entry in the list
            is a Numpy array with shape (M,) representing the mean point for
            each label class. K is the number of possible labels and M is the
            number of features in the data.
        x_data: Numpy array with shape (N, M) where the rows are filled with N M-dimensional data points

        Returns: Return Numpy array with shape (N,) The predicted label value for each of the N points in x_data
    """

    ...

In [None]:
def train_predict_and_measure_performance(x_train, y_train, x_test, y_test, label_names, feature_names):
    """ Put all the steps together to create and evaluate a machine learning system

        1. Given the training data x_train and y_train, train your nearest neighbor model
        2. Compute the classification accuracy on the training data
        3. Compute the classification accuracy on the test data

        Input:
        x_train: Numpy array with shape (N_train, M) where the rows are filled with N_train M-dimensional data points
        y_train: Numpy array with shape (N_train,). The label value for each of the N_train points in x_train
        x_test: Numpy array with shape (N_test, M) where the rows are filled with N_test M-dimensional data points
        y_test: Numpy array with shape (N_test,). The label value for each of the N_test points in x_test
        label_names: list of strings where the i-th string is the label name for class i. This is only
                     used for plotting
        feature_names: list of strings where the i-th string is the feature name for the i-th features. This is only
                       used for plotting

        Returns: (training_accuracy, testing_accuracy). These two values are both numbers between
                 zero and one (they could also be exactly zero or one).
    """

    ...


    # Code to plot data and means. You can use this to visualize data and/or debug.
    # Uncomment the code below to use it
    #
    # Note the variable "means" is not defined. You would have to define it above if
    # you want to use it.

    # Plotting the data only really makes sense if you only two features
#     if len(feature_names) == 2:
#         plot_labeled_points(x_train, y_train, label_names, feature_names)
#         plot_mean_points(means, label_names, feature_names)


    # Don't forget to return the training and testing accuracy
    ...

Cells that you can run to make sure your code is working before running the local tests.

Feel free to modify these or add more cells to help you test your code.

Change the dataset to see if your code works for all of them.

In [None]:
x_train, y_train, x_test, y_test, label_names, feature_names = load_animals_dataset()
# x_train, y_train, x_test, y_test, label_names, feature_names = load_iris_dataset_two_features()
# x_train, y_train, x_test, y_test, label_names, feature_names = load_iris_dataset()
# x_train, y_train, x_test, y_test, label_names, feature_names = load_digit_dataset()

In [None]:
train_accuracy, test_accuracy = train_predict_and_measure_performance(x_train, y_train, x_test, y_test, label_names, feature_names)

print('Performance on training dataset:')
print(f'    accuracy: {train_accuracy: 0.3f}')

print('Performance on test dataset:')
print(f'    accuracy: {test_accuracy: 0.3f}')

### Run the local autograder tests for this question

In [None]:
grader.check("Q1e")

## Q1f Exploring results <a class="anchor" name="q1f"></a>

One of the most import skills used to improve your machine learning models in practice is to explore the predicted results of your model. It is particularly useful to look at the examples that your model got wrong.

Using a for loop over test points is acceptable here.

In [None]:
def find_first_misclassified_datapoint(means_for_each_label, x_test, y_test, correct_label):
    """ Loop through the test data points in order and return the index
        of the first test point that you find where the true label is
        correct_label but your nearest means function predicts some other
        label.

        You have already been given the means for each label, so there is no need
        to train your model.

        For example, return the number 10 if 10 is the first index where both
        of the following are true:
        -- y_test[10] is the same as correct_label
        -- The predicted label for input x_test[10] does not equal y_test[10]

        Input:
        means_for_each_label: a list Numpy array means, where i-th entry in the list is a Numpy array with
                shape (M,) containing the mean of datapoints with label equal to i. The length of the
                list corresponds to the number of possible labels.

        x_test: Numpy array with shape (N, M) where the rows are filled with N M-dimensional data points
        y_test: Numpy array with shape (N,). The label value for each of the N points in x_test
        correct_label: The label that you want to find the first misclassified point for. This is an integer
                       between 0 and K-1, inclusively, where K is the number of possible labels.

        Return: first_index, y_pred. Where first_index is the integer index for the first incorrectly
                classified data point with the correct label equal to correct_label and y_pred is your
                predicted label for input x_test[first_index].
                Return (-1, -1) if all points with the correct label equal to correct_label are classified
                correctly.
    """
    ...

In [None]:
x_train, y_train, x_test, y_test, label_names, feature_names = load_animals_dataset()
means = train(x_train, y_train, len(label_names))

# Find the first datapoint the was incorrectly predicted as a Beaver (label 1)
correct_label = 1
correct_label_name = label_names[correct_label] # Beaver

first_index, y_pred = find_first_misclassified_datapoint(means, x_test, y_test, correct_label)

if first_index == -1:
    print(f"Could not find a misclassified point for correct label {correct_label}")
else:
    print(f"Test point {first_index} should be {label_names[correct_label]} but was predicted to be {label_names[y_pred]}.")

    plt.figure(figsize=(4,4))
    plot_mean_points(means, label_names, feature_names)
    plt.plot(x_test[first_index, 0], x_test[first_index,1], 'kx', markersize=8, markeredgewidth=2, label=f"Error {first_index}")
    plt.legend()

In [None]:
x_train, y_train, x_test, y_test, label_names, feature_names = load_animals_dataset()
means = train(x_train, y_train, len(label_names))

for correct_label in range(len(label_names)):
    first_index, y_pred = find_first_misclassified_datapoint(means, x_test, y_test, correct_label)
    if first_index == -1:
        print(f"Could not find a misclassified point for correct label {correct_label}")
    else:
        print(f"Test point {first_index} should be {label_names[correct_label]} but was predicted to be {label_names[y_pred]}.")

        plt.figure(figsize=(4,4))
        plot_mean_points(means, label_names, feature_names)
        plt.plot(x_test[first_index, 0], x_test[first_index,1], 'kx', markersize=8, markeredgewidth=2, label=f"Error {first_index}")
        plt.legend()

In [None]:
x_train, y_train, x_test, y_test, label_names, feature_names = load_iris_dataset_two_features()
means = train(x_train, y_train, len(label_names))

for correct_label in range(len(label_names)):
    first_index, y_pred = find_first_misclassified_datapoint(means, x_test, y_test, correct_label)
    if first_index == -1:
        print(f"Could not find a misclassified point for correct label {correct_label}")
    else:
        print(f"Test point {first_index} should be {label_names[correct_label]} but was predicted to be {label_names[y_pred]}.")

        plt.figure(figsize=(4,4))
        plot_mean_points(means, label_names, feature_names)
        plt.plot(x_test[first_index, 0], x_test[first_index,1], 'kx', markersize=8, markeredgewidth=2, label=f"Error {first_index}")
        plt.legend()

In [None]:
x_train, y_train, x_test, y_test, label_names, feature_names = load_digit_dataset()
means = train(x_train, y_train, len(label_names))

for correct_label in range(len(label_names)):
    first_index, y_pred = find_first_misclassified_datapoint(means, x_test, y_test, correct_label)
    if first_index == -1:
        print(f"Could not find a misclassified point for correct label {correct_label}")
    else:
        print(f"Test point {first_index} should be {label_names[correct_label]} but was predicted to be {label_names[y_pred]}.")

        show_digit(x_test[first_index], correct_label, y_pred)

### Run the local autograder tests for this question

In [None]:
grader.check("Q1f")

### Links back to questions

* [Q0: Autograder introduction](#q0)
* [Q1a: Computing the mean](#q1a)
* [Q1b: Compute the mean for each label](#q1b)
* [Q1c: Nearest mean classification](#q1c)
* [Q1d: Classification performance measure](#q1d)
* [Q1e: Putting it all together](#q1e)
* [Q1f: Exploring results](#q1f)

# Question 2: Neural Networks <a class="anchor" name="q2"></a>

<img src="https://www.cs.cmu.edu/~10315/figures/digit_network_summary.png" width="600"/>

## Q2 Table of Contents

* [Q2a: Three neuron network](#q2a)
* [Q2b: 28x28 image classification network](#q2b)
* [Q2c: 28x28 image autoencoder network](#q2c)

## Q2 Overview

Neural networks are made up of layers of neurons that transform the input data into a predicted output.

In this question, we'll wire up two different neural networks: a three neuron (7 parameter) network to predict the intensity of Minneapolis traffic given the time of day (below) and a 210 neuron (159,010 parameter) network to predict the class label of a 28x28 input image (above).

Results of three neuron network:
<img src="https://www.cs.cmu.edu/~10315/figures/traffic_data_network_summary.png" width="400"/>

## Q2a: Three-neuron network <a class="anchor" name="q2a"></a>

Implement the `network3` function to match the following three-neuron network diagram, where ReLU function is simply the max of zero and the input, $a = \max(0, z)$:

<img src="https://www.cs.cmu.edu/~10315/figures/three_neuron_diagram.png" width="800"/>

Note: These are all scalar values, not vectors or matrices.

No need for NumPy here. We'll get to that in a second!

In [None]:
def network3(x, wA, bA, wB, bB, wC, wD, bC):
    """ Compute the predicted output y given the input x and the network parameters
        Note: These are all scalar values

        Input:
        x: numerical value of the input
        wA: numerical value of the weight from the input to the first neuron in the hidden layer
        bA: numerical value of the bias associated with the first neuron in the hidden layer
        wB: numerical value of the weight from the input to the second neuron in the hidden layer
        bB: numerical value of the bias associated with the second neuron in the hidden layer
        wC: numerical value of the weight from the first neuron in the hidden layer to the output neuron
        wD: numerical value of the weight from the second neuron in the hidden layer to the output neuron
        bC: numerical value of the bias associated with the output neuron

        Returns: numerical value of the predicted output
    """
    ...

Cells that you can run to make sure your code is working before running the local tests.

Are the results what you expect them to be?

Feel free to modify these or add more cells to help you test your code.

In [None]:
wA = 1
bA = 0
wB = 0
bB = 0
wC = 1
wD = 1
bC = 0

x_new = 8
y_new_pred = network3(x_new, wA, bA, wB, bB, wC, wD, bC)

plot_network_prediction(None, None, x_new,
                        network3, wA, bA, wB, bB, wC, wD, bC,
                        x_label="Input", y_label="Output",
                        x_min=-10, x_max=10)
y_new_pred

In [None]:
wA = 0
bA = 0
wB = 1
bB = 0
wC = 1
wD = 1
bC = 0

x_new = -8
y_new_pred = network3(x_new, wA, bA, wB, bB, wC, wD, bC)

plot_network_prediction(None, None, x_new,
                        network3, wA, bA, wB, bB, wC, wD, bC,
                        x_label="Input", y_label="Output",
                        x_min=-10, x_max=10)
y_new_pred

In [None]:
wA = 1
bA = -5
wB = -2
bB = 5
wC = 1
wD = -1
bC = 10

x_new = -8
y_new_pred = network3(x_new, wA, bA, wB, bB, wC, wD, bC)

plot_network_prediction(None, None, x_new,
                        network3, wA, bA, wB, bB, wC, wD, bC,
                        x_label="Input", y_label="Output",
                        x_min=-10, x_max=10)
y_new_pred

In [None]:
#@title Draw custom inputs { run: "auto"}
wA = 1 #@param {type:"slider", min:-5, max:5, step:0.1}
bA = 0 #@param {type:"slider", min:-10, max:10, step:0.1}
wB = -1 #@param {type:"slider", min:-5, max:5, step:0.1}
bB = 0 #@param {type:"slider", min:-10, max:10, step:0.1}
wC = 1 #@param {type:"slider", min:-5, max:5, step:0.1}
wD = 1 #@param {type:"slider", min:-5, max:5, step:0.1}
bC = 0 #@param {type:"slider", min:-10, max:10, step:0.1}
x_new = 0  #@param {type:"slider", min:-10, max:10, step:0.5}

y_new_pred = network3(x_new, wA, bA, wB, bB, wC, wD, bC)

plot_network_prediction(None, None, x_new,
                        network3, wA, bA, wB, bB, wC, wD, bC,
                        x_label="Input", y_label="Output",
                        x_min=-10, x_max=10)
y_new_pred

In [None]:
# Load the traffic dataset
x_train, y_train, x_test, y_test, label_names, feature_names = load_traffic_dataset()

wA = 0.4
bA = -5.9
wB = -0.6
bB = 5.2
wC = -1
wD = -1
bC = 5.1

x_new = 8
y_new_pred = network3(x_new, wA, bA, wB, bB, wC, wD, bC)

plot_network_prediction(x_train, y_train, x_new,
                        network3, wA, bA, wB, bB, wC, wD, bC)
y_new_pred

### Run the local autograder tests

In [None]:
grader.check("Q2a")

### Again, remember to keep submitting to Gradescope as you go.

This will help you to make sure you are passing the hidden tests as well as the local tests. It also is a good reminder to save your work and make sure you officially collect points in Gradescope. You wouldn't want to finish all of your work and then realize at the last minute that there was something wrong with Q1 that caused problems for your whole assignment in Gradescope.

## Q2b: 28x28 Image Classification Network <a class="anchor" name="q2b"></a>

In this question, we'll implement a two-layer neural network that will be able to classify 28x28 hand-written digits with 96% accuracy.

<img src="https://www.cs.cmu.edu/~10315/figures/digit_network_summary.png" width="600"/>

As a quick preview, the network function that you will implement is: `network(x, layer1_w, layer1_b, layer2_w, layer2_b)`, which will take in the input image, `x`, and return 10 values indicating the strength of the prediction for each class, a value for 0-9 in the case of digits.

### Scaling up our network implementation using linear algebra

Notice that in our original three neuron network, there are actually multiple linear functions: two with a 1-D input and one with a 2-D input.

<img src="https://www.cs.cmu.edu/~10315/figures/three_neuron_diagram.png" width="800"/>

It would be really nice if we could use linear algebra to generalize these as linear functions with multiple inputs and multiple outputs:

<img src="https://www.cs.cmu.edu/~10315/figures/three_neuron_diagram_layers.png" width="800"/>

The diagram above is now a network that is exactly the same diagram for our three neuron network and for the digit classification network below! The only difference is in the size various vectors and matrices involved. For the digit classification network, here are the sizes:

$$\begin{align*}
\mathbf{x} &\in \mathbb{R}^{784} &
W_1 &\in \mathbb{R}^{200\times 784} &
\mathbf{z} &\in \mathbb{R}^{200} &
\mathbf{a} &\in \mathbb{R}^{200} &
W_2 &\in \mathbb{R}^{10\times 200} &
\mathbf{y}_{pred} &\in \mathbb{R}^{10} \\
& &
\mathbf{b}_1 &\in \mathbb{R}^{200} &
& &
& &
\mathbf{b}_2 &\in \mathbb{R}^{10} &
\end{align*}$$


<img src="https://www.cs.cmu.edu/~10315/figures/digit_network_summary.png" width="600"/>

### Parameters: Weights and biases

We've already trained this image classification network using PyTorch: https://www.kaggle.com/code/justuser/mnist-with-pytorch-fully-connected-network/notebook We've saved the resulting weight (W) and bias (b) values for both linear layers.

We'll pass these weights (W) and biases (b) into your `network(x, linear1_w, linear1_b, linear2_w, linear2_b)` function.

For the pretrained network we've stored these weights and bias values in \*.csv files:
- http://cs.cmu.edu/~10315/data/mnist_layer1_weights.csv
- http://cs.cmu.edu/~10315/data/mnist_layer1_biases.csv
- http://cs.cmu.edu/~10315/data/mnist_layer2_weights.csv
- http://cs.cmu.edu/~10315/data/mnist_layer2_biases.csv

<!-- We'll then read those \*.csv files into NumPy arrays using `np.loadtxt(filename, delimiter=',')`. The format of these NumPy arrays will be as follows:

#### Layer 1 weights
Shape: (200, 784)

The *first* row of this weight array contains the 784 weights to be applied to the 784 input pixels for the *first* neuron in layer 1.

The *k-th* row of this weight array contains the 784 weights to be applied to the 784 input pixels for the *k-th* neuron in layer 1.

#### Layer 1 bias
Shape: (200,)

The *first* value in this 1-D array contains the bias value to be applied to the *first* neuron in layer 1.

The *k-th* value in this 1-D array contains the bias value to be applied to the *k-th* neuron in layer 1.

#### Layer 2 weights
Shape: (10, 200)

The *first* row of this weight array contains the 200 weights to be applied to the 200 input values for the *first* neuron in layer 2.

The *k-th* row of this weight array contains the 200 weights to be applied to the 200 input values for the *k-th* neuron in layer 2.

#### Layer 2 bias
Shape: (10,)

The *first* value in this 1-D array contains the bias value to be applied to the *first* neuron in layer 2.

The *k-th* value in this 1-D array contains the bias value to be applied to the *k-th* neuron in layer 2. -->

### Implementation

<span style="color:red">Note:</span> You must use NumPy in these functions; no loops allowed. Points will be taken off during manual grading (autograder would still give full points).

In [None]:
def linear(x, W, b):
    """ Generic linear layer where the input x is the input vector to this layer (note: not necessarily
        the input to the whole network, which we also tend to call x).
        Input:
        x: Numpy array with shape (num_in,)
        W: Numpy array with shape (num_out, num_in)
        b: Numpy array with shape (num_out,)

        Returns: Numpy array with shape (num_out,)
    """

    ...

def relu(z):
    """ ReLU (rectified linear unit) that returns the max of zero and the input value for each entry
        input vector.
        Input:
        z: Numpy array with shape (num_in,)

        Returns: Numpy array with shape (num_in,)
    """

    ...

def network(x, linear1_w, linear1_b, linear2_w, linear2_b):
    """ Return the result of passing input values x through neural network given the weight and bias parameters
        for the two linear layers.

        Input:
        x: Numpy array with shape (784,)
        linear1_w: Numpy array with shape (200, 784) containing the weights for linear layer 1
        linear1_b: Numpy array with shape (200,) containing the bias values for linear layer 1
        linear_w: Numpy array with shape (10, 200) containing the weights for linear layer 2
        linear2_b: Numpy array with shape (10,) containing the bias values for linear layer 2

        Returns: Numpy array with shape (10,) containing the output of linear layer 2
    """

    ...

Cells that you can run to make sure your code is working before running the local tests.

Are the results what you expect them to be?

Feel free to modify these or add more cells to help you test your code.

Let's jump right in and test with digits. There are some simpler test cases below to help you debug anything that might be going wrong

In [None]:
# This will take a minute or so the first time you run it
x_train, y_train, x_test, y_test, label_names, feature_names = load_digit_dataset()
print(x_test.shape)

In [None]:
layer1_weights = np.loadtxt("http://cs.cmu.edu/~10315/data/mnist_layer1_weights.csv", delimiter=',')
layer1_biases = np.loadtxt("http://cs.cmu.edu/~10315/data/mnist_layer1_biases.csv", delimiter=',')
layer2_weights = np.loadtxt("http://cs.cmu.edu/~10315/data/mnist_layer2_weights.csv", delimiter=',')
layer2_biases = np.loadtxt("http://cs.cmu.edu/~10315/data/mnist_layer2_biases.csv", delimiter=',')
print(layer1_weights.shape)
print(layer1_biases.shape)
print(layer2_weights.shape)
print(layer2_biases.shape)

Output for the fifth image in the test set (index = 4):

<img src="https://www.cs.cmu.edu/~10315/figures/digit_network_output_4.png" width="500"/>

In [None]:
# For the fifth image in the test input (index = 4), the output should be the same as the bar chart in the
# figure above
image_index = 4
output = network(x_test[image_index], layer1_weights, layer1_biases, layer2_weights, layer2_biases)

show_digit(x_test[image_index])

plt.figure()
plt.bar(range(10), output)
plt.xticks(range(10))

output

In [None]:
# Running on a range of test images
for image_index in range(5):
    output = network(x_test[image_index], layer1_weights, layer1_biases, layer2_weights, layer2_biases)

    show_digit(x_test[image_index])

    plt.figure()
    plt.bar(range(10), output)
    plt.xticks(range(10))

In [None]:
# Test with really simple values to make sure the network functions are working
# Output should be all zeros

x = np.zeros(784)
layer1_weights = np.zeros((200, 784))
layer1_biases = np.zeros(200)
layer2_weights = np.zeros((10, 200))
layer2_biases = np.zeros(10)

output = network(x, layer1_weights, layer1_biases, layer2_weights, layer2_biases)

plt.figure()
plt.bar(range(10), output)
plt.xticks(range(10))

output

In [None]:
# Test with really simple values to make sure the network functions are working

x = np.zeros(784)
layer1_weights = np.zeros((200, 784))
layer1_biases = np.zeros(200)
layer2_weights = np.zeros((10, 200))
layer2_biases = np.arange(10)

output = network(x, layer1_weights, layer1_biases, layer2_weights, layer2_biases)

plt.figure()
plt.bar(range(10), output)
plt.xticks(range(10))

output

In [None]:
# Test with really simple values to make sure the network functions are working
# Should have the same results as above because the layer 2 weights are all still zero

x = np.zeros(784)
layer1_weights = np.random.normal(size=(200, 784))
layer1_biases = np.random.normal(size=200)
layer2_weights = np.zeros((10, 200))
layer2_biases = np.arange(10)

output = network(x, layer1_weights, layer1_biases, layer2_weights, layer2_biases)

plt.figure()
plt.bar(range(10), output)
plt.xticks(range(10))

output

In [None]:
# Test with really simple values to make sure the network functions are working

x = 99*np.ones(784)
layer1_weights = np.zeros((200, 784))
layer1_biases = np.arange(200)
layer2_weights = np.zeros((10, 200))
for i in range(10):
    layer2_weights[i, i] = -2
layer2_biases = np.zeros(10)

output = network(x, layer1_weights, layer1_biases, layer2_weights, layer2_biases)

plt.figure()
plt.bar(range(10), output)
plt.xticks(range(10))

output

In [None]:
# Test with really simple values to make sure the network functions are working

x = 10*np.arange(784)
layer1_weights = np.zeros((200, 784))
layer1_weights[33, 99] = 1
layer1_biases = np.zeros(200)
layer2_weights = np.zeros((10, 200))
layer2_weights[4, 33] = 1
layer2_biases = np.zeros(10)

output = network(x, layer1_weights, layer1_biases, layer2_weights, layer2_biases)

plt.figure()
plt.bar(range(10), output)
plt.xticks(range(10))

output

### Run the local autograder tests for this question

In [None]:
grader.check("Q2b")

## Q2c: 28x28 Image Autoencoder Network <a class="anchor" name="q2c"></a>

Let's keep going and create an autoencoder network to encode and decode images!

<img src="https://www.cs.cmu.edu/~10315/figures/autoencoder_network_summary.png" width="800"/>

As a quick preview, you will implement two autoencoder network functions: `encode(x, W1, b1, W2, b2, W3, b3, W4, b4)` and `decode(z, W5, b5, W6, b6, W7, b7, W8, b8)`.

### Parameters: Weights and biases

We've already trained this autoencoder network using PyTorch: https://github.com/L1aoXingyu/pytorch-beginner/blob/master/08-AutoEncoder/simple_autoencoder.py The only modification to the network was changing the dimension from the encoded features to be 2 rather than 3.

We've saved the resulting weight (W) and bias (b) values for all 8 linear layers.

We'll pass these weights (W) and biases (b) into your `encode` and `decode` functions.

The NumPy arrays should have been automatically downloaded and extracted to `autoencoder/*.npy` as part of the setup cell that processed `hw1_additional_files.zip`.

Rather than storing these arrays in csv files, we stored them more efficiently in NumPys *.npy files that can be processed with `np.save` and `np.load`.

In [None]:
# Loading the weights and biases into a dictonary for convenient storage
# This may take a few seconds
params = {}
for i in range(1, 9):
    params[f'W{i}'] = np.load(f'autoencoder/autoencoder_W{i}.npy')
    params[f'b{i}'] = np.load(f'autoencoder/autoencoder_b{i}.npy')

for key in params:
    print(key, params[key].shape)

### Implementation

You've already written the `linear` and `relu` functions that should work perfectly well in your autoencoder `encode` and `decode` functions. The only new layer type is a `tanh` layer that, like relu, applies the tanh trig function to each element in the input vector.

<span style="color:red">Note:</span> You must use NumPy in these functions; no loops allowed. Points will be taken off during manual grading (autograder would still give full points).

In [None]:

def tanh(z):
    """ tanh function applied to each entry of the input vector.
        Input:
        z: Numpy array with shape (num_in,)

        Returns: Numpy array with shape (num_in,)
    """

    ...

def encode(x, W1, b1, W2, b2, W3, b3, W4, b4):
    """ Return the result of passing input values x through the encoder half of the autoencoder network
        given the weight and bias parameters for the four encoder linear layers.

        Input:
        x: Numpy array with shape (784,)
        W1: Numpy array with shape (128, 784) containing the weights for linear layer 1
        b1: Numpy array with shape (128,) containing the bias values for linear layer 1
        W2: Numpy array with shape (64, 128) containing the weights for linear layer 2
        b2: Numpy array with shape (64,) containing the bias values for linear layer 2
        W3: Numpy array with shape (12, 64) containing the weights for linear layer 2
        b3: Numpy array with shape (12,) containing the bias values for linear layer 2
        W4: Numpy array with shape (2, 12) containing the weights for linear layer 2
        b4: Numpy array with shape (2,) containing the bias values for linear layer 2

        Returns: Numpy array with shape (2,) containing the output of encoder half of the network
    """

    ...


def decode(z, W5, b5, W6, b6, W7, b7, W8, b8):
    """ Return the result of passing encoded vector z through the decoder half of the autoencoder network
        given the weight and bias parameters for the four decoder linear layers.

        Input:
        z: Numpy array with shape (2,)
        W5: Numpy array with shape (12, 2) containing the weights for linear layer 1
        b5: Numpy array with shape (12,) containing the bias values for linear layer 1
        W6: Numpy array with shape (64, 12) containing the weights for linear layer 2
        b6: Numpy array with shape (64,) containing the bias values for linear layer 2
        W7: Numpy array with shape (128, 64) containing the weights for linear layer 2
        b7: Numpy array with shape (128,) containing the bias values for linear layer 2
        W8: Numpy array with shape (784, 128) containing the weights for linear layer 2
        b8: Numpy array with shape (784,) containing the bias values for linear layer 2

        Returns: Numpy array with shape (784,) containing the output of decoder half of the network
    """

    ...


Cells that you can run to make sure your code is working before running the local tests.

Are the results what you expect them to be?

Feel free to modify these or add more cells to help you test your code.

In [None]:
# This will take a minute or so the first time you run it
x_train, y_train, x_test, y_test, label_names, feature_names = load_digit_dataset()
print(x_test.shape)

We trained the autoencoder network on images with pixel values ranging from -1 to 1 rather than 0 to 255. So we'll provide two quick pre/post processing functions to make this quick adjustments.

In [None]:
def preprocess(x):
    # These scalar, element-wise operations are easy to write with NumPy

    # Convert to 0 to 1
    x = x/255

    # Convert to -1 to 1
    x = (x-0.5) * 2

    return x

def postprocess(x):
    # These scalar, element-wise operations are easy to write with NumPy

    # Convert to 0 to 1
    x = (x+1) / 2

    # Convert to 0 to 255
    x = x * 255

    x = x.astype(np.uint8)

    return x

In [None]:
#@title Draw custom inputs { run: "auto"}
z1 = 0 #@param {type:"slider", min:-30, max:30, step:0.5}
z2 = 0 #@param {type:"slider", min:-30, max:30, step:0.5}

z = np.array([z1, z2])

x_prime = decode(z, params['W5'], params['b5'], params['W6'], params['b6'], params['W7'], params['b7'], params['W8'], params['b8'])

x_prime = postprocess(x_prime)

show_digit(x_prime)
plt.figure()
plt.plot(z[0], z[1], 'ro', markersize=12)
plt.xlim(-30, 30)
plt.ylim(-30, 30)
# Show axes
plt.axhline(0, color='lightgray')
plt.axvline(0, color='lightgray')
plt.xlabel("$z_1$")
plt.ylabel("$z_2$")
plt.title("Change values of z1 and z2 to create different output images!")

In [None]:
# Let's see how well we can reconstruct the fifth image in the test input (index = 4)
image_index = 4
x = x_test[image_index]
x = preprocess(x)

z = encode(x, params['W1'], params['b1'], params['W2'], params['b2'], params['W3'], params['b3'], params['W4'], params['b4'])
x_prime = decode(z, params['W5'], params['b5'], params['W6'], params['b6'], params['W7'], params['b7'], params['W8'], params['b8'])

x_prime = postprocess(x_prime)

show_digit(x_test[image_index])
show_digit(x_prime)
plt.figure()
plt.plot(z[0], z[1], 'ro', markersize=12)
plt.xlim(-30, 30)
plt.ylim(-30, 30)
# Show axes
plt.axhline(0, color='lightgray')
plt.axvline(0, color='lightgray')

x_prime

In [None]:
# Running on a range of test images
for image_index in range(5):
    x = x_test[image_index]
    x = preprocess(x)

    z = encode(x, params['W1'], params['b1'], params['W2'], params['b2'], params['W3'], params['b3'], params['W4'], params['b4'])
    x_prime = decode(z, params['W5'], params['b5'], params['W6'], params['b6'], params['W7'], params['b7'], params['W8'], params['b8'])

    x_prime = postprocess(x_prime)

    show_digit(x_test[image_index])
    show_digit(x_prime)
    plt.figure()
    plt.plot(z[0], z[1], 'ro', markersize=12)
    plt.xlim(-30, 30)
    plt.ylim(-30, 30)
    # Show axes
    plt.axhline(0, color='lightgray')
    plt.axvline(0, color='lightgray')


In [None]:
# Test with really simple values to make sure the network functions are working

x = np.zeros(784)

z = encode(x, params['W1'], params['b1'], params['W2'], params['b2'], params['W3'], params['b3'], params['W4'], params['b4'])

z

In [None]:
# Test with really simple values to make sure the network functions are working

x = np.ones(784)

z = encode(x, params['W1'], params['b1'], params['W2'], params['b2'], params['W3'], params['b3'], params['W4'], params['b4'])

z

### Run the local autograder tests for this question

In [None]:
grader.check("Q2c")

### Links back to questions

* [Q2a: Three neuron network](#q2a)
* [Q2b: 28x28 image classification network](#q2b)
* [Q2c: 28x28 image autoencoder network](#q2c)

## Submit your work

### Submit your code to Gradescope

Congratulations on finishing!!

Save your notebook (or click File->Download->Download .ipynb) and then upload your `hw1.ipynb` file to Gradescope under assignment HW1 (programming).

Not all of the tests are included in the local autograder. Some of the tests are "hidden" and only run in the server autograder on Gradescope.

There is no limit on the number of submissions to Gradescope, so as you complete parts of the assignment it is a really good idea to save your notebook and upload it to Gradescope.

### Check all of your work locally (but don't forget to submit to Gradescope)