## Data 188: Homework 0

The goal of this homework is to give you a quick overview of some of the concepts and ideas that you should be familiar with _prior_ to taking this course, namely: familiarity with Python coding, numpy, matrix-vector computations.
The assignment will require you to build a basic softmax regression algorithm, plus a simple two-layer neural network.
You will create these implementations in native Python (using the numpy library).
Along the way, we'll give some guidance as to how you might want to implement these different functions, but overall the details are up to you.
What we will say, though, is that in the Python version you should make heavy use of the linear algebra calls in numpy: trying to use explicit loops will usually make the code much slower (and more complicated) than it should be.

**We know that there is a lot of prose text in this assignment, especially in the beginning, and relatively little coding.  That being said, _please_ read carefuly through the entirety of the text in this writeup.  Doing so will describe the process and philosophy behind how we structure our assignments, and will make a huge difference in your ability to complete later assignments.**

All the code development for the homeworks in Data 188 can be done in the Google Colab environment.  However, instead of making extensive use of actual code blocks within a colab notebook, most of the code you develop will be done `.py` files downloaded (automatically) to your Google Drive, and you will largely use the notebook for running shell scripts that test and submit the code to the auto-grader (or optionally for testing out snippets of code as you develop, but this is not required).  This is a somewhat non-standard usage of Colab Notebooks (typically one uses them more like interactive coding environments, with code cells directly in the notebook).  However, our rationale for using them in the way that we do is actually straightforward: in addition to being a nice cloud-based notebook environment, Colab also provides very nice access to "standard" cloud-based GPU systems that you can spin up quickly, which will let you develop especially some of the later (CUDA-based) code without requiring you to get access to a physical GPU, or set up the CUDA libraries yourself.  That being said, **you are welcome to do the development and submission of your code in any environment you like**, we just can't guarantee the abiltiy to support any environment other than the Colab-based one.

### (optional) "Python Tutorial with Colab""

Before you get started with HW0, it's recommended to run through this ["Python Tutorial with Colab"](https://github.com/data-188-berkeley/hw0/blob/main/colab_tutorial.ipynb).
This will give a brief tour of: Python, numpy, and matplotlib.

Hint: for this course (and this assignment!), you will find the numpy section particularly useful.

Tip: if you are having significant struggles with this "Python Tutorial with Colab" (particularly the Python section), I recommend setting aside additional time to build up your Python programming skills (or reconsider taking this class), as this course relies heavily on Python programming.

### Clarifications Doc

In case we need to give any clarifications (or hints!) for the assignment, see this Google Doc: ["HW0 Clarifications"](https://docs.google.com/document/d/1P967Ok28mmpjpxQVqU6Y4MXn6Doq1VYuAYNqTbGYGY4/edit?tab=t.0).
This doc will be regularly updated.

### Getting started

To get started, **make a copy of this notebook** file by selecting "Save a copy in Drive" from the "File" menu, and then run the code block below.  This will load your Google Drive folder into the Colab notebook environment, create a `/data188/hw0` directory, and clone the HW0 public repository into this directory.
Notably, your code will be in this directory!
Spend some time to familiarize yourself with the contents of this directory.

I also recommend opening the notebook from your `/data188/hw0/` directory (rather than the copy you made via "Save a copy in Drive") to keep things organized.

**Acknowledgement**: this assignment is based on HW0 of CMU's ["Deep Learning Systems"](https://dlsyscourse.org/) course (10-414/714, Fall 2025). Thanks to: Prof. Zico Kolter, Prof. Tianqi Chen, Prof. Tim Dettmers.

In [None]:
# Run this cell each time your kernel is disconnected or restarted
# Tip: it's always safe to re-run this, this will never delete data

import os
import sys

# basedir_course: All colab material for this course will live here.
#   Feel free to modify this if you'd like.
basedir_course = "/content/drive/MyDrive/data188"
# asn_name: name of assignment, eg hw0. This must match github repo name.
asn_name = "hw0"
rootdir_asn = os.path.join(basedir_course, asn_name)

# Fetch code to set up the assignment, then set the correct working directory
# (eg for python imports to work)
from google.colab import drive
drive.mount('/content/drive')

os.makedirs(basedir_course, exist_ok=True)
os.chdir(basedir_course)
os.system(f"git clone https://github.com/data188sp26/{asn_name}.git")
os.chdir(rootdir_asn)

# install required libraries
!pip3 install numdifftools

# Validate that our current working directory is correct.
# This should output:
#   /content/drive/MyDrive/data188/hw0/
print("Current working directory: ", os.getcwd())
# Another check: let's double check that the files in hw0/ are in the current directory.
# This should output something like:
#   ['.git', 'LICENSE', 'apps', 'data', 'hw0.ipynb', 'tests', '.pytest_cache']
print("ls cwd: ", os.listdir(os.getcwd()))

print("Setup done!")

## Question 1: A basic `add` function, and testing/autograding basics

To illustrate the workflow of these assignments and the autograding system, we'll use a simple example of implementing an `add` function.  Note that the commands run above will create the following structure in your `data188/hw0` directory

    data/
        train-images-idx3-ubyte.gz
        train-labels-idx1-ubyte.gz
        t10k-images-idx3-ubyte.gz
        t10k-labels-idx1-ubyte.gz
    apps/
        simple_ml_hw0.py
    tests/
        hw0/
            test_simple_ml_hw0.py
    
The `data/` directory contains the data needed for this assignment (a copy of the MNIST data set); the `apps/` directory contains the source files where you will write your implementations; the `tests/` directory contains tests that will evaluate (locally) your solution.

The first homework question requires you to implement `simple_ml_hw0.add()` function (this trivial function is not used anywhere, it is just an example to get you used to the structure of the assignment).  Looking at the `apps/simple_ml_hw0.py` file, you will find the following function stub for the `add()` function.

```python
def add(x, y):
    """ A trivial 'add' function you should implement to get used to the
    autograder and submission system.  The solution to this problem is in
    the homework notebook.

    Args:
        x (Python number or numpy array)
        y (Python number or numpy array)

    Return:
        Sum of x + y
    """
    ### YOUR CODE HERE
    pass
    ### END YOUR CODE
```
The docstring in each file defines the expected input/output mapping that your function should produce (you need to get used to reading carefully, as the number one source of errors in submissions, we typically find,  is simply not reading the spec).  And hopefully it's pretty obvious to you how to implement this function.  You would just replace the `pass` statement with the correct code, namely the following:

```python
def add(x, y):
    """ A trivial 'add' function you should implement to get used to the
    autograder and submission system.  The solution to this problem is in the
    the homework notebook.

    Args:
        x (Python number or numpy array)
        y (Python number or numpy array)

    Return:
        Sum of x + y
    """
    ### YOUR CODE HERE
    return x + y
    ### END YOUR CODE
```
Go ahead and do this in your `apps/simple_ml_hw0.py` file.

### Running local tests

Now you will want to test to see if your code works, and if so, to submit it to the autograding system.  Throughout this course, we are using standard tools for running unit tests on code, namely the `pytest` system.  Once you've written the correct code in the `apps/simple_ml_hw0.py` file, run the following command below.

In [2]:
!python -m pytest -k "add"

platform linux -- Python 3.12.12, pytest-8.4.2, pluggy-1.6.0
rootdir: /content/drive/MyDrive/data188/hw0
plugins: langsmith-0.6.8, typeguard-4.4.4, anyio-4.12.1
collected 5 items / 4 deselected / 1 selected                                  [0m

tests/hw0/test_simple_ml_hw0.py [32m.[0m[32m                                        [100%][0m



If all goes correctly, you will see that one tests is passed correctly.  To see how this test works, take a look at the `tests/test_simple_ml.py` file, specifically the `test_add()` function:

```python
def test_add():
    assert add(5,6) == 11
    assert add(3.2,1.0) == 4.2
    assert type(add(4., 4)) == float
    np.testing.assert_allclose(add(np.array([1,2]), np.array([3,4])),
                               np.array([4,6]))
```

This code runs a suite of unit tests against your implemented function.  If the function is implemented correctly, then all the assertions above _should_ pass (i.e., the code will execute without errors).  If on the other hand, you implemented something incorrectly (say, changed the `x + y` above to `x - y`), then these assertions will fail, and `pytest` will indicate that the corresponding test failed.

In [None]:
# in this example cell, we replaced "x + y" with "x - y" in simple_ml_hw0.add()
!python -m pytest -k "add"

platform darwin -- Python 3.7.3, pytest-4.3.1, py-1.8.0, pluggy-0.9.0
rootdir: /Users/zkolter/Dropbox/class/10-714/homework/hw0, inifile:
plugins: remotedata-0.3.1, openfiles-0.3.2, doctestplus-0.3.0, arraydiff-0.3
collected 6 items / 5 deselected / 1 selected                                  [0m

tests/test_simple_ml.py [31mF[0m[36m                                                [100%][0m

[31m[1m___________________________________ test_add ___________________________________[0m

[1m    def test_add():[0m
[1m>       assert add(5,6) == 11[0m
[1m[31mE       assert -1 == 11[0m
[1m[31mE        +  where -1 = add(5, 6)[0m

[1m[31mtests/test_simple_ml.py[0m:16: AssertionError


As you can see, you will get an error that indicates the line where the assertion failed, which you can then use to go back and debug your implementation.  **You should get comfortabale with reading and tracing through the tests file as a way of better understanding how your implementations should work.**

Learning to properly develop and use unit tests is crucial to modern software development, and hopefully a secondary outcome of this course is that you become familiar with the typical usage of unit tests within software development.  Of course, this isn't entirely true, because you don't necessarily need to _write_ your own tests to pass the questions here, but you _should_ become familiar with how to read the test files that we provide, as a way to understand how your function should behave.  However, we _absolutely_ would also encourage you to write additional tests for your implementations, especially if you find that your code is passing the local tests, but still seems to be failing on submission.

One last quick comment.  If you're used to debugging code via print statements, note that **pytest will by default capture any output**. You can disable this behavior and have the tests display all output in all cases by passing the `-s` flag to pytest.

## Question 2: Loading MNIST data

Now that you're familiar with the autograding system, try it out on the next function you need to implement in the `apps/simple_ml_hw0.py` file: the `parse_mnist_data()` function.  Here is the function declaration from the file (we typically won't walk through this whole process again, but will do so here one more time).

```python
def parse_mnist(image_filename, label_filename):
    """ Read an images and labels file in MNIST format.  See this page:
    http://yann.lecun.com/exdb/mnist/ for a description of the file format.

    Args:
        image_filename (str): name of gzipped images file in MNIST format
        label_filename (str): name of gzipped labels file in MNIST format

    Returns:
        Tuple (X,y):
            X (numpy.ndarray[np.float32]): 2D numpy array containing the loaded
                data.  The dimensionality of the data should be
                (num_examples x input_dim) where 'input_dim' is the full
                dimension of the data, e.g., since MNIST images are 28x28, it
                will be 784.  Values should be of type np.float32, and the data
                should be normalized to have a minimum value of 0.0 and a
                maximum value of 1.0 (i.e., scale original values of 0 to 0.0
                and 255 to 1.0).

            y (numpy.ndarray[dtype=np.uint8]): 1D numpy array containing the
                labels of the examples.  Values should be of type np.uint8 and
                for MNIST will contain the values 0-9.
    """
    ### BEGIN YOUR CODE
    pass
    ### END YOUR CODE
```

Hopefully you're now familiar with how this docstring works, and have an idea about how to go about implementing this function.  First, go to http://yann.lecun.com/exdb/mnist/ or this alternate [link](https://web.archive.org/web/20220509025752/http://yann.lecun.com/exdb/mnist/) (the bottom of the page) to read about the binary format for the MNIST data.  Then write a loader that will read files of this type, and return numpy arrays according to the specification in the docstring (if you're having any issues with the implementation, be sure to read the docstring closely).  We'd recommend you use the `struct` module in python (along with the `gzip` module and of course `numpy` itself), in order to implement this function.

When you've implemented the function run the local unit tests.


In [4]:
!python -m pytest -k "parse_mnist"

platform linux -- Python 3.12.12, pytest-8.4.2, pluggy-1.6.0
rootdir: /content/drive/MyDrive/data188/hw0
plugins: langsmith-0.6.8, typeguard-4.4.4, anyio-4.12.1
collected 5 items / 4 deselected / 1 selected                                  [0m

tests/hw0/test_simple_ml_hw0.py [32m.[0m[32m                                        [100%][0m



## Question 3: Softmax loss

Implement the softmax (a.k.a. cross-entropy) loss as defined in `softmax_loss()` function in `apps/simple_ml_hw0.py`.  Recall (hopefully this is review, but we'll also cover it in lecture), that for a multi-class output that can take on values $y \in \{1,\ldots,k\}$, the softmax loss takes as input a vector of logits $z \in \mathbb{R}^k$, the true class $y \in \{1,\ldots,k\}$ returns a loss defined by
\begin{equation}
\ell_{\mathrm{softmax}}(z, y) = (\log\sum_{i=1}^k \exp z_i) - z_y.
\end{equation}

Where $z_y$ is the predicted logit for ground-truth class $y$, eg `z[y]` in numpy code, for `z: np.ndarray, y: int`.

Note that as described in its docstring, `softmax_loss()` takes a _2D array_ of logits (i.e., the $k$ dimensional logits for a **batch** of different samples), plus a corresponding 1D array of true labels, and should output the _average_ softmax loss over the entire **batch**.  Note that to do this correctly, you should _not_ use any loops, but do all the computation natively with numpy vectorized operations (to set expectations here, we should note for instance that our reference solution consists 1-3 lines of code).

Note that for a "real/practical" implementation of softmax loss you would want to scale the logits to prevent numerical issues like overflow, but we won't worry about that here (the rest of the assignment will work fine even if you don't worry about this). If you're curious, to learn how deep learning libraries like pytorch/tensorflow calculate softmax loss in a numerically stable manner (eg via the "log-sum-exp" trick), see this [link](https://stackoverflow.com/a/63968725).

*Hint*: you'll want to use the [np.log](https://numpy.org/doc/stable/reference/generated/numpy.log.html), [np.sum](https://numpy.org/doc/stable/reference/generated/numpy.sum.html), [np.exp](https://numpy.org/doc/stable/reference/generated/numpy.exp.html), and [np.mean](https://numpy.org/doc/stable/reference/generated/numpy.mean.html) functions. In particular, pay attention to the `axis` kwarg in `np.sum(..., axis=...)`, you'll want to use it!

The code below runs the test cases.

In [5]:
!python -m pytest -k "softmax_loss"

platform linux -- Python 3.12.12, pytest-8.4.2, pluggy-1.6.0
rootdir: /content/drive/MyDrive/data188/hw0
plugins: langsmith-0.6.8, typeguard-4.4.4, anyio-4.12.1
collected 5 items / 4 deselected / 1 selected                                  [0m

tests/hw0/test_simple_ml_hw0.py [32m.[0m[32m                                        [100%][0m



## Question 4: Stochastic gradient descent for softmax regression

In this question you will implement stochastic gradient descent (SGD) for (linear) softmax regression.  In other words, as discussed in lecture, we will consider a hypothesis function that makes $n$-dimensional inputs to $k$-dimensional logits via the function
\begin{equation}
h(x) = \Theta^T x
\end{equation}
where $x \in \mathbb{R}^n$ is the input, and $\Theta \in \mathbb{R}^{n \times k}$ are the model parameters.  Given a dataset $\{(x^{(i)} \in \mathbb{R}^n, y^{(i)} \in \{1,\ldots,k\})\}$, for $i=1,\ldots,m$, the optimization problem associated with softmax regression is thus given by
\begin{equation}
\DeclareMathOperator*{\minimize}{minimize}
\minimize_{\Theta} \; \frac{1}{m} \sum_{i=1}^m \ell_{\mathrm{softmax}}(\Theta^T x^{(i)}, y^{(i)}).
\end{equation}

Recall from class that the gradient of the linear softmax objective is given by
\begin{equation}
\nabla_\Theta \ell_{\mathrm{softmax}}(\Theta^T x, y) = x (z - e_y)^T
\end{equation}
where
\begin{equation}
\DeclareMathOperator*{\normalize}{normalize}
z = \frac{\exp(\Theta^T x)}{1^T \exp(\Theta^T x)} \equiv \normalize(\exp(\Theta^T x))
\end{equation}
(i.e., $z$ is just the normalized softmax probabilities), and where $e_y$ denotes the $y$th unit basis (aka one-hot encoding), i.e., a vector of all zeros with a one in the $y$th position.

We can also write this in the more compact notation to extend to multiple samples at once (eg a **batch**).  Namely, if we let $X \in \mathbb{R}^{m \times n}$ denote a design matrix of some $m$ inputs (either the entire dataset or a minibatch), $y \in \{1,\ldots,k\}^m$ a corresponding vector of labels, and overloading $\ell_{\mathrm{softmax}}$ to refer to the average softmax loss, then
\begin{equation}
\nabla_\Theta \ell_{\mathrm{softmax}}(X \Theta, y) = \frac{1}{m} X^T (Z - I_y)
\end{equation}
where
\begin{equation}
Z = \normalize(\exp(X \Theta)) \quad \mbox{(normalization applied row-wise)}
\end{equation}
denotes the matrix of logits, and $I_y \in \mathbb{R}^{m \times k}$ represents a concatenation of one-hot vectors for the labels in $y$.

Using these gradients, implement the `softmax_regression_epoch()` function in `apps/simple_ml_hw0.py`, which runs a single epoch of SGD (one pass over a data set) using the specified learning rate / step size `lr` and minibatch size `batch`.  As described in the docstring, your function should modify the `Theta` array in-place.  After implementation, run the tests.

In [None]:
!python -m pytest -k "softmax_regression_epoch and not cpp"

### Training MNIST with softmax regression

Although it's not a part of the tests, now that you have written this code, you can also try training a full MNIST linear classifier using SGD.  For this you can use the `train_softmax()` function in the `apps/simple_ml_hw0.py` file (we have already written this function for you, so you don't need to write it yourself, though you can take a look to see what it's doing).  

You can see how this works using the following code.  For reference, as seen below, our implementation runs in ~3 seconds on Colab, and achieves 7.97% error.

In [None]:
# Reload the simple_ml_hw0 module. If you make changes to simple_ml_hw0.py, then `importlib.reload()` will
# pull in those changes to the currently-running kernel.
# Note: In py3.12, %autoreload (aka `imp` module) was deprecated and replaced with `importlib`.
# Important: any .py file you change must be explicitly reloaded via `importlib.reload()`.
import importlib
import apps.simple_ml_hw0
importlib.reload(apps.simple_ml_hw0)
import python.needle.utils.visualize_mnist
importlib.reload(python.needle.utils.visualize_mnist)

from apps.simple_ml_hw0 import train_softmax, parse_mnist

X_tr, y_tr = parse_mnist("data/train-images-idx3-ubyte.gz",
                         "data/train-labels-idx1-ubyte.gz")
X_te, y_te = parse_mnist("data/t10k-images-idx3-ubyte.gz",
                         "data/t10k-labels-idx1-ubyte.gz")

# set visualize_preds=True to visualize predictions at end of training
train_softmax(X_tr, y_tr, X_te, y_te, epochs=10, lr=0.2, batch=100, visualize_preds=True)

| Epoch | Train Loss | Train Err | Test Loss | Test Err |
|     0 |    0.35134 |   0.10182 |   0.33588 |  0.09400 |
|     1 |    0.32142 |   0.09268 |   0.31086 |  0.08730 |
|     2 |    0.30802 |   0.08795 |   0.30097 |  0.08550 |
|     3 |    0.29987 |   0.08532 |   0.29558 |  0.08370 |
|     4 |    0.29415 |   0.08323 |   0.29215 |  0.08230 |
|     5 |    0.28981 |   0.08182 |   0.28973 |  0.08090 |
|     6 |    0.28633 |   0.08085 |   0.28793 |  0.08080 |
|     7 |    0.28345 |   0.07997 |   0.28651 |  0.08040 |
|     8 |    0.28100 |   0.07923 |   0.28537 |  0.08010 |
|     9 |    0.27887 |   0.07847 |   0.28442 |  0.07970 |


## Question 5: SGD for a two-layer neural network

Now that you've written SGD for a linear classifier, let's consider the case of a simple two-layer neural network.  Specifically, for input $x \in \mathbb{R}^n$, we'll consider a two-layer neural network (without bias terms) of the form
\begin{equation}
z = W_2^T \mathrm{ReLU}(W_1^T x)
\end{equation}
where $W_1 \in \mathbb{R}^{n \times d}$ and $W_2 \in \mathbb{R}^{d \times k}$ represent the weights of the network (which has a $d$-dimensional hidden unit), and where $z \in \mathbb{R}^k$ represents the logits output by the network.  We again use the softmax / cross-entropy loss, meaning that we want to solve the optimization problem
\begin{equation}
\minimize_{W_1, W_2} \;\; \frac{1}{m} \sum_{i=1}^m \ell_{\mathrm{softmax}}(W_2^T \mathrm{ReLU}(W_1^T x^{(i)}), y^{(i)}).
\end{equation}
Or alternatively, overloading the notation to describe the batch form with matrix $X \in \mathbb{R}^{m \times n}$, this can also be written
\begin{equation}
\minimize_{W_1, W_2} \;\; \ell_{\mathrm{softmax}}(\mathrm{ReLU}(X W_1) W_2, y).
\end{equation}

Using the chain rule, we can derive the backpropagation updates for this network (we'll briefly cover these in class, but also provide the final form here for ease of implementation).  Specifically, let
\begin{equation}
\begin{split}
Z_1 \in \mathbb{R}^{m \times d} & = \mathrm{ReLU}(X W_1) \\
G_2 \in \mathbb{R}^{m \times k} & = \normalize(\exp(Z_1 W_2)) - I_y \\
G_1 \in \mathbb{R}^{m \times d} & = \mathrm{1}\{Z_1 > 0\} \circ (G_2 W_2^T)
\end{split}
\end{equation}
where $\mathrm{1}\{Z_1 > 0\}$ is a binary matrix with entries equal to zero or one depending on whether each term in $Z_1$ is strictly positive and where $\circ$ denotes elementwise multiplication.  Then the gradients of the objective are given by
\begin{equation}
\begin{split}
\nabla_{W_1} \ell_{\mathrm{softmax}}(\mathrm{ReLU}(X W_1) W_2, y) & = \frac{1}{m} X^T G_1  \\
\nabla_{W_2} \ell_{\mathrm{softmax}}(\mathrm{ReLU}(X W_1) W_2, y) & = \frac{1}{m} Z_1^T G_2.  \\
\end{split}
\end{equation}

**Note:** If the details of these precise equations seem a bit cryptic to you, don't worry too much.  These are the standard backpropagation equations for a two-layer ReLU network: the $Z_1$ term just computes the "forward" pass while the $G_2$ and $G_1$ terms denote the backward pass.  But the precise form of the updates can vary depending upon the notation you've used for neural networks, the precise ways you formulate the losses, if you've derived these previously in matrix form, etc.
If you feel comfortable with understanding the notation as well as implementing it mechanically via numpy/python code, then that is sufficient in terms of background (after all, the whole _point_ of deep learning systems, to some extent, is that we don't need to bother with these manual gradient calculations).
It's also OK if you feel that you can't yet derive the above gradient update equations on your own.
But: if these concepts are _completely_ foreign to you, then it may be better to take a separate course on ML and neural networks prior to this course, or at least be aware that there will be substantial catch-up work to do for the course.

Using these gradients, now write the `nn_epoch()` function in the `apps/simple_ml_hw0.py` file.  As with the previous question, your solution should modify the `W1` and `W2` arrays in place.  After implementing the function, run the following test.  Be sure to use matrix operations as indicated by the expresssions above to implement the function: this will be _much_ faster, and more efficient, than attempting to use loops (and it requires far less code).

In [None]:
!python -m pytest -k "nn_epoch"

### Training a full neural network

As before, though it isn't a strict necessity to pass the autograder, it's rather fun to see how well you can use your neural network function to train an MNIST classifier.  Analogous to the softmax regression case, there is a `train_nn()` function in the `simple_ml_hw0.py` file you can use to train this two-layer network via SGD with multiple epochs.  Here is code, for example, that trains a two-layer network with 400 hidden units.

In [None]:
# Reload the simple_ml_hw0 module.
import importlib
import apps.simple_ml_hw0
importlib.reload(apps.simple_ml_hw0)
import python.needle.utils.visualize_mnist
importlib.reload(python.needle.utils.visualize_mnist)

from apps.simple_ml_hw0 import train_nn, parse_mnist

X_tr, y_tr = parse_mnist("data/train-images-idx3-ubyte.gz",
                         "data/train-labels-idx1-ubyte.gz")
X_te, y_te = parse_mnist("data/t10k-images-idx3-ubyte.gz",
                         "data/t10k-labels-idx1-ubyte.gz")
train_nn(X_tr, y_tr, X_te, y_te, hidden_dim=400, epochs=2, lr=0.2, visualize_preds=True)

| Epoch | Train Loss | Train Err | Test Loss | Test Err |
|     0 |    0.15324 |   0.04697 |   0.16305 |  0.04920 |
|     1 |    0.09854 |   0.02923 |   0.11604 |  0.03660 |
|     2 |    0.07392 |   0.02163 |   0.09750 |  0.03200 |
|     3 |    0.06006 |   0.01757 |   0.08825 |  0.02960 |
|     4 |    0.04869 |   0.01368 |   0.08147 |  0.02620 |
|     5 |    0.04061 |   0.01093 |   0.07698 |  0.02380 |
|     6 |    0.03494 |   0.00915 |   0.07446 |  0.02320 |
|     7 |    0.03027 |   0.00758 |   0.07274 |  0.02320 |
|     8 |    0.02674 |   0.00650 |   0.07103 |  0.02240 |
|     9 |    0.02373 |   0.00552 |   0.06989 |  0.02150 |
|    10 |    0.02092 |   0.00477 |   0.06870 |  0.02130 |
|    11 |    0.01914 |   0.00403 |   0.06837 |  0.02130 |
|    12 |    0.01705 |   0.00325 |   0.06748 |  0.02150 |
|    13 |    0.01541 |   0.00272 |   0.06688 |  0.02130 |
|    14 |    0.01417 |   0.00232 |   0.06657 |  0.02090 |
|    15 |    0.01282 |   0.00195 |   0.06591 |  0.02040 |
|    16 |    0

This takes about 1-2 minutes to run on Colab for our implementation, and as seen above, it achieve an error of ~1.9\% on MNIST.  Not bad for less than 20 lines of code or so...

## Homework Submission

To submit your homework assignment, we will do two things:
(1) Package your code (and any produced artifacts like model weights, etc) into a zip file
(2) Submit your zip file to Gradescope.

Once you've submitted to Gradescope, the Gradescope autograder will run and assign your earned points. Run the below cell to prepare+download your submission zip file:

In [None]:
# Packages code+artifacts into zip file, and download it to your local machine
# Tip: the zip file will be downloaded to your browser's default download folder, eg `~/Downloads, C:\Users\YourUserName\Downloads`, etc.
# IMPORTANT: be sure that `root_folder` is defined correctly in previous cell
from utils_public.utils import run_cmd

os.chdir(rootdir_asn)
print("cwd: ", os.getcwd())  # make sure we are in the right dir
print("ls cwd: ", os.listdir(os.getcwd()))

zip_outpath = f"{asn_name}_submission.zip"
print(f"Creating zip file (zip_outpath={zip_outpath})...")
run_cmd(["bash", "./utils_public/prepare_submission.sh", zip_outpath])
print("Created zipfile!")

# Check if filesize is too big
filesize_mb = os.path.getsize(zip_outpath) / (1024 * 1024)
print(f"Zip file size: {filesize_mb} MB")
if filesize_mb > 20:
    print(f"Warning: your submission zip is very large, and may result in autograder issues. Please investigate, perhaps you accidentally included unnecessary files?")

# Download created zipfile to your local machine
from google.colab import files
files.download(zip_outpath)
print(
    f"Finished downloading {zip_outpath}! Upload this zip file to Gradescope as your submission to run the autograder. "
    "\nThe zip file will be in your browser's default download directory, eg '~/Downloads', 'C:\\Users\\YourUserName\\Downloads', etc"
)