In [1]:
###########
# PRELUDE #
###########

# auto-reload changed python files
%load_ext autoreload
%autoreload 2

# Format cells with %%black
%load_ext blackcellmagic

# nice interactive plots
%matplotlib inline

# add repository directory to include path
from pathlib import Path
import sys
PROJECT_DIR = Path('../..').resolve()
sys.path.append(str(PROJECT_DIR))

import inspect
def _acceptable_global(name, value):
    """Returns True if a global variable with name/value can be safely ignored"""
    return (
        # stuff that's normal to share everywhere
        inspect.isroutine(value) or
        inspect.isclass(value) or
        inspect.ismodule(value) or
        # leading underscore marks private variables
        name.startswith('_') or
        # all-caps names indicate constants
        name.upper() == name or
        # ignore IPython stuff
        name in {'In', 'Out'} or 
        getattr(value, '__module__', '').startswith('IPython'))

def assert_globals_clean():
    """Raises an assertion error if there are unmanaged global variables.
       Variables that are considered 'managed' include those formatted with 
       ALL_CAPS (constants), _a_leading_underscore (recognized as a global but at
       least indicated as private to the cell), classes and modules, automatic
       imports from IPython, and functions generally."""
    unmanaged_globals = {k:type(v) for k, v in globals().items() if not _acceptable_global(k, v)}
    if unmanaged_globals != {}:
        raise AssertionError(f"Unmanaged globals found: {unmanaged_globals}")
    ok("No unmanaged globals detected")

from IPython.display import display, Markdown, HTML

def markdown(s):
    return display(Markdown(s))

def html(s):
    return display(HTML(s))

def ok(message="OK"):
    html(f"<div class=\"alert alert-block alert-success\">{message}</div>")

display(HTML("""
<style>
.custom-assignment-text {
    background-color: lightyellow;
    border: 1px solid darkkhaki; 
    padding: 10px;
    border-radius: 2px
}
</style>"""))

markdown("#### Custom functionality enabled:")
markdown("* Format a code cell by entering %%black at the top of it")
markdown("* Surround markdown cells with  `<div class=\"custom-assignment-text\">\\n\\n ... \\n\\n</div>` to format course-provided assignment text")
markdown("* Use `ok(<message>)` to notify of a passing test")
markdown("* Use `assert_globals_clean()` to check that all globals are managed (private, constants, etc.)")

#### Custom functionality enabled:

* Format a code cell by entering %%black at the top of it

* Surround markdown cells with  `<div class="custom-assignment-text">\n\n ... \n\n</div>` to format course-provided assignment text

* Use `ok(<message>)` to notify of a passing test

* Use `assert_globals_clean()` to check that all globals are managed (private, constants, etc.)

<div class="custom-assignment-text">

## Goal of mini-project

In the three problems of this mini-project, you will explore the idea of generalization, i.e., when the test error of a learned prediction function is roughly the same as its training error. You will explore how regularization and the choice of the learning algorithm (gradient descent, stochastic gradient descent, etc.) interact with generalization in a simple linear prediction setting.1 Many aspects of these relationships are still not well understood, and a fierce debate is currently raging within the Machine Learning community about whether our understanding of generalization lacks key components necessary for explaining the unreasonable effectiveness of stochastic gradient descent (particularly in the context of “deep learning”). This week will give you a glimpse of some of these mysteries.

</div>

<div class="custom-assignment-text">

# Part 1: Regression, Three Ways

We will consider the problem of fitting a linear model. Given $d$-dimensional input data $\mathbf{x}^{(1)}, \cdots, \mathbf{x}^{(n)} ∈ ℝ^d$ with real-valued labels $y^{(1)},\cdots, y^{(n)} ∈ ℝ$, the goal is to find the coefficient vector $\mathbf{a}$ that minimizes the sum of the squared errors. The total squared error of $\mathbf{a}$ can be written as $f(\mathbf{a}) = \sum_{i=1}^{n} f_i(\mathbf{a})$, where $f_i(\mathbf{a})=(\mathbf{a}^\top\mathbf{x}^{(i)} - $ $y^{(i)})^2$ denotes the squared error of the $i$th data point.

The data in this problem will be drawn from the following linear model. For the training data, we select $n$ data points $\mathbf{x}^{(1)}, \cdots, \mathbf{x}^{(n)}$, each drawn independently from a $d$-dimensional Gaussian distribution. We then pick the "true" coefficient vector $\mathbf{a}^*$ (again from a $d$-dimensional Gaussian), and give each training point $\mathbf{x}^{(i)}$ a label equal to $(\mathbf{a}^*)^\top\mathbf{x}^{(i)}$ plus some noise (which is drawn from a 1-dimensional Gaussian distribution).

The following Python code will generate the data used in this problem.

    d = 100 # dimensions of data
    n = 1000 # number of data points
    X = np.random.normal(0,1, size=(n,d))
    a_true = np.random.normal(0,1, size=(d,1))
    y = X.dot(a_true) + np.random.normal(0,0.5,size=(n,1))

</div>

<div class="custom-assignment-text">

(a) (4 points) Least-squares regression has the closed form solution $\mathbf{a} = (\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{y}$, which minimizes the squared error on the data. (Here $\mathbf{X}$ is the $n×d$ data matrix as in the code above, with one row per data point, and $y$ is the $n$-vector of their labels.) Solve for $\mathbf{a}$ and report the value of the objective function using this value $\mathbf{a}$. For comparison, what is the total squared error if you just set $\mathbf{a}$ to be the all 0’s vector?

Comment: Computing the closed-form solution requires time $O(nd^2+d^3)$, which is slow for large $d$. Although gradient descent methods will not yield an exact solution, they do give a close approximation in much less time. For the purpose of this assignment, you can use the closed form solution as a good sanity check in the following parts.

</div>