<a href="https://colab.research.google.com/github/olinml2024/notebooks/blob/main/ML24_Assignment07.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 7: Autograd

**High-level picture:** In this assignment you'll be implementing your very own system for automatic differentiation.  You'll then see how this framework can be used as a machine learning algorithm by using your autograd engine within a gradient descent optimizer.

*Credits:* the design of the autograd engine you will be implementing is based on the micrograd framework from **Andrej Karpathy**.  You may find it useful [to watch Andrej Karpathy's walkthrough video](https://www.youtube.com/watch?v=VMj-3S1tku0&t=2414s) of this.  A lot of what we've done in the first 2/3 of the assignment follows along the same lines, but we felt it important to explicitly define exercises so that you are actively doing things and reasoning about what you are implementing rather than passively watching.

Before we start, here is some possibly useful background for those with little familiarity with the idea of neural networks and gradient descent (we'll go into this soon, but we thought some might find these videos useful now).
* [But what is a neural network?](https://www.youtube.com/watch?v=aircAruvnKk)
* [Gradient descent, how neural networks learn](https://www.youtube.com/watch?v=IHZwWFHWa-w)

## Exercise 1

Let's do a little calculus in Python.

Define a function, `f`, that takes a scalar, `x`, as input and returns the expression $x^2 + 5x$.  Define a second function, `df`, that takes a scalar, `x`, as input and returns $\frac{d}{dx}f(x)$ (i.e., the derivative of $f$).  Verify that your function returns the correct derivative by computing a numerical approximation of the derivative.  **Hint:** to numerically estimate the derivative, calculate the ratio of rise over run as you nudge the input to the function, `f`, by a small amount.  $f'(x) \approx \frac{f(x+h) - f(x)}{h}$ (where $h$ is some small value, like $0.001$).

Define a function, `g`, that takes two scalars as input, `x` and `y`, and returns the expression $sin(x)y^2$.  Define two additional functions that return $\frac{\partial g}{\partial x}$ and $\frac{\partial g}{\partial y}$.  As before, verify that your partial derivative functions are working by taking a numerical approximation.

### Solution


In [None]:
# from math import sin, cos

# def f(x):
#     return x**2 + 5*x

# def df(x):
#     # doin' the real calculus
#     return 2*x + 5

# x = 3
# h = 0.001

# df_approx = (f(x+h) - f(x))/h

# print(f"derivative using calculus {df(x)} approximation of derivative {df_approx}")

# def g(x,y):
#     return sin(x)*y**2

# def dg_dx(x,y):
#     return cos(x)*y**2

# def dg_dy(x,y):
#     return 2*y*sin(x)

# x = 2
# y = 1

# dg_dx_approx = (g(x+h,y) - g(x,y))/h
# dg_dy_approx = (g(x,y+h) - g(x,y))/h

# print(f"derivative using calculus {dg_dx(x,y)} approximation of derivative {dg_dx_approx}")
# print(f"derivative using calculus {dg_dy(x,y)} approximation of derivative {dg_dy_approx}")

## Exercise 2

Let's create a Python class called `Value` that can store a scalar value. Define an addition function that allows you to add two Value objects together.  You'll also keep a placeholder to store, as a Python `set`, the `Value` objects (if any) that are used to compute the scalar stored in the `Value` object (e.g., if a `Value` object is computed as the sum of two other `Value` objects, you'd want to keep track of that in this set).  This might be a bit confusing at first, so we'll give you some commented starter code.  Once you've defined your addition function, create a suitable test case (nothing formal just a few print statements) to make sure it's working.

In [None]:
class Value:
    def __init__(self, data, inputs=()):
        self.data = data
        self.inputs = set(inputs)

    def __add__(self, other):
        """ return a new `Value` object that represents the sum of `self` and `other` """
        pass

# write a test case here.

### Solution

In [None]:
# class Value:
#     def __init__(self, data, inputs=()):
#         self.data = data
#         self.inputs = set(inputs)

#     def __add__(self, other):
#         return Value(self.data + other.data, (self, other))

# x = Value(3)
# y = Value(7)
# z = x + y
# print(z.data)       # should be 10

## Exercise 3

Add a multiplication function to your `Value` class.  You can copy-paste your code from exercise 2 into the cell below so you  have a record of how your code was for exercise 2.  In order to use Python's syntax `a*b`, you'll want to call your multiplication function `__mul__`.

Create a suitable test case to make sure your new function is working properly.

### Solution

In [None]:
# class Value:
#     def __init__(self, data, inputs=()):
#         self.data = data
#         self.inputs = set(inputs)

#     def __add__(self, other):
#         return Value(self.data + other.data, (self, other))

#     def __mul__(self, other):
#         return Value(self.data * other.data, (self, other))

# q = Value(2) * Value(3)
# print(q.data)           # should be 6

## Exercise 4

Draw a dataflow diagram representing the expression $f = xy + yz$.  Work out the partial derivatives using the methods we learned about in assignment 5.

### Solution


<img src="https://raw.githubusercontent.com/olinml2024/notebooks/refs/heads/main/images/a7_dataflow_v2.png" width="50%"/>

\begin{align}
\frac{\partial f}{\partial x} &= \frac{\partial r}{\partial x} \frac{\partial f}{\partial r} \\
&= y \\
\frac{\partial f}{\partial y} &= \frac{\partial r}{\partial y} \frac{\partial f}{\partial r} + \frac{\partial s}{\partial y} \frac{\partial f}{\partial s} \\
&= x + z \\
\frac{\partial f}{\partial y} &= \frac{\partial s}{\partial z} \frac{\partial f}{\partial s} \\
&= y
\end{align}

## Exercise 5

In the code block below, we've provided you with some code to visualize dataflow diagrams created with your `Value` class.  
First, modify your `Value` class so it stores the name of the operation used to compute `Value` (e.g., + or *).  You should store this as a atttribute of your `Value` class called `op`.  Next, use your value class to generate the expression from exercise 4 and use the `draw_dot` function to visualize it.  Feel free to set the values of `x`, `y`, and `z` to whatever you want.  Make sure that the visualization of the dataflow diagram looks right.

In [None]:
from graphviz import Digraph

def trace(root):
    nodes, edges = set(), set()
    def build(v):
        if v not in nodes:
            nodes.add(v)
            for child in v.inputs:
                edges.add((child, v))
                build(child)
    build(root)
    return nodes, edges

def draw_dot(root, format='svg', rankdir='LR'):
    """
    format: png | svg | ...
    rankdir: TB (top to bottom graph) | LR (left to right)
    """
    assert rankdir in ['LR', 'TB']
    nodes, edges = trace(root)
    dot = Digraph(format=format, graph_attr={'rankdir': rankdir}) #, node_attr={'rankdir': 'TB'})

    for n in nodes:
        dot.node(name=str(id(n)), label = "{ data %.4f }" % (n.data), shape='record')
        if n.op:
            dot.node(name=str(id(n)) + n.op, label=n.op)
            dot.edge(str(id(n)) + n.op, str(id(n)))

    for n1, n2 in edges:
        dot.edge(str(id(n1)), str(id(n2)) + n2.op)

    return dot

### Solution

In [None]:
# from graphviz import Digraph

# def trace(root):
#     nodes, edges = set(), set()
#     def build(v):
#         if v not in nodes:
#             nodes.add(v)
#             for child in v.inputs:
#                 edges.add((child, v))
#                 build(child)
#     build(root)
#     return nodes, edges

# def draw_dot(root, format='svg', rankdir='LR'):
#     """
#     format: png | svg | ...
#     rankdir: TB (top to bottom graph) | LR (left to right)
#     """
#     assert rankdir in ['LR', 'TB']
#     nodes, edges = trace(root)
#     dot = Digraph(format=format, graph_attr={'rankdir': rankdir}) #, node_attr={'rankdir': 'TB'})

#     for n in nodes:
#         dot.node(name=str(id(n)), label = "{ data %.4f }" % (n.data), shape='record')
#         if n.op:
#             dot.node(name=str(id(n)) + n.op, label=n.op)
#             dot.edge(str(id(n)) + n.op, str(id(n)))

#     for n1, n2 in edges:
#         dot.edge(str(id(n1)), str(id(n2)) + n2.op)

#     return dot

# class Value:
#     def __init__(self, data, inputs=(), op=""):
#         self.data = data
#         self.inputs = set(inputs)
#         self.op = op

#     def __add__(self, other):
#         return Value(self.data + other.data, (self, other), '+')

#     def __mul__(self, other):
#         return Value(self.data * other.data, (self, other), '*')

# x = Value(7)
# y = Value(4)
# z = Value(2)
# q = x*y + z*z*x
# draw_dot(q)

## Exercise 6

Let's add a placeholder to store the partial derivative of some output expression with respect to the `data` stored in a `Value` object (let's call this partial derivative `grad` and store it as an attribute of your `Value` class).  In the `__init__` method of your class we'll set its initial value to 0.  We'll also update our visualization code to display `grad`.

We'll also define a function called `_backward` that will propagate the appropriate partial derivatives to each of `Value`'s inputs (take a look at the figure below to see the picture from day 6).

<img src="https://mermaid.ink/svg/eyJjb2RlIjoiZmxvd2NoYXJ0IFRCXG5pZDFbXCIkJGdyYWRfZiA9IDEgfn5-fiQkXCJdXG5pZDJbXCIkJGdyYWRfeCA9IFxcZnJhY3tcXHBhcnRpYWwgZn17XFxwYXJ0aWFsIHh9IGdyYWRfZn5-JCRcIl1cbmlkM1tcIiQkZ3JhZF95ID0gXFxmcmFje1xccGFydGlhbCBmfXtcXHBhcnRpYWwgeX0gZ3JhZF9mfn4kJFwiXVxuaWQ0W1wiJCRncmFkX3QgPSBcXGZyYWN7XFxwYXJ0aWFsIHh9e1xccGFydGlhbCB0fSBncmFkX3ggKyBcXGZyYWN7XFxwYXJ0aWFsIHl9e1xccGFydGlhbCB0fSBncmFkX3l-fn5-fn4kJFwiXVxuaWQxIC0tXCIkJFxcZnJhY3tcXHBhcnRpYWwgZn17XFxwYXJ0aWFsIHh9IGdyYWRfZn5-JCRcIi0tPiBpZDJcbmlkMSAtLVwiJCRcXGZyYWN7XFxwYXJ0aWFsIGZ9e1xccGFydGlhbCB5fSBncmFkX2Z-fiQkXCItLT4gaWQzXG5pZDIgLS1cIiQkXFxmcmFje1xccGFydGlhbCB4fXtcXHBhcnRpYWwgdH0gZ3JhZF94IH5-JCRcIi0tPiBpZDRcbmlkMyAtLVwiJCRcXGZyYWN7XFxwYXJ0aWFsIHl9e1xccGFydGlhbCB0fSBncmFkX3kgfn4kJFwiLS0-IGlkNCIsIm1lcm1haWQiOnsidGhlbWUiOiJkZWZhdWx0In19" width="50%"/>

This is probably a bit confusing, but we'll give you a template for the `_backward()` to show you what we mean and you can fill in the specifics.  Test your function by defining values to represent `x` and `y` and then computing `z = x*y`.  Kickoff the base case by setting `z.grad = 1`, then call `_backward` on the x, y, and z in an appropriate order (think about what order these have to be called in).  Finally, make sure the values of `x.grad`, `y.grad`, and `z.grad` are as expected by visualize your dataflow diagram using `draw_dot`.

**Important question to ponder:** In the template code, why did we define `_backward` using `+=` instead of `=`?

In [None]:
def draw_dot(root, format='svg', rankdir='LR'):
    """
    format: png | svg | ...
    rankdir: TB (top to bottom graph) | LR (left to right)
    """
    assert rankdir in ['LR', 'TB']
    nodes, edges = trace(root)
    dot = Digraph(format=format, graph_attr={'rankdir': rankdir}) #, node_attr={'rankdir': 'TB'})

    for n in nodes:
        dot.node(name=str(id(n)), label = "{ data %.4f | grad %.4f }" % (n.data, n.grad), shape='record')
        if n.op:
            dot.node(name=str(id(n)) + n.op, label=n.op)
            dot.edge(str(id(n)) + n.op, str(id(n)))

    for n1, n2 in edges:
        dot.edge(str(id(n1)), str(id(n2)) + n2.op)

    return dot

class Value:
    def __init__(self, data, inputs=(), op=""):
        self.data = data
        self.inputs = set(inputs)
        self.op = op
        self._backward = lambda : None
        # set this to 0, it will be updated by calling `_backward` on various Value objects
        self.grad = 0

    def __add__(self, other):
        out = Value(self.data + other.data, (self, other), '+')
        # note: this uses a feature of Python called inner functions
        # see: https://www.geeksforgeeks.org/python-inner-functions/
        def _backward():
            # you should assume that out.grad has already been computed
            self.grad +=  # fill in this expression!
            other.grad +=  # fill in this expression!
        out._backward = _backward
        return out

    def __mul__(self, other):
        out = Value(self.data * other.data, (self, other), '+')
        def _backward():
            # you should assume that out.grad has already been computed
            self.grad +=  # fill in this expression!
            other.grad +=  # fill in this expression!
        out._backward = _backward
        return out

# fill in some values for x and y, define z, and visualize your graph


### Solution

In [None]:
# def draw_dot(root, format='svg', rankdir='LR'):
#     """
#     format: png | svg | ...
#     rankdir: TB (top to bottom graph) | LR (left to right)
#     """
#     assert rankdir in ['LR', 'TB']
#     nodes, edges = trace(root)
#     dot = Digraph(format=format, graph_attr={'rankdir': rankdir}) #, node_attr={'rankdir': 'TB'})

#     for n in nodes:
#         dot.node(name=str(id(n)), label = "{ data %.4f | grad %.4f }" % (n.data, n.grad), shape='record')
#         if n.op:
#             dot.node(name=str(id(n)) + n.op, label=n.op)
#             dot.edge(str(id(n)) + n.op, str(id(n)))

#     for n1, n2 in edges:
#         dot.edge(str(id(n1)), str(id(n2)) + n2.op)

#     return dot

# class Value:
#     def __init__(self, data, inputs=(), op=""):
#         self.data = data
#         self.inputs = set(inputs)
#         self.op = op
#         self._backward = lambda : None
#         # set this to 0, it will be updated by calling `_backward` on various Value objects
#         self.grad = 0

#     def __add__(self, other):
#         out = Value(self.data + other.data, (self, other), '+')
#         # note: this uses a feature of Python called inner functions
#         # see: https://www.geeksforgeeks.org/python-inner-functions/
#         def _backward():
#             self.grad += out.grad * 1
#             other.grad += out.grad * 1
#         out._backward = _backward
#         return out

#     def __mul__(self, other):
#         out = Value(self.data * other.data, (self, other), '+')
#         def _backward():
#             self.grad +=  out.grad * other.data
#             other.grad +=  out.grad * self.data
#         out._backward = _backward
#         return out

# x = Value(7)
# y = Value(4)
# z = x*y
# z.grad = 1
# z._backward()
# x._backward()
# y._backward()
# draw_dot(z)

## Exercise 7

Add support to your `Value` class for the sigmoid function and log loss functions (log loss should take `self` and `y` as inputs where `y` is either 0 or 1 and represents the true class).  Come up with a suitable test case for your new functions and make sure that your implementation agrees with what you expect.  Make sure that the output and `grad` are computed appropriately (you'll have to call `_backward` in a suitable order).  Note: you should choose something pretty simple here. Avoid doing more than one operation in a single assignment as that will make it hard to properly call `_backward()` (i.e., don't do ``z =(x*y+z).sigmoid()`).  You can use `draw_dot` to verify your answer with some hand-done calculations.

### Solution

In [None]:
# from math import exp, log

# class Value:
#     def __init__(self, data, inputs=(), op=""):
#         self.data = data
#         self.inputs = set(inputs)
#         self.op = op
#         self._backward = lambda : None
#         self.grad = 0

#     def __add__(self, other):
#         out = Value(self.data + other.data, (self, other), '+')
#         def _backward():
#             self.grad += 1 * out.grad
#             other.grad += 1 * out.grad
#         out._backward = _backward
#         return out

#     def __mul__(self, other):
#         out = Value(self.data * other.data, (self, other), '+')
#         def _backward():
#             self.grad += other.data * out.grad
#             other.grad += self.data * out.grad
#         out._backward = _backward
#         return out

#     def sigmoid(self):
#         out = Value(1./(1+exp(-self.data)), (self,), 'sigmoid')
#         def _backward():
#             self.grad += out.grad * out.data * (1 - out.data)
#         out._backward = _backward
#         return out

#     def logloss(self, y):
#         # Note: self is yhat and y is the actual value
#         out = Value(-y.data*log(self.data) - (1-y.data)*log(1-self.data), (self, y), 'logloss')
#         def _backward():
#             self.grad += out.grad * (-y.data/self.data + (1-y.data)/(1-self.data))
#             y.grad += out.grad * (-y.data*log(self.data)+log(1-self.data))

#         out._backward = _backward
#         return out

# x = Value(3)
# z = x.sigmoid()
# y = Value(1.0)
# log_loss = z.logloss(y)
# print(f"{z.data}")       # this should be 1/(1+exp(-3)) = 0.9525741268224334
# print(f"{log_loss.data}")       # this should be -log(0.9525741268224334) = 0.04858735157374191(note: log is natural log)
# log_loss.grad = 1
# log_loss._backward()
# z._backward()
# y._backward()
# x._backward()
# draw_dot(log_loss)

## Exercise 8

Add the `backward` function that we provide to our `Value` class (or if you are super ambitious you can implement it by learning about the concept of [topological sorting](https://en.wikipedia.org/wiki/Topological_sorting), which will automatically ensure that the `_backward` functions are called in the correct order).  In your response, explain what would could happen if the `_backward` functions are not called in a suitable order.

Use the `backward` function to evaluate the partial derivatives of $f = xy + z^2x$ with respect to $x$, $y$, and $z$.  Instead of manually having to run `_backward` on each `Value` object, you can just run `backward` on `f`.

In [None]:
# add this to your `Value` class (make sure to fix the indentation)
def backward(self):
    # set our base case
    self.grad = 1
    # topological order all of the children in the graph
    topo = []
    visited = set()
    def build_topo(v):
        if v not in visited:
            visited.add(v)
            for input in v.inputs:
                build_topo(input)
            topo.append(v)
    build_topo(self)

    # go one variable at a time and apply the chain rule to get its gradient
    self.grad = 1
    for v in reversed(topo):
        v._backward()

# create the specified expression and test that the gradients are correct.

### Solution

In [None]:
# class Value:
#     def __init__(self, data, inputs=(), op=""):
#         self.data = data
#         self.inputs = set(inputs)
#         self.op = op
#         self._backward = lambda : None
#         self.grad = 0

#     def __add__(self, other):
#         out = Value(self.data + other.data, (self, other), '+')
#         def _backward():
#             self.grad += 1 * out.grad
#             other.grad += 1 * out.grad
#         out._backward = _backward
#         return out

#     def __mul__(self, other):
#         out = Value(self.data * other.data, (self, other), '+')
#         def _backward():
#             self.grad += other.data * out.grad
#             other.grad += self.data * out.grad
#         out._backward = _backward
#         return out

#     def sigmoid(self):
#         out = Value(1./(1+exp(-self.data)), (self,), 'sigmoid')
#         def _backward():
#             self.grad += out.grad * out.data * (1 - out.data)
#         out._backward = _backward
#         return out

#     def logloss(self, y):
#         # Note: self is yhat and y is the actual value
#         out = Value(-y.data*log(self.data) - (1-y.data)*log(1-self.data), (self, y), 'logloss')
#         def _backward():
#             self.grad += out.grad * (-y.data/self.data + (1-y.data)/(1-self.data))
#             y.grad += out.grad * (-y.data*log(self.data)+log(1-self.data))

#         out._backward = _backward
#         return out

#     def backward(self):
#         # set our base case
#         self.grad = 1
#         # topological order all of the children in the graph
#         topo = []
#         visited = set()
#         def build_topo(v):
#             if v not in visited:
#                 visited.add(v)
#                 for input in v.inputs:
#                     build_topo(input)
#                 topo.append(v)
#         build_topo(self)

#         # go one variable at a time and apply the chain rule to get its gradient
#         self.grad = 1
#         for v in reversed(topo):
#             v._backward()

# x = Value(3)
# y = Value(8)
# z = Value(2)
# f = x*y + z*z*x
# f.backward()
# print(f"df_dx {x.grad}")        # should be 12
# print(f"df_dy {y.grad}")        # should be 3
# print(f"df_dz {z.grad}")        # should be 12

## Exercise 9

Now, it's all going to come together!  In this exercise you will use your `Value` class to perform machine learning on the [Wisconsin Breast Cancern dataset](https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic), which is a classic dataset in machine learning.  In this dataset, you will use machine learning for medical diagnosis (in this case determining if a tumor is malignant or benign from a set of features derived from images of a tumor).

We have left several TODOs in the starter code below.  The high-level steps are listed below, and we have specified whether we have implemented these for you or you should implement them yourself.

1.  Load the data, scale the features, create a train / test split, and define a helper function for determining accuracy of the logistic regression on the test set (we provide this)
2.  Initialize the weights and bias term of the logistic regression to 0.0.
3.  Compute the total log loss of the model on the training set (you will do this).
4.  Call backward to populate the `.grad` (you will do this)
5.  Adjust the weights and bias using the computed gradient (you will do this).
6.  Reset the gradients for the next step (we provide this)
7.  Compute the accuracy of the model on the test set (we provide this)

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from math import exp

def get_test_accuracy(X_test, y_test, weights, bias):
    # compute the accuracy on the test set
    correct = 0
    for i in range(X_test.shape[0]):
        s = bias.data
        for j in range(X_train.shape[1]):
            s = s + weights[j].data * X_test[i,j]
        p = 1./(1+exp(-s))
        if p > 0.5:
            y_pred = 1
        else:
            y_pred = 0
        if y_pred == y_test[i]:
            correct += 1
    return correct

# see https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.33, random_state=3)
s = StandardScaler()
s.fit(X_train)
X_train = s.transform(X_train)
X_test = s.transform(X_test)
print(f"X_train.shape {X_train.shape} y_train.shape {y_train.shape} X_test.shape {X_test.shape}")
print(f"feature names={data.feature_names}")
# this defines the number of times to loop through the dataset
n_epochs = 10

# create a weight for each feature and set it to 0
weights = [Value(0.0) for _ in range(len(data.feature_names))]
# this is a constant value (like the y-intercept in linear regression)
bias = Value(0.0)

lam = 10**-2
correct = get_test_accuracy(X_test, y_test, weights, bias)
print(f"correct / total ({correct} / {X_test.shape[0]})")
for epoch in range(n_epochs):
    print(f"epoch {epoch}")
    total_loss = Value(0.0)
    # TODO: compute the total loss over the training set using your `Value` class
    # Hint: you'll want to look up how to loop over the rows and columns of a numpy array
    pass

    # TODO: call `.backward()` on total_loss to populate the `.grad` varaiables
    pass

    # TODO: adjust the weights and bias term by performing a step of gradient descent
    # using the provide step size `lam`
    pass

    # reset the grad variables for the next step
    for j in range(X_train.shape[1]):
        weights[j].grad = 0.0
    bias.brad = 0.0

    # print out the accuracy
    correct = get_test_accuracy(X_test, y_test, weights, bias)
    print(f"correct / total ({correct} / {X_test.shape[0]})")

### Solution

In [None]:
# from sklearn.datasets import load_breast_cancer
# from sklearn.model_selection import train_test_split
# from sklearn.preprocessing import StandardScaler
# from math import exp

# def get_test_accuracy(X_test, y_test, weights, bias):
#     # compute the accuracy on the test set
#     correct = 0
#     for i in range(X_test.shape[0]):
#         s = bias.data
#         for j in range(X_train.shape[1]):
#             s = s + weights[j].data * X_test[i,j]
#         p = 1./(1+exp(-s))
#         if p > 0.5:
#             y_pred = 1
#         else:
#             y_pred = 0
#         if y_pred == y_test[i]:
#             correct += 1
#     return correct

# # see https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html
# data = load_breast_cancer()
# X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.33, random_state=3)
# s = StandardScaler()
# s.fit(X_train)
# X_train = s.transform(X_train)
# X_test = s.transform(X_test)
# print(f"X_train.shape {X_train.shape} y_train.shape {y_train.shape} X_test.shape {X_test.shape}")
# print(f"feature names={data.feature_names}")
# # this defines the number of times to loop through the dataset
# n_epochs = 10

# # create a weight for each feature and set it to 0
# weights = [Value(0.0) for _ in range(len(data.feature_names))]
# # this is a constant value (like the y-intercept in linear regression)
# bias = Value(0.0)

# lam = 10**-2
# correct = get_test_accuracy(X_test, y_test, weights, bias)
# print(f"correct / total ({correct} / {X_test.shape[0]})")
# for epoch in range(n_epochs):
#     print(f"epoch {epoch}")
#     # compute the accuracy on the test set
#     correct = 0
#     for i in range(X_test.shape[0]):
#         s = bias.data
#         for j in range(X_train.shape[1]):
#             s = s + weights[j].data * X_test[i,j]
#         p = 1./(1+exp(-s))
#         if p > 0.5:
#             y_pred = 1
#         else:
#             y_pred = 0
#         if y_pred == y_test[i]:
#             correct += 1
#     total_loss = Value(0.0)
#     # TODO: compute the total loss over the training set using your `Value` class
#     for i in range(X_train.shape[0]):
#         s = bias
#         for j in range(X_train.shape[1]):
#             s = s + weights[j] * Value(X_train[i,j])
#         p = s.sigmoid()
#         loss = p.logloss(Value(y_train[i]))
#         total_loss = total_loss + loss

#     # TODO: call `.backward()` on total_loss
#     total_loss.backward()
#     # TODO: adjust the weights and bias term by performing a step of gradient descent
#     # using the provide step size `lam`
#     for j in range(X_train.shape[1]):
#         weights[j].data -= lam * weights[j].grad
#     bias.data -= lam * bias.grad

#     # reset the grad variables for the next step
#     for j in range(X_train.shape[1]):
#         weights[j].grad = 0.0
#     bias.brad = 0.0
#     correct = get_test_accuracy(X_test, y_test, weights, bias)
#     print(f"correct / total ({correct} / {X_test.shape[0]})")

## Exercise 10

What just happened?  Summarize in your own words what we did in this assignment.  How might the framework we developed help us build more complex machine learning models (e.g., like the ones show in the 3B1B video [But what is a neural network?](https://www.youtube.com/watch?v=aircAruvnKk)?