<a href="https://colab.research.google.com/github/osipov/edu/blob/master/pyt0/Solution_Autodiff_Algorithm.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg"/></a>

## Import the __`torch`__ package

In [0]:
import torch as pt
pt.__version__

'1.4.0'

In [0]:
class Scalar:    
    def __init__(self, val):
        self.val = val
        self.grad = 0.
        self.backward = lambda: None
        
    def __repr__(self):
        return f"Value: {self.val}, Gradient: {self.grad}"
    
    def __add__(self, other):
        result = Scalar(self.val + other.val)
        return result

    def __mul__(self, other):
        result = Scalar(self.val * other.val)
        return result        

## Create a `Scalar` instance for `x = 2.0`

In [0]:
x = Scalar(2.)
x

Value: 2.0, Gradient: 0.0

## Define `y = x` 

In [0]:
y = x

## Prepare for and call `backward` on `y`
* Use floating point values
* Zero out the accumulating gradients
* Initialize $ \frac{\partial y}{ \partial y} $


In [0]:
x.grad = 0.
y.grad = 1.
y.backward()

* check that $ \frac{\partial y}{ \partial x} = 1.0 $

In [0]:
x.grad

1.0

## Self-check

* Why the did the implementation return the correct answer?

## Implement `backward` support in the` __add__` function
* **hint:** given $ y = a + b $, you need to update `a.grad` and `b.grad` to accumulate $ \frac{\partial y}{ \partial a} $ and $ \frac{\partial y}{ \partial b} $ respectively.



* **hint:** don't forget to make the recursive `backward` call

In [0]:
class Scalar:    
    def __init__(self, val):
        self.val = val
        self.grad = 0.
        self.backward = lambda: None
        
    def __repr__(self):
        return f"Value: {self.val}, Gradient: {self.grad}"
    
    def __add__(self, other):
        result = Scalar(self.val + other.val)
        def backward():
            self.grad += result.grad
            other.grad += result.grad
            self.backward(), other.backward()
        result.backward = backward
        return result

    def __mul__(self, other):
        result = Scalar(self.val * other.val)
        return result
        

## Define `y = 3 * x` for `x = 3.0`
* **hint:** recall that $ 3 * x = x + x + x $

In [0]:
x = Scalar(3.0)
y = x + x + x

## Prepare for and run the backward pass

In [0]:
x.grad = 0.
y.grad = 1.
y.backward()

* check that $ \frac{\partial y}{ \partial x} = 3.0 $

In [0]:
x.grad

3.0

## Implement `backward` support in the` __mul__` function
* **hint:** given $ y = c * x$, $ \frac{\partial y}{ \partial x} = c $

In [0]:
class Scalar:    
    def __init__(self, val):
        self.val = val
        self.grad = 0.
        self.backward = lambda: None
        
    def __repr__(self):
        return f"Value: {self.val}, Gradient: {self.grad}"
    
    def __add__(self, other):
        result = Scalar(self.val + other.val)
        def backward():
            self.grad += result.grad
            other.grad += result.grad
            self.backward()
            other.backward()
        result.backward = backward
        return result

    def __mul__(self, other):
        result = Scalar(self.val * other.val)
        def backward():
            self.grad += other.val * result.grad
            other.grad += self.val * result.grad
            self.backward()
            other.backward()
        result.backward = backward
        return result

## Use `y = x^3 + 2*x` for `x = 4.0`
* **hint:** recall that $ x^3 = x * x * x $

In [0]:
x = Scalar(4.0)
y = x * x * x + x + x

* given $ y = x^3 + 2x $ the analytical solution to $ \frac{\partial y}{ \partial x} = 3x^2+2 $
* check that your implementation of `Scalar` returns the correct value of $ \frac{\partial y}{ \partial x} $ when $ x = 4.0 $

In [0]:
x.grad = 0
y.grad = 1
y.backward()

In [0]:
x.grad

50.0

## Apply `Scalar` to linear regression
* set the random seed to `0`
* randomly init the model parameter `w`

In [0]:
pt.manual_seed(0)
w = Scalar(pt.randn(1).item())
w

Value: 1.5409960746765137, Gradient: 0.0

## Make linear regression data


In [0]:
ptX = pt.linspace(-5, 5, 100)
pty = 5 * ptX + pt.randn(len(ptX))

X = [Scalar(x.item()) for x in ptX]
y = [Scalar(y.item()) for y in pty]

## Implement a `forward` function using `w`
* **hint:** the function should return $ w * X $

In [0]:
def forward(w, X):
    return [w * X[i] for i in range(len(X))]

## Implement the mean squared error calculation
* **hint:** Python `sum` can use a starter value as the 2nd argument

In [0]:
def loss(y_pred, y):
  error = [y_pred[i] + Scalar(-1.) * y[i] for i in range(len(y))]
  squared_error = [error[i] * error[i] for i in range(len(y))]
  mean_squared_error = sum(squared_error, Scalar(0)) * Scalar(1.0 / len(y))
  return mean_squared_error

## Confirm that gradient descent reduces MSE

In [0]:
LEARNING_RATE = 0.03

for _ in range(20):
    
    y_pred = forward(w, X)
    
    mse = loss(y_pred, y)  
    
    w.grad = 0.
    mse.grad = 1.
    mse.backward()
    
    w.val -= LEARNING_RATE * w.grad
    
    print(mse.val)

1.0128978723013213
1.0128978716163717
1.0128978714237493
1.0128978713695798
1.0128978713543464
1.0128978713500625
1.0128978713488572
1.0128978713485186
1.0128978713484234
1.012897871348397
1.0128978713483887
1.0128978713483872
1.0128978713483867
1.012897871348386
1.0128978713483858
1.0128978713483863
1.0128978713483856
1.0128978713483863
1.0128978713483863
1.0128978713483863


In [0]:
w.val

4.963128189963196

## Compare the value of `w` to the analytical solution
* the ordinary least squares solution is $ (X^TX)^{-1}X^Ty $


* **self-check:** why are neither of the values equal to `5`?

In [0]:
(pt.pow(ptX.T @ ptX, -1) * ptX.T @ pty).item()

4.963138580322266

Copyright 2020 CounterFactual.AI LLC. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.