<img style="float: right;" src="../htwlogo.jpg">

# Exercise: using pytorch for non-linear optimization

**Author**: Dive into Deep Learning, adapted by _Erik Rodner_<br>
**Lecture**: Computer Vision and Machine Learning I

In the following exercise, we will dive into automatic differentiation and write
a simple gradient descent optimizer on simple functions. In particular, we try to solve
the linear equation system $\mathbf{A} w = b$ with $\mathbf{A}$ being a matrix and $w$ and $b$
corresponding vectors.


In [None]:
import torch
import sys
import os
%load_ext autoreload
%autoreload 2

sys.path.append(os.path.join("..", "utils"))

from torchutils import make_dot
import numpy as np

import matplotlib.pylab as plt


### Definition of the linear equation system

Let's define a matrix $\mathbf{A}$ and a vector $b$ for our linear equation system that we would like to solve.

Very advanced question: The matrix needs to have special properties, in order to make the iterative algorithm work - can you find out which ones? This is an advanced question out of scope for the lecture.

In [None]:
b = np.array([.1, .6, .9])
A = np.array([[ 3.98481808,  1.48175294, -0.53227922],
       [ 1.48175294,  1.98511523,  0.84631116],
       [-0.53227922,  0.84631116,  1.83430071]])

### Solving it the traditional way

Do you remember linear algebra and how to solve a linear equation system? Traditionally you would use the Gauss algorithm, which is also available in numpy. Another more instable way is to use the inverse matrix: $\mathbf{A}^{-1}b$

In [None]:
w = np.dot(A.T, b) # everything correct here?
w

In [None]:
np.linalg.norm(np.dot(A, w)-b)

### Solving the system with gradient descent

The following method we will implement is to use gradient descent to minimize the squared error loss $\epsilon(w) = \|\mathbf{A} w - b\|^2$ with respect to $w$. For gradient descent, we need one essential thing: the gradient. However, in our case, we do not compute this by hand but rather use pytorchs autograd mechanism.

First, let's define the required pytorch tensors

In [None]:
tA = torch.Tensor(np.array(A))
tb = torch.Tensor(np.array(b))
tw = torch.ones(3, requires_grad=True)
tw.requires_grad_(True)

### Ready, steady, go!

The following loop implements a gradient descent method on the corresponding quadratic loss for the linear equation system. Can you spot the errors here? You have to find a wrong statement and for performance reasons you need to add another one.

In [None]:
learning_rate = 0.1
losses = []
for i in range(1000):
    loss = torch.norm(torch.matmul(tA,tw))
    
    loss.backward()
    
    with torch.no_grad():
        print (f"Loss in iteration {i}: {loss}")
        losses.append(loss.data)
    
    with torch.no_grad():
        tw -= learning_rate * tw.grad    
    tw.grad.zero_()
    
    

### Visualize the solution and the optimization progress of the loss

In [None]:
print (f"Solution from the non-linear optimization: {tw.data}")
print (f"Solution from the algebraic method: {w}")

In [None]:
plt.plot(losses[:100])
plt.yscale("log")
plt.xlabel("steps")
plt.ylabel("loss")
plt.show()

### Show the graph again of the loss function

In [None]:
make_dot(loss, locals())