# Lab 1 - Linear Regression with PyTorch

## Objective

* To perform linear regression using PyTorch for understanding the link between linear models and neural networks.

**Suggested reading**: 
* What is PyTorch from [PyTorch tutorial](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py)

#### Assumptions: basic python programming and [Anaconda](https://anaconda.org/) installed.



## Why

[Linear regression](https://en.wikipedia.org/wiki/Linear_regression) is a fundamental problem in statistics and machine learning. Using PyTorch, a deep learing library, to do linear regression will help bridge simple linear models with complex neural networks.

## 1. PyTorch Installation and Basics

### Install-1: direct installation (e.g., on your own machine with full installation right)

#### Install [PyTorch](https://github.com/pytorch/pytorch) via [Anaconda](https://anaconda.org/)
`conda install -c pytorch pytorch`

When you are asked whether to proceed, say `y`

#### Install [torchvision](https://github.com/pytorch/vision)
`conda install -c pytorch torchvision`

When you are asked whether to proceed, say `y`

### Install-2: Set up Anaconda Python environment (e.g., on a university desktop)

On a university desktop, you may not have permission to install new packages on the main environment of Anaconda. Please follow the instructions below to set up a new environment. This is also recommended if you have different python projects running that may require different environments.

Open a command line terminal.

**Create a new conda environment with Python 3.6**<br>
`conda create -n mlai19 python=3.6 anaconda`

**Activate the conda environment `mlai19`**<br>
`activate mlai19` (Windows)<br>
`source activate mlai19` (Mac/Linux)<br>
You will see `(mlai19)` on the left indciating your environment

**Install Pytorch and Torchvision** (non-CUDA/GPU version for simplicity)<br>
`conda install pytorch torchvision cpuonly -c pytorch`<br>
If you have GPU, install the GPU version with command at [here](https://pytorch.org/)

**Start Jupyter notebook server**: `jupyter notebook`

### Tensor
Optional: Go over the first two modules of [PyTorch tutorial](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py), *What is PyTorch* and *Autograd*

`torch.Tensor` is 
a multidimensional array data structure (array). You may check out the full list of [tensor types](http://pytorch.org/docs/master/tensors.html) and various [tensor operations](https://pytorch.org/docs/stable/torch.html).


### Computational Graph
A computation graph defines/visualises a sequence of operations to go from input to model output. 

Consider a linear regression model $\hat y = Wx + b$, where $x$ is our input, $W$ is a weight matrix, $b$ is a bias, and $\hat y$ is the predicted output. As a computation graph, this looks like:

![Linear Regression Computation Graph](https://imgur.com/IcBhTjS.png)

PyTorch dynamically build the computational graph, for example
![DynamicGraph.gif](https://raw.githubusercontent.com/pytorch/pytorch/master/docs/source/_static/img/dynamic_graph.gif)

## 2. Linear Regression using PyTorch

Let us start right away with implementing linear regression in PyTorch to study PyTorch concepts closely. This part follows the [PyTorch Linear regression example](https://github.com/pytorch/examples/tree/master/regression) that trains a **single fully-connected layer** to fit a 4th degree polynomial.

### A synthetic linear regression problem

* Generate model parameters, weight and bias. The weight vector and bias are both tensors, 1D and 0D, respectively. 

In [None]:
import torch
import torch.nn.functional as F

POLY_DEGREE = 4
W_target = torch.randn(POLY_DEGREE, 1) * 5
b_target = torch.randn(1) * 5

In [None]:
print(W_target)
print(b_target)

* Next, define a number of functions to generate the input (variables) and output (target/response). 

In [None]:
def make_features(x):
    """Builds features i.e. a matrix with columns [x, x^2, x^3, x^4]."""
    x = x.unsqueeze(1)
    return torch.cat([x ** i for i in range(1, POLY_DEGREE+1)], 1)

def f(x):
    """Approximated function."""
    return x.mm(W_target) + b_target.item()

def poly_desc(W, b):
    """Creates a string description of a polynomial."""
    result = 'y = '
    for i, w in enumerate(W):
        result += '{:+.2f} x^{} '.format(w, len(W) - i)
    result += '{:+.2f}'.format(b[0])
    return result

def get_batch(batch_size=32):
    """Builds a batch i.e. (x, f(x)) pair."""
    random = torch.randn(batch_size)
    x = make_features(random)
    y = f(x)
    return x, y

* Define a simple(st) neural network, which is a **single fully connected** (FC) layer. See [`torch.nn.Linear`](https://pytorch.org/docs/master/nn.html#torch.nn.Linear)

In [None]:
fc = torch.nn.Linear(W_target.size(0), 1)
print(fc)

    This is a *network* with four input units, one output unit, with a bias term.
    
* Now generate the data. Let us try to get five pairs of (x,y) first to inspect.


In [None]:
sample_x, sample_y = get_batch(5)
print(sample_x)
print(sample_y)

    Take a look at the FC layer weights (randomly initialised)

In [None]:
print(fc.weight)

    Reset the gradients to zero, perform a forward pass to get prediction, and compute the loss

In [None]:
fc.zero_grad()
output = F.smooth_l1_loss(fc(sample_x), sample_y)
loss = output.item()
print(loss)

    Random did not give a good prediction. Let us do a backpropagation and update model parameters with gradients.

In [None]:
output.backward() 
for param in fc.parameters():  
    param.data.add_(-0.1 * param.grad.data)

    Check the updated weights and respective loss

In [None]:
print(fc.weight)
output = F.smooth_l1_loss(fc(sample_x), sample_y)
loss = output.item()
print(loss)

   * Now keep feeding more data until the loss is small enough. 

In [None]:
from itertools import count
for batch_idx in count(1):
    # Get data
    batch_x, batch_y = get_batch()

    # Reset gradients
    fc.zero_grad()

    # Forward pass
    output = F.smooth_l1_loss(fc(batch_x), batch_y)
    loss = output.item()

    # Backward pass
    output.backward()

    # Apply gradients
    for param in fc.parameters():
        param.data.add_(-0.1 * param.grad.data)

    # Stop criterion
    if loss < 1e-3:
        break


In [None]:
print('Loss: {:.6f} after {} batches'.format(loss, batch_idx))
print('==> Learned function:\t' + poly_desc(fc.weight.view(-1), fc.bias))
print('==> Actual function:\t' + poly_desc(W_target.view(-1), b_target))

## 3. Exercises


* Change the [loss function](https://pytorch.org/docs/stable/nn.html#loss-functions) to different choices and compare the results.
  
* Formulate another regression problem and solve it using `torch.nn`
* Compare the `torch.nn` solution against the closed-form solution
* Explore any other variations that you can think of to learn more


# Acknowledgement
Some part of this notebook is taken from the following sources

* [PyTorch tutorial from CSE446, University of Washington](https://courses.cs.washington.edu/courses/cse446/18wi/sections/section7/446_pytorch_tutorial.html)