# 1. Linear regression

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import pandas as pd
from torch.utils.data import TensorDataset, DataLoader
from torchvision import transforms, utils
import torch.nn as nn
import torch.nn.functional as F
import csv

In [None]:
data = pd.read_csv("Animals.csv", )
print(data.shape)

#### Have you ever wondered what is the relation between brain and body weights among various animal species?
We will try to find a solution to this problem.

In [None]:
data

Let's look at the graph:

In [None]:
data.plot.scatter(x="BodyWeight(kg)", y="BrainWeight(kg)")

At first glance it does not resemble any particular dependance. However, if we change the scale smoatehing interesting can be spotted:

In [None]:
data.plot.scatter(x="BodyWeight(kg)", y="BrainWeight(kg)", logx=True, logy=True)
#plt.scatter(np.log(data['BodyWeight(kg)']), np.log(data['BrainWeight(kg)']))

We can say that the data points form a line. Then, we can try to find the equation of that line.

That is why we need linear regression.

### 1.0 Linear equation

Let's consider two sets of numbers:

In [None]:
x = [1, 2, 3, 4]
y = [3.2, 5.1, 6.9, 9.3]

plt.scatter(x, y)
plt.show()

On the scatterplot it can be easly seen that the relationship between presented data is almost linear. It occurs that lots of dependencies in the actual world can be described just by fitting a linear equation to the observed data. We will try to apply the equation:

$$ y = w_0 + w_1x $$

to the analysed dataset. The only problem is how to find $w_0$ and $w_1$.

### 1.1 Loss function

We will try to somehow measure if the coefficients in the equation are good enough to describe our problem. In order to do it we will define a loss function - an equation that will tell us how much our approximation differs from the expected output. 

The loss function should:
* depend only on the coefficients of the model, expected output and our approximation,
* shrink if our approximation is becomming better and grow if it gets worse.

When it comes to linear regression the most common approach is the least-squares loss function. We will calculate the average square of the vertical deviations from each data point to the line. Since we first square the dviations, it does not matter if the data point is above or below the line. 


$$y^{ pred}_{i} = w_0 + w_1x_{i} $$
$$L=\frac{1}{N}\sum_{i=0}^N( y^{pred}_{i} - y_{i})^2 $$

We will get random $w_0$ and $w_1$ and apply our loss function.

In [None]:
W = torch.randn(1,2)
X = torch.tensor([[1.0,1.0,1.0,1.0], x])
Y = torch.tensor(y)
print(W)
print(X)
print(Y)

In [None]:
def Y_pred(W, X):
     return torch.mm(W,X) #matrix multiplication
Y_pred(W, X)

In [None]:
def Loss(W, X, Y):
    y_pred = Y_pred(W, X)
    return torch.mean((y_pred-Y)*(y_pred-Y))
Loss(W, X, Y)

### 1.2 Minimazing loss function

You will find more detailed explanation of what will happen here in the chapter "Gradient descent".

Since we have defined the loss function, we should minimize it. By doing so we will step by step rotate and move the line, so it will reflect the actual location of data points. In order to do it we need to repeatedly shift the weights till we find a minimum of the loss function. What we need is a mathematical operation that will tell us how the loss function will change, if we increase or decreas $w_0$ and $w_1$. The operation we are looking for is partial derivative.

$$\dfrac{\partial L}{\partial w_0} = \frac{2}{N}\sum_{i=0}^N (y^{pred}_{i} -y_{i})$$


$$\dfrac{\partial L}{\partial w_1}  = \frac{2}{N}\sum_{i=0}^N (y^{pred}_{i} -y_{i}) \cdot x_{i}$$ 

In [None]:
def dL_dw0 (W, X, Y):
    y_pred = Y_pred(W, X)
    return 2*torch.mean(y_pred - Y)
dL_dw0(W, X, Y)

In [None]:
def dL_dw1 (W, X, Y):
    y_pred = Y_pred(W, X)
    return 2*torch.mean((y_pred - Y)*X[1])
dL_dw1(W, X, Y)

Two more thing we have to specify is **learning\_rate** - hyperparamiter that will define how much the value of the derivative will influance the change of $w_0$ and $w_1$ and **num\_epochs** - hyperparameter defining how many iterations it will take to sufficiently minimiaze the loss function.

In [None]:
def gradient_step(W, X, Y, learnig_rate):
    W[0][0]-=learnig_rate*dL_dw0(W, X, Y)
    W[0][1]-=learnig_rate*dL_dw1(W, X, Y)
    return W

In [None]:
def minimise_loss_function(W, X, Y, learning_rate, num_epochs):
    loss_history = []
    for i in range(num_epochs):
        W = gradient_step(W, X, Y, learning_rate)
        loss_history.append(Loss(W, X, Y))
    return W, loss_history

In [None]:
print(W)

learning_rate = 0.1
num_epochs = 30
W_trained, loss_history = minimise_loss_function(W, X, Y, learning_rate, num_epochs)

print(W_trained, loss_history)
plt.plot(list(range(epoch)), loss_history)

In [None]:
w_0 = W.numpy()[0][0]
x_a = np.array(x)
plt.scatter(x, y)
plt.plot(x_a, w_1*x_a+w_0)
plt.show()

And that is how we find the propper line!

### 1.3 Linear regression using PyTorch

Knowing how linear regression works, let's come back to the relation between body and brain weights. This time we will use built-in PyTorch functions.

Firstly, we need to prepare the data.

In [None]:
X = torch.tensor(np.log(data['BodyWeight(kg)']))
Y = torch.tensor(np.log(data['BrainWeight(kg)']))
X = X.view(1, 27, 1)
Y = Y.view(1, 27, 1)

Instead of initializing the coefficients manually, we can define the model using a built in class. Since both input and output in the analysed problem have only one dimension we set **(1,1)** as arguments of **nn.Linear**.

In [None]:
model = nn.Linear(1, 1)
print(model.weight)
print(model.bias)

Insted of **gradient\_step** function, we will define an **optimizer** with learning rate and built-int **loss function**.

In [None]:
optim = torch.optim.SGD(model.parameters(), lr=0.001)

In [None]:
loss_function = F.mse_loss
loss = loss_function(model(X), Y)
print(loss)

Now we can train the model - minimise the loss function.

In [None]:
def train(num_epochs, X, Y, model, loss_function, optim):
    for epoch in range(num_epochs):
        Y_pred = model(X)
        loss = loss_function(Y_pred, Y)
        
        loss.backward()
        optim.step()
        optim.zero_grad()
        
    print('Training loss: ', loss_function(model(X), Y))

train(5000, X, Y, model, loss_function, optim)

Let's see if we fitted the line properly!

In [None]:
X = X.view(27)
Y = Y.view(27)

In [None]:
plt.scatter(X, Y)
plt.plot(X.numpy(),model.weight.item()*X.numpy()+model.bias.item())
plt.show()

We have found the relation between brain and body weights among various animal species. We also discussed how linear regression works, applied it step by step and using PyTorch built-ins.

### Here are some interesting websites on the subject of linear regression:


* [Linear regression](http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm)
* [Ordinary Least Squares Regression-Explained Visually](http://setosa.io/ev/ordinary-least-squares-regression/)
* [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient)
