# Lab1 - Introduction + Neural Networks + PyTorch recap

## Plan for today

1. Get to know course rules, timetable, etc.
2. Briefly recap our ML knowledge:
    * implement basic logistic regression from scratch
    * get (re)accustomed with PyTorch

## 1. Course logistics

Let's go over [the course page](https://github.com/gmum/wzum-22) on GitHub.

## 2. Logistic regression from scratch

We will tackle the problem of **classification**, i.e. prediction of a discrete value (class):

$$
f(x) = y, y \in \{0...N\}
$$

The most basic variant of this is **binary** classification: $y \in \{0, 1\}$. We will focus on that for the time being.

**Logistic regression** is a model which predicts the probability that a given example belongs to the class 1:

$$
g(x) = \hat{p}(y = 1 | x )
$$

**Questions for you:**
* what is the probability that $y=0$?
* in the multi-class case, how many outputs will the model have?
* what conditions must the model outputs satisfy?

As an example, we will work with a breast cancer prediction dataset.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from typing import Tuple
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.model_selection import train_test_split
import torch
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt


In [None]:
print(load_breast_cancer().DESCR)

In [None]:
np.random.seed(0)
X, y = load_breast_cancer(return_X_y=True)
X_train, X_val, y_train, y_val= train_test_split(X, y, train_size=0.9)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)


print("array shapes", [t.shape for t in [X_train, X_val, y_train, y_val]])
print("y values", np.unique(y_train), np.unique(y_val) )


### Linear vs logistic regression

We need to transform a vector of 30 features into a value $\in (0,1)$. Can we use linear regression for that?

![classification_regression](https://raw.githubusercontent.com/aghbit/BIT_AI/master/3_logistic_regression/img/clas_reg.png)

Recall that in linear regression $f(x) \in \mathbb{R} $ is defined as:
$$
f(x) = w^T x + b
$$

Where $w, b$ are trainable parameters.

In logistic regression, we will need to squash the output, so that $f(x) \in [0,1]$. A convenient way to do this is the **sigmoid** function:

$$\sigma(x) = \frac{1}{1+e^{-x}}$$

In [None]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

for x in [
   -np.inf,
    0,
    1,
    np.inf
]:
    print(f"sigmoid({x}) = {sigmoid(x)}")

x = np.linspace(-10, 10)

plt.plot(x, sigmoid(x))
plt.grid(True)
plt.show()

To sum it up, in case of binary classification:
$$
\hat{p}(y=1 | x) = \sigma(w^Tx + b)
$$

What about the loss function which can train such a model?

We'll use a logarithmic loss function which quite nicely captures an intuition, that we want the predictions datapoints which should be predicted as $0$ as close to $0$ as possible, and, analogically, predictions which should be $1$, as close to $1$ as possible:

$$ L = \frac{-1}{n}\Big(\sum_{i=0}^n y^{(i)}\log{f(x^{(i)})} + (1-y^{(i)})\log{(1-f(x^{(i)}))} \Big)$$

This function is called **Binary Cross-Entropy**

In [None]:
x = np.linspace(1e-6, 1 - (1e-6), 1000)

plt.plot(x, -np.log(1 - x), label="loss when y = 0")
plt.plot(x, -np.log(x), label="loss when y = 1")
plt.legend()
plt.show()

### Task for you: implement the binary cross-entropy function.

In [None]:
def binary_cross_entropy(x, y, w, b) -> float:
    """
    All arguments are numpy arrays with shapes:
        x: [N, F]
        y: [N]
        w: [F, 1]
        b: [1]

    Returns:
        The value of binary cross-entropy (a single number).
    """
    ...

In order to train the parameters $w, b$ of our model, we need to calculate the gradients of loss with regard to those parameters:

$$ \frac{\partial L}{\partial w} \text{ and } \frac{\partial L}{\partial b} $$


### Task for you: implement the function which calculates $\frac{\partial L}{\partial w}$ and $\frac{\partial L}{\partial b}$:

In [None]:
def calculate_gradients(x, y, w, b) -> Tuple[np.ndarray, np.ndarray]:
    """
    All arguments are numpy arrays with shapes:
        x: [N, F]
        y: [N]
        w: [F, 1]
        b: [1]

    Returns:
        The gradients of loss L with regard to `w` and `b`. Their shapes should be identical to the shapes of `w` and `b`, respectively.
    """
    ...

With the gradient calculation implemented, we should be able to train our model with the **Gradient Descent** method.

In [None]:
w = np.random.randn(30, 1) 
b = np.random.randn(1)
# how should we initialize w and b?

lr = 1e-1
# how big should be the learning rate?

history = []

for epoch in range(100):
    
    y_pred_train = ... # calculate predictions based on x, w, and b (don't forget sigmoid!)
    y_pred_train = (y_pred_train > 0.5).astype(int)
    train_accuracy = (y_pred_train.reshape(-1) == y_train).mean()
    
    y_pred_val = ... # calculate predictions based on x, w, and b (don't forget sigmoid!)
    y_pred_val = (y_pred_val > 0.5).astype(int)
    val_accuracy = (y_pred_val.reshape(-1) == y_val).mean()
    
    l_train = binary_cross_entropy(X_train, y_train, w=w, b=b)
    dw, db = calculate_gradients(X_train, y_train, w=w, b=b)

    w = w - (dw*lr)
    b = b - (db*lr)
    
    elem = dict(epoch=epoch, loss=l_train, train_accuracy=train_accuracy, val_accuracy=val_accuracy)
    history.append(elem)



plt.plot([h["epoch"] for h in history], [h["loss"] for h in history], label="loss")
plt.legend()
plt.show()

plt.plot([h["epoch"] for h in history], [h["train_accuracy"] for h in history], label="train acc")
plt.plot([h["epoch"] for h in history], [h["val_accuracy"] for h in history], label="val acc")
plt.legend()
plt.show()

## 3. Logistic regression with PyTorch automatic differentiation / loss / optimization tools

In [None]:
from torch.optim import SGD
from torch import nn
import torch

### Task for you: initialize a logistic regression PyTorch model and train it

In [None]:
model = nn.Sequential(
    # we need two pytorch layers for a basic logistic regression model
)
opt = SGD(model.parameters(), lr=1e-1)
loss_fn = nn.BCELoss()

print({
    name: p.shape
    for (name, p) in model.named_parameters()
})

In [None]:
history = []

for epoch in range(100):
    opt.zero_grad()
    
    ### YOUR CODE HERE ###
    y_pred_train = ... # make predictions for train set
    y_pred_val = ... # make_predictions for val set
    
    # calculate loss
    l_train = ...
    # calculate gradients with respect to l_train
    
    # perform the optimization step with the optimizer
    
    #######################
    
    train_accuracy = ((y_pred_train.detach().numpy() > 0.5) == y_train).mean()
    val_accuracy = ((y_pred_val.detach().numpy() > 0.5) == y_val).mean()

    elem = dict(epoch=epoch, loss=l_train.item(), train_accuracy=train_accuracy, val_accuracy=val_accuracy)
    history.append(elem)
    


    
plt.plot([h["epoch"] for h in history], [h["loss"] for h in history], label="loss")
plt.legend()
plt.show()

plt.plot([h["epoch"] for h in history], [h["train_accuracy"] for h in history], label="train acc")
plt.plot([h["epoch"] for h in history], [h["val_accuracy"] for h in history], label="val acc")
plt.legend()
plt.show()

## 4. FashionMNIST: a bigger task

Let's now train a neural net on a more challenging, multi-label FashionMNIST task.

In [None]:
from torchvision.datasets import FashionMNIST
from torchvision import transforms as tv
from torch.utils.data import DataLoader
from sklearn.metrics import accuracy_score

### Task: load the dataset and visualize some examples along with their classes
* what is the shape of a single example from the dataset?

In [None]:
ds = FashionMNIST('./data', train=True, target_transform=None, download=True, transform=tv.ToTensor()) # transform the data from PIL image to a tensor
ds_test = FashionMNIST('./data', train=False, target_transform=None, download=True, transform=tv.ToTensor()) # transform the data from PIL image to a tensor

In [None]:
batch_size=64

train_dl = DataLoader(ds, batch_size, shuffle=True)
valid_dl = DataLoader(ds_test, batch_size, shuffle=True)

### Task: implement and train a neural network with two linear layers and ReLU activation between them

In [None]:
class FashionNN(nn.Module):
    def __init__(self):
        super().__init__()
        
        # initialize the layers of the network
        # what is the size of the input?
        # what is the output size?
        
    def forward(self, x):
        # process x through:
        # 1) first layer
        # 2) relu activation
        # 3) second layer

In [None]:
net = FashionNN() # actually initialize the net
loss_fn = ... # what loss do we use for multilabel classification?
opt = torch.optim.Adam(net.parameters()) 

In [None]:
number_of_epochs = 10

for i in range(number_of_epochs):
    train_loss = 0
    for iteration, (X_train, y_train) in enumerate(train_dl):
        
        # perform optimization on the train set and calculate the total train loss
        ...
        
        
    val_loss = 0
    y_predicted = []
    y_true = []

    with torch.no_grad():
        for iteration, (X_val, y_val) in enumerate(valid_dl):
            # perform predictions on the validation set and gather them to calculate accuracy
            
            y_pred = net(X_val)
            loss = loss_fn(y_pred, y_val)
            val_loss += loss.item()
            y_pred = y_pred.argmax(dim=1)
            y_true.extend(y_val.numpy())
            y_predicted.extend(y_pred.numpy())
    
            
    val_acc = accuracy_score(y_true, y_predicted)
    print(f'#Epoch: {i}, train loss: {train_loss}, val loss: {val_loss}, val_acc: {val_acc}')
    