# Logistic Regression (binary classification)


## Hypothesis space
- Samples of dimension $ d $ 
- Linearly separable  
$ \implies $ 
  - $h_w(x)$ produces a linear discriminant of dimension $ d-1 $ (eg with two features we'll have a line) 
  - linear discriminant expressed as $ w^Tx + b = 0$
  - if $ x_0 = 1 $ for every sample and $ w_0 = b $ then it reduces to $ w^Tx = 0$
- Specifically we use the sigmoid function: $h_w(x) = g(w^Tx) = \frac{1}{1 + e^{-w^Tx}}$

**So how do we predict**
- Supposing we have two classes, 0 and 1  
- By imposing a threshold at - let's say - 0.5
  - If $g(w^Tx) \geq 0.5$ then we predict class 1
  - If $g(w^Tx) < 0.5$ then we predict class 0

**Role of the bias?**
- Since the bias doesn't interact directly with the features, it simply shifts the prediction even before it happens

In [None]:
import numpy as np
import matplotlib.pyplot as plt
def logisticFunction(t):
    return 1/(1+np.exp(-t))

t = np.arange(-5,5,0.01)
y = logisticFunction(t)

plt.plot(t,y); plt.grid()

## How do we fit $w$ ?

### Likelihood 
- $ l(w) = p(y|x) = (h_w(x))^y * (1-h_w(x))^{1-y} $ where $y$ can be either $0$ or $1$
- In words, given certain features, $y$ is $1$ with probability $h_w(x)$ and $0$ with probability $1-h_w(x)$

**Using maximum likelihood**
- At each step we update the weights in the direction of the gradient of the likelihood function, maximising it  
- Udate rule : $w_j = w_j + \alpha \frac{\partial l}{\partial w_j} = w_j + \alpha (y - h_w(x))x_j $ 
- We would reach the same result by considering a squared error as loss function

### Loss function
$ L(w) = \begin{cases} -log(h_w(x)) & \text{if} & y = 1 \\ -log(1 - h_w(x)) & \text{if} & y = 0 \end{cases}$  
OR ALSO  
$ L(w) = y(-log(h_w(x))) + (1-y)(-log(1 - h_w(x))) $ 

In [None]:
from ng.lab_utils_common import plotTwoLogLosses
plotTwoLogLosses()

In [None]:
from toolbox.datamodule import *
import torch
# https://raw.githubusercontent.com/jbrownlee/Datasets/master/haberman.names
# https://archive.ics.uci.edu/datasets

# load the dataset
# url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/haberman.csv'
# features = ["age","opYear","nodes","class"]
# dataset = DataLoader(url, features, num_train = 256, num_val = 50, batch_size = 16)
# dataset.summarize()
# print(dataset.dataframe)

In [None]:
from toolbox.datamodule import *
import pandas as pd

# load the dataset
dataframe = pd.read_csv("../data/diabetes_prediction_dataset.csv")
dataframe.head()

# dataset = DataLoader(path = "diabetes_prediction_dataset.csv", num_train = 256, num_val = 50, batch_size = 16)
# dataset.summarize()
# print(dataset.dataframe)

# Ng Example

In [1]:
from toolbox.base_models import *
from toolbox.trainer import *
from toolbox.datamodule import *

data = DataLoader(txtfile = "../data/log_reg.txt")
print("First five elements in X_train are:\n", data.X[:5])
print("Type of X_train:",type(data.X))
data.summarize()

model = LogisticRegressionScratch(input_dim = 2, lr=0.001)
# model = LogisticRegression(3, lr=0.01)
trainer = Trainer(max_epochs = 10000)
trainer.fit(model,data)

First five elements in X_train are:
 tensor([[34.6237, 78.0247],
        [30.2867, 43.8950],
        [35.8474, 72.9022],
        [60.1826, 86.3086],
        [79.0327, 75.3444]])
Type of X_train: <class 'torch.Tensor'>
N Examples: 100
N Inputs: 2
N Classes: 2
Classes: [0. 1.]
 - Class 0.0: 40 (40.0)
 - Class 1.0: 60 (60.0)
Loss at epoch 1,batch 1: 4.782772541046143

Loss at epoch 2,batch 1: 0.9942498207092285

Loss at epoch 3,batch 1: 0.4216228723526001

Loss at epoch 4,batch 1: 0.30580687522888184

Loss at epoch 5,batch 1: 0.30547741055488586

Loss at epoch 6,batch 1: 0.30546554923057556

Loss at epoch 7,batch 1: 0.3054640591144562

Loss at epoch 8,batch 1: 0.30546292662620544

Loss at epoch 9,batch 1: 0.30546197295188904

Loss at epoch 10,batch 1: 0.305461049079895

Loss at epoch 11,batch 1: 0.30546021461486816

Loss at epoch 12,batch 1: 0.3054594397544861

Loss at epoch 13,batch 1: 0.3054587244987488

Loss at epoch 14,batch 1: 0.30545803904533386

Loss at epoch 15,batch 1: 0.30545741

# Penguins

In [None]:
import tensorflow_datasets as tfds
import torch
# You can see the load documentation here: https://www.tensorflow.org/datasets
# The dataset itself is described here: https://www.tensorflow.org/datasets/catalog/penguins
penguins = tfds.load('penguins', as_supervised=True, split='train')

In [None]:
# By default, the Dataset object is an iterator over the elements.
# The instructions below extract the underlying tensors.
X, y = penguins.batch(500).get_single_element() #gets first 500 elements
X, y = X.numpy(), y.numpy()
# One row is an example, one column a feature of the input.
X.shape, X[0]

In [None]:
# We split in a training set and a test set using the train_test_split utility from sklearn.
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, stratify=y)
Xtrain = torch.from_numpy(Xtrain).float()
Xtest = torch.from_numpy(Xtest).float()
ytrain = torch.from_numpy(ytrain).long()
ytest = torch.from_numpy(ytest).long()

# Perceptron (binary classification)

## Hypothesis space
- Same linear discriminant as logistic regression
- But the $h_w(x)$ is different, namely

 $ h_w(x) = g(w^Tx) = \begin{cases} 1 & \text{if} & w^Tx \geq 0 \\ 0 & \text{if} & w^Tx < 0 \end{cases}$ 

- So perceptron is basically logistic regression with a linear decision boundary

**So how do we predict?**

- We predict by projecting the samples on the vector of weights and looking at the sign of that number $wx$
- Looking at the sign and deciding is the actication function (which is nonlinear)

## How do we fit $w$ ?
- We can use squared loss $ E = \frac{1}{2} \sum_{i=1}^n (y_i - wx_i)^2 $

**Using gradient descent**
- At each step we update the weights in the direction of the negative gradient of the loss function
- Same update rule as with logistic regression
