## Logistic Regression

Simple classification.

It is basically a one-layer NN with one output neuron with a sigmoid as a activation function. It calculates:

$p(y=1|x)$

if $p >= 0.5$ then y = 1

if $p < 0.5$ then y = 0

In [350]:
import torch as T
import numpy as np
from sklearn.datasets import load_iris

### Load data

I will only use the first 2 classes of the iris dataset to make it a simple binary classification.

In [351]:
X, y = load_iris(return_X_y=True)
X = X[:np.where(y == 2)[0][0]]
y = y[:np.where(y == 2)[0][0]]
print(X.shape)
print(y.shape)

(100, 4)
(100,)


In [352]:
X = T.Tensor(X)
X = (X-X.mean(axis=0))/X.std(axis=0)
y = T.Tensor(y)

## Log Loss / Binary Cross Entropy Loss

$H(p,q)\ =\ -\sum_ip_i\log q_i\ =\ -y\log\hat{y} - (1-y)\log(1-\hat{y})$

In [353]:
def loss(y_hat, y):
    return ((-y*T.log(y_hat))-((1-y)*T.log(1-y_hat))).sum()

In [354]:
w = T.randn((5,1),requires_grad=True)
b = T.zeros((100,1), requires_grad=True)
o = T.ones((100,1))
X = T.cat([T.Tensor(X), o],axis=1)

In [355]:
def sigmoid(x):
    return 1 / (1 + T.exp(-x))

In [356]:
epochs = 50
lr = 1e-2

for e in range(epochs):
    y_hat = sigmoid(X@w + b)
    l = loss(y_hat, y)
    print(l.data)
    l.backward()
    
    with T.no_grad():
        w -= T.clamp(w.grad, -3.5, 3.5) * lr
        b -= T.clamp(b.grad, -3.5, 3.5) * lr
        w.grad.zero_()
        b.grad.zero_()

tensor(10585.5889)
tensor(10181.9854)
tensor(9808.4639)
tensor(9466.1670)
tensor(9139.4170)
tensor(8832.9443)
tensor(8547.8906)
tensor(8285.1436)
tensor(8045.8110)
tensor(7831.1392)
tensor(7642.2090)
tensor(7479.8379)
tensor(7344.1660)
tensor(7235.6162)
tensor(7154.0898)
tensor(7087.7095)
tensor(7036.6616)
tensor(7009.1792)
tensor(6974.3169)
tensor(6958.0498)
tensor(6949.5850)
tensor(6953.1567)
tensor(6942.5977)
tensor(6945.7642)
tensor(6939.3979)
tensor(6941.7837)
tensor(6938.1318)
tensor(6939.7710)
tensor(6937.7290)
tensor(6938.7646)
tensor(6937.6787)
tensor(6938.2935)
tensor(6937.7139)
tensor(6938.0762)
tensor(6937.7568)
tensor(6937.9829)
tensor(6937.8003)
tensor(6937.9238)
tensor(6937.8330)
tensor(6937.9014)
tensor(6937.8467)
tensor(6937.8936)
tensor(6937.8525)
tensor(6937.8799)
tensor(6937.8682)
tensor(6937.8730)
tensor(6937.8711)
tensor(6937.8750)
tensor(6937.8711)
tensor(6937.8691)


## It worked! Obviously it overfits but it definetly shows that our math is right.

In [357]:
y_hat = T.round(sigmoid(X@w + b))

In [358]:
(y_hat - y.reshape(100,1)).sum().data

tensor(0.)