# **5-1. Logistic Regression**

**Jonathan Choi 2021**

**[Deep Learning By Torch] End to End study scripts of Deep Learning by implementing code practice with Pytorch.**

If you have an any issue, please PR below.

[[Deep Learning By Torch] - Github @JonyChoi](https://github.com/jonychoi/Deep-Learning-By-Torch)

## Reminder: Logistic Regression

### Hypothesis

$ H(X) = \frac{1}{1+e^{-W^T X}} $

### Cost

$ cost(W) = -\frac{1}{m} \sum y \log\left(H(x)\right) + (1-y) \left( \log(1-H(x) \right) $

- If $y \simeq H(x)$, cost is near 0.

- If $y \neq H(x)$, cost is high.

### Weight Update via Gradient Descent

$ W := W - \alpha \frac{\partial}{\partial W} cost(W) $

- $\alpha$: Learning rate

## Imports

In [263]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [264]:
torch.manual_seed(1)

<torch._C.Generator at 0x22db807b450>

## Training Data

In [265]:
x_data = [[1, 2], [2, 3], [3, 1], [4, 3], [5, 3], [6, 2]]
y_data = [[0], [0], [0], [1], [1], [1]]

Consider the following classification problem: given the number of hours each student spent watching the lecture and working in the code lab, predict whether the student passed or failed a course. For example, the first (index 0) student watched the lecture for 1 hour and spent 2 hours in the lab session ```([1, 2])```, and ended up failing the course ```([0])```.

In [266]:
x_train = torch.FloatTensor(x_data)
y_train = torch.FloatTensor(y_data)

As always, we need these data to be in torch.Tensor format, so we convert them.

In [267]:
print(x_train.shape)
print(y_train.shape)

torch.Size([6, 2])
torch.Size([6, 1])


## Computing the Hypothesis

$ H(X) = \frac{1}{1+e^{-W^T X}} $

PyTorch has a torch.exp() function that resembles the exponential function.

In [268]:
print('e^1 equals: ', torch.exp(torch.FloatTensor([1])))

e^1 equals:  tensor([2.7183])


We can use it to compute the hypothesis function conveniently.

In [269]:
W = torch.zeros((2, 1), requires_grad=True)
b = torch.zeros(1, requires_grad = True)

In [270]:
hypothesis = 1 / (1 + torch.exp(-(x_train.matmul(W) + b)))

In [271]:
print(hypothesis)
print(hypothesis.shape)

tensor([[0.5000],
        [0.5000],
        [0.5000],
        [0.5000],
        [0.5000],
        [0.5000]], grad_fn=<MulBackward0>)
torch.Size([6, 1])


Or, we could use torch.sigmoid() function! This resembles the sigmoid function:

In [272]:
print('1/(1+e^{-1}) equals: ', torch.sigmoid(torch.FloatTensor([1])))

1/(1+e^{-1}) equals:  tensor([0.7311])


Now, the code for hypothesis function is cleaner.

In [273]:
hypothesis = torch.sigmoid(x_train.matmul(W) + b)

In [274]:
print(hypothesis)
print(hypothesis.shape)

tensor([[0.5000],
        [0.5000],
        [0.5000],
        [0.5000],
        [0.5000],
        [0.5000]], grad_fn=<SigmoidBackward>)
torch.Size([6, 1])


## Computing the Cost Function (Low-level)

$ cost(W) = -\frac{1}{m} \sum y \log\left(H(x)\right) + (1-y) \left( \log(1-H(x) \right) $

We want to measure the difference between hypothesis and y_train.

In [275]:
print(hypothesis)
print(y_train)

tensor([[0.5000],
        [0.5000],
        [0.5000],
        [0.5000],
        [0.5000],
        [0.5000]], grad_fn=<SigmoidBackward>)
tensor([[0.],
        [0.],
        [0.],
        [1.],
        [1.],
        [1.]])


For one element, the loss can be computed as follows:

In [276]:
-(y_train[0] * torch.log(hypothesis[0]) + (1 - y_train[0]) * torch.log(1 - hypothesis[0]))

tensor([0.6931], grad_fn=<NegBackward>)

To compute the losses for the entire batch, we can simply input the entire vector.

In [277]:
losses = -(y_train * torch.log(hypothesis) + (1 - y_train) * torch.log(1 - hypothesis))

print(losses)

tensor([[0.6931],
        [0.6931],
        [0.6931],
        [0.6931],
        [0.6931],
        [0.6931]], grad_fn=<NegBackward>)


Then, we just .mean() to take the mean of these individual losses.

In [278]:
cost = losses.mean()
print(cost)

tensor(0.6931, grad_fn=<MeanBackward0>)


## Computing the Cost Function with ```F.binary_cross_entropy```

In reality, binary classification is used so often that PyTorch has a simple function called F.binary_cross_entropy implemented to lighten the burden.

In [279]:
F.binary_cross_entropy(hypothesis, y_train)

tensor(0.6931, grad_fn=<BinaryCrossEntropyBackward>)

## Training with Low-level Binary Cross Entorpy Loss

In [280]:
x_data = [[1, 2], [2, 3], [3, 1], [4, 3], [5, 3], [6, 2]]
y_data = [[0], [0], [0], [1], [1], [1]]
x_train = torch.FloatTensor(x_data)
y_train = torch.FloatTensor(y_data)

In [281]:
#Model Initialize
W = torch.zeros((2, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

#Set Optimizer
optimizer = optim.SGD([W, b], lr=1)

nb_epochs = 1000

for epoch in range(nb_epochs + 1):

    #Hypothesis
    pred = torch.sigmoid(x_train.matmul(W) + b)

    #Cost
    cost = -(y_train * torch.log(pred) + (1 - y_train) * torch.log(1 - pred)).mean()

    #Reduce Cost
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    if epoch % 100 == 0:
        print('Epoch {:4d}/{} \n Hypothesis: {} \n Cost: {:.6f}'.format(epoch, nb_epochs, pred.squeeze().detach(), cost.item()))
        print()

Epoch    0/1000 
 Hypothesis: tensor([0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000]) 
 Cost: 0.693147

Epoch  100/1000 
 Hypothesis: tensor([0.0245, 0.1484, 0.2770, 0.7954, 0.9484, 0.9834]) 
 Cost: 0.134722

Epoch  200/1000 
 Hypothesis: tensor([0.0080, 0.1065, 0.1632, 0.8566, 0.9769, 0.9931]) 
 Cost: 0.080643

Epoch  300/1000 
 Hypothesis: tensor([0.0037, 0.0822, 0.1161, 0.8888, 0.9869, 0.9965]) 
 Cost: 0.057900

Epoch  400/1000 
 Hypothesis: tensor([0.0021, 0.0669, 0.0902, 0.9090, 0.9916, 0.9979]) 
 Cost: 0.045300

Epoch  500/1000 
 Hypothesis: tensor([0.0013, 0.0564, 0.0739, 0.9229, 0.9941, 0.9986]) 
 Cost: 0.037261

Epoch  600/1000 
 Hypothesis: tensor([8.7256e-04, 4.8759e-02, 6.2629e-02, 9.3312e-01, 9.9567e-01, 9.9906e-01]) 
 Cost: 0.031673

Epoch  700/1000 
 Hypothesis: tensor([6.2087e-04, 4.2945e-02, 5.4368e-02, 9.4091e-01, 9.9668e-01, 9.9932e-01]) 
 Cost: 0.027556

Epoch  800/1000 
 Hypothesis: tensor([4.6039e-04, 3.8371e-02, 4.8050e-02, 9.4706e-01, 9.9737e-01, 9.9949e-01]) 


## Training with ```F.binary_cross_entropy```

In [282]:
#Model Initialize
W = torch.zeros((2, 1), requires_grad = True)
b = torch.zeros(1, requires_grad = True)

#Set Optimizer
optimizer = optim.SGD([W, b], lr=1)

nb_epochs = 1000

for epoch in range(nb_epochs + 1):

    #Hypothesis
    pred = torch.sigmoid(x_train.matmul(W) + b)

    #Cost Function
    cost = F.binary_cross_entropy(pred, y_train)

    #Reduce Cost
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    if epoch % 100 == 0:
        print('Epoch {:4d}/{} \n Hypothesis: {} \n Cost: {:.6f}'.format(epoch, nb_epochs, pred.squeeze().detach(), cost))

Epoch    0/1000 
 Hypothesis: tensor([0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000]) 
 Cost: 0.693147
Epoch  100/1000 
 Hypothesis: tensor([0.0245, 0.1484, 0.2770, 0.7954, 0.9484, 0.9834]) 
 Cost: 0.134722
Epoch  200/1000 
 Hypothesis: tensor([0.0080, 0.1065, 0.1632, 0.8566, 0.9769, 0.9931]) 
 Cost: 0.080643
Epoch  300/1000 
 Hypothesis: tensor([0.0037, 0.0822, 0.1161, 0.8888, 0.9869, 0.9965]) 
 Cost: 0.057900
Epoch  400/1000 
 Hypothesis: tensor([0.0021, 0.0669, 0.0902, 0.9090, 0.9916, 0.9979]) 
 Cost: 0.045300
Epoch  500/1000 
 Hypothesis: tensor([0.0013, 0.0564, 0.0739, 0.9229, 0.9941, 0.9986]) 
 Cost: 0.037261
Epoch  600/1000 
 Hypothesis: tensor([8.7256e-04, 4.8759e-02, 6.2629e-02, 9.3312e-01, 9.9567e-01, 9.9906e-01]) 
 Cost: 0.031673
Epoch  700/1000 
 Hypothesis: tensor([6.2087e-04, 4.2945e-02, 5.4368e-02, 9.4091e-01, 9.9668e-01, 9.9932e-01]) 
 Cost: 0.027556
Epoch  800/1000 
 Hypothesis: tensor([4.6039e-04, 3.8371e-02, 4.8050e-02, 9.4706e-01, 9.9737e-01, 9.9949e-01]) 
 Cost: 0

## Loading Real Data

In [283]:
import numpy as np

In [284]:
xy = np.loadtxt('../datasets/data-03-diabetes.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]

x_train = torch.FloatTensor(x_data)
y_train = torch.FloatTensor(y_data)

In [285]:
print(x_train.shape)
print(y_train.shape)

torch.Size([759, 8])
torch.Size([759, 1])


In [286]:
print(x_train[:5])
print(y_train[:5])

tensor([[-0.2941,  0.4874,  0.1803, -0.2929,  0.0000,  0.0015, -0.5312, -0.0333],
        [-0.8824, -0.1457,  0.0820, -0.4141,  0.0000, -0.2072, -0.7669, -0.6667],
        [-0.0588,  0.8392,  0.0492,  0.0000,  0.0000, -0.3055, -0.4927, -0.6333],
        [-0.8824, -0.1055,  0.0820, -0.5354, -0.7778, -0.1624, -0.9240,  0.0000],
        [ 0.0000,  0.3769, -0.3443, -0.2929, -0.6028,  0.2846,  0.8873, -0.6000]])
tensor([[0.],
        [1.],
        [0.],
        [1.],
        [0.]])


## Training with Real Data using low-level Binary Cross Entropy Loss

In [287]:
#Model Initialize
W = torch.zeros((8, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

#Set optimizer
optimizer = optim.SGD([W, b], lr=1)

nb_epochs = 1000

for epoch in range(nb_epochs + 1):

    #Hypothesis
    pred = torch.sigmoid(x_train.matmul(W) + b)

    #Cost Function
    cost = -(y_train * torch.log(pred) + (1 - y_train) * torch.log(1 - pred)).mean()

    #Reduce Cost
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    if epoch % 10 == 0:
        print('Epoch {:4d}/{} Cost: {:.6f}'.format(epoch, nb_epochs, cost.item()))

Epoch    0/1000 Cost: 0.693147
Epoch   10/1000 Cost: 0.572727
Epoch   20/1000 Cost: 0.539493
Epoch   30/1000 Cost: 0.519708
Epoch   40/1000 Cost: 0.507066
Epoch   50/1000 Cost: 0.498539
Epoch   60/1000 Cost: 0.492549
Epoch   70/1000 Cost: 0.488209
Epoch   80/1000 Cost: 0.484985
Epoch   90/1000 Cost: 0.482543
Epoch  100/1000 Cost: 0.480661
Epoch  110/1000 Cost: 0.479189
Epoch  120/1000 Cost: 0.478023
Epoch  130/1000 Cost: 0.477088
Epoch  140/1000 Cost: 0.476331
Epoch  150/1000 Cost: 0.475711
Epoch  160/1000 Cost: 0.475198
Epoch  170/1000 Cost: 0.474771
Epoch  180/1000 Cost: 0.474411
Epoch  190/1000 Cost: 0.474107
Epoch  200/1000 Cost: 0.473846
Epoch  210/1000 Cost: 0.473622
Epoch  220/1000 Cost: 0.473428
Epoch  230/1000 Cost: 0.473259
Epoch  240/1000 Cost: 0.473111
Epoch  250/1000 Cost: 0.472980
Epoch  260/1000 Cost: 0.472864
Epoch  270/1000 Cost: 0.472761
Epoch  280/1000 Cost: 0.472669
Epoch  290/1000 Cost: 0.472586
Epoch  300/1000 Cost: 0.472511
Epoch  310/1000 Cost: 0.472444
Epoch  3

## Training with Real Data using ```F.binary_cross_entropy```

In [288]:
# Model Initialize
W = torch.zeros((8, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

#Set Optimizer
optimizer = optim.SGD([W, b], lr=1)

nb_epochs = 1000

for epoch in range(nb_epochs + 1):

    #Hypothesis
    pred = torch.sigmoid(x_train.matmul(W) + b)

    #Cost
    cost = F.binary_cross_entropy(pred, y_train)

    #Reduce Cost
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    if epoch % 10 == 0:
        print('Epoch {:4d}/{} \n Cost: {:.6f}'.format(epoch, nb_epochs, cost.item()))

Epoch    0/1000 
 Cost: 0.693147
Epoch   10/1000 
 Cost: 0.572727
Epoch   20/1000 
 Cost: 0.539493
Epoch   30/1000 
 Cost: 0.519708
Epoch   40/1000 
 Cost: 0.507066
Epoch   50/1000 
 Cost: 0.498539
Epoch   60/1000 
 Cost: 0.492549
Epoch   70/1000 
 Cost: 0.488209
Epoch   80/1000 
 Cost: 0.484985
Epoch   90/1000 
 Cost: 0.482543
Epoch  100/1000 
 Cost: 0.480661
Epoch  110/1000 
 Cost: 0.479189
Epoch  120/1000 
 Cost: 0.478023
Epoch  130/1000 
 Cost: 0.477088
Epoch  140/1000 
 Cost: 0.476331
Epoch  150/1000 
 Cost: 0.475711
Epoch  160/1000 
 Cost: 0.475198
Epoch  170/1000 
 Cost: 0.474771
Epoch  180/1000 
 Cost: 0.474411
Epoch  190/1000 
 Cost: 0.474107
Epoch  200/1000 
 Cost: 0.473846
Epoch  210/1000 
 Cost: 0.473622
Epoch  220/1000 
 Cost: 0.473428
Epoch  230/1000 
 Cost: 0.473259
Epoch  240/1000 
 Cost: 0.473111
Epoch  250/1000 
 Cost: 0.472980
Epoch  260/1000 
 Cost: 0.472864
Epoch  270/1000 
 Cost: 0.472761
Epoch  280/1000 
 Cost: 0.472669
Epoch  290/1000 
 Cost: 0.472586
Epoch  300

## Checking the Accuracy our Model

After we finish training the model, we want to check how well our model fits the training set.

In [289]:
hypothesis = torch.sigmoid(x_train.matmul(W) + b)
print(hypothesis[:5])

tensor([[0.3555],
        [0.9564],
        [0.1934],
        [0.9623],
        [0.0706]], grad_fn=<SliceBackward>)


We can change hypothesis (real number from 0 to 1) to binary predictions (either 0 or 1) by comparing them to 0.5.

In [290]:
prediction = hypothesis >= torch.FloatTensor([0.5])
print(prediction[:5].float())

tensor([[0.],
        [1.],
        [0.],
        [1.],
        [0.]])


Then, we compare it with the correct labels y_train.

In [291]:
print(prediction[:5].float())
print(y_train[:5])

tensor([[0.],
        [1.],
        [0.],
        [1.],
        [0.]])
tensor([[0.],
        [1.],
        [0.],
        [1.],
        [0.]])


In [292]:
correct_prediction = prediction.float() == y_train
print(correct_prediction[:5])

tensor([[True],
        [True],
        [True],
        [True],
        [True]])


Finally, we can calculate the accuracy by counting the number of correct predictions and dividng by total number of predictions.

### Take a Moment!

Why we attach tensor.item() ?

if just print(tensor) => tensor(value)
else if print(tensor.item()) => value

In [293]:
accuracy = correct_prediction.sum().item() / len(correct_prediction)

print('The model has an accuracy of {:2.2f}% for the training set.'.format(accuracy * 100))

The model has an accuracy of 76.94% for the training set.


## Optional: High-level Implementation with ```nn.Module```

In [294]:
class BinaryClassfier(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(8, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        return self.sigmoid(self.linear(x))

In [295]:
model = BinaryClassfier()

In [296]:
#Set Optimizer
optimizer = optim.SGD(model.parameters(), lr=1)

nb_epochs = 100

for epoch in range(nb_epochs + 1):

    #Hypothesis
    pred = model(x_train)

    #Cost
    cost = F.binary_cross_entropy(pred, y_train)

    #Reduce Cost
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    if epoch % 10 == 0:
        prediction = pred >= torch.FloatTensor([0.5])
        correct_prediction = prediction.float() == y_train
        accuracy = correct_prediction.sum().item() / len(correct_prediction)
        print('Epoch {:4d}/{} Cost: {:.6f} Accuracy: {:2.2f}'.format(epoch, nb_epochs, cost.item(), accuracy * 100))

Epoch    0/100 Cost: 0.704829 Accuracy: 45.72
Epoch   10/100 Cost: 0.572392 Accuracy: 67.59
Epoch   20/100 Cost: 0.539563 Accuracy: 73.25
Epoch   30/100 Cost: 0.520041 Accuracy: 75.89
Epoch   40/100 Cost: 0.507561 Accuracy: 76.15
Epoch   50/100 Cost: 0.499125 Accuracy: 76.42
Epoch   60/100 Cost: 0.493177 Accuracy: 77.21
Epoch   70/100 Cost: 0.488846 Accuracy: 76.81
Epoch   80/100 Cost: 0.485612 Accuracy: 76.28
Epoch   90/100 Cost: 0.483146 Accuracy: 76.55
Epoch  100/100 Cost: 0.481234 Accuracy: 76.81
