# Lab 6.1: Softmax Classification

Edited By Steve Ive

Reference From Seungjae Lee.

https://github.com/deeplearningzerotoall/PyTorch/blob/master/lab-06_1_softmax_classification.ipynb

## Imports

In [97]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [98]:
torch.manual_seed(1)

<torch._C.Generator at 0x23a6f9603f0>

## Softmax

Convert numbers to probabilities with softmax.

$ P(class=i) = \frac{e^i}{\sum e^i} $

In [99]:
z = torch.FloatTensor([1, 2, 3])

PyTorch has a ```softmax``` function

In [100]:
hypothesis = F.softmax(z, dim = 0)
print(hypothesis)

tensor([0.0900, 0.2447, 0.6652])


Since they are probabilities, they should add up to 1. Let's do a sanity check.

In [101]:
hypothesis.sum()

tensor(1.)

## Cross Entropy Loss (Low-level)

For multi-class classification, we use the cross entropy loss.

$ L = \frac{1}{N} \sum - y \log(\hat{y}) $

where $\hat{y}$ is the predicted probability and $y$ is the correct probability (0 or 1).

### Create An Arbitrary Softmax values

In [102]:
z = torch.rand(3, 5, requires_grad = True)
hypothesis = F.softmax(z, dim = 1)
print(hypothesis)

tensor([[0.2645, 0.1639, 0.1855, 0.2585, 0.1277],
        [0.2430, 0.1624, 0.2322, 0.1930, 0.1694],
        [0.2226, 0.1986, 0.2326, 0.1594, 0.1868]], grad_fn=<SoftmaxBackward>)


### Take a Moment!

** ```TORCH.RANDINT```

torch.randint(low=0, high, size, *, generator=None, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

Returns a tensor filled with random integers generated uniformly between low (inclusive) and high (exclusive).

The shape of the tensor is defined by the variable argument size.

At below, since we set the classes as 5 upon (```z=torch.rand(3, 5, requires_grad=True)```), the first argument was set to 5.


In [103]:
y = torch.randint(5, (3,)).long()
print(y)

tensor([0, 2, 1])


### Create an Arbitrary Correct Values

In [104]:
y_one_hot = torch.zeros_like(hypothesis)
y_one_hot.scatter_(1, y.unsqueeze(1), 1)

tensor([[1., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 1., 0., 0., 0.]])

### Get Loss

In [105]:
cost = (y_one_hot * -torch.log(hypothesis)).sum(dim = 1).mean()
print(cost)

tensor(1.4689, grad_fn=<MeanBackward0>)


## Cross-entropy Loss with ```torch.nn.functional```

PyTorch has ```F.log_softmax()``` function.

In [106]:
#Low Level Loss with Low level logsoftmax

torch.log(F.softmax(z, dim=1))
print((y_one_hot * -torch.log(F.softmax(z, dim=1))).sum(dim=1).mean())

# Low Level Loss with High Level logsoftmax

F.log_softmax(z, dim=1)
print((y_one_hot * -F.log_softmax(z, dim=1)).sum(dim =1).mean())

tensor(1.4689, grad_fn=<MeanBackward0>)
tensor(1.4689, grad_fn=<MeanBackward0>)


PyTorch also has ```F.nll_loss()``` function that computes the negative likelihood.

In [107]:
# High Level Loss with F.nll_loss(F.log_softmax)
F.nll_loss(F.log_softmax(z, dim =1), y)

tensor(1.4689, grad_fn=<NllLossBackward>)

PyTorch also has ```F.cross_entropy``` that combines ```F.log_softmax()``` and ```F.nll_loss()```.

In [108]:
# High Level Loss with F.cross_entropy
F.cross_entropy(z, y)

tensor(1.4689, grad_fn=<NllLossBackward>)

# Training with Low-Level Cross Entropy Loss

Useless data for practice

In [109]:
x_train = [[1, 2, 1, 1],
           [2, 1, 3, 2],
           [3, 1, 3, 4],
           [4, 1, 5, 5],
           [1, 7, 5, 5],
           [1, 2, 5, 6],
           [1, 6, 6, 7],
           [1, 7, 7, 7]]

y_train = [2, 2, 2, 1, 1, 1, 0, 0]
x_train = torch.FloatTensor(x_train)
y_train = torch.LongTensor(y_train)

### Take a Moment!

**TORCH.NN.FUNCTIONAL.ONE_HOTI**

torch.nn.functional.one_hot(tensor, num_classes=-1) → LongTensor

Takes LongTensor with index values of shape (*) and returns a tensor of shape (*, num_classes) that have zeros everywhere except where the index of last dimension matches the corresponding value of the input tensor, in which case it will be 1.

In [144]:
# Model Initialize
W = torch.zeros((4, 3), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

#set optimizer
optimizer = optim.SGD([W, b], lr=0.1)

nb_epochs = 10000

for epoch in range(nb_epochs + 1):

    #Hypothesis
    pred = F.softmax(x_train.matmul(W) + b, dim = 1)

    #one-hot-encoding
    y_one_hot = torch.zeros_like(pred)
    y_one_hot = y_one_hot.scatter_(1, y_train.unsqueeze(1), 1)
    #y_one_hot == F.one_hot(y_train)

    #Cost
    cost = (y_one_hot * -torch.log(pred)).sum(dim =1).mean()

    #Reduce Cost
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    if epoch % 1000 == 0:
        x_one_hot = F.one_hot(torch.argmax(pred, dim = 1))
        accuracy = (x_one_hot == y_one_hot).float().mean()*100
        print('Epoch: {}/{} Accruacy: {:6f} Cost:{:.6f}'.format(epoch, nb_epochs, accuracy, cost.item()))

Epoch: 0/10000 Accruacy: 33.333336 Cost:1.098612
Epoch: 1000/10000 Accruacy: 75.000000 Cost:0.585815
Epoch: 2000/10000 Accruacy: 91.666672 Cost:0.405933
Epoch: 3000/10000 Accruacy: 100.000000 Cost:0.245907
Epoch: 4000/10000 Accruacy: 100.000000 Cost:0.207278
Epoch: 5000/10000 Accruacy: 100.000000 Cost:0.178310
Epoch: 6000/10000 Accruacy: 100.000000 Cost:0.155908
Epoch: 7000/10000 Accruacy: 100.000000 Cost:0.138157
Epoch: 8000/10000 Accruacy: 100.000000 Cost:0.123801
Epoch: 9000/10000 Accruacy: 100.000000 Cost:0.111988
Epoch: 10000/10000 Accruacy: 100.000000 Cost:0.102121


## High-level Implementation with ```nn.Module```

In [146]:
class SoftmaxClassifierModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(4, 3)

    def forward(self, x):
        return self.linear(x)

In [147]:
model = SoftmaxClassifierModel()

In [153]:
#Set Optimizer
optimizer = optim.SGD(model.parameters(), lr=0.1)

nb_epochs =10000

for epoch in range(nb_epochs + 1):

    #prediction
    pred = model(x_train)

    #cost
    cost = F.cross_entropy(pred, y_train)

    #reduce cost
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    if epoch % 1000 == 0:
        accuracy = (F.one_hot(torch.argmax(F.log_softmax(pred, dim=1), dim=1)) == F.one_hot(y_train)).float().mean()*100
        print('Epoch {:2d}/{} Accuracy: {} Cost: {:.6f}'.format(epoch, nb_epochs, accuracy, cost))

Epoch  0/10000 Accuracy: 66.66667175292969 Cost: 0.932692
Epoch 1000/10000 Accuracy: 83.33332824707031 Cost: 0.334209
Epoch 2000/10000 Accuracy: 100.0 Cost: 0.173013
Epoch 3000/10000 Accuracy: 100.0 Cost: 0.127700
Epoch 4000/10000 Accuracy: 100.0 Cost: 0.100328
Epoch 5000/10000 Accuracy: 100.0 Cost: 0.082227
Epoch 6000/10000 Accuracy: 100.0 Cost: 0.069463
Epoch 7000/10000 Accuracy: 100.0 Cost: 0.060020
Epoch 8000/10000 Accuracy: 100.0 Cost: 0.052773
Epoch 9000/10000 Accuracy: 100.0 Cost: 0.047048
Epoch 10000/10000 Accuracy: 100.0 Cost: 0.042417
