### Softmax and Cross Entropy 

#### Let's practice building softmax from scratch!

And starts with numpy.

In [1]:
import torch
import torch.nn as nn
import numpy as np

def softmax(x):
    return np.exp(x) / np.sum(np.exp(x), axis=0)

x = np.array([2.0, 1.0, 0.1])
output = softmax(x)
print('softmax numpy:' , output)

softmax numpy: [0.65900114 0.24243297 0.09856589]


#### Now let's call built-in oftmax with pytorch.

It's pretty much the same.

In [2]:
x = torch.tensor([2.0, 1.0, 0.1])
output = torch.softmax(x, dim=0)
print('pytorch built in softmax:', output)

pytorch built in softmax: tensor([0.6590, 0.2424, 0.0986])


#### Now let's see how to handle cross entropy. 

And starts with numpy, build it ourselves too.
Cross entropy: - 1/n * sum(truth_distribution * log(prediction_distribution))

In [6]:
import torch
import torch.nn as nn
import numpy as np

def cross_entropy(actual, predicted):
    loss = -np.sum(actual * np.log(predicted))
    return loss # /float(predicted.shape[0]) ## we skip the 'average' part because it doesn't impact optimization, probably

# y must be one-hot encoded. For example
# if class 0: [1, 0, 0]
# if class 1: [0, 1, 0]
# if class 2: [0, 2, 0]
Y = np.array([1, 0, 0])

# y_pred comes in the form of probabilities
Y_pred_good = np.array([0.7, 0.2, 0.1])
Y_pred_bad = np.array([0.1, 0.3, 0.6])
loss_good = cross_entropy(Y, Y_pred_good)
loss_bad = cross_entropy(Y, Y_pred_bad)
print(f'Loss 1 numpy: {loss_good:.4f}')
print(f'Loss 2 numpy: {loss_bad:.4f}')


Loss 1 numpy: 0.3567
Loss 2 numpy: 2.3026


### Again, let's now do this with pytorch.

*Note* Pytorch built-in cross entropy function **nn.CrossEntropyLoss()** handles the usual softmax layer already, and the input needs to be raw logit values, not one-hot encoded values. So if we use it, we don't one-hot, and we don't softmax, let pytorch do it.

nn.CrossEntropyLoss() is really nn.LogSoftmax + nn.NLLLoss (negative log likelyhood loss)

**No One-Hot and No Softmax layer if we use pytorch Cross Entropy**

In [11]:
import torch
import torch.nn
import numpy as np

loss = nn.CrossEntropyLoss()

# We just have one sample for now, and 3 possible labels
Y = torch.tensor([0])

# output should be nsamples x nclasses = 1 * 3
Y_pred_good = torch.tensor([[2.0, 1.0, 0.1]])
Y_pred_bad = torch.tensor([[0.5, 2.0, 0.3]])

l1 = loss(Y_pred_good, Y)
l2 = loss(Y_pred_bad, Y)

print('loss good', l1.item())
print('loss bad', l2.item())

_, pred1 = torch.max(Y_pred_good, 1)
_, pred2 = torch.max(Y_pred_bad, 1)

print('good predition:', pred1)
print('bad prediction:', pred2)

loss good 0.4170299470424652
loss bad 1.840616226196289
good predition: tensor([0])
bad prediction: tensor([1])


In [11]:
import torch
import torch.nn
import numpy as np

loss = nn.CrossEntropyLoss()

# We just have one sample for now, and 3 possible labels
Y = torch.tensor([0])

# output should be nsamples x nclasses = 1 * 3
Y_pred_good = torch.tensor([[2.0, 1.0, 0.1]])
Y_pred_bad = torch.tensor([[0.5, 2.0, 0.3]])

l1 = loss(Y_pred_good, Y)
l2 = loss(Y_pred_bad, Y)

print('loss good', l1.item())
print('loss bad', l2.item())

_, pred1 = torch.max(Y_pred_good, 1)
_, pred2 = torch.max(Y_pred_bad, 1)

print('good predition:', pred1)
print('bad prediction:', pred2)

loss good 0.4170299470424652
loss bad 1.840616226196289
good predition: tensor([0])
bad prediction: tensor([1])


Let's beef it up, let's handle three samples instead of one with pytorch. It's very simple.

In [13]:
import torch
import torch.nn
import numpy as np

loss = nn.CrossEntropyLoss()

# We just have one sample for now, and 3 possible labels
Y = torch.tensor([2, 0, 1])

# output should be nsamples x nclasses = 1 * 3
Y_pred_good = torch.tensor([[0.1, 1.0, 2.1], [2.0, 1.0, 0.1], [0.1, 3.0, 0.1]])
Y_pred_bad = torch.tensor([[2.1, 1.0, 0.1], [0.5, 2.0, 0.3],[0.1, 3.0, 0.1]])

l1 = loss(Y_pred_good, Y)
l2 = loss(Y_pred_bad, Y)

print('loss good', l1.item())
print('loss bad', l2.item())

_, pred1 = torch.max(Y_pred_good, 1)
_, pred2 = torch.max(Y_pred_bad, 1)

print('good predition:', pred1)
print('bad prediction:', pred2)

loss good 0.3018244206905365
loss bad 1.4430197477340698
good predition: tensor([2, 0, 1])
bad prediction: tensor([0, 1, 1])


### Now let's put all these together, in a multi-class classification problem.

In [21]:
import torch
import torch.nn

class NNMultiClassification(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NNMultiClassification, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, num_classes)
        
    def forward(self, x):
        
        # Note we don't explicitly do any one-hot encoding
        
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)
        
        # Note we dont have softmax layer here at the very end
        
        return out
    
model = NNMultiClassification(input_size=28*28, hidden_size = 5, num_classes=3)
criterion = nn.CrossEntropyLoss() ## remember this applies softmax for us

### We can easily turn this multi-class classification into a binary classification problem

Like if we want to tell if a picture is a cat. All we have to do is modify the final linear layer to output size of 1, and use a sigmoid function to turn it into a single probability value, which we can say yes if greater than 0.5, so on.

Note we have to **explicitly** add the sigmoid layer here, unlike in mulit-classification with cross entropy loss, having pytorch doing softmax for us.

And we can use **nn.BCELoss()** for the binary classification loss function.



In [22]:
import torch
import torch.nn

class NNBinaryClassification(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NNBinaryClassification, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, 1) # Note the output size is now set to 1
        
    def forward(self, x):
        
        # Note we don't explicitly do any one-hot encoding
        
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)
        
        # Note we have to explicitly build our sigmoid layer
        out = torch.sigmoid(out)
        
        return out
    
model = NNBinaryClassification(input_size=28*28, hidden_size = 5, num_classes=3)
criterion = nn.BCELoss()