## Cross-Entropy
(based on a tutorial by Python Engineer in Youtube)

**Cross Entropy**


![title](img/soft-CE.png)

source: https://levelup.gitconnected.com/killer-combo-softmax-and-cross-entropy-5907442f60ba

And it's formula is:

![title](img/formula.png)

source: https://levelup.gitconnected.com/grokking-the-cross-entropy-loss-cda6eb9ec307

**Cross Entropy Implementation with Numpy**

Considering a problem in which we have 3 classes and we show them with a one-hot vector:
so class 0 [1 0 0]
class 1    [0 1 0]
class 2    [0 0 1]

In [20]:
import torch
import torch.nn as nn
import numpy as np

In [15]:
def cross_entropy(y_gold, y_predicted):
    loss = -np.sum(y_gold * np.log(y_predicted))
    return loss
    # we can devide it by number of samples to normalize it too.
    # loss/ float(y_predicted.shape[0])

In [16]:
# y must be one hot encoded
Y = np.array([1, 0, 0])

Y_pred_good = np.array([0.7, 0.2, 0.1])
Y_pred_bad = np.array([0.1, 0.3, 0.6])

l1 = cross_entropy(Y, Y_pred_good)
l2 = cross_entropy(Y, Y_pred_bad)

print(f'Loss1:{l1:.3f}')
print(f'Loss2:{l2:.3f}')

Loss1:0.357
Loss2:2.303


**Cross Entropy Implementation with Pytorch**



In [21]:
loss = nn.CrossEntropyLoss()

    Attention: nn.CrossEntropyLoss applies nn.LogSoftmax + nn.NLLLoss (Negative Log Likelihood Loss)
    => no need for Softmax in the last layer!

    Attention2: Y has class labels => not one-hot, just put the correct class labels.

    Attention3: Y_pred has raw scores (logits) => no Softmax!

In [24]:
Y = torch.tensor([0])     #actual lables

#for y_pred_good we must be careful about the size. 
#n_samples x n_classes = 1 x 3 
#so it must be an array of arrays
Y_pred_good = torch.tensor([[2.0, 1.0, 0.1]])    #These are the raw values and we did'nt apply the softmax
Y_pred_bad = torch.tensor([[0.5, 2.0, 0.3]]) 

#compute the loss
l1 = loss(Y_pred_good, Y)
l2 = loss(Y_pred_bad, Y)

print(l1.item())  #it has only one value so we can call item
print(l2.item()) 

0.4170299470424652
1.840616226196289


In [26]:
_, predictions1 = torch.max(Y_pred_good, 1)
_, predictions2 = torch.max(Y_pred_bad, 1)

print(predictions1) #the highest element in the Y_pred
print(predictions2)

tensor([0])
tensor([1])


In [27]:
#the loss in pytorch allows for multiple samples
#so let's say we have 3 samples 
Y = torch.tensor([2, 0, 1])

#n_samples x n_classes = 3 x 3 
Y_pred_good = torch.tensor([[0.1, 1.0, 2.1],[2.0, 1.0, 0.1],[0.1, 3.0, 0.1]])
Y_pred_bad = torch.tensor([[0.5, 2.0, 0.3],[0.0, 1.0, 0.1],[0.1, 0.0, 3.1]]) 

#compute the loss
l1 = loss(Y_pred_good, Y)
l2 = loss(Y_pred_bad, Y)

print(l1.item())  #it has only one value so we can call item
print(l2.item()) 

_, predictions1 = torch.max(Y_pred_good, 1)
_, predictions2 = torch.max(Y_pred_bad, 1)

print(predictions1) #the highest element in the Y_pred
print(predictions2)

0.3018244206905365
2.2682371139526367
tensor([2, 0, 1])
tensor([1, 1, 2])


## Example of a multiclass problem

In [29]:
class NeuralNet2(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet2, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)  #one linear layer
        self.relu = nn.ReLU()                              #activation function
        self.linear2 = nn.Linear(hidden_size,num_classes)  #last layer
        
    def forward(self, x):
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)
        #as we are using nn.CrossEntropy no softmax will be here.
        return out

model = NeuralNet2(input_size=28*28, hidden_size=5, num_classes=3)
criterion = nn.CrossEntropyLoss() #applies softmax too

## Example of a binary classification problem

In [30]:
class NeuralNet1(nn.Module):
    #num_classes = 1
    def __init__(self, input_size, hidden_size):
        super(NeuralNet1, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)  #one linear layer
        self.relu = nn.ReLU()                              #activation function
        self.linear2 = nn.Linear(hidden_size, 1)           #last layer
        
    def forward(self, x):
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)
        #sigmoid at the end
        y_pred = torch.sigmoid(out)
        return y_pred

model = NeuralNet1(input_size=28*28, hidden_size=5)
criterion = nn.BCELoss() #applies softmax too