## Softmax function Tutorials

### From Numpy

In [48]:
import torch
import torch.nn as nn
import numpy as np

In [49]:
def softmax(x):
    return np.exp(x)/ np.sum(np.exp(x), axis = 0)
x = np.array([2.0, 1.0, 0.1])
outputs = softmax(x)
print('softmax', outputs)

softmax [0.65900114 0.24243297 0.09856589]


### From Torch

In [50]:
x = torch.tensor([2.0, 1.0, 0.1])
outputs = torch.softmax(x, dim = 0)
outputs

tensor([0.6590, 0.2424, 0.0986])

## Cross-entropy

### Sometime Softmax is applied with cross entropy loss to measure the classification problems.
Lower loss means better prediction

$$ D(\hat{Y},Y) = \frac{-1}{N} . \sum Y_i. log(\hat{Y_i}) $$

#### 1. Cross Entropy Loss for Binary classification
$$loss = label * (-1) * log(pred) + (1 — label) * (-1) * log(1 — pred)$$
#### Here “label” can be either 0 or 1 and “pred” can be a probability value between 0 to 1 — any real value. </br>
The loss is a scalar value.</br>
So, for label=1, the loss will be</br>
$$ {loss\_label\_1} = label * (-1) * log(pred) + 0$$
$$ loss\_label\_1 = — log(pred) $$
for label=0, the loss will be
$$loss\_label\_0 = 0 + (1 — label) * (-1) * log(1 — pred)$$
$${loss\_label\_0} = -log(1-pred)$$


#### 2. Softmax Cross Entropy Loss for Binary Classification
$${softmax\_logits} = softmax(logits)$$
$${loss\_softmax\_cross} = label * (-1) * log{(softmax\_logits)} + (1 — label) * (-1) * log(1 — {(softmax\_logits)})$$
Here, because the logits are softmaxed, they contain the probability of being a positive class.

#### 3. Softmax Cross Entropy Loss for Multi Class Classification
$${softmax\_logits} = softmax(logits){loss\_softmax\_cross\_multi} = sum(label * (-1) * log({softmax\_logits}))$$
Here, labels and logits both are a vector / single column array. E.g. for 10 classes it is [10,] array. The sum represents the sum across the dimension depicted by num_of_classes which is in this case last dims. So, the loss is a scalar value.

#### 4. Weighted Softmax Cross Entropy Loss for Multi Class Classification
$${softmax\_logits} = softmax(logits){loss\_softmax\_cross\_multi} = sum({cls\_weight} * label * (-1) * log({softmax\_logits}))$$
Here, labels, logits and cls_weights all are having same shape — a vector / single column array.</br>
E.g. for 10 classes it is [10,] array. The loss is a scalar value. The cls_weight has a separate weight for each class.


#### 5. Sigmoid Cross Entropy Loss 
The sigmoid cross entropy is same as softmax cross entropy except for the fact that instead of softmax, we apply sigmoid function on logits before feeding them.


### From Numpy

In [51]:
def cross_entropy(actual, predicted):
    loss = -np.sum(actual * np.log(predicted))
    return loss

# One hot encoded vector
Y = np.array([1,0,0])

Y_pred_good = np.array([0.7, 0.2, 0.1])
Y_pred_bad = np.array([0.1, 0.3, 0.6])
l1 = cross_entropy(Y, Y_pred_good)
l2 = cross_entropy(Y, Y_pred_bad)
print(f'loss Numpy for good pred : {l1:.4f}')
print(f'loss Numpy for good pred : {l2:.4f}')

loss Numpy for good pred : 0.3567
loss Numpy for good pred : 2.3026


### From torch
*** nn.CrossEntropyLoss applies (nn.LogSoftmax + nn.NLLLoss(negative log likelihodd loss))***

In [52]:
loss = nn.CrossEntropyLoss()
# nsamples * nclasses = 1 * 3
Y = torch.tensor([0])
Y_pred_good = torch.tensor([[2.0, 1.0, 0.1]])
Y_pred_bad =  torch.tensor([[0.5, 2.0, 0.3]])

l1 = loss(Y_pred_good, Y)
l2 = loss(Y_pred_bad, Y)

print (l1.item()) 
print (l2.item()) 

0.4170299470424652
1.840616226196289


In [53]:
_, predictions1 = torch.max(Y_pred_good, 1)
_, predictions2 = torch.max(Y_pred_bad, 1)

In [54]:
print(predictions1)
print(predictions2)

tensor([0])
tensor([1])


## Define Crossentropy for NeuralNet

In [58]:
class NeuralNet2(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet2, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, num_classes)
        
    def forward(self, x):
        out = self.linear1()
        out = self.relu(out)
        out = self.Linear2(out)
        # No softmax at the end
        return out
    
model = NeuralNet2(input_size = 28*28, hidden_size = 5, num_classes = 3)
criterion = nn.CrossEntropyLoss()  # Applies Softmax


In [59]:
print(criterion)

CrossEntropyLoss()


### for Binary classification use:-
***nn.BCELoss()***

In [60]:
class NeuralNet1(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet1, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        slef.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, 1) # Only One output
        
    def forward(self, x):
        out = self.linear1()
        out = self.relu(out)
        out = self.Linear2(out)
        y_pred = torch.sigmoid(out)
        # No softmax at the end
        return y_pred
    
model = NeuralNet2(input_size = 28*28, hidden_size = 5, num_classes = 3)
criterion = nn.BCELoss()  # Applies Softmax

False