<a href="https://colab.research.google.com/github/jposyluzny/ENEL645/blob/main/Lectures/Week%203/Softmax%2COHE%2CCrossEntropy%2CAccuracy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Softmax receives a vector of numbers, and spits out a vector of numbers that can be interpreted as probabilities. They are between 0 and 1, and the sum of them all is 1

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pylab as plt
np.set_printoptions(suppress=True, precision=3) # Limits the number of decimal houses when printing values to 3
import ipywidgets as widgets # for cells with interactivity
from tensorflow.keras.utils import to_categorical # Function to convert labels to one-hot encoding

In [2]:
def softmax(Z):   # number of columns = number of classes in classification problems
  EZ = np.exp(Z)
  S = EZ / EZ.sum(axis=1, keepdims=True)
  return S

In [3]:
# Test the code on a case where n = 10 and k = 3

aux = np.linspace(-2, 6.0, 10).reshape(-1,1)
Z = np.hstack([aux, np.ones_like(aux), 0.2 * np.ones_like(aux)])

S = softmax(Z)
print('Z=\n',Z)
print('S=\n',S)

Z=
 [[-2.     1.     0.2  ]
 [-1.111  1.     0.2  ]
 [-0.222  1.     0.2  ]
 [ 0.667  1.     0.2  ]
 [ 1.556  1.     0.2  ]
 [ 2.444  1.     0.2  ]
 [ 3.333  1.     0.2  ]
 [ 4.222  1.     0.2  ]
 [ 5.111  1.     0.2  ]
 [ 6.     1.     0.2  ]]
S=
 [[0.033 0.667 0.3  ]
 [0.077 0.637 0.286]
 [0.169 0.573 0.258]
 [0.331 0.462 0.207]
 [0.546 0.313 0.141]
 [0.745 0.176 0.079]
 [0.877 0.085 0.038]
 [0.945 0.038 0.017]
 [0.977 0.016 0.007]
 [0.99  0.007 0.003]]


In [4]:
def plotmodel(s1,s2,s3):
    scores = np.array([[s1, s2, s3]]) # shape: (1,3) 
    S = softmax(scores)[0] # (1,3) array as (3,) array
    plt.rcdefaults()
    fig, ax = plt.subplots(figsize=(3, 2))
    classes = ('0', '1', '2')
    x_pos = [2,4,6]
    ax.bar(x_pos, S, align='center',color='green', ecolor='black')
    ax.set_xticks(x_pos)
    ax.set_xticklabels(classes)
    ax.set_ylim([0,1])
    ax.set_xlabel('$\overrightarrow{Z}$')
    ax.set_ylabel('Softmax')
    plt.show()
                       
widgets.interact(plotmodel,s1 = (1,10,.1),s2 = (1,10,.1),s3 = (1,10,.1))

interactive(children=(FloatSlider(value=5.0, description='s1', max=10.0, min=1.0), FloatSlider(value=5.0, desc…

<function __main__.plotmodel>

# One hot encoding
One hot encoding represents categorical data as a list of binary values with one element in the list for each possible category. The name "one hot" comes from the fact that only one binary element is set to 1 (hot) at a time and all other elements are set to 0 (cold).

Most deep learning algorithms cannot work with categorical data directly. The categories need to be converted into numerical representations. This is true for both the input ($X$) and output ($\widehat{Y}$) of our models.

Let's think about this class garbage classification assignment. There are three classes: "green", "blue" and "black" garbage bins. We often encode these classes by assigning an integer label without even giving too much thought about it:

* "green" - class 0
* "blue" - class 1
* "black" - class 2<br>

This label assignment is called label encoding. Label encoding can be a proper representation if there is a natural ordering relationship between the categories. In the example of garbage classiifcation, where there is no clear ordering, label encoding is not a good strategy. One example of categorical data that has an ordering relationship is the Likert scale, which is split into five categories that clearly have an ordering relationship among them: "Like", "Like Somewhat", "Neutral", "Dislike Somewhat", "Dislike".

When our data does not have an ordering relationship, we employ one hot encoding.

The code snippet below show how to get the one hot encoding representation from a list of strings

In [5]:
s = pd.DataFrame({"bin": ["blue","blue","green","black","green","black","black","green"]})
s.head(8)

Unnamed: 0,bin
0,blue
1,blue
2,green
3,black
4,green
5,black
6,black
7,green


In [6]:
one_hot = pd.get_dummies(s) # Get one-hot encoding of variable
# Join the one hot encoding to the data frame
s2 = s.join(one_hot)
s2.head(8)

Unnamed: 0,bin,bin_black,bin_blue,bin_green
0,blue,0,1,0
1,blue,0,1,0
2,green,0,0,1
3,black,1,0,0
4,green,0,0,1
5,black,1,0,0
6,black,1,0,0
7,green,0,0,1


In [7]:
s3 = s2.drop('bin',axis = 1) # Drop the bin column 
s3.head(8)

Unnamed: 0,bin_black,bin_blue,bin_green
0,0,1,0
1,0,1,0
2,0,0,1
3,1,0,0
4,0,0,1
5,1,0,0
6,1,0,0
7,0,0,1


In [8]:
# The code snippet below show how to get the one hot encoding representation from an array of integers using the keras to_categorical function.
Y = np.array([0, 0, 1, 0, 2 ,1 ,1])
Yoh = to_categorical(Y)

print('Y=')
print(Y)
print('\nYoh=')
print(Yoh)

Y=
[0 0 1 0 2 1 1]

Yoh=
[[1. 0. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 1. 0.]]


In [9]:
# To go back from one hot encoding to label encoding, you just need to use the numpy argmax funcation across the columns of the array.
Y2 = np.argmax(Yoh, axis = 1)
print('\nY2=')
print(Y2)
print("\nY = Y2?")
print(np.all(Y == Y2))


Y2=
[0 0 1 0 2 1 1]

Y = Y2?
True


# Categorical cross-entropy and Accuracy

**Important comment:** If accuracy was used as the loss function of our model, our goal would be to maximize it. In the case of CCE, we want to minimize it.

In [10]:
def compute_cce(Yoh,Yoh_pred):
    cce = (-Yoh*np.log(Yoh_pred)).mean()
    return cce

def compute_accuracy(Yoh,Yoh_pred):
    Y = np.argmax(Yoh, axis = 1)
    Ypred = np.argmax(Yoh_pred, axis = 1)
    accuracy = (Y == Ypred).sum()/Y.size
    return accuracy

In [11]:
# Labels
Yoh = np.array([[0, 0, 1],\
                [1, 0, 0],\
                [0, 1, 0]])

# Confident predictions
Yoh_pred = np.array([[0.01, 0.02, 0.97],\
                     [0.94, 0.03, 0.03],\
                     [0.02, 0.95, 0.03]])

# Low confidence predictions
Yoh_pred2 = np.array([[0.33, 0.33, 0.34],\
                     [0.40, 0.30, 0.30],\
                     [0.31, 0.36, 0.33]])

In [12]:
print("Confident predictions case")
print("CCE:")
print(compute_cce(Yoh,Yoh_pred))
print("Accuracy:")
print(compute_accuracy(Yoh,Yoh_pred))

Confident predictions case
CCE:
0.015958656176705183
Accuracy:
1.0


In [13]:
print("Low confidence predictions case")
print("CCE:")
print(compute_cce(Yoh,Yoh_pred2))
print("Accuracy:")
print(compute_accuracy(Yoh,Yoh_pred2))

Low confidence predictions case
CCE:
0.3351946267531185
Accuracy:
1.0


In both the high and low confidence prediction cases, the accuracy was 1. On the other hand, the CCE achieved a considrable smaller value for predictions with high confidence compared to prediction with low confidence. We want to have confident preictions and that is one of the many reasons why we prefer using CCE than accuracy as a loss function for training our models.