![title](https://image.ibb.co/erDntK/logo2018.png)

---
# [Class Exercise] Linear Classification 

In this exercise you will practice a simple Linear Classification and its multiclass loss, 
including:
* implement simple steps and understand the basic Linear Classification pipeline, 
* implement Softmax and Multiclass SVM loss

---
## Simple Dataset

We use a simple case with
* 4 data `x` of 8 dimension, 
* 3 class target classification, 
thus, we have a weight parameter `W` with the size of (8,3) and bias `b` of size (3,)

![Linear Classifier](https://image.ibb.co/iCvLL9/01.png)

In [0]:
import numpy as np
np.set_printoptions(precision=2)

In [0]:
def simple_random(size, seed):
    np.random.seed(seed)
    return np.random.randint(20,size=size)/10-1

In [0]:
n_dim = 8
n_data = 4
n_class = 3

In [0]:
X = simple_random((n_data,n_dim),1)
print(X)
print('shape=',X.shape)

[[-0.5  0.1  0.2 -0.2 -0.1  0.1 -0.5  0.5]
 [-1.   0.6 -0.9  0.2 -0.3  0.3 -0.4  0.8]
 [-0.5  0.8  0.1  0.   0.4  0.8 -0.6 -0.1]
 [ 0.7 -1.   0.3 -0.1 -0.1 -0.3 -0.9 -1. ]]
shape= (4, 8)


In [0]:
W = simple_random((n_class, n_dim),2)
print(W)
print('shape=',W.shape)

[[-0.2  0.5  0.3 -0.2  0.1  0.8  0.1 -0.2]
 [-0.3 -0.8  0.7  0.1  0.5 -0.5 -0.3 -0.7]
 [-0.4 -0.6  0.   0.1  0.9 -0.3 -0.4  0. ]]
shape= (3, 8)


In [0]:
np.random.seed(25)
b = simple_random((n_class,1),3)
print(b)
print('shape=',b.shape)

[[ 0. ]
 [-0.7]
 [-0.2]]
shape= (3, 1)


## Linear Function

In many tutorials available on the Internet, you may find that they have different mathematical formulation for the forward (linear) function. 

But you should know that basically, depending on how you shape the matrices, it's all the same

Below is the example from $WX'+b$ formulation. You should notice that using $XW'+b$ formulation should result the same.

In [0]:
scores1 = W.dot(X.T) + b
print('scores =')
print(scores1)
print('shape=',scores1.shape)

scores =
[[ 0.17  0.2   1.17 -0.67]
 [-0.81 -2.23 -1.07  1.16]
 [ 0.   -0.34 -0.12  0.47]]
shape= (3, 4)


In [0]:
scores = X.dot(W.T) + b.T
print('scores =')
print(scores.T)
print('shape=',scores.shape)

scores =
[[ 0.17  0.2   1.17 -0.67]
 [-0.81 -2.23 -1.07  1.16]
 [ 0.   -0.34 -0.12  0.47]]
shape= (4, 3)


# Multiclass Loss Function

For multiclass classification problem, at the end of the system/network, there should be some activation/scoring function head to determine the classification. Then from the activation, we can calculate the loss/gradient to propagate back to the entire network.

There are two popular loss functions for multiclass classification problem:
* Softmax Loss or Categorical Cross-entropy Loss
* SVM Loss or Hinge Loss

Let's say from our previous three inputs, the the targets are as follow

In [0]:
y = np.array([0, 1, 2, 1])

print(scores)
print('\n target:',y)

[[ 0.17 -0.81  0.  ]
 [ 0.2  -2.23 -0.34]
 [ 1.17 -1.07 -0.12]
 [-0.67  1.16  0.47]]

 target: [0 1 2 1]


## Multiclass SVM Loss

In Multiclass SVM Loss, there is no Scoring function. So we can go stright calculate its loss.

First, get the score on the actual class (target class)

In [0]:
img = 0

print('score image',img,'      =', scores[img])
print('score on true class =', scores[img, y[img]])

score image 0       = [ 0.17 -0.81  0.  ]
score on true class = 0.17000000000000004


Then subtract the current score with the actual class score. For SVM, margin 1 is added to keep the actual class loss positive (=1)

In [0]:
print('(score image',img,') minus (score on true class) =', scores[img]-scores[img, y[img]])
print('margin is added by 1                         =', scores[img]-scores[img, y[img]]+1)

(score image 0 ) minus (score on true class) = [ 0.   -0.98 -0.17]
margin is added by 1                         = [1.   0.02 0.83]


Remove the negative loss

In [0]:
margin = scores[img]-scores[img, y[img]] + 1
print('remove all negative loss',img, '=', np.maximum(0, margin ))

remove all negative loss 0 = [1.   0.02 0.83]


Lastly, sum over all class loss and subtract by 1 for target (from margin)

In [0]:
img = 0

margin = scores[img]-scores[img, y[img]] + 1
losses_i = np.maximum(0, margin)
print('loss of example',img, '(Li) is the sum of it, minus 1 (for target) =', np.sum(losses_i) - 1 )

loss of example 0 (Li) is the sum of it, minus 1 (for target) = 0.8500000000000001


SVM Loss is the average of loss over all examples (data)

In [0]:
Loss_svm = []

for img in range(n_data):
    margin = scores[img]-scores[img, y[img]] + 1
    losses_i = np.maximum(0, margin)
    L_i = np.sum(losses_i) - 1
    print('SVM Loss for data',img,':',L_i)
    Loss_svm.append(L_i)

Loss_svm = np.array(Loss_svm)
print('\nHinge Loss or Multiclass SVM Loss is the average of all example losses')
print('SVM Loss (avg) =', np.mean(Loss_svm))

SVM Loss for data 0 : 0.8500000000000001
SVM Loss for data 1 : 6.32
SVM Loss for data 2 : 2.34
SVM Loss for data 3 : 0.31000000000000005

Hinge Loss or Multiclass SVM Loss is the average of all example losses
SVM Loss (avg) = 2.455


---
## Softmax Loss
In Softmax Loss, there are two steps. First we calculate the score, then the loss. 
![Softmax Loss](https://image.ibb.co/msQy7p/03.png)


In [0]:
print(scores)
print('shape=',scores.shape)

[[ 0.17 -0.81  0.  ]
 [ 0.2  -2.23 -0.34]
 [ 1.17 -1.07 -0.12]
 [-0.67  1.16  0.47]]
shape= (4, 3)


### Softmax Score
Softmax score will normalize the output into normalized log-probability distribution.

First we calculate the exponent of output scores, to get the unnormalized log probability

In [0]:
e_scores = np.exp(scores)
print(e_scores)
print('shape=',e_scores.shape)

[[1.19 0.44 1.  ]
 [1.22 0.11 0.71]
 [3.22 0.34 0.89]
 [0.51 3.19 1.6 ]]
shape= (4, 3)


sum over class

In [0]:
sum_e_score = np.sum(e_scores, axis=1, keepdims = True)
print(sum_e_score.T)

[[2.63 2.04 4.45 5.3 ]]


Divide the score to get the normalized log probabilities

In [0]:
norm_log_prob = e_scores / sum_e_score
print(norm_log_prob)

[[0.45 0.17 0.38]
 [0.6  0.05 0.35]
 [0.72 0.08 0.2 ]
 [0.1  0.6  0.3 ]]


Note that now, sum over all class for each data is equal to 1. The score now better represents the classification confidence to a class.

In [0]:
img = 0

print('probability over all classes on image', img, '      =', norm_log_prob[img])
print('total probability over all classes on image', img, '=', np.sum(norm_log_prob[img]))
print('this is the softmax score')

probability over all classes on image 0       = [0.45 0.17 0.38]
total probability over all classes on image 0 = 1.0
this is the softmax score


### Categorical Crossentropy Loss

To calculate the Softmax loss, also called categorical crossentropy, calculate the minus log of the score

we can use the base-10 log

In [0]:
print('log10 loss')
loss_i = -np.log10(norm_log_prob)
print(loss_i)

log10 loss
[[0.35 0.77 0.42]
 [0.22 1.28 0.46]
 [0.14 1.11 0.7 ]
 [1.02 0.22 0.52]]


or use natural log

In [0]:
loss_i_natural = -np.log(norm_log_prob)
print('natural log loss')
print(loss_i_natural)

natural log loss
[[0.8  1.78 0.97]
 [0.51 2.94 1.05]
 [0.32 2.56 1.61]
 [2.34 0.51 1.2 ]]


Like SVM Loss, Softmax loss is the average of all example (data)

In [0]:
Loss_softmax = []

for img in range(n_data):
    L_i = loss_i[img,y[img]]
    print('Softmax Loss for data',img,':',L_i)
    Loss_softmax.append(L_i)

Loss_softmax = np.array(Loss_softmax)
print('\nSoftmax Loss or Categorical Crossentropy Loss is the average of all example losses')
print('Softmax Loss (avg) =', np.mean(Loss_softmax))

Softmax Loss for data 0 : 0.3461525884668476
Softmax Loss for data 1 : 1.2782561807026984
Softmax Loss for data 2 : 0.7006628447549723
Softmax Loss for data 3 : 0.2206283114597039

Softmax Loss or Categorical Crossentropy Loss is the average of all example losses
Softmax Loss (avg) = 0.6364249813460556


In [0]:
Loss_natural = []

for img in range(n_data):
    L_i = loss_i_natural[img,y[img]]
    print('Softmax Loss for data',img,':',L_i)
    Loss_natural.append(L_i)

Loss_natural = np.array(Loss_natural)
print('\nSoftmax Loss or Categorical Crossentropy Loss is the average of all example losses')
print('Softmax Natural Loss(avg) =', np.mean(Loss_natural))

Softmax Loss for data 0 : 0.7970457901050659
Softmax Loss for data 1 : 2.943293626713537
Softmax Loss for data 2 : 1.6133358215476006
Softmax Loss for data 3 : 0.5080154610595616

Softmax Loss or Categorical Crossentropy Loss is the average of all example losses
Softmax Natural Loss(avg) = 1.4654226748564412


---
# Comparison

below is the comparison between 3 Losses

In [0]:
print('SVM Loss (avg)  =', Loss_svm, ',loss =',np.mean(Loss_svm))
print('Softmax Loss    =', Loss_softmax, ',loss =',np.mean(Loss_softmax))
print('Softmax Natural =', Loss_natural, ',loss =',np.mean(Loss_natural))

SVM Loss (avg)  = [0.85 6.32 2.34 0.31] ,loss = 2.455
Softmax Loss    = [0.35 1.28 0.7  0.22] ,loss = 0.6364249813460556
Softmax Natural = [0.8  2.94 1.61 0.51] ,loss = 1.4654226748564412


## Practical Technique: Shift Score to reduce computation workload
Calculating exponent from small number is quite expensive

shift the raw score by subtracting it with the maximum

In [0]:
shifted_scores = scores - np.max(scores)
print('shifted scores')
print(shifted_scores)

shifted scores
[[-1.   -1.98 -1.17]
 [-0.97 -3.4  -1.51]
 [ 0.   -2.24 -1.29]
 [-1.84 -0.01 -0.7 ]]


In [0]:
print('unnormalized log probability')
e_shifted_scores = np.exp(shifted_scores)
print(e_shifted_scores)

unnormalized log probability
[[0.37 0.14 0.31]
 [0.38 0.03 0.22]
 [1.   0.11 0.28]
 [0.16 0.99 0.5 ]]


In [0]:
print('normalized log probaility')
sum_e_shifted_score = np.sum(e_shifted_scores, axis=1, keepdims = True)
norm_log_prob_shifted = e_shifted_scores / sum_e_shifted_score
print(norm_log_prob_shifted)

normalized log probaility
[[0.45 0.17 0.38]
 [0.6  0.05 0.35]
 [0.72 0.08 0.2 ]
 [0.1  0.6  0.3 ]]


Difference between vanilla Softmax Loss and shifted Softmax Loss

In [0]:
loss_i_shifted = -np.log10(norm_log_prob_shifted)

Loss_shifted = []

for img in range(n_data):
    L_i = loss_i_shifted[img,y[img]]
    print('Softmax Loss for data',img,':',L_i)
    Loss_shifted.append(L_i)
    
Loss_shifted = np.array(Loss_shifted)

Softmax Loss for data 0 : 0.34615258846684765
Softmax Loss for data 1 : 1.2782561807026986
Softmax Loss for data 2 : 0.7006628447549723
Softmax Loss for data 3 : 0.2206283114597039


In [0]:
print('SVM Loss (avg)  =', Loss_svm, ',loss =',np.mean(Loss_svm))
print('Softmax Loss    =', Loss_softmax, ',loss =',np.mean(Loss_softmax))
print('Softmax Shifted =', Loss_shifted, ',loss =',np.mean(Loss_shifted))
print('Softmax Natural =', Loss_natural, ',loss =',np.mean(Loss_natural))


SVM Loss (avg)  = [0.85 6.32 2.34 0.31] ,loss = 2.455
Softmax Loss    = [0.35 1.28 0.7  0.22] ,loss = 0.6364249813460556
Softmax Shifted = [0.35 1.28 0.7  0.22] ,loss = 0.6364249813460556
Softmax Natural = [0.8  2.94 1.61 0.51] ,loss = 1.4654226748564412



<p>Copyright &copy;  <a href=https://www.linkedin.com/in/andityaarifianto/>2019 - ADF</a> </p>

![footer](https://image.ibb.co/hAHDYK/footer2018.png)