# Additive Margin Softmax

### 1. Reference
- [Additive Margin Softmax for Face Verification](https://arxiv.org/abs/1801.05599)

- [Lochappy's github](https://github.com/lochappy/MyTensorflowPractices/blob/master/CNN/01-b-MNIST-AMSoftmax.ipynb)

### 2. Theory
#### Softmax:
L2 normalize $w$ and $x$
$$
\begin{aligned}
L_s & = -\frac{1}{n}\sum_{i=1}^{n}{\log{\frac{e^{w_{y_i}^T x_i}}{\sum_{j=1}^{c}{e^{w_j^T x_i}}}}} \\
& = -\frac{1}{n}\sum_{i=1}^{n}{\log{\frac{e^{\lVert w_{y_i}\rVert \lVert x_i\rVert \cos\theta_{y_i}}}{\sum_{j=1}^{c}{e^{\lVert w_j\rVert \lVert x_i\rVert \cos\theta_j}}}}} \\
& = -\frac{1}{n}\sum_{i=1}^{n}{\log{\frac{e^{\cos\theta_{y_i}}}{\sum_{j=1}^{c}{e^{\cos\theta_j}}}}}
\end{aligned}
$$

#### AM-Softmax:
L2 normalize $w$ and $x$
$$
\begin{aligned}
L_{ams} & = -\frac{1}{n}\sum_{i=1}^{n}{\log{\frac{e^{s(\cos\theta_{y_i}-m)}}{e^{s(\cos\theta_{y_i}-m)}+\sum_{j=1,j\neq y_i}^{c}{e^{s\cos\theta_j}}}}} \\
& = -\frac{1}{n}\sum_{i=1}^{n}{\log{\frac{e^{s(w_{y_i}^T x_i -m)}}{e^{s(w_{y_i}^T x_i-m)}+\sum_{j=1,j\neq y_i}^{c}{e^{sw_j^T x_i}}}}}
\end{aligned}
$$

suggest $s=30$, $m=0.35$

### 3. implement

In [1]:
#import tensorflow as tf
import numpy as np

input: $x\_shape = (batch\_size,4)\qquad$ 
weight: $w\_shape = (4,num\_class)\qquad$
label: $y = [0,3,1]\qquad$
scale: $s=30\qquad$
$m=0.35$

In [2]:
x = np.arange(12).reshape((3, 4))
w = np.arange(16).reshape((4, 4))-8
y = np.array([0, 3, 1]).reshape((-1,1))
s = 30
m = 0.35
print('x= \n', x)
print('w= \n', w)
print('y =', y, ', s = ', s, ', m = ', m)

x= 
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
w= 
 [[-8 -7 -6 -5]
 [-4 -3 -2 -1]
 [ 0  1  2  3]
 [ 4  5  6  7]]
y = [[0]
 [3]
 [1]] , s =  30 , m =  0.35


L2 normalize $x$(by row) and $w$(by col)

$\frac{x}{\lVert x\rVert}$ and $\frac{w}{\lVert w\rVert}$

In [3]:
x_l2 = (np.sum(x**2, axis=1)**(1/2)).reshape((-1,1))
x = x / x_l2

w_l2 = (np.sum(w**2, axis=0)**(1/2)).reshape((1,-1))
w = w / w_l2

print('x= \n', x, '\n')
print('w= \n', w)

x= 
 [[ 0.          0.26726124  0.53452248  0.80178373]
 [ 0.35634832  0.4454354   0.53452248  0.62360956]
 [ 0.4181667   0.47043754  0.52270837  0.57497921]] 

w= 
 [[-0.81649658 -0.76376262 -0.67082039 -0.54554473]
 [-0.40824829 -0.32732684 -0.2236068  -0.10910895]
 [ 0.          0.10910895  0.2236068   0.32732684]
 [ 0.40824829  0.54554473  0.67082039  0.76376262]]


$\cos\theta_j = \frac{x}{\lVert x\rVert} \frac{w}{\lVert w\rVert}$

In [4]:
cos = np.dot(x,w)
print(cos)

[[ 0.21821789  0.40824829  0.5976143   0.7581754 ]
 [-0.21821789 -0.01944039  0.19920477  0.40824829]
 [-0.29875272 -0.10265789  0.11688115  0.33078652]]


$s\cdot(\cos\theta_{y_i}-m)$ and $s\cdot\cos\theta_{j,j\neq y_i}$

In [5]:
groundtruth_score = []
for i in range(cos.shape[0]):
    groundtruth_score.append(float(cos[i][y[i]]))
groundtruth_score = np.array(groundtruth_score).reshape((-1,1))
print(groundtruth_score)

[[ 0.21821789]
 [ 0.40824829]
 [-0.10265789]]


In [6]:
M = np.greater(groundtruth_score, m)*m
print(M)

[[ 0.  ]
 [ 0.35]
 [ 0.  ]]


In [7]:
one_hot_y = np.zeros(cos.shape)
for i in range(len(one_hot_y)):
    one_hot_y[i][y[i]] = 1.
print('one hot y = \n', one_hot_y, '\n')
cos_min_m = (cos - one_hot_y * M)
print('cos-m = \n', cos_min_m)
print('only change 1 value, from 0.4082 to 0.0582.')

one hot y = 
 [[ 1.  0.  0.  0.]
 [ 0.  0.  0.  1.]
 [ 0.  1.  0.  0.]] 

cos-m = 
 [[ 0.21821789  0.40824829  0.5976143   0.7581754 ]
 [-0.21821789 -0.01944039  0.19920477  0.05824829]
 [-0.29875272 -0.10265789  0.11688115  0.33078652]]
only change 1 value, from 0.4082 to 0.0582.


$\exp(s\cdot(\cos\theta_{y_i}-m))$ and $\exp(s\cdot\cos\theta_{j,j\neq y_i})$

In [8]:
exp_feature = np.exp(cos_min_m*s)
print(exp_feature)

[[  6.96826675e+02   2.08448797e+05   6.11248554e+07   7.55338691e+09]
 [  1.43507709e-03   5.58102945e-01   3.93918111e+02   5.73993975e+00]
 [  1.28115085e-04   4.59713640e-02   3.33292159e+01   2.04062337e+04]]


$e^{s(\cos\theta_{y_i}-m)}+\sum_{j=1,j\neq y_i}^{c}{e^{s\cos\theta_j}}$

In [9]:
sum_feature = np.sum(exp_feature, axis=1).reshape((-1,1))
print(sum_feature)

[[  7.61472091e+09]
 [  4.00217588e+02]
 [  2.04396090e+04]]


$\frac{e^{s(\cos\theta_{y_i}-m)}}{e^{s(\cos\theta_{y_i}-m)}+\sum_{j=1,j\neq y_i}^{c}{e^{s\cos\theta_j}}}$

In [10]:
feature_norm = exp_feature / sum_feature
feature_groundtruth_norm = np.sum(feature_norm * one_hot_y, axis=1).reshape((-1,1))
print('feature_groundtruth_norm = \n', feature_groundtruth_norm)

feature_groundtruth_norm = 
 [[  9.15104681e-08]
 [  1.43420477e-02]
 [  2.24913128e-06]]


$loss = -\frac{1}{n}\sum_{i=1}^{n}{\log{\frac{e^{s(\cos\theta_{y_i}-m)}}{e^{s(\cos\theta_{y_i}-m)}+\sum_{j=1,j\neq y_i}^{c}{e^{s\cos\theta_j}}}}}$

In [11]:
loss = -1/3*np.log(np.sum(feature_groundtruth_norm, axis=1)).reshape((-1,1))
print(loss)

[[ 5.40227082]
 [ 1.41485322]
 [ 4.33498884]]
