## Pipeline of multi-classification
![](https://watertap.oss-ap-southeast-2.aliyuncs.com/img/20210313214431.png)

### Softmax
$$ softmax(\vec{x_i}) = \frac{e^{x_i}}{\sum{e^{x_j}}}$$

假设一个向量，描述同事最近的各种情况，预测他的心情

In [1]:
person1 = [8.5, 34, 173, 54] # 工作8.5个小时， 34岁， 173cm， 54kg

In [5]:
心情 = ['开心', '难过', '平静', '生气', '惊喜']

In [4]:
import numpy as np

In [8]:
weights = np.random.randn(4,5)
weights

array([[-1.48646436,  1.33464191, -0.16997196,  0.31751428, -1.52605198],
       [ 3.15491439,  0.79582594, -0.40743345, -0.25555758, -0.25794931],
       [ 0.29781294, -0.8930184 , -1.37124732, -1.35311116,  1.56214044],
       [ 0.03991902,  0.23419123,  0.22428729, -0.51159647,  1.56125781]])

In [10]:
bias = np.random.random()
bias

0.0953484180385229

### y_ = wx + b

In [13]:
y_ = np.dot(person1, weights) + bias
y_

array([ 148.40475534, -103.34796987, -240.31642269, -267.60917886,
        332.91184811])

In [23]:
def softmax(x):
    x = x - np.max(x)
    return np.exp(x)/np.sum(np.exp(x))

In [17]:
softmax(y_)

array([7.40606868e-081, 3.42585025e-190, 1.12234175e-249, 1.57409783e-261,
       1.00000000e+000])

### weights 初始是 0-1 的小数，softmax之前的wx+b，如果x不做归一化，就会参差不齐。这样得到的softmax会一个接近1，其他接近0，并非想要的效果

### 对 x 进行归一化, 这里因为只有一个person的数据，无法归一化，故先随意定值

In [18]:
person1_norm = [-1.48646436,  1.33464191, -0.16997196,  0.31751428]

In [19]:
y_ = np.dot(person1_norm, weights) + bias
y_

array([ 6.47766069, -0.6002594 ,  0.10851593, -0.65015133,  2.24970192])

In [25]:
y_predict = softmax(y_)
y_predict

array([9.82374484e-01, 8.28658199e-04, 1.68342065e-03, 7.88329256e-04,
       1.43251081e-02])

## Cross-Entropy
### 衡量这一组weights和bias的好坏程度，Loss_func
$$cross-entropy(label, y_softmax) = - \sum_{i\in{N}}{label_i * log(ysoftmax(i))}$$

In [26]:
y_label = [0, 1, 0, 0, 0]

In [27]:
weights

array([[-1.48646436,  1.33464191, -0.16997196,  0.31751428, -1.52605198],
       [ 3.15491439,  0.79582594, -0.40743345, -0.25555758, -0.25794931],
       [ 0.29781294, -0.8930184 , -1.37124732, -1.35311116,  1.56214044],
       [ 0.03991902,  0.23419123,  0.22428729, -0.51159647,  1.56125781]])

In [28]:
bias

0.0953484180385229

In [29]:
def cross_entropy(label, y_softmax):
    return -sum(label[i] * np.log(y_softmax[i]) for i in range(len(label)))

In [30]:
cross_entropy(y_label, y_predict)

7.0957027933760255

In [31]:
np.log(8.28658199e-04)

-7.0957027930371055

### Gradicent 交叉熵损失函数的偏导

In [38]:
## 偏导
# gradient_loss = (y_predict - y_label) * person1_norm
print(max(y_predict) - 1)
print(person1_norm)
gradient_loss = np.dot((max(y_predict) - 1) , person1_norm)
gradient_loss

-0.01762551622212638
[-1.48646436, 1.33464191, -0.16997196, 0.31751428]


array([ 0.0261997 , -0.02352375,  0.00299584, -0.00559635])

In [39]:
print(weights.T)

[[-1.48646436  3.15491439  0.29781294  0.03991902]
 [ 1.33464191  0.79582594 -0.8930184   0.23419123]
 [-0.16997196 -0.40743345 -1.37124732  0.22428729]
 [ 0.31751428 -0.25555758 -1.35311116 -0.51159647]
 [-1.52605198 -0.25794931  1.56214044  1.56125781]]


In [40]:
print(weights.T-gradient_loss)

[[-1.51266406  3.17843814  0.29481709  0.04551537]
 [ 1.3084422   0.8193497  -0.89601424  0.23978759]
 [-0.19617166 -0.38390969 -1.37424316  0.22988364]
 [ 0.29131458 -0.23203383 -1.35610701 -0.50600012]
 [-1.55225168 -0.23442556  1.5591446   1.56685416]]


In [42]:
weights_adj = (weights.T-gradient_loss).T
weights_adj

array([[-1.51266406,  1.3084422 , -0.19617166,  0.29131458, -1.55225168],
       [ 3.17843814,  0.8193497 , -0.38390969, -0.23203383, -0.23442556],
       [ 0.29481709, -0.89601424, -1.37424316, -1.35610701,  1.5591446 ],
       [ 0.04551537,  0.23978759,  0.22988364, -0.50600012,  1.56685416]])

In [43]:
bias_adj = bias - (max(y_predict) - 1)
bias_adj

0.11297393426064928

In [44]:
y_adj = np.dot(person1_norm, weights_adj) + bias_adj
y_adj

array([ 6.56791305, -0.51000705,  0.19876829, -0.55989897,  2.33995428])

In [45]:
softmax(y_adj)

array([9.82374484e-01, 8.28658199e-04, 1.68342065e-03, 7.88329256e-04,
       1.43251081e-02])

### 迭代n次

In [55]:
_n_ = 10000

for i in range(_n_):
    y_pred = softmax(np.dot(person1_norm, weights) + bias)
    loss = cross_entropy(y_label, y_pred)
    if i%1000 == 0:
        print(loss)
    grandient_loss = np.dot((max(y_predict) - 1) , person1_norm)
    weights = (weights.T - grandient_loss).T
    bias = bias - (max(y_predict) - 1)

7.09570279337647
7.095702793376473
7.095702793376697
7.095702793376693
7.09570279337669
7.095702793376697
7.09570279337647
7.095702793367394
7.095702793367617
7.0957027933747066
