<h1 style="text-align: center">Implement Multi-Perception</h1>

# Import lib

In [3]:
import unittest
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn

## T√≠nh to√°n ƒë·∫ßu ra c·ªßa m·∫°ng (t√≠nh xu√¥i)

L·∫∑p $L$ l·∫ßn, quy ∆∞·ªõc $\mathbf f_0 = \mathbf x$, v·ªõi $l = 1,2,\ldots L$

$$ \begin{align*} \mathbf a_l &= \mathbf W_l \mathbf f_{l-1}\in\mathbb R^{p_l}\\ \mathbf f_l &= \phi_l(\mathbf a_l)\in\mathbb R^{p_l} \end{align*} $$

T√≠nh logit, x√°c su·∫•t (softmax) v√† h√†m l·ªói cross-entropy

$$ \begin{align*} \mathbf f &= \mathbf f_L \in\mathbb R^{C}\\ \bf\mu &=\mathcal S(\mathbf f)\in\mathbb R^{C}\\ \ell &= -\mathbf y^T\log(\bf\mu)\in\mathbb R \end{align*} $$

$\mathbf y$ l√† **m√£ ho√° one-hot** c·ªßa nh√£n $y\in\{1,2,\ldots, C\}$ .

M·∫°ng n∆°-ron c√≥ $L-1$ l·ªõp ·∫©n v√† m·ªôt l·ªõp ƒë·∫ßu ra.

## L·ª±a ch·ªçn h√†m k√≠ch ho·∫°t

- H√†m tuy·∫øn t√≠nh: $\phi_l(a) = a$, hay d√πng ·ªü l·ªõp cu·ªëi c√πng
- H√†m sigmoid: $\phi_l(a) = \sigma(a) = \frac 1 {1+e^{-a}}$, hay d√πng ·ªü c√°c l·ªõp tr∆∞·ªõc l·ªõp cu·ªëi c√πng (l·ªõp ·∫©n)
- H√†m ReLU: $\phi_l(a) = \max(0, a)$
- H√†m tanh: $\phi_l(a) = 2\sigma(a)-1$

üí° N·∫øu ch·ªçn $\phi_l(a) = a$ v·ªõi m·ªçi t·∫ßng c·ªßa m·∫°ng th√¨ s·∫Ω c√≥ l·∫°i H·ªìi quy Logistics (tr∆∞·ªùng h·ª£p con c·ªßa m·∫°ng n∆°-ron)



## Suy lu·∫≠n b·∫±ng m·∫°ng n∆°-ron

**Lu·∫≠t ph√¢n l·ªõp**: ch·ªçn v·ªã tr√≠ ph·∫ßn t·ª≠ l·ªõn nh·∫•t trong $\mathbf f$ l√† ph√¢n l·ªõp c·ªßa $\mathbf x$.

## K√≠ch th∆∞·ªõc c·ªßa b·ªô tr·ªçng s·ªë

- ƒê·∫ßu ra c·ªßa t·∫ßng tr∆∞·ªõc l√† ƒë·∫ßu v√†o c·ªßa t·∫ßng sau: ma tr·∫≠n $\mathbf W_l\in \mathbb R^{p_l\times p_{l-1}}$, trong ƒë√≥ $p_l$ l√† s·ªë ƒë·∫ßu ra c·ªßa l·ªõp $l$ c√≤n $p_{l-1}$ l√† s·ªë ƒë·∫ßu ra c·ªßa l·ªõp $l-1$.
- Quy ∆∞·ªõc $p_0 = d+1$ l√† s·ªë ƒë·∫ßu v√†o c·ªßa m·∫°ng.
- L·ªõp cu·ªëi c√πng: $p_L = C$ l√† s·ªë l·ªõp c·ªßa b√†i to√°n ph√¢n l·ªõp.

In [4]:
class TestFullyConnected(unittest.TestCase):
    
    def test_fc_init(self):
        fc = FC(n_in = 3, n_out = 5, activation = "sigmoid")
        self.assertEqual(fc.n_in, 3)
        self.assertEqual(fc.n_out, 5)
        self.assertEqual(fc.activation, "sigmoid")
        self.assertEqual(fc.W.shape, (3, 5))
        self.assertEqual(fc.dW.shape, (3, 5))
    
    def test_fc_forward(self):
        fc = FC(n_in = 3, n_out = 5, activation = "sigmoid")
        x = np.zeros((10, 3), dtype=np.float32)
        y = fc.forward(x)
        error = np.sum(np.abs((y - np.ones_like(y) * 0.5)))
        self.assertEqual(y.shape, (10, 5))
        self.assertLess(error, 1e-6)

    def test_fc_forward_identity(self):
        fc = FC(n_in = 3, n_out = 5, activation = None)
        x = np.zeros((10, 3), dtype=np.float32)
        y = fc.forward(x)
        error = np.sum(np.abs(y - np.zeros_like(y)))
        self.assertEqual(y.shape, (10, 5))
        self.assertLess(error, 1e-6)
        
    def test_fc_backward(self):
        fc = FC(n_in = 3, n_out = 5, activation = "sigmoid")
        x = np.zeros((10, 3), dtype=np.float32)
        y = fc.forward(x)
        dx = fc.backward(np.zeros_like(y))
        self.assertEqual(dx.shape, x.shape)
        self.assertEqual(fc.dW.shape, fc.W.shape)
    
    def test_fc_backward_identity(self):
        fc = FC(n_in = 3, n_out = 5, activation = None)
        x = np.zeros((10, 3), dtype=np.float32)
        y = fc.forward(x)
        dx = fc.backward(np.zeros_like(y))
        self.assertEqual(dx.shape, x.shape)
        self.assertEqual(fc.dW.shape, fc.W.shape)
    

## ƒê·∫°o h√†m c·ªßa h√†m l·ªói (t√≠nh ng∆∞·ª£c)

### B1. T√≠nh ƒë·∫°o h√†m $\delta_{\bf \mu}$

$$ \delta_{\bf\mu} = -\mathbf y^T / \bf\mu^T\in\mathbb R^{1\times C} $$

<aside> üí° L∆∞u √Ω: vector h√†ng (d√≤ng)

</aside>

### B2. T√≠nh ƒë·∫°o h√†m $\mathbf J_{\bf\mu}(\mathbf f)$

$$ \begin{align*} \frac{\partial \mu_i}{\partial f_j} &= \frac{u'v-v'u}{v^2} =\frac{\mathbb I(i=j) e^{f_j}\sum_{c=1}^C e^{f_c}-e^{f_j}e^{f_i}}{(\sum_{c=1}^C e^{f_c})^2}\\ &= \mathbb I(i=j) \mu_j - \mu_i\mu_j=\begin{cases}(1-\mu_i)\mu_i&i=j\\-\mu_i\mu_j&i\neq j\end{cases}\\&=\begin{bmatrix}(1-\mu_1)\mu_1 &-\mu_2\mu_1 &\cdots&-\mu_K\mu_1\\ -\mu_1\mu_2&(1-\mu_2)\mu_2 &\cdots&-\mu_K\mu_2\\\vdots&\vdots&&\vdots\\-\mu_1\mu_K &-\mu_2\mu_K &\cdots&(1-\mu_K)\mu_K\end{bmatrix}\in\mathbb R^{C\times C} \end{align*} $$

### B3. T√≠nh ƒë·∫°o h√†m $\delta_{\mathbf f}$

$$ \delta_{\mathbf f_L=}\delta_{\mathbf f} = \delta_{\bf\mu}\mathbf J_{\bf\mu}(\mathbf f)\in\mathbb R^{1\times C} $$

Trong tr∆∞·ªùng softmax v√† cross-entropy th√¨ $\delta_{\mathbf f} = \bf\mu^T-\mathbf y^T$.

### B4. T√≠nh ƒë·∫°o h√†m $\mathbf J_{\mathbf f_l}(\mathbf a_l)$

H√†m tuy·∫øn t√≠nh $\phi_l(a) = a$ th√¨ $\mathbf J_{\mathbf f_l}(\mathbf a_l)=\mathbf I\in\mathbb R^{p_l\times p_l}$

H√†m sigmoid $\phi_l(a) = \sigma(a)$ th√¨ $\mathbf J_{\mathbf f_l}(\mathbf a_l)=\mathrm{diag}(f_{l1}(1-f_{l1}),f_{l2}(1-f_{l2}),\ldots,f_{lp_l}(1-f_{lp_l}))$

 üí° Do h√†m k√≠ch ho·∫°t ƒë∆∞·ª£c t√≠nh tr√™n t·ª´ng ph·∫ßn t·ª≠ c·ªßa $\mathbf a_l$ n√™n ma tr·∫≠n $\mathbf J_{\mathbf f_l}(\mathbf a_l)$ l√† ma tr·∫≠n ƒë∆∞·ªùng ch√©o

### B5. T√≠nh ƒë·∫°o h√†m $\mathbf J_{\mathbf a_l}(\mathbf f_{l-1})$

$$ \mathbf a_l = \mathbf W_l \mathbf f_{l-1} $$

$$ \mathbf J_{\mathbf a_l}(\mathbf f_{l-1}) = \mathbf W_l $$

### B6. T√≠nh ƒë·∫°o h√†m c·ªßa $\mathbf a_l$ ƒë·ªëi v·ªõi $\mathbf W_l$.

$$ \frac {\partial a_{li}}{\partial\mathbf W_l}=\begin{bmatrix}0 &0 &\cdots&0\\\vdots&\vdots&&\vdots\\ -&\mathbf f_{l-1}^T&-&-\\\vdots&\vdots&&\vdots\\0&0&\cdots&0\end{bmatrix}\in\mathbb R^{p_l\times p_{l-1}} $$

Ma tr·∫≠n g·ªìm to√†n c√°c d√≤ng s·ªë 0, duy nh·∫•t d√≤ng th·ª© $i$ l√† ƒë·∫ßu v√†o $\mathbf f_{l-1}^T$.

### B7. T√≠nh ƒë·∫°o h√†m $\delta_{\mathbf a_l}, \delta_{\mathbf W_l}, \delta_{\mathbf f_{l-1}}$

$$ \begin{align*} \delta_{\mathbf a_l} &= \delta_{\mathbf f_l}\mathbf J_{\mathbf f_l}(\mathbf a_l)\in\mathbb R^{1\times p_l}\\ \delta_{\mathbf W_l} &= \delta_{\mathbf a_l}^T\mathbf f_{l-1}^T \in\mathbb R^{p_l\times p_{l-1}}\\ \delta_{\mathbf f_{l-1}}&=\delta_{\mathbf a_l}\mathbf W_l\in\mathbb R^{1\times p_{l-1}} \end{align*} $$

<aside> üí° C√¥ng th·ª©c ƒë·∫ßu ti√™n c√≥ ma tr·∫≠n ƒë∆∞·ªùng ch√©o

</aside>

üí° C√¥ng th·ª©c th·ª© hai c√≥ ƒë∆∞·ª£c d·ª±a v√†o ƒë·∫°o h√†m th√†nh ph·∫ßn $\frac{\partial \ell}{\partial\mathbf W_l}=\sum_{i=1}^{p_l}\frac{\partial\ell}{\partial a_{li}}\frac {\partial a_{li}}{\partial\mathbf W_l}$

In [24]:
class FC:
    def __init__(self, n_in, n_out, activation=None):
        self.n_in = n_in
        self.n_out = n_out
        
        self.activation = activation
        self.W = np.random.randn(n_in, n_out)
        self.dW = np.zeros_like(self.W, dtype = np.float32)
#         self.a_ = None
#         self.f = None
#         self.f = None
        
        
    @staticmethod
    def stable_sigmoid(X):
        return np.where(X >= 0,
                        (1 + np.exp(-X))**-1,
                        np.exp(X) / (1 + np.exp(X)))
  
    def __activation(self, a):
        if self.activation in (None, "linear"):
            f = a.copy()
        elif self.activation == "sigmoid":
            f = self.stable_sigmoid(a)
        else:
            raise NotImplementError(f"{self.activation} has been implemented yet")
            
        return f
    
    def __deactivation(self, output_grad, f):
        if self.activation in (None, "linear"):
            da = output_grad.copy()
        elif self.activation == "sigmoid":
            da = f * (1 - f) * output_grad
        else:
            raise NotImplementError(f"{self.activation} has been implemented yet")
            
        return da
        
    def forward(self, x):
        # x: N_samples x n_in
        # W: n_in x n_out
        self.x = x.copy()
        self.a = x @ self.W
        self.f = self.__activation(self.a)
        return self.f
    
    def backward(self, output_grad):
        # output_grad: n_samples x n_out
        da = self.__deactivation(output_grad, self.f)
        """"""
        # self.x: n_samples x n_in
#         self.dW = self.x.T @ self.da # expecting: n_in x n_out
        self.dW = np.einsum("ij,ik->jk", self.x, da)
        self.dx = da @ self.W.T # n_samples x n_in
        
        return self.dx

In [25]:
fc = FC(n_in = 3, n_out = 5, activation='sigmoid')
x = np.ones((4, 3))
y = fc.forward(x)
# fc.backward(np.ones_like(y))

In [26]:
unittest.main(argv=[""], verbosity = 2, exit = False)

test_fc_backward (__main__.TestFullyConnected) ... ok
test_fc_backward_identity (__main__.TestFullyConnected) ... ok
test_fc_forward (__main__.TestFullyConnected) ... ok
test_fc_forward_identity (__main__.TestFullyConnected) ... ok
test_fc_init (__main__.TestFullyConnected) ... ok

----------------------------------------------------------------------
Ran 5 tests in 0.026s

OK


<unittest.main.TestProgram at 0x1bf20abe490>

In [8]:
a = np.array([[1,2], [3, 4]])
np.diag((a * (1 - a)).ravel())

array([[  0,   0,   0,   0],
       [  0,  -2,   0,   0],
       [  0,   0,  -6,   0],
       [  0,   0,   0, -12]])