In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import math
from sklearn.metrics import accuracy_score
from sklearn import metrics
from sklearn.metrics import confusion_matrix
import sys,os
sys.path.append(r"C:\Users\anai\dive\oreilly\deep-learning-from-scratch\common")
from utils import im2col, col2im

In [None]:
sys.path

# 【Problem 1】Creating a 2-D convolutional layer

Develop the class Conv1d of 1D convolutional layers and create the class Conv2d of 2D convolutional layers.

The formula for forward propagation is as follows
$$a_{i, j, m} = \sum_{k=0}^{K-1} \sum_{s=0}^{F_h -1}\sum_{t=0}^{F_w -1} x_{(i+s),(j+t),k}w_{s,t,k,m} + b_m$$.

$a_{i,j,m}$ : value of row i, column j and channel m of the output array

$i$ : index of the array in the row direction

$j$ : Array column index

$m$ : index of the output channel

$K$ : Number of input channels

$F_h,F_w$ : Size of the filter in height (h) and width (w) direction

$x_{(i+s),(j+t),k}$ : (i+s)-row (j+t)-column, k-channel value of the input array

$w_{s,t,k,m}$ : row s, column t of the array of weights, for k-channel input, output weights for m-channel

$b_m$ : bias term of the output to the m-channel.

All are scalars.

Next is the update formula, which has the same form as for the 1D convolutional and all-combining layers.
$$w'_{s, t, k, m} = w_{s, t, k, m} - \alpha \frac{partial L}{\partial w_{s, t, k, m}}}$$

$$b'_m = b_m - \alpha \frac{\partial L}{\partial b_m}$$

$\alpha$ : Learning rate

$\frac{\partial L}{\partial w_{s, t, k, m}}$: Gradient of loss $L$ with respect to $w_{s,t,k,m}$.

$\frac{\partial L}{\partial b_m}$ : Gradient of loss $L$ with respect to $b_m$.

The back-propagation formulae for finding the gradient $\frac{\partial L}{\partial w_{s, t, k, m}}$ and $\frac{\partial L}{\partial b_m}$ are as follows
$$\frac{\partial L}{\partial w_{s, t, k, m}} = \sum_{i=0}^{N_{out, h}-1} \sum_{j=0}^{N_{out, w}-1} \frac{\partial L}{\partial a_{i,j,m}} x_{(i+s )(j+k),k}$$$$\frac{\partial L}{\partial b_m} = sum_{i=0}^{N_{out, h}-1} sum_{j=0}^{N_{out, w}-1}\frac{\partial L}{\partial a_{i,j,m}}$$

$\frac{\partial L}{\partial a_{i,j,m}}$ : The value of the i-th row, j-th column and m-channel of the gradient array

$N_{out,h},N_{out,w}$ : Size of the output in height direction (h) and width direction (w)

The formula for the error to be passed to the previous layer is as follows
$$\frac{\partial L}{\partial x_{i,j,k}} = \sum_{m=0}^{M-1} \sum_{s=0}^{F_{h-1}} \sum_{t=0}^{F_{w-1}} \frac{\partial L}{\partial a_{(i-s)(j-t),m}}w_{s, t, k, m}$$.

$\frac{\partial L}{\partial x_{i,j,k}}$ : i-column, j-row, k-channel value of the array of errors to be passed to the previous layer

$M$ : Number of output channels
However, when $i-s&lt;0$ or $i-s&gt;N_{out,h}-1$ or $j-t&lt;0$ or $j-t&gt;N_{out,w }-1$ when
$\frac{\partial L}{\partial a_{(i-s)(j-t),m}}w_{s, t, k, m}=0$.

# [Problem 2] Experiments with 2D convolutional layers on small arrays

The convolution changes the size of the feature map. The formula below will tell you how it changes. Create a function to perform this calculation.
$$N_{h, out} = \frac{N_{h, in} + 2P_h - F_h}{S_h} + 1$$$$N_{w, out} = \frac{N_{w, in} + 2P_w - F_w}{S_w} + 1$$.

$N_{out}$ : size of the output (number of features)

$N_{in}$ : size of the input (number of features)

$P$ : Number of paddings in a direction

$F$ : size of the filter

$S$ : size of the stride

where $H$ is the height direction and $W$ is the width direction

In [29]:
class Conv2d:
    """
    All coupling layers from n_nodes1 to n_nodes2
    Parameters
    ----------
    n_nodes1 : int
      Number of nodes in previous layer
    n_nodes2 : int
      Number of nodes in the next layer
    initializer : instance of initialization method
    optimizer : instance of optimisation method
    """
    def __init__(self):
        self.P = 0
        self.Str = 1
        self.a = np.array([])
        self.dW = np.array([])
        self.dX = np.array([])
        #self.s=None
        
    def forward(self, X, W ,B):
        """
        Forward
        Parameters
        ----------
        X : ndarray of the following form, shape (batch_size, n_nodes_bf)
            Input
        Returns
        ----------
        A : ndarray of the following form, shape (batch_size, n_nodes_af)
            Outputs
        """
        self.X = X
        self.XN, self.XC, self.XH, self.XW = self.X.shape
        
        self.W = W 
        self.FN, self.FC, self.FH, self.FW = self.W.shape
        self.B = B
        self.s = self.W.shape[3]
        
        self._output_size()
        
        self.Wre = self.W.reshape(self.FN,-1)
        #self.Bre = np.array([self.B] * self.Nhout * self.Nwout).reshape(-1,1)
        self.Bre = self.B.reshape(1,-1)
        self.col = im2col(self.X , self.FH, self.FW, self.Str, self.P)
        
        display(self.col)
        display(self.Wre.T)
        #display(self.Bre)
        display(self.col @ self.Wre.T)
        self.Are = (self.col @ self.Wre.T) + self.Bre
        display(self.Are)
        self.a = self.Are.reshape(self.XN, self.Nhout, self.Nwout,-1).transpose(0,3,1,2)
        display(self.a.shape)
        return self.a
    
    def _output_size(self):
        self.Nhout = int((self.XH + 2*self.P - self.FH) / self.Str + 1)       
        self.Nwout = int((self.XW + 2*self.P - self.FW) / self.Str + 1)
        
    def backward(self, dA):
        """
        Backward
        Parameters
        ----------
        dA : ndarray of the following form, shape (batch_size, n_nodes2)
            The gradient flowed from behind
        Returns
        ----------
        dZ : ndarray of the next shape, shape (batch_size, n_nodes1)
            Gradient flowing forward
        """
        self.dA=dA.transpose(0,2,3,1).reshape(-1,self.FN)
        self.dB = np.sum(self.dA, axis=0)
        
        display(self.dA)
        display(self.col.T)
        self.dW = np.dot(self.col.T , self.dA)
        display(self.dW)
        self.dW = self.dW.transpose(1, 0).reshape(self.FN, self.FC, self.FH, self.FW)
        display(self.dW)
        
        self.dXre = np.dot(self.dA, self.Wre) 
        self.dX = col2im(self.dXre, self.X.shape, self.FH, self.FW, self.Str, self.P)

        return self.dX

In [30]:
x = np.array([[2, 3, 4, 5], [1, 2, 3, 4]]) # shape(2, 4), where (number of input channels, number of features).
w = np.array([[[[1,1,1],
            [1,1,1]],
            [[1,1,1],
            [2,1,1]],
            [[2,1,1],
            [1,1,2]]]])
w=w.transpose(1,0,2,3)
display(w.shape)
b = np.array([3, 2, 1]) # (Number of output channels)
b=np.array([b]*1)
x = np.array([[x]*1]*1)
x.shape

(3, 1, 2, 3)

(1, 1, 2, 4)

In [None]:
dnn2 = Conv2d()
dnn2.forward(x,w,b)

In [None]:
dA = np.array([[[52,56]],
            [[32,35]],
            [[9,11]]])
dA = np.array([dA]*1)
dA.shape

In [None]:
dnn2.backward(dA)

In [None]:
ten_2 = np.array([[[1., 2., 3., 4.],
                 [1., 2., 3., 4.]],
                [[2., 3., 4., 5.],
                 [2., 3., 4., 5.]]])

ten_2 = np.expand_dims(ten_2, 0)
print('ten_2.shape{}'.format(ten_2.shape))
print('ten_2{}'.format(ten_2)) # (1, 2, 2, 4)

kernel = np.array([[[[1., 2., 1.],
                    [1., 1., 1.],
                    [2., 1., 1.]]],
                   [[[2., 1., 1.],
                    [1., 1., 1.],
                    [1., 1., 1.]]]])

kernel = kernel.transpose(3,0,1,2)
print('kernel.shape{}'.format(kernel.shape))
print('kernel{}'.format(kernel))

bias = np.array([[1],[2],[3]])
print('bias.shape{}'.format(bias.shape))

print('bias{}'.format(bias))

In [None]:
dnn3 = Conv2d()
dnn3.forward(ten_2,kernel,bias)

In [None]:
loss_2 = np.array([[[[9., 11.],
                   [9., 11.]],
                  [[32., 35.],
                   [32., 35.]],
                  [[52., 56.],
                   [52., 56.]]]])
display(loss_2.shape)
dnn3.backward(loss_2)

# [Problem 3] Output size after 2-dimensional convolution

$$a_{i,j,k} = max_{(p,q)\in P_{i,j}} x_{p,q,k}$$.

$P_{i,j}$ : the set of indices of the input array when outputting to row i and column j. Rows $(p)$ and columns $(q)$ in the range $S_h×S_w$.

$S_h,S_w$ : size of stride in height direction $(h)$ and width direction $(w)$.

$(p,q)\in P_{i,j}$ : Index of the row $(p)$ and column $(q)$ in $P_{i,j}$.

$a_{i,j,m}$ : value of row i, column j and channel k of the output array

$x_{p,q,k}$ : $p$-row $q$-column, $k$-channel values of the input array

Within a certain range, the maximum value will be calculated while leaving the axes in the channel direction.

For back-propagation we need to keep the index $(p,q)$ of the maximum value at the time of forward-propagation. The reason for this is that we want to keep the same error in the forward-propagated maximum, and zero in the other parts.

In [None]:
class Pooling:
    """
    All coupling layers from n_nodes1 to n_nodes2
    Parameters
    ----------
    n_nodes1 : int
      Number of nodes in previous layer
    n_nodes2 : int
      Number of nodes in the next layer
    initializer : instance of initialization method
    optimizer : instance of optimisation method
    """
    def __init__(self, PH ,PW):
        self.P = 0
        self.Str = 2
        self.PH = PH
        self.PW = PW
        
    def forward(self, X):
        """
        Forward
        Parameters
        ----------
        X : ndarray of the following form, shape (batch_size, n_nodes_bf)
            Input
        Returns
        ----------
        A : ndarray of the following form, shape (batch_size, n_nodes_af)
            Outputs
        """
        self.X = X
        self.XN, self.XC, self.XH, self.XW = self.X.shape
        #self.X = self.X.transpose(1,0,2,3)
        self.Xre = self.X.reshape(-1, 1, self.XH, self.XW)
        self.pcol = im2col(self.Xre , self.PH, self.PW, self.Str, self.P)
        #display(self.pcol)
        self.p_index = np.argmax(self.pcol, axis=1)
        #display(self.p_index)
        self.p_out_re =np.max(self.pcol,axis=1)
        #display(self.p_out_re)
        self._out_psize()
        self.p_out = self.p_out_re.reshape(self.XN, self.Phout, self.Pwout,-1).transpose(0,3,1,2)
        self.p_out = self.p_out_re.reshape(self.XN, -1, self.Phout, self.Pwout)
        #display(self.pcol)
        
        return self.p_out
    
    def _out_psize(self):
        self.Phout = int((self.XH + 2*self.P - self.PH) / self.Str + 1)       
        self.Pwout = int((self.XW + 2*self.P - self.PW) / self.Str + 1)
    
    def backward(self, dPin):
        """
        Backward
        Parameters
        ----------
        dA : ndarray of the following form, shape (batch_size, n_nodes2)
            The gradient flowed from behind
        Returns
        ----------
        dZ : ndarray of the next shape, shape (batch_size, n_nodes1)
            Gradient flowing forward
        """
        self.dPre = dPin.reshape(1,-1)
        self.backP = np.zeros((self.pcol.shape)) 
        for i in range(len(self.p_index)):
            self.backP[i][self.p_index[i]] = self.dPre[:,i]
        #display(self.backP)
        #display(self.pcol)
        self.dbackP = col2im(self.backP, self.Xre.shape, self.PH, self.PW, self.Str, self.P)
        self.dbackP = self.dbackP.reshape(self.X.shape)
        return self.dbackP

In [None]:
xin = np.array([[[[1,3,2,9],
                  [7,4,1,5],
                  [8,5,2,3],
                  [4,2,1,4]],
                 
                 [[1,3,2,9],
                  [7,4,1,5],
                  [8,5,2,3],
                  [4,2,1,4]]],
               [[[1,3,2,9],
                  [7,4,1,5],
                  [8,5,2,3],
                  [4,2,1,4]],
                 
                 [[1,3,2,9],
                  [7,4,1,5],
                  [8,5,2,3],
                  [4,2,1,4]]]])
display(xin.shape)
scr_pool=Pooling(2,2)
scr_pool.forward(xin).shape

In [None]:
#dP = np.array([[[[1,2,3],
#                 [4,5,6],
#                 [7,8,9]]]])
dP = np.array([[[[1,2],
                 [3,4]],
                
                [[1,2],
                 [3,4]]],
              [[[1,2],
                 [3,4]],
                
                [[1,2],
                 [3,4]]]])
display(dP.shape)
scr_pool.backward(dP)

# [Problem 5] (Advance task) Creating average pooling

In [None]:
class Flatten:
    """
    All join layers from number of nodes n_nodes1 to n_nodes2
    Parameters
    ----------
    n_nodes1 : int
      Number of nodes in previous layer
    n_nodes2 : int
      Number of nodes in the next layer
    initializer : instance of initialization method
    optimizer : instance of the optimisation method
    """
    def forward(self, X):
        """
        forward(self, X)
        Parameters
        ----------
        X : ndarray of the following form, shape (batch_size, n_nodes_bf)
            Input
        Returns
        ----------
        A : ndarray of the following form, shape (batch_size, n_nodes_af)
            Outputs
        """
        self.XN, self.XC, self.XH, self.XW = X.shape
        self.flatout = X.reshape(self.XN,-1)
        
        return self.flatout
    
    def backward(self, dPin):
        """
        Backward
        Parameters
        ----------
        dA : ndarray of the following form, shape (batch_size, n_nodes2)
            The gradient flowed from behind
        Returns
        ----------
        dZ : ndarray of the next shape, shape (batch_size, n_nodes1)
            Gradient flowing forward
        """
        self.dFlatten = dPin.reshape(self.XN, self.XC, self.XH, self.XW )
        
        return self.dFlatten

In [None]:
scr_flatten = Flatten()
scr_flatten.forward(dnn2.forward(x,w,b))

In [None]:
deluta_flat = np.array([1,2,3,4,5,6])
scr_flatten.backward(deluta_flat)

# [Problem 6] Smoothing

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import math
from sklearn.metrics import accuracy_score
from sklearn import metrics
from sklearn.metrics import confusion_matrix

In [None]:
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [None]:
X_train = X_train.astype(np.float)
X_test = X_test.astype(np.float)
X_train /= 255
X_test /= 255

X_train = X_train[:, np.newaxis,:,:]
X_test = X_test[:, np.newaxis,:,:]

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore', sparse=False)
y_train_one_hot = enc.fit_transform(y_train[:, np.newaxis])
y_test_one_hot = enc.transform(y_test[:, np.newaxis])

from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train_one_hot, test_size=0.2)

In [None]:
class GetMiniBatch:
    """
    Iterator to retrieve the mini-batch

    Parameters
    ----------
    X : ndarray of the following form, shape (n_samples, n_features)
      Training data
    y : ndarray of the following form, shape (n_samples, 1)
      The correct answer value
    batch_size : int
      batch size
    seed : int
      Seed of random number in NumPy
    """
    def __init__(self, X, y, batch_size = 20, seed=0):
        self.batch_size = batch_size
        np.random.seed(seed)
        shuffle_index = np.random.permutation(np.arange(X.shape[0]))
        self._X = X[shuffle_index]
        self._y = y[shuffle_index]
        self._stop = np.ceil(X.shape[0]/self.batch_size).astype(np.int)

    def __len__(self):
        return self._stop

    def __getitem__(self,item):
        p0 = item*self.batch_size
        p1 = item*self.batch_size + self.batch_size
        return self._X[p0:p1], self._y[p0:p1]        

    def __iter__(self):
        self._counter = 0
        return self

    def __next__(self):
        if self._counter >= self._stop:
            raise StopIteration()
        p0 = self._counter*self.batch_size
        p1 = self._counter*self.batch_size + self.batch_size
        self._counter += 1
        return self._X[p0:p1], self._y[p0:p1]

In [None]:
class SimpleInitializer:
    """
    Simple initialization with Gaussian distribution
    Parameters
    ----------
    sigma : float
      Standard deviation of the Gaussian distribution
    """
    def __init__(self, sigma):
        self.sigma = sigma
        #display(self.sigma.Calc)
    def W(self, FC, FN, FH=3, FW=3):
        """
        Initialize the weights
        Parameters
        ----------
        n_nodes1 : int
          Number of nodes in the previous layer
        n_nodes2 : int
          Number of nodes in the next layer

        Returns
        ----------
        W :
        """
        W = self.sigma * np.random.randn(FN, FC, FH, FW)
        return W
    
    def B(self, FN):
        """
        Bias initialization
        Parameters
        ----------
        n_nodes2 : int
          Number of nodes in the next layer

        Returns
        ----------
        B :
        """
        B = self.sigma * np.random.randn(1, FN)
        return B

In [None]:
class SGD:
    """
    Stochastic gradient descent method
    Parameters
    ----------
    lr : learning rate
    """
    def __init__(self, lr):
        self.lr = lr

    def update(self, dWorB, WorB):
        """
        Update the weights and biases of a layer
        Parameters
        ----------
        layer : the instance of the layer before the update

        Returns
        ----------
        layer : the instance of the layer after the update
        """
        self.WorB = WorB
        self.WorB -= self.lr*dWorB
        return self.WorB

In [None]:
class Activation:
    """
    Activation function tanh
    Parameters
    ----------
    """
    def __init__(self):
        pass
    def tanh_fw(self, X):
        """
        Forward
        Parameters
        ----------
        X : ndarray of the following form, shape (batch_size, n_nodes_bf)
            Input
        Returns
        ----------
        A : ndarray of the following form, shape (batch_size, n_nodes_af)
            Outputs
        """     
        self.A = X
        Z = np.tanh(X)
        return Z

    def tanh_bw(self, dZ):
        """
        Backward
        Parameters
        ----------
        dA : ndarray of the following form, shape (batch_size, n_nodes2)
            The gradient flowed from behind
        Returns
        ----------
        dZ : ndarray of the next shape, shape (batch_size, n_nodes1)
            Gradient flowing forward
        """
        dA = dZ * (1 - np.tanh(self.A)**2)  
        return dA
    
    def softmax_fw(self, X):
        """
        Forward
        Parameters
        ----------
        X : ndarray of the following form, shape (batch_size, n_nodes_bf)
            Input
        Returns
        ----------
        A : ndarray of the following form, shape (batch_size, n_nodes_af)
            Outputs
        """     
        Z = np.exp(X) / np.sum(np.exp(X), axis=1).reshape(-1,1)
        return Z

    def softmax_bw(self, Z, y):
        """
        Backward
        Parameters
        ----------
        dA : ndarray of the following form, shape (batch_size, n_nodes2)
            The gradient flowed from behind
        Returns
        ----------
        dZ : ndarray of the next shape, shape (batch_size, n_nodes1)
            Gradient flowing forward
        """
        dA = Z - y
        return dA

    def entropy(self, Z, y):
        """
        Forward
        Parameters
        ----------
        X : ndarray of the following form, shape (batch_size, n_nodes_bf)
            Input
        Returns
        ----------
        A : ndarray of the following form, shape (batch_size, n_nodes_af)
            Outputs
        """     
        L = -1*np.average(np.sum(y * np.log(Z), axis=1), axis=0)
        return L

In [None]:
class Xavier:
    """
    Activation function tanh
    Parameters
    ----------
    """
    def __init__(self,n):
        self.n=n
        self.calc = 1/np.sqrt(self.n)

In [None]:
class FC:
    """
    All coupling layers from n_nodes1 to n_nodes2
    Parameters
    ----------
    n_nodes1 : int
      Number of nodes in previous layer
    n_nodes2 : int
      Number of nodes in the next layer
    initializer : instance of initialization method
    optimizer : instance of optimisation method
    """
    def __init__(self, n_nodes1, n_nodes2, sigma, optimizer):
        # Initialization
        # Use the initializer method to initialize self.W and self.B
        self.n_nodes1 = n_nodes1
        self.n_nodes2 = n_nodes2
        self.optimizer = optimizer
        #self.W = initializer.W(self.n_nodes1, self.n_nodes2)
        #self.B = initializer.B(self.n_nodes2)
        display(sigma)
        self.W = sigma * np.random.randn(self.n_nodes1, self.n_nodes2)
        self.B = sigma * np.random.randn(1, self.n_nodes2)
        
    def forward(self, X):
        """
        Forward
        Parameters
        ----------
        X : ndarray of the following form, shape (batch_size, n_nodes_bf)
            Input
        Returns
        ----------
        A : ndarray of the following form, shape (batch_size, n_nodes_af)
            Outputs
        """
        self.X = X
        #display(self.X.shape)
        #display(self.W.shape)
        #display(self.B.shape)
        A = np.dot(self.X, self.W) +  self.B
        #display(A.shape)
        return A

    def backward(self, dA):
        """
        Backward
        Parameters
        ----------
        dA : ndarray of the following form, shape (batch_size, n_nodes2)
            The gradient flowed from behind
        Returns
        ----------
        dZ : ndarray of the next shape, shape (batch_size, n_nodes1)
            Gradient flowing forward
        """
        dB = np.sum(dA, axis=0)
        dW = np.dot(self.X.T, dA)
        dZ = np.dot(dA, self.W.T)     

        # Updates
        #self = self.optimizer.update(dW, dB, self.W, self.W)
        self.W = self.optimizer.update(dW, self.W)
        self.B = self.optimizer.update(dB, self.B)

        return dZ

In [None]:
class Conv2d:
    """
    All coupling layers from n_nodes1 to n_nodes2
    Parameters
    ----------
    n_nodes1 : int
      Number of nodes in previous layer
    n_nodes2 : int
      Number of nodes in the next layer
    initializer : instance of initialization method
    optimizer : instance of the optimisation method
    """
    def __init__(self, initializer, optimizer, FC, FN, FH=3, FW=3):
        self.P = 0
        self.Str = 1
        self.optimizer = optimizer
        self.W = initializer.W(FC, FN)
        self.B = initializer.B(FN)

        self.a = np.array([])
        self.dW = np.array([])
        self.dX = np.array([])
        #self.s=None
        
    def forward(self, X):
        """
        Forward
        Parameters
        ----------
        X : ndarray of the following form, shape (batch_size, n_nodes_bf)
            Input
        Returns
        ----------
        A : ndarray of the following form, shape (batch_size, n_nodes_af)
            Outputs
        """
        self.X = X
        self.XN, self.XC, self.XH, self.XW = self.X.shape
        self.FN, self.FC, self.FH, self.FW = self.W.shape
        self.s = self.W.shape[3]
        
        self._output_size()
        
        self.Wre = self.W.reshape(self.FN,-1)
        self.Bre = self.B.reshape(1,-1)
        self.col = im2col(self.X , self.FH, self.FW, self.Str, self.P)
        
        self.Are = (self.col @ self.Wre.T) + self.Bre
        self.a = self.Are.reshape(self.XN, self.Nhout, self.Nwout,-1).transpose(0,3,1,2)

        return self.a
    
    def _output_size(self):
        self.Nhout = int((self.XH + 2*self.P - self.FH) / self.Str + 1)       
        self.Nwout = int((self.XW + 2*self.P - self.FW) / self.Str + 1)
        
    def backward(self, dA):
        """
        Backward
        Parameters
        ----------
        dA : ndarray of the following form, shape (batch_size, n_nodes2)
            The gradient flowed from behind
        Returns
        ----------
        dZ : ndarray of the next shape, shape (batch_size, n_nodes1)
            Gradient flowing forward
        """
        self.dA=dA.transpose(0,2,3,1).reshape(-1,self.FN)
        self.dB = np.sum(self.dA, axis=0)
        
        self.dW = np.dot(self.col.T , self.dA)
        self.dW = self.dW.transpose(1, 0).reshape(self.FN, self.FC, self.FH, self.FW)
        
        self.dXre = np.dot(self.dA, self.Wre) 
        self.dX = col2im(self.dXre, self.X.shape, self.FH, self.FW, self.Str, self.P)

        self.W = self.optimizer.update(self.dW, self.W)
        self.B = self.optimizer.update(self.dB, self.B)
        
        return self.dX

In [None]:
class ScratchSimpleNeuralNetrowkClassifier():
    """
    A simple three-layer neural network classifier

    Parameters
    ----------
    bp : int
        Number of backpropagations
    Attributes
    ----------
    """
    def __init__(self, how_prp="htan", FN1=3, FN2=6, n_output=10, n_epoq=5, batch_size=20, 
                 lr=0.001, verbose = True):
        self.n_epoq = n_epoq
        self.FN1 = FN1
        self.FN2 = FN2
        self.n_output = n_output
        self.batch_size = batch_size
        self.how_prp = how_prp
        self.lr = lr
        self.loss_list = np.array([])
        self.loss_val_list = np.array([])

    def fit(self, X, y):
        """
        Trains a neural network classifier.
        Parameters
        ----------
        X : ndarray of the following form, shape (n_samples, n_features)
            Features of the training data.
        y : ndarray of the following form, shape (n_samples, )
            The correct answer value of the training data
        """
        self.X = X
        self.XN, self.XC, self.XH, self.XW = self.X.shape
        
        optimizer = SGD(self.lr)
        sigma1=Xavier(self.XC*3*3)
        sigma2=Xavier(self.FN1*3*3)
        
        flatten_node = self.FN2*5*5
        #last_node = np.round(flatten_node/2).astype(int)
        sigma3=Xavier(self.n_output)
        
        self.CONV1 = Conv2d(SimpleInitializer(sigma1.calc), optimizer, self.XC, self.FN1, FH=3, FW=3)
        self.activation1 = Activation()
        self.MaxPool1 = Pooling(PH=2,PW=2)
        self.CONV2 = Conv2d(SimpleInitializer(sigma2.calc), optimizer, self.FN1, self.FN2, FH=3, FW=3)
        self.activation2 = Activation()
        self.MaxPool2 = Pooling(PH=2,PW=2)
        self.FLAT = Flatten()
        self.DENSE = FC(flatten_node, self.n_output, sigma3.calc, optimizer)
        self.activation3 = Activation()
        
        self.val = 0
        get_mini_batch = GetMiniBatch(X, y, batch_size=self.batch_size)
        for _ in range(self.n_epoq):
            for X_mini, y_mini in get_mini_batch:
                self.X_ = X_mini
                self.y_ = y_mini
                #display(sigma3)
                self._forward_propagation(self.X_)
                self._back_propagation()
                
            self.L = self.activation3.entropy(self.Z3, self.y_)        
            self.loss_list = np.append(self.loss_list, self.L)
        
    def _forward_propagation(self, X):
        self.a1 = self.CONV1.forward(X)
        self.Z1 = self.activation1.tanh_fw(self.a1)
        #display(self.Z1.shape)
        self.p1 = self.MaxPool1.forward(self.Z1)
        #display(self.p1.shape)
        self.a2 = self.CONV2.forward(self.p1)
        #display(self.a2.shape)
        self.Z2 = self.activation2.tanh_fw(self.a2)
        #display(self.Z2.shape)
        self.p2 = self.MaxPool2.forward(self.Z2)
        #display(self.p2.shape)
        self.flat = self.FLAT.forward(self.p2) 
        #display(self.flat.shape)
        
        self.den = self.DENSE.forward(self.flat)
        #display(self.den.shape)
        self.Z3 = self.activation3.softmax_fw(self.den)
        #display(self.Z3.shape)
        
    def _back_propagation(self):
        dA3 = self.activation3.softmax_bw(self.Z3, self.y_)
        #display(dA3.shape)
        dZ2 = self.DENSE.backward(dA3)
        #display(dZ2.shape)
        dflat = self.FLAT.backward(dZ2)
        #display(dflat.shape)
        dP2 = self.MaxPool2.backward(dflat)
        #display(dP2.shape)
        dA2 = self.activation2.tanh_bw(dP2)
        dZ1 = self.CONV2.backward(dA2)
        
        dP1 = self.MaxPool1.backward(dZ1)
        dA1 = self.activation1.tanh_bw(dP1)
        #dZ0 = self.CONV2.backward(dA1)        

    def graph_cost_func(self):
        """
        Graph the loss trend.    
        If the data for verification is input, the loss trends for training and verification are superimposed and graphed.
        """
        plt.title("Num_of_Epoq vs Loss")
        plt.xlabel("Num_of_Epoq")
        plt.ylabel("Loss")
        plt.plot(range(1,self.n_epoq+1), self.loss_list, color="b", marker="o", label="train_loss")
        plt.grid()
        plt.legend()
        plt.show()
  
    def predict(self, X):
        """
        Estimation using a neural network classifier.
        Parameters
        ----------
        X : ndarray of the following form, shape (n_samples, n_features)
            Samples
            
        Returns
        -------
            ndarray of the following form, shape (n_samples, 1)
            Estimation results
        """
        #a1 = self.CONV1.forward(X)
        #Z1 = self.activation1.tanh_fw(a1)
        ##display(self.Z1.shape)
        #p1 = self.MaxPool1.forward(Z1)
        ##display(self.p1.shape)
        #a2 = self.CONV2.forward(p1)
        ##display(self.a2.shape)
        #Z2 = self.activation2.tanh_fw(a2)
        ##display(self.Z2.shape)
        #p2 = self.MaxPool2.forward(Z2)
        ##display(self.p2.shape)
        #flat = self.FLAT.forward(p2) 
        ##display(self.flat.shape)
        #
        #den = self.DENSE.forward(flat)
        ##display(self.den.shape)
        #Z3 = self.activation3.softmax_fw(den)
        
        self._forward_propagation(X)
        #display(Z3)
        return np.argmax(self.Z3, axis=1)

In [None]:
scr_nnc = ScratchSimpleNeuralNetrowkClassifier(batch_size=100)
scr_nnc.fit(X_train, y_train)

In [None]:
scr_nnc.predict(X_val)

In [None]:
from sklearn.metrics import accuracy_score
from sklearn import metrics
from sklearn.metrics import confusion_matrix

print("accuracy:{}".format(accuracy_score(y_test, scr_nnc.predict(X_test))))
print(" {}".format(confusion_matrix(y_test, scr_nnc.predict(X_test))))

In [None]:
scr_nnc.graph_cost_func()

# [Problem 9] (Advance assignment) Survey of famous image recognition models

When building a CNN model, it is necessary to calculate in advance the number of features at the stage of input to the all-connected layer. In addition, when dealing with large models, the calculation of the number of parameters becomes a necessity due to memory and computation speed. The framework can show you the number of parameters for each layer, but you need to understand the meaning to be able to adjust it properly.

Calculate the output size and the number of parameters for the following three convolutional layers. For the number of parameters also consider the bias term.

1. input size : 144×144, 3 channels filter size : 3×3, 6 channels stride : 1 padding : none

2. input size : 60 x 60, 24 channels filter size : 3 x 3, 48 channels stride : 1 padding : none

3. input size : 20x20, 10 channels filter size : 3x3, 20 channels stride : 2 padding : none

The last example is a case where the convolution cannot be done just right. The last example is a case where the convolution can't be done just right, the framework may not see the extra pixels. This is an example of why such a setting is undesirable, as it will result in missing edges.

1. input size : 144×144, 3 channels filter size : 3×3, 6 channels stride : 1 padding : none

Answer Output size = 141 x 141 Number of parameters = 6 (number of output filters) x 3 (number of kernels) x 3 x 3 Bias term = 6

2. input size : 60×60, 24 channels filter size : 3×3, 48 channels stride : 1 padding : none

Answer Output size = 58 x 58 Number of parameters = 48 (number of output filters) x 24 (number of kernels) x 3 x 3 Bias term = 48

3. input size : 20×20, 10 channels filter size : 3×3, 20 channels stride : 2 padding : none

Answer Output size = 10 x 10 Number of parameters = 20 (number of output filters) x 10 (number of kernels) x 3 x 3 Bias term = 20