# __WSI - ćwiczenie 5.__

### __Sztuczne sieci neuronowe__

#### __Treść ćwiczenia__

- Celem cwiczenia jest implementacja perceptronu wielowarstwowego oraz wybranego algorytmu
optymalizacji gradientowej z algorytmem propagacji wstecznej.
- Nastepnie nalezy wytrenowac perceptron wielowarstwowy do klasyfikacji zbioru danych wine
(https://archive.ics.uci.edu/ml/datasets/wine). Zbiór ten dostepny jest w pakiecie scikitlearn
(sklearn.datasets.load wine).

In [160]:
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score, confusion_matrix, roc_curve, roc_auc_score, precision_recall_curve, auc, RocCurveDisplay, PrecisionRecallDisplay, recall_score, precision_score, f1_score, classification_report
from sklearn.model_selection import train_test_split
from seaborn import heatmap
import plotly.express as px
from math import log, inf, e, tanh, sqrt
from sklearn.utils import resample, shuffle
import unittest

RNG = np.random.default_rng()

cele: 
- perceptron wielowarstwowy, implementacja ze zmienną ilością warstw głębokich oraz zmienną ilością długości wektora neuronów
- kilka algorytmów optymalizacji wag sieci (gradient prosty, SGD, algorytm ewolucyjny??)

zadania:
1. model sieci
2. propagacja wsteczna
3. optymalizacja wag

__Weight matrix per layer:__

$$
\theta^{l}=
\left[\begin{array}{ccc}
\omega_{1,1}& \cdots&\omega_{1,k+1}\\
\vdots&\ddots&\vdots\\
\omega_{n,1}&\cdots&\omega_{n,k+1}
\end{array}\right]
$$
where  $ \omega_{i,j} $ is the $j$-th weight of the $i$-th neuron (in layer $l$), and $ \omega_{i,k+1} $ is its bias

__Matrix of layers:__

$$
\Theta=
\left[\begin{array}{ccc}
\theta^{1}& \cdots&\theta^{\lambda}\
\end{array}\right]
$$
where  $ \theta^{\lambda} $ is the output layer 

__Input data vector:__

$$
y^0=\left[\begin{array}{ccc}
x^T& 1
\end{array}\right]^T
$$
it is extended by 1 to allow easier multiplication

__Opertation of a single neuron:__

$$
y^l_i=\psi(\theta^l_i y^{l-1})
$$

$\psi$ is the neuron activation function

__Output layer:__

$$
f_i(x, \Theta)=\theta^\lambda_i y^\lambda
$$

__Backpropagation__
$$
\frac{de}{ds^l_i}=\frac{de}{dy^l_i}\frac{∂\psi^l(s^l_i)}{∂s^l_i}
$$

$$
\frac{de}{dy^l_i}=\sum_{\gamma} \frac{de}{ds^{l+1}_\gamma}\theta^{l+1}_{\gamma,i} 
$$
For the last layer this can be calculated immidiately:

$$
\frac{∂e}{∂\theta^l_{i,j}}=\frac{de}{ds^l_i}y^{l-1}_j= \frac{de}{dy^l_i} \frac{∂\psi^l(s^l_i)}{∂s^l_i}y^{l-1}_j
$$

For the rest:
$$
\frac{∂e}{∂\theta^l_{i,j}}= \left( \sum_{\gamma} \frac{de}{ds^{l+1}_\gamma}\theta^{l+1}_{\gamma,i} \right) \frac{∂\psi^l(s^l_i)}{∂s^l_i}y^{l-1}_j
$$

So in order to calculate all the derivatives we need to first calculate all derivatives of neuron input sums

__Default weigths initialization:__

todo

In [161]:
class MLP():
    """
    todo fully-connected?

    Attribubtes:
        _layers: 

    Methods:
        fit:
    """
    def __init__(self, dimensions:list, activations:list, derivatives:list, feature_number:int) -> None:
        """
        todo

        Args:
            dimensions: starting from first hidden layer
            activations: last actvation function should be linear if a basic MLP is being modeled

        Returns:
            MLP object

        Raises:
            None
        """

        # todo assertions

        self._layers = np.array([np.empty((dimensions[0], feature_number + 1))] + \
                                [np.empty((dimensions[i+1], dimensions[i]+1)) for i in range(len(dimensions)-1)], 
                                dtype=object)
        self._activations = activations
        self._derivatives = derivatives

    def initialize_weights(self, strategy='default'):
        # todo strategies
        if strategy == 'default':
            for layer in self._layers[:-1]:
                size = layer.shape[1]
                size_sqrt = sqrt(size)
                with np.nditer(layer, op_flags=['writeonly']) as it:
                    for w in it:
                        w[...] = RNG.uniform(-1/size_sqrt, 1/size_sqrt)
            self._layers[-1].fill(0)
        if strategy == 'ones':
            for layer in self._layers:
                layer.fill(1)

    def fit(self):
        pass

    def feed_forward(self, input_vector):
        # todo description
        # this returns a vector consisting of tuples 
        # of single neuron input sum and effective activated output 
        all_inputs_outputs = []
        current_layer_activated_outputs = input_vector
        for layer, activate in zip(self._layers, self._activations):
            current_layer_activated_outputs = current_layer_activated_outputs + [1]
            current_layer_input_sums = [np.matmul(weights, current_layer_activated_outputs) for weights in layer]
            current_layer_activated_outputs = [activate(s) for s in current_layer_input_sums]
            current_layer_pairs = np.array([current_layer_input_sums, current_layer_activated_outputs]).T
            all_inputs_outputs.append(current_layer_pairs)
        return all_inputs_outputs

    def propagate(self, input_vector):
        return self.feed_forward(input_vector)[-1][:, 1].tolist()

    def backprop(self, input_vector, true_output_vector, loss_func_derivative):
        
        gradient_estimate = []
        
        all_inputs_outputs = self.feed_forward(input_vector)
        all_partial_deriv_input_sums = [[deriv(s) for s in layer[:, 0]] for layer, deriv in zip(all_inputs_outputs, self._derivatives)]
        
        # configure for output layer
        # calculate loss derivative values
        total_deriv_outputs = [loss_func_derivative(output, true_value) for output, true_value in zip(all_inputs_outputs[-1][:,1], true_output_vector)]
        # all outputs appended with 1 (bias) at each layer
        all_outputs_with_input_vector = [input_vector + [1]] + [layer[:,1].tolist() + [1] for layer in all_inputs_outputs]
        # for each layer
        layer_idx = -1
        for layer_idx in range(len(self._layers)-1, -1, -1):
            partial_deriv_input_sums = all_partial_deriv_input_sums[layer_idx]
            total_deriv_input_sums = np.multiply(total_deriv_outputs, partial_deriv_input_sums)
            # calculate estimated gradient for layer 
            layer_gradient = np.matmul(total_deriv_input_sums[np.newaxis].T, np.array(all_outputs_with_input_vector[layer_idx])[np.newaxis])
            gradient_estimate.insert(0, layer_gradient)
            
            # prepare for next layer 
            if not layer_idx == 0:
                # shorten the arrays by 1 to avoid calculating derivatives for biases
                total_deriv_outputs = [np.multiply(total_deriv_input_sums, neuron_weights).sum() for neuron_weights in self._layers[layer_idx][:,:-1].T]

        return np.array(gradient_estimate, dtype=object)

    def add_gradients():
        ...

    def BGD_step(self, data, target_label, learning_rate):
        gradient_sum = np.shape(self._layers)
        for _, row in data.iterrows():
            gradient_sum += gradient_sum + self.backprop(row.drop([target_label]).tolist(), 
                                                         [row[target_label]], 
                                                         lambda v, t: 2*(v-t))
        avg_gradient = (gradient_sum / data.shape[0])
        self._layers = self._layers - ( avg_gradient * learning_rate)
    
    def SGD_step(self, data, mini_batch_size):
        ...

    def predict(self, data):
        return data.apply(lambda row : self.propagate(row.tolist()), axis=1)


In [162]:
class TestMLP(unittest.TestCase):

    def test_init_1_neuron(self):
        mlp = MLP([1], [], [], 1)
        self.assertEqual(len(mlp._layers), 1)
        self.assertEqual(mlp._layers[0].shape, (1, 2))

    def test_init_multi_neuron(self):
        mlp = MLP([5, 3, 11], [], [], 15)
        self.assertEqual(len(mlp._layers), 3)
        self.assertEqual(mlp._layers[0].shape, (5, 16))
        self.assertEqual(mlp._layers[1].shape, (3, 6))
        self.assertEqual(mlp._layers[2].shape, (11, 4))

    def test_1_feature_feed_forward_1_neuron(self):
        mlp = MLP([1], [lambda x: x], [], 1)
        mlp._layers[0] = np.array([[3, 2]])
        self.assertEqual(mlp.propagate([3]), [11.])

    def test_multi_features_feed_forward_mutli_neuron(self):
        mlp = MLP([2, 3, 2], [lambda x: x, lambda x: 2*x, lambda x: x], [], 2)
        mlp.initialize_weights('ones')
        self.assertEqual(mlp.propagate([2,  3]), [79., 79.])

    def test_initialize_weigths_default(self):
        mlp = MLP([2, 3, 2], [lambda x: x, lambda x: 2*x, lambda x: x], [], 2)
        mlp.initialize_weights(strategy='default')
        for layer in mlp._layers[:-1]:
            self.assertTrue(((layer > -1) & (layer < 1)).all())
        self.assertTrue((mlp._layers[-1] == 0).all())

    def test_single_predict(self):
        mlp = MLP([2, 3, 2], [lambda x: x, lambda x: 2*x, lambda x: x], [lambda x: 1, lambda x: 2, lambda x: 1], 2)
        mlp.initialize_weights('ones')
        self.assertEqual(mlp.propagate([2,  3]), [79, 79])

    def test_backprop(self):
        mlp = MLP([2, 3, 1], [lambda x: x**2, lambda x: x**2, lambda x: x], [lambda x: 2*x, lambda x: 2*x, lambda x: 1], 1)
        mlp.initialize_weights('ones') 
        gradient = mlp.backprop([3], [3000], lambda t, v: 2*(t - v))
        true_gradient = np.array([np.array([[2547072.,  849024.],
                                   [2547072.,  849024.]]),
                         np.array([[566016., 566016.,  35376.],
                                   [566016., 566016.,  35376.],
                                   [566016., 566016.,  35376.]]),
                         np.array([[5.83704e+05, 5.83704e+05, 5.83704e+05, 5.36000e+02]])], dtype=object)
        self.assertTrue(all([(ar1 == ar2).all() for ar1, ar2 in zip (gradient, true_gradient)]))

unittest.main(argv=[''],  exit=False)

.......
----------------------------------------------------------------------
Ran 7 tests in 0.008s

OK


<unittest.main.TestProgram at 0x21dff419480>

In [163]:
data = pd.DataFrame(data=[[x, x**2] for x in RNG.uniform(-1, 1, 100)], columns=['x', 'y'])

In [164]:
mlp = MLP([2, 3, 1], [lambda x: x**2, lambda x: x**2, lambda x: x], [lambda x: 2*x, lambda x: 2*x, lambda x: 1], 1)
mlp.initialize_weights('default')
mlp._layers

array([array([[ 0.65750109, -0.12846013],
              [-0.68677102, -0.5221293 ]]),
       array([[ 0.08767372, -0.20051741, -0.30937769],
              [ 0.26951933, -0.0827613 , -0.21743387],
              [ 0.12484229, -0.42868849, -0.01910858]]),
       array([[0., 0., 0., 0.]])], dtype=object)

In [165]:
mlp.BGD_step(data, 'y', 0.01)
mlp._layers

array([array([[-3.8029518e+26, -3.8029518e+26],
              [-3.8029518e+26, -3.8029518e+26]]),
       array([[-3.8029518e+26, -3.8029518e+26, -3.8029518e+26],
              [-3.8029518e+26, -3.8029518e+26, -3.8029518e+26],
              [-3.8029518e+26, -3.8029518e+26, -3.8029518e+26]]),
       array([[-3.66273355e+26, -3.79005716e+26, -3.79016299e+26,
               -1.91677948e+26]])                                ],
      dtype=object)

In [166]:
mlp.BGD_step(data, 'y', 0.01)
mlp._layers

  layer_gradient = np.matmul(total_deriv_input_sums[np.newaxis].T, np.array(all_outputs_with_input_vector[layer_idx])[np.newaxis])
  total_deriv_outputs = [np.multiply(total_deriv_input_sums, neuron_weights).sum() for neuron_weights in self._layers[layer_idx][:,:-1].T]
  layer_gradient = np.matmul(total_deriv_input_sums[np.newaxis].T, np.array(all_outputs_with_input_vector[layer_idx])[np.newaxis])
  total_deriv_outputs = [np.multiply(total_deriv_input_sums, neuron_weights).sum() for neuron_weights in self._layers[layer_idx][:,:-1].T]
  layer_gradient = np.matmul(total_deriv_input_sums[np.newaxis].T, np.array(all_outputs_with_input_vector[layer_idx])[np.newaxis])
  total_deriv_outputs = [np.multiply(total_deriv_input_sums, neuron_weights).sum() for neuron_weights in self._layers[layer_idx][:,:-1].T]
  layer_gradient = np.matmul(total_deriv_input_sums[np.newaxis].T, np.array(all_outputs_with_input_vector[layer_idx])[np.newaxis])
  total_deriv_outputs = [np.multiply(total_deriv_input_sums

array([array([[nan, inf],
              [nan, inf]]), array([[inf, inf, inf],
                                   [inf, inf, inf],
                                   [inf, inf, inf]]),
       array([[            inf,             inf,             inf,
               9.25492375e+212]])                                ],
      dtype=object)