# Deep Neural Network

You see a lot of people around you who are interested in deep neural networks and you think that it might be interesting to start thinking about creating a software that is as flexible as possible and allows novice users to test this kind of methods.

You have no previous knowledge and while searching a bit on the internet, you come across this project https://github.com/HyTruongSon/Neural-Network-MNIST-CPP. You say to yourself that this is a good starting point and decide to spend a bit more time on it.

We recall here the key elements found in deep neural networks. We will not go into the mathematical details as this is not the purpose of this course.

A deep neurl network is composed of an input, an output and several hidden layers.

A neuron is illustrated by the following figure

![image](./figures/dnn1.png)

This figure comes from a CNRS course called fiddle (https://gricad-gitlab.univ-grenoble-alpes.fr/talks/fidle).

We can observe that a neuron is made of weights, a bias and an activation function. The activation function can be a sigmoid, reLU, tanh, ...

A deep neural network is composed of several hidden layers with several neurons as illustrated in the following figure

![image](./figures/dnn2.png)

This figure also comes from the CNRS course fiddle.

In the following, we will use these notations:

- $w^l_{j,i}$ is the weight of the layer $l$ for the neuron $j$ and the input entry $i$.
- $z^l_j$ is the aggregation: $\sum_i x_{i}^l w_{j, i}^l + b_j^l$ where $x_{i}$ is the input.
- $\sigma$ is the activation function. 
- $a^l_j$ is the output of the neuron $j$ for the layer $l$.
- $L$ is the index of the last layer.
- $C(a^L, y)$ is the cost function where $a^L$ is the predict value and $y$ is the expected result.

The algorithm has three steps:

- the forward propagation: for a given input, cross all the layers until the output.
- Then using this output, change the weights and biases to minimize the cost function using a descent gradient. This is called backward propagation
- iterate until reaching the maximum number of iterations or a given tolerance.

The gradient descent can be written as

$$
w_{j, i}^l = w_{j, i}^l - \mu \frac{\partial C}{\partial w_{j, i}^l},
$$

where $\mu$ is the learning rate.

The equations of the backward propagation are

- $\delta^L_j = \frac{\partial C}{\partial a_j^L}\sigma'(z_j^L)$
- $\delta^l_j = \sum_i w^{l+1}_{i, j}\delta^{l+1}_i \sigma'(z_j^l)$
- $\frac{\partial C}{\partial b^l_j} = \delta_j^l$
- $\frac{\partial C}{\partial w^l_{j, i}} = a^{l-1}_i \delta_j^l$


We need to set of datas: datas for training the neural network and datas for testing the final weights and biases.


- Read the code https://github.com/HyTruongSon/Neural-Network-MNIST-CPP carefully and try to recognize each element of the algorithm.

- Think of a code organization and data structure that offer more flexibility and readability.

- Duplicate `step_0` into `step_1` and add all the `CMakeLists.twt` to create a library of `dnn` source files and the executable of the main function

- Duplicate `step_1` into `step_2` and implement the following functions
    - `forward_propagation`
    - `backward_propagation`
    - `evaluate`
    
- How to proceed to have more flexibility in the choice of the activation function ?