# Neural Network Intro

**Summary of Article**
- Theoretical Introduction to Neural Networks.
- FeedForward Neural Network Implementation for Regression.
- FeedForward Neural Network Implementation for Classification.



## Neural Network Intro
### Theoretical Introduction to Neural Networks
Neural Networks (NN)  are a class of ML models that are based on the connections of layers of artificial neurons. The connections between the layers are made up of weights and biases, that are updated during the training process. Activation functions are used to determine the output of a neuron. Different activation functions are what allow the NN to learn and generalize expressive results. The following illustration represents the architecture of a neural network. (Only on thesis)
** Figure here **.
### Training Process 
The training process of a NN is the process of updating the weights and biases of the neural network to make it better at predicting the output of the input. Backpropagation is a method of updating the weights and biases, where the derivative of the loss fuction with respect to the weights and biases, is used to update the respective values. The training process takes the following steps:

- Take a batch of training data.
- Forward propagate the batch of data through the neural network.
- Compute the loss function for the batch of data.
- Backpropagate the loss function to get the gradients.
- Update the weights and biases using the gradients.
- Repeat the above steps until the loss function is less than a determined threshold.

The most common activation function are: 
- Sigmoid function: $$g(z) = \frac{1}{1+e^{-z}}$$
- Tanh: $$ g(z)= \frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}$$
- ReLu: $$ g(z) = \max(0,z)$$

The most common Loss functions for Regression is:
- RMSE: $$L(z,y) = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - z_i)^2}$$


## Training a Neural Network with PyTorch 
Now Pytorch will be used to train a neural network. The data will be the sparse dataset normalized.

In [1]:
import pandas as pd
import torch
import sys; sys.path.append('..')
from thesis_package import utils

y_max_u_bool = pd.read_csv('..\data\ground_truth\\res_bus_vm_pu_max_bool_constr.csv').drop(columns='timestamps')
y_max_u = y_max_u_bool[utils.cols_with_positive_values(y_max_u_bool)]
exogenous_data = pd.read_csv('..\data\processed\production\exogenous_data_extended.csv').drop(columns=['date'])
X_max_u_bool_train, X_max_u_bool_test, y_max_u_bool_train, y_max_u_bool_test = utils.split_and_suffle(exogenous_data, y_max_u_bool, scaling=True)
data = {'X_train': torch.tensor(X_max_u_bool_train),
        'X_test': torch.tensor(X_max_u_bool_test),
        'y_train': torch.tensor(y_max_u_bool_train.astype(bool)),
        'y_test': torch.tensor(y_max_u_bool_test.astype(bool))
    }

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
import optuna
from optuna.trial import TrialState
# Setup
import torch 
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data

DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
BATCH_SIZE = 128
CLASSES = 2
EPOCHS = 10

First, we define a model.

In [None]:
def define_model():
    # Optimize the number of layers, hidden units and dropout rate.
    n_layers = 2
    layers = []
    in_features = data['X_train'].shape[1]
    for i in range(n_layers):
        out_features = 20
        layers.append(nn.Linear(in_features, out_features))
        layers.append(nn.ReLU())
        p = 0.5 # Probability of element being zeroed.
        layers.append(nn.Dropout(p))
        in_features = out_features
    layers.append(nn.Linear(in_features, CLASSES))
    layers.append(nn.Sigmoid())
    return nn.Sequential(*layers)

In [None]:
model  = define_model()
optimizer = 'Adam'
lr = 0.001
optimizer = 
