# Neural Network Intro

**Summary of Article**
- Theoretical Introduction to Neural Networks.
- FeedForward Neural Network Implementation for Regression.
- FeedForward Neural Network Implementation for Classification.



## Neural Network Intro
### Theoretical Introduction to Neural Networks
Neural Networks (NN)  are a class of ML models that are based on the connections of layers of artificial neurons. The connections between the layers are made up of weights and biases, that are updated during the training process. Activation functions are used to determine the output of a neuron. Different activation functions are what allow the NN to learn and generalize expressive results. The following illustration represents the architecture of a neural network. (Only on thesis)
** Figure here **.
### Training Process 
The training process of a NN is the process of updating the weights and biases of the neural network to make it better at predicting the output of the input. Backpropagation is a method of updating the weights and biases, where the derivative of the loss fuction with respect to the weights and biases, is used to update the respective values. The training process takes the following steps:

- Take a batch of training data.
- Forward propagate the batch of data through the neural network.
- Compute the loss function for the batch of data.
- Backpropagate the loss function to get the gradients.
- Update the weights and biases using the gradients.
- Repeat the above steps until the loss function is less than a determined threshold.

The most common activation function are: 
- Sigmoid function: $$g(z) = \frac{1}{1+e^{-z}}$$
- Tanh: $$ g(z)= \frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}$$
- ReLu: $$ g(z) = \max(0,z)$$

The most common Loss functions for Regression is:
- RMSE: $$L(z,y) = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - z_i)^2}$$


## Training a Neural Network with PyTorch 
Now Pytorch will be used to train a neural network. The data will be the sparse dataset normalized.

In [1]:
import pandas as pd
import torch
import sys; sys.path.append('..')
from thesis_package import utils

The first step is to create the data set class. Our data set class will extend `torch.utils.data.Dataset`. Afterwards, we will create the data loader class. The data loader class will extend `torch.utils.data.DataLoader`. The data loader is used to separate the data set in batches, shuffle the data set and create an iterator.

First the dataset is loaded and prepared in tha same fashion as for the other ML models.

In [3]:
y_max_u_bool = pd.read_csv('..\data\ground_truth\\res_bus_vm_pu_max_bool_constr.csv').drop(columns='timestamps')
y_max_u = y_max_u_bool[utils.cols_with_positive_values(y_max_u_bool)]
exogenous_data = pd.read_csv('..\data\processed\production\exogenous_data_extended.csv').drop(columns=['date'])
X_max_u_bool_train, X_max_u_bool_test, y_max_u_bool_train, y_max_u_bool_test = utils.split_and_suffle(exogenous_data, y_max_u_bool, scaling=True)
data = {'X_train':X_max_u_bool_train,
        'X_test': X_max_u_bool_test,
        'y_train':y_max_u_bool_train.astype(bool),
        'y_test': y_max_u_bool_test.astype(bool)
    }

Then the dataset class is declared.

In [10]:
from torch.utils.data import Dataset, DataLoader
class ThesisDataset(Dataset):
    def __init__(self, data) -> None:
        train_X, train_y = data['X_train'], data['y_train']
        test_X, test_y = data['X_test'], data['y_test']
        self.X = torch.tensor(train_X.values)
        self.y = torch.tensor(train_y.values)
        self.X_test = torch.tensor(test_X.values)
        self.y_test = torch.tensor(test_y.values)
    def __getitem__(self, index) -> tuple:
        return self.X[index], self.y[index]
    def __len__(self) -> int:
        return len(self.X)
dataset = ThesisDataset(data)
dataset[0]

(tensor([0.9610, 0.9391, 0.2836, 0.0202, 0.6667, 0.1429, 0.2948, 0.0231, 0.9624,
         0.2588, 0.9390], dtype=torch.float64),
 tensor([False, False, False, False, False, False, False, False, False, False,
         False, False, False, False, False, False, False, False, False, False,
         False, False, False, False, False, False, False, False, False, False,
         False, False, False, False]))

Now we create our data loader object.

In [16]:
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
dataiter = iter(dataloader)
from math import ceil
total_samples = len(dataset)
n_iterations = ceil(total_samples / 32)
print('Total samples: {}, total interations: {}'.format(total_samples, n_iterations))

Total samples: 36172, total interations: 1131


Now our training loop will look like the follwing:

```python	
for epoch in range(num_epochs):
    for i, (features, labels) in enumerate(dataloader):
        # Zero grads, Forward, Backwards and Update
    # Compute and print loss
    # Evaluate model on validation set
```

Now, it is necessary to decide on hyperparameters for the neural network. These hyper parameters will later be tunned using optuna.

In [17]:
hyper_params = {
    'input_size': dataset.X.shape[1],
    'hidden_size': 32,
    'output_size': dataset.y.shape[1],
    'num_epochs': 100,
    'batch_size': 32,
    'learning_rate': 0.001  
}

It is important to configure the device once it is faster to train the models oh the GPU, if one is available. Later it is necessary to push th tensors into device.

In [21]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device # In my case I don't have a GPU, so I use the CPU ... Sad, so if anyone wants to give me a GPU, hit me up on LinkedIn :) 

device(type='cpu')

First, we define a model.

In [None]:
import torch.nn as nn
class FeedforwardNetwork(nn.Module):
    def __init__(
            self, output_size, input_size, hidden_size, layers,
            activation_type, dropout, **kwargs):
        """
        n_classes (int)
        n_features (int)
        hidden_size (int)
        layers (int)
        activation_type (str)
        dropout (float): dropout probability
        As in logistic regression, the __init__ here defines a bunch of
        attributes that each FeedforwardNetwork instance has. Note that nn
        includes modules for several activation functions and dropout as well.
        """
        super(FeedforwardNetwork, self).__init__()
        self.feedforward_nn = nn.Sequential(nn.Linear(input_size, hidden_size),
                                            nn.Tanh(),
                                            nn.Dropout(dropout),
                                            nn.Linear(hidden_size, hidden_size),
                                            nn.Tanh(),
                                            nn.Dropout(dropout),
                                            nn.Linear(hidden_size, hidden_size),
                                            nn.Tanh(),
                                            nn.Dropout(dropout),
                                            nn.Linear(hidden_size, output_size))
    def forward(self, x, **kwargs):
        """
        x (batch_size x n_features): a batch of training examples
        This method needs to perform all the computation needed to compute
        the output logits from x. This will include using various hidden
        layers, pointwise nonlinear functions, and dropout.
        """
        return self.feedforward_nn(x)


def train_batch(X, y, model, optimizer, criterion, **kwargs):
    """
    X (n_examples x n_features)
    y (n_examples): gold labels
    model: a PyTorch defined model
    optimizer: optimizer used in gradient step
    criterion: loss function
    To train a batch, the model needs to predict outputs for X, compute the
    loss between these predictions and the "gold" labels y using the criterion,
    and compute the gradient of the loss with respect to the model parameters.
    Check out https://pytorch.org/docs/stable/optim.html for examples of how
    to use an optimizer object to update the parameters.
    This function should return the loss (tip: call loss.item()) to get the
    loss as a numerical value that is not part of the computation graph.
    """
    optimizer.zero_grad()  # Setting our stored gradients equal to zero
    output = model(X)  # Computes the gradient of the given tensor w.r.t. the weights/bias
    loss = criterion(output, y) # cross entropy in this case
    loss.backward() # Computes the gradient of the given tensor w.r.t. graph leaves 
    optimizer.step() # Updates weights and biases with the optimizer (SGD of ADAM)
    return loss.item()
    


def predict(model, X):
    """X (n_examples x n_features)"""
    scores = model(X)  # (n_examples x n_classes)
    predicted_labels = scores.argmax(dim=-1)  # (n_examples)
    return predicted_labels


def evaluate(model, X, y):
    """
    X (n_examples x n_features)
    y (n_examples): gold labels
    """
    model.eval()
    y_hat = predict(model, X)
    n_correct = (y == y_hat).sum().item()
    n_possible = float(y.shape[0])
    model.train()
    return n_correct / n_possible


def plot(epochs, plottable, ylabel='', name='', title=''):
    plt.clf()
    plt.xlabel('Epoch')
    plt.ylabel(ylabel)
    plt.plot(epochs, plottable)
    plt.grid()
    plt.title(title)
    plt.savefig('%s.pdf' % (name), bbox_inches='tight')


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('model',
                        choices=['logistic_regression', 'mlp'],
                        help="Which model should the script run?")
    parser.add_argument('-epochs', default=20, type=int,
                        help="""Number of epochs to train for. You should not
                        need to change this value for your plots.""")
    parser.add_argument('-batch_size', default=1, type=int,
                        help="Size of training batch.")
    parser.add_argument('-learning_rate', type=float, default=0.01)
    parser.add_argument('-l2_decay', type=float, default=0)
    parser.add_argument('-hidden_sizes', type=int, default=200)
    parser.add_argument('-layers', type=int, default=1)
    parser.add_argument('-dropout', type=float, default=0.3)
    parser.add_argument('-activation', choices=['tanh', 'relu'], default='relu')
    parser.add_argument('-optimizer', choices=['sgd', 'adam'], default='sgd')
    opt = parser.parse_args()
    print(opt)
    utils.configure_seed(seed=42)

    data = utils.load_classification_data()
    dataset = utils.ClassificationDataset(data)
    train_dataloader = DataLoader(
        dataset, batch_size=opt.batch_size, shuffle=True)

    dev_X, dev_y = dataset.dev_X, dataset.dev_y
    test_X, test_y = dataset.test_X, dataset.test_y

    n_classes = torch.unique(dataset.y).shape[0]  # 10
    n_feats = dataset.X.shape[1]

    # initialize the model    
    model = FeedforwardNetwork(
        n_classes, n_feats,
        opt.hidden_sizes, opt.layers,
        opt.activation, opt.dropout)

    # get an optimizer
    optims = {"adam": torch.optim.Adam, "sgd": torch.optim.SGD}

    optim_cls = optims[opt.optimizer]
    optimizer = optim_cls(
        model.parameters(),
        lr=opt.learning_rate,
        weight_decay=opt.l2_decay)

    # get a loss criterion
    criterion = nn.CrossEntropyLoss()

    # training loop
    epochs = torch.arange(1, opt.epochs + 1)
    train_mean_losses = []
    valid_accs = []
    train_losses = []
    for ii in epochs:
        print('Training epoch {}'.format(ii))
        for X_batch, y_batch in train_dataloader:
            loss = train_batch(
                X_batch, y_batch, model, optimizer, criterion)
            train_losses.append(loss)

        mean_loss = torch.tensor(train_losses).mean().item()
        print('Training loss: %.4f' % (mean_loss))

        train_mean_losses.append(mean_loss)
        valid_accs.append(evaluate(model, dev_X, dev_y))
        print('Valid acc: %.4f' % (valid_accs[-1]))

    final_acc = evaluate(model, test_X, test_y)
    print('Final Test acc: %.4f' % (evaluate(model, test_X, test_y)))
    # plot
    file_name = '.\q4_1_results\q4_1_loss_' + str(opt.learning_rate) + '_' + str(opt.hidden_sizes) + '_' + str(opt.dropout) + '_' + str(opt.activation) + '_' + str(opt.optimizer) +  '_' + str(opt.layers)
    plot(epochs, train_mean_losses, ylabel='Loss', name=file_name, title='Loss(Epoch)')
    
    file_name = '.\q4_1_results\q4_1_acc' + str(opt.learning_rate) + '_' + str(opt.hidden_sizes) + '_' + str(opt.dropout) + '_' + str(opt.activation) + '_' + str(opt.optimizer) + '_' + str(opt.layers) + '_' + str(final_acc) 
    plot(epochs, valid_accs, ylabel='Accuracy', name=file_name, title='Accuracy(Epoch); Final Accuracy=' + str(final_acc))
#%%
if __name__ == '__main__':
    main()

In [None]:
# main
