**Group information**

| Family name | First name | Email address |
| ----------- | ---------- | ------------- |
|             |            |               |
|             |            |               |
|             |            |               |

# Network - Practice

This tutorial explores how to implement a simple neural network to predict the likelihood of loan default from borrower loan characteristics. The labelled dataset contains 100,000 observations and 16 predictors (e.g. income, credit score). The response is a binary variable indicating whether the borrower defaulted on the loan.

In [None]:
# Packages
import numpy as np
import pandas as pd
import shutil
import os
import torch
import torchinfo

from captum import attr
from matplotlib import pyplot as plt
from sklearn import metrics, model_selection, preprocessing
from torch import nn, optim, utils
from tqdm import tqdm
from urllib import request

# Device
device = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'
device = torch.device(device)

# Utilities
def download_data():
    '''Downloads the data folder'''
    if os.getcwd().endswith('/data'):
        print('Data folder already exists')
    else:
        request.urlretrieve('https://www.dropbox.com/scl/fo/tniycpagp0c3p72uy0ag1/ACdmVyp71Zw_89tPERPN2mI?rlkey=0nxq0gifiqh5fwl9j0dk8lgk9&dl=1', 'data.zip')
        shutil.unpack_archive('data.zip', 'data')
        os.remove('data.zip')
        os.chdir('data')

In [None]:
# Execute on first run
download_data()

**1. Descriptive statistics**

Load the `X.csv` and `y.csv` files using `pd.read_csv`. Display the first few observations with the `head` method and generate descriptive statistics for both continuous (e.g. `describe`) and categorical (e.g. `value_counts`) variables.

**2. Format data**

Pre-process the data by encoding categorical variables with `pd.get_dummies` and converting the target variable to a probability format i.e. `float`.

**3. Split samples and scale data**

Split the dataset into a training and a test sample using `model_selection.train_test_split` and allocate 80% of the observations to the training sample. 

Scale the input variables using `preprocessing.MinMaxScaler` by fitting the scaler on the training sample and applying the transformation to both the training and the test sample. Explain why input scaling is required for machine learning models.

**4. Data loaders**

Convert the training and test sets to `torch.Tensor` objects using `torch.from_numpy` and wrap them in `utils.data.TensorDataset`. Use these datasets to create `utils.data.DataLoader` instances for training and testing. Explain why `shuffle=True` is necessary for the training loader, select an appropriate batch size, and discuss the trade-offs involved in that choice.


**5. Model structure**

Define a feedforward neural network class using PyTorch. The model takes as input a feature vector and outputs a probability scores. The model consists of two hidden layers with 16 units each, each followed by a ReLU activation, and ends with a linear output layer. Instantiate the model and print the model architecture.

Note: For numerical stability, PyTorch loss functions expect raw logit scores rather than probability distributions. The sigmoid transformation is applied internally within the loss function.

**6. Model training** 

Define the appropriate loss function `nn.BCEWithLogitsLoss` and an optimisation algorithm (e.g. `optim.AdamW`).  Write a PyTorch training loop to estimate the model parameters using the training sample, with a maximum of 25 epochs and a learning rate of `1e-3`. Remember to move the model and the batch data to the correct device.

**7. Model performance** 

Write a PyTorch evaluation loop to assess the model's generalisation performance on the test sample. Interpret the results using `metrics.classification_report` and `metrics.confusion_matrix`.

**8. Feature importance** 

Using the `attr.IntegratedGradients` function, compute local variable importance on the test sample. Aggregate the results across all observations and display them as a bar plot. Which variables are the most important for the classification?

**9. Regularisation**

Apply regularisation by configuring the optimiser's `weight_decay`, adding a penalty term to the loss function, inserting dropout layers after the activation functions, or implementing early stopping based on performance on a validation sample.

**10. Model architecture tuning**

Modify the model structure to improve predictive performance.