# 📌 <span style="font-size:18px; color:#007acc;"><b>Introduction</b></span>

This is the second notebook in the series. The first notebook focused on data preprocessing and cleaning, including handling missing values, detecting and removing outliers, and performing exploratory data analysis (EDA). In this notebook, we continue from that point by importing necessary modules and defining a custom dataset class to handle the loading and preprocessing of the data. We then proceed to define the neural network architecture, followed by dataset instantiation and preprocessing, which includes feature engineering, normalization, and further data preparation. A custom batch sampler is implemented for efficient data loading. The notebook also includes the creation of training and validation functions, defining key evaluation metrics for model performance, and the training and evaluation of the neural network. For interpretability, SHAP analysis is performed to explain feature importance. The trained model is then saved for future use, and a FastAPI server is created to serve the model for local inference. Finally, predictions can be made by sending requests to the API. Note: This work was conducted on MS Azure, so some settings may need adjustments.

🚀 <span style="font-size:18px; color:#e63946;"><b>Let's get started!</b></span> 🚀


# <span style="font-size:18px; color:#007acc;"><b> Table of Contents</b></span>
1. [Import Modules](#Import-Modules) 
2. [Dataset Class](#Dataset-Class)  
3. [Define Network](#Define-Network) 
4. [Dataset Instantiation and Preprocessing](#Dataset-Instantiation-and-Preprocessing) 
5. [Custom Batch Sampler](#Custom-Batch-Sampler) 
6. [Training and Validation Functions](#Training-and-Validation-Functions) 
7. [Evaluation Metrics](#Evaluation-Metrics) 
8. [Training and Evaluation of Neural Network](#Training-and-Evaluation-of-Neural-Network)
9. [SHAP Analysis](#SHAP-Analysis)
10. [Save Model](#Save-Model)
11. [Create FAST API](#Create-FAST-API)
12. [Inference from API](#Inference-from-API)

## <span style="font-size:18px; color:#007acc;"><b> 1. Import Modules <a id="Import-Modules"></a> ##

In [None]:
import sys
import logging
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import fsspec
import seaborn as sns
import mlflow
import datetime
from torchvision import datasets, transforms, models
from torch.utils.data import Dataset, random_split, DataLoader, TensorDataset
from torchmetrics import Precision, Recall, F1Score
from torch.utils.data import Sampler
import math
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_curve, auc
import requests
from fastapi import FastAPI
from pydantic import BaseModel
from uvicorn import run
import threading  # Ensure threading is imported

## <span style="font-size:18px; color:#007acc;"><b> 2. Dataset Class <a id="Dataset-Class"></a> ##

In [None]:
##### Define dataset class #####
################################
# Define a custom dataset class inheriting from PyTorch's Dataset class
class FreddieMacDataSet(Dataset):
    # Constructor method, used for initializing the dataset
    def __init__(self):
        super().__init__()
        # Read Full 2004 Data
        feature_data = pd.read_csv("feature2004_7_all.txt", sep='\t', skiprows=1)
        label_data = pd.read_csv("label2004_7_all.txt", sep='\t', skiprows=1)
        print(feature_data.shape)
        print(label_data.shape)
        # Convert feature data to a numpy array and store it as X
        self.X = np.array(feature_data)
        # Convert label data to a numpy array and store it as Y
        self.Y = np.array(label_data)
        # Store the number of samples in the dataset
        self.n_samples = self.X.shape[0]
    
    # Method to retrieve a sample from the dataset given its index
    def __getitem__(self, index):
        # Return the feature and label of the sample at the given index
        return self.X[index], self.Y[index]

    # Method to return the total number of samples in the dataset
    def __len__(self):
        # Return the number of samples
        return self.n_samples


## <span style="font-size:18px; color:#007acc;"><b> 3. Define Network <a id="Define-Network"></a> ##

In [None]:
##### Define network #####
##########################
class NNModel(nn.Module):
    def __init__(self, input_size, hidden_sizes, output_size, initialization='he'):
        super().__init__()
        self.dropout = nn.Dropout(p=0.3)
        self.input_size = input_size
        self.hidden_sizes = hidden_sizes
        self.output_size = output_size
        
        # Define the first hidden layer
        self.fc1 = nn.Linear(input_size, hidden_sizes[0])
        self.initialize_weights(self.fc1, initialization)
        
        # Define subsequent hidden layers
        self.hidden_layers = nn.ModuleList()
        for i in range(len(hidden_sizes) - 1):
            layer = nn.Linear(hidden_sizes[i], hidden_sizes[i+1])
            self.hidden_layers.append(layer)
            self.initialize_weights(layer, initialization)
        
        # Define the output layer
        self.fc_out = nn.Linear(hidden_sizes[-1], output_size)
        self.initialize_weights(self.fc_out, initialization)
    
    def forward(self, x):
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        #x = self.fc1(x)  # Linear activation applied to the output of self.fc1
        
        # Pass through each hidden layer with decreasing neuron count
        for hidden_layer in self.hidden_layers:
            x = F.relu(hidden_layer(x))
        
        # Output layer 
        x = self.fc_out(x)
        return x
    
    def num_flat_features(self, x):
        size = x.size()[1:] 
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

    def initialize_weights(self, layer, initialization):
        if initialization == 'uniform':
            scale = kwargs.get('scale', 0.1)
            nn.init.uniform_(layer.weight, -scale, scale)
            nn.init.constant_(layer.bias, 0)
        elif initialization == 'normal':
            mean = kwargs.get('mean', 0)
            std = kwargs.get('std', 0.01)
            nn.init.normal_(layer.weight, mean, std)
            nn.init.constant_(layer.bias, 0)
        elif initialization == 'xavier':
            nn.init.xavier_uniform_(layer.weight)
            nn.init.constant_(layer.bias, 0)
        elif initialization == 'he':
            nn.init.kaiming_uniform_(layer.weight, nonlinearity='relu')
            nn.init.constant_(layer.bias, 0)
        else:
            raise ValueError("Invalid initialization type. Choose from 'uniform', 'normal', 'xavier', or 'he'.")
    
    def print_output_layer_weights(self, epoch):
        #if epoch % 10 == 0:
            last_hidden_layer_weights = self.fc_out.weight.data
            print(f"Epoch {epoch}: fc_out")
            print(last_hidden_layer_weights)


## <span style="font-size:18px; color:#007acc;"><b> 4. Dataset Instantiation and Preprocessing <a id="Dataset-Instantiation-and-Preprocessing"></a> ## 

In [None]:
##### Instantiating dataset class #####
#######################################
# Set random seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Instantiate your custom dataset
dataset = FreddieMacDataSet()

# Split the dataset into features (X) and labels (Y)
X, Y = dataset.X, dataset.Y

count_of_ones = np.sum(Y == 1)
print(count_of_ones)

# Perform initial split of the data into training and temporary sets, stratifying by Y
X_train, X_temp, Y_train, Y_temp = train_test_split(X, Y, test_size=0.30, random_state=42, stratify=Y)

# Further split the temporary set into validation and test sets, stratifying by Y_temp
X_val, X_test, Y_val, Y_test = train_test_split(X_temp, Y_temp, test_size=0.5, random_state=42, stratify=Y_temp)

# Instantiate MinMaxScaler to scale the features
sc = MinMaxScaler()

# Fit and transform the training data
X_train_normalized = sc.fit_transform(X_train)

# Transform the validation and test data using the same scaler
X_val_normalized = sc.transform(X_val)
X_test_normalized = sc.transform(X_test)

# Converting to PyTorch tensors
train_input_tensor = torch.from_numpy(X_train_normalized).float()
train_output_tensor = torch.from_numpy(Y_train).float()
valid_input_tensor = torch.from_numpy(X_val_normalized).float()
valid_output_tensor = torch.from_numpy(Y_val).float()
test_input_tensor = torch.from_numpy(X_test_normalized).float()
test_output_tensor = torch.from_numpy(Y_test).float()

# Pytorch train, validation, and test sets
train = TensorDataset(train_input_tensor, train_output_tensor)
valid = TensorDataset(valid_input_tensor, valid_output_tensor)
test = TensorDataset(test_input_tensor, test_output_tensor)

# Class weights calculation for handling class imbalance
train_num_positives = torch.sum(train_output_tensor == 1)
train_num_negatives = torch.sum(train_output_tensor == 0)
print ("train_num_positives", train_num_positives )
print("train_num_negatives", train_num_negatives)

valid_num_positives = torch.sum(valid_output_tensor == 1)
valid_num_negatives = torch.sum(valid_output_tensor == 0)
print ("valid_num_positives", valid_num_positives )
print("valid_num_negatives", valid_num_negatives)

## <span style="font-size:18px; color:#007acc;"><b> 5. Custom Batch Sampler <a id="Custom-Batch-Sampler"></a> ##  

In [None]:
#################### Custom Batch Sampler #############################
#######################################################################
class ProportionalBatchSampler(Sampler):
    def __init__(self, data_source, batch_size, state_start_idx=19, state_end_idx=70):
        self.data_source = data_source
        self.batch_size = batch_size
        self.state_start_idx = state_start_idx
        self.state_end_idx = state_end_idx
        self.num_samples = len(data_source)
        self._prepare_indices()

    def _prepare_indices(self):
        # For each data point, identify its state
        # Assuming data_source.tensors[0] is the input tensor
        input_tensor = self.data_source.tensors[0]
        states_tensor = input_tensor[:, self.state_start_idx:self.state_end_idx]
        state_indices = torch.argmax(states_tensor, dim=1).numpy()

        # Collect indices per state
        self.state_to_indices = {}
        for idx, state in enumerate(state_indices):
            if state not in self.state_to_indices:
                self.state_to_indices[state] = []
            self.state_to_indices[state].append(idx)

        # Shuffle indices within each state
        for state in self.state_to_indices:
            np.random.shuffle(self.state_to_indices[state])

        # Calculate the proportion of each state
        self.state_proportions = {}
        for state in self.state_to_indices:
            self.state_proportions[state] = len(self.state_to_indices[state]) / self.num_samples

        # Initialize pointers for each state
        self.state_pointers = {state: 0 for state in self.state_to_indices}

        # Calculate the number of batches
        self.num_batches = math.ceil(self.num_samples / self.batch_size)

    def __iter__(self):
        for _ in range(self.num_batches):
            batch_indices = []
            for state, proportion in self.state_proportions.items():
                num_samples_state = int(round(proportion * self.batch_size))
                start_idx = self.state_pointers[state]
                end_idx = start_idx + num_samples_state
                state_indices = self.state_to_indices[state]

                # If not enough samples left in this state, take as many as possible
                if end_idx > len(state_indices):
                    end_idx = len(state_indices)

                batch_indices.extend(state_indices[start_idx:end_idx])
                self.state_pointers[state] = end_idx

            # If we don't have enough samples in the batch (due to rounding), fill randomly
            if len(batch_indices) < self.batch_size:
                remaining = self.batch_size - len(batch_indices)
                all_indices = []
                for state in self.state_to_indices:
                    all_indices.extend(self.state_to_indices[state])
                np.random.shuffle(all_indices)
                batch_indices.extend(all_indices[:remaining])

            np.random.shuffle(batch_indices)
            yield batch_indices

    def __len__(self):
        return self.num_batches

## <span style="font-size:18px; color:#007acc;"><b> 6. Training and Validation Functions <a id="Training-and-Validation-Functions"></a> ##  

In [None]:
###### Define Loss Function and Train Model #####
#################################################
def train_model(model, device, train_loader, optimizer, pos_weight):
    model = model.double() 
    model.train()
    train_loss = 0.0
    criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weight)

    for batch_index, (data, target) in enumerate(train_loader):
        data, target = data.to(device).double(), target.to(device).double()  # Convert data and target to double
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
    train_loss /= len(train_loader.dataset)  # Average training loss per sample
    #     #print("Average training loss : {:.4f}".format(train_loss))
    return train_loss

####### Compute validation loss #############
#############################################
def valid_model(model, device, valid_loader, pos_weight):
    model.eval()
    model = model.to(device).double()
    valid_loss = 0.0
    criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weight)
    criterion = criterion.to(device).double()
    with torch.no_grad():
        for data, target in valid_loader:
            data, target = data.to(device).double(), target.to(device).double()
            output = model(data)
            output = output.to(device).double()
            loss = criterion(output, target)
            loss = loss.to(device).double()
            valid_loss += loss.item() 
        valid_loss /= len(valid_loader.dataset)
        #print("Average valid loss: {:.4f}".format(valid_loss))
        return valid_loss

## <span style="font-size:18px; color:#007acc;"><b> 7. Evaluation Metrics <a id="Evaluation-Metrics"></a> ##   

In [None]:
########### Evaluation Metrics ##############
#############################################
def compute_accuracy(model, data_loader, device):
    model = model.to(device).float()
    model.eval()  # Set the model to evaluation mode
    
    CM = torch.zeros(2, 2, dtype=torch.int32)  # Initialize confusion matrix
    correct = 0
    total = 0
    all_predictions = torch.tensor([], dtype=torch.float32, device=device)
    all_targets = torch.tensor([], dtype=torch.float32, device=device)

    # Compute precision, recall, and F1 score
    precision = Precision(average='macro', num_classes=1, task='binary').to(device)
    recall = Recall(average='macro', num_classes=1, task='binary').to(device)
    f1_score = F1Score(average='macro', num_classes=1, task='binary').to(device)

    with torch.no_grad():  # No need to compute gradients during inference
        for data, target in data_loader:
            data, target = data.to(device).float(), target.to(device).float()
            output = model(data)
            predicted = (torch.sigmoid(output) >= 0.5).float()

            correct += (predicted == target).sum().item()
            total += target.size(0)

            all_predictions = torch.cat((all_predictions, predicted), dim=0)
            all_targets = torch.cat((all_targets, target), dim=0)

            CM += torch.tensor(confusion_matrix(target.cpu(), predicted.cpu(), labels=[0, 1]))

    # Calculate accuracy
    accuracy = correct / total * 100  

    # Update metrics with predictions and targets
    precision.update(all_predictions, all_targets)
    recall.update(all_predictions, all_targets)
    f1_score.update(all_predictions, all_targets)

    # Get the computed values
    precision_value = precision.compute().item()
    recall_value = recall.compute().item()
    f1_score_value = f1_score.compute().item()

    true_negative = CM[0][0].item()
    true_positive = CM[1][1].item()
    false_positive = CM[0][1].item()
    false_negative = CM[1][0].item()

    return accuracy, precision_value, recall_value, f1_score_value, false_positive, false_negative, true_positive, true_negative, CM


########## ROC Function ############
####################################

def compute_roc(model, data_loader, device):
    model = model.to(device).float()
    model.eval()  # Set the model to evaluation mode
    
    all_predictions = torch.tensor([], dtype=torch.float32, device=device)
    all_targets = torch.tensor([], dtype=torch.float32, device=device)

    with torch.no_grad():  # No need to compute gradients during inference
        for data, target in data_loader:
            data, target = data.to(device).float(), target.to(device).float()
            output = model(data)
            predicted = torch.sigmoid(output)

            all_predictions = torch.cat((all_predictions, predicted), dim=0)
            all_targets = torch.cat((all_targets, target), dim=0)

    # Convert predictions and targets to CPU and numpy arrays
    fpr, tpr, thresholds = roc_curve(all_targets.cpu().numpy(), all_predictions.cpu().numpy())
    roc_auc = auc(fpr, tpr)

    # Print number of thresholds
    print(f'Number of thresholds: {len(thresholds)}')

    # Print thresholds to inspect
    print(np.round(thresholds, decimals=4))

    return fpr, tpr, roc_auc, thresholds

###### PRC Function ##############
##################################

def compute_prc(model, data_loader, device):
    model = model.to(device).float()
    model.eval()  # Set the model to evaluation mode
    
    all_predictions = torch.tensor([], dtype=torch.float32, device=device)
    all_targets = torch.tensor([], dtype=torch.float32, device=device)

    with torch.no_grad():  # No need to compute gradients during inference
        for data, target in data_loader:
            data, target = data.to(device).float(), target.to(device).float()
            output = model(data)
            predicted = torch.sigmoid(output)

            all_predictions = torch.cat((all_predictions, predicted), dim=0)
            all_targets = torch.cat((all_targets, target), dim=0)

    # Convert predictions and targets to CPU and numpy arrays
    precision, recall, thresholds = precision_recall_curve(all_targets.cpu().numpy(), all_predictions.cpu().numpy())
    prc_auc = auc(recall, precision)
    
    return precision, recall, prc_auc, thresholds

## <span style="font-size:18px; color:#007acc;"><b> 8. Training and Evaluation of Neural Network <a id="Training-and-Evaluation-of-Neural-Network"></a> ##    

In [None]:
###### Main program ##############
##################################
# Define the device
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import roc_curve, auc, precision_recall_curve

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

# Parameters
num_epochs = 50
input_size = 23
weight_decay = 0.0
hidden_sizes = [2560, 1280, 640, 320, 160, 80, 40, 20, 10, 5]
output_size = 1
learning_rate = 0.0002  # Fixed learning rate
pos_weights = [2.0]  # Iterate over positive weights
batch_percentage = 0.02

# Calculate batch size
total_train_samples = len(train)
batch_size = int(total_train_samples * batch_percentage)
batch_size = max(1, batch_size)  # Ensure at least 1

# Initialize dictionaries to store values for plotting later
all_training_losses = {}
all_valid_losses = {}
all_train_accuracies = {}
all_valid_accuracies = {}
all_train_false_positives = {}
all_train_false_negatives = {}
all_valid_false_positives = {}
all_valid_false_negatives = {}
all_train_fpr = {}
all_train_tpr = {}
all_train_roc_auc = {}
all_valid_fpr = {}
all_valid_tpr = {}
all_valid_roc_auc = {}
all_train_precision = {}
all_train_recall = {}
all_valid_precision = {}
all_valid_recall = {}
all_train_prc_auc = {}
all_valid_prc_auc = {}
all_train_thresholds = {}
all_valid_thresholds = {}

# Converting to DataLoaders
train_sampler = ProportionalBatchSampler(train, batch_size=batch_size)
train_loader = DataLoader(train, batch_sampler=train_sampler)
valid_loader = DataLoader(valid, batch_size=batch_size, shuffle=False)

# Iterate through positive weights
for pos_weight_mul in pos_weights:
    # Calculate the total number of samples
    pos_count = torch.sum(train_output_tensor == 1).item()
    neg_count = torch.sum(train_output_tensor == 0).item()

    # Check for zero positive samples
    if pos_count == 0:
        raise ValueError("No positive samples in the dataset, cannot compute pos_weight.")

    # Calculate the positive weight
    pos_weight = (neg_count / pos_count) * pos_weight_mul
    pos_weight = torch.tensor([pos_weight], device=device)

    # Reset lists for the current positive weight
    Epoch_ind = []
    training_losses = []
    valid_losses = []
    train_accuracies = []
    valid_accuracies = []
    train_false_positives = []
    train_false_negatives = []
    valid_false_positives = []
    valid_false_negatives = []

    # Initialize model
    model = NNModel(input_size, hidden_sizes, output_size).to(device)
    optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
    scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[10, 25, 50], gamma=0.1)

    for epoch in range(1, num_epochs + 1):
        train_loss = train_model(model, device, train_loader, optimizer, pos_weight)
        training_losses.append(train_loss)

        train_accuracy, _, _, _, train_false_positive, train_false_negative, train_true_positive, train_true_negative, _ = compute_accuracy(model, train_loader, device)
        train_accuracies.append(train_accuracy)
        train_false_negatives.append(train_false_negative)
        train_false_positives.append(train_false_positive)

        valid_loss = valid_model(model, device, valid_loader, pos_weight)
        valid_losses.append(valid_loss)

        valid_accuracy, _, _, _, valid_false_positive, valid_false_negative, valid_true_positive, valid_true_negative, _ = compute_accuracy(model, valid_loader, device)
        valid_accuracies.append(valid_accuracy)
        valid_false_negatives.append(valid_false_negative)
        valid_false_positives.append(valid_false_positive)

        Epoch_ind.append(epoch)

        scheduler.step()
        
        print ('epoch '+ str(epoch))

    # Compute ROC curve
    train_fpr, train_tpr, train_roc_auc, train_thresholds = compute_roc(model, train_loader, device)
    valid_fpr, valid_tpr, valid_roc_auc, valid_thresholds = compute_roc(model, valid_loader, device)
    
    # Compute PRC
    train_precision, train_recall, train_prc_auc, train_thresholds = compute_prc(model, train_loader, device)
    valid_precision, valid_recall, valid_prc_auc, valid_thresholds = compute_prc(model, valid_loader, device)
    
     # Store results for the current positive weight
    all_training_losses[pos_weight_mul] = training_losses
    all_valid_losses[pos_weight_mul] = valid_losses
    all_train_accuracies[pos_weight_mul] = train_accuracies
    all_valid_accuracies[pos_weight_mul] = valid_accuracies
    all_train_false_positives[pos_weight_mul] = train_false_positives
    all_train_false_negatives[pos_weight_mul] = train_false_negatives
    all_valid_false_positives[pos_weight_mul] = valid_false_positives
    all_valid_false_negatives[pos_weight_mul] = valid_false_negatives
    
    all_train_fpr[pos_weight_mul] = train_fpr
    all_train_tpr[pos_weight_mul] = train_tpr
    all_train_roc_auc[pos_weight_mul] = train_roc_auc
    all_valid_fpr[pos_weight_mul] = valid_fpr
    all_valid_tpr[pos_weight_mul] = valid_tpr
    all_valid_roc_auc[pos_weight_mul] = valid_roc_auc

    all_train_precision[pos_weight_mul] = train_precision
    all_train_recall[pos_weight_mul] = train_recall
    all_valid_precision[pos_weight_mul] = valid_precision
    all_valid_recall[pos_weight_mul] = valid_recall
    
    all_train_prc_auc[pos_weight_mul] = train_prc_auc
    all_valid_prc_auc[pos_weight_mul] = valid_prc_auc

    all_train_thresholds[pos_weight_mul] = train_thresholds
    all_valid_thresholds[pos_weight_mul] = valid_thresholds

# Plotting results for all positive weights
plt.figure(figsize=(15, 25))

# Plot Train False Positives vs Epochs
plt.subplot(5, 2, 1)
for pos_weight_mul in pos_weights:
    plt.plot(Epoch_ind, all_train_false_positives[pos_weight_mul], label=f"Pos Weight {pos_weight_mul}")
plt.xlabel('Epochs')
plt.ylabel('Count')
plt.title('Train False Positives')
plt.legend()

# Plot Train False Negatives vs Epochs
plt.subplot(5, 2, 2)
for pos_weight_mul in pos_weights:
    plt.plot(Epoch_ind, all_train_false_negatives[pos_weight_mul], label=f"Pos Weight {pos_weight_mul}")
plt.xlabel('Epochs')
plt.ylabel('Count')
plt.title('Train False Negatives')
plt.legend()

# Plot Valid False Positives vs Epochs
plt.subplot(5, 2, 3)
for pos_weight_mul in pos_weights:
    plt.plot(Epoch_ind, all_valid_false_positives[pos_weight_mul], label=f"Pos Weight {pos_weight_mul}")
plt.xlabel('Epochs')
plt.ylabel('Count')
plt.title('Valid False Positives')
plt.legend()

# Plot Valid False Negatives vs Epochs
plt.subplot(5, 2, 4)
for pos_weight_mul in pos_weights:
    plt.plot(Epoch_ind, all_valid_false_negatives[pos_weight_mul], label=f"Pos Weight {pos_weight_mul}")
plt.xlabel('Epochs')
plt.ylabel('Count')
plt.title('Valid False Negatives')
plt.legend()

# Plot Training and Validation Loss vs Epochs
plt.subplot(5, 2, 5)
for pos_weight_mul in pos_weights:
    plt.plot(Epoch_ind, all_training_losses[pos_weight_mul], label=f"Train Loss (Pos Weight {pos_weight_mul})")
    plt.plot(Epoch_ind, all_valid_losses[pos_weight_mul], label=f"Valid Loss (Pos Weight {pos_weight_mul})")
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Train and Validation Loss')
plt.legend()

# Plot ROC curve for the last epoch
plt.subplot(5, 2, 6)
for pos_weight_mul in pos_weights:
    plt.plot(all_train_fpr[pos_weight_mul], all_train_tpr[pos_weight_mul], label=f'Train ROC Curve (AUC = {all_train_roc_auc[pos_weight_mul]:.2f}, Pos Weight {pos_weight_mul})')
    plt.plot(all_valid_fpr[pos_weight_mul], all_valid_tpr[pos_weight_mul], label=f'Validation ROC Curve (AUC = {all_valid_roc_auc[pos_weight_mul]:.2f}, Pos Weight {pos_weight_mul})')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve (Last Epoch)')
plt.legend()

# Plot PRC Curve
plt.subplot(5, 2, 7)
for pos_weight_mul in pos_weights:
    plt.plot(all_train_recall[pos_weight_mul], all_train_precision[pos_weight_mul], label=f'Train PR Curve (AUC = {all_train_prc_auc[pos_weight_mul]:.2f}, Pos Weight {pos_weight_mul})')
    plt.plot(all_valid_recall[pos_weight_mul], all_valid_precision[pos_weight_mul], label=f'Validation PR Curve (AUC = {all_valid_prc_auc[pos_weight_mul]:.2f}, Pos Weight {pos_weight_mul})')

plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('PR Curve (Last Epoch)')
plt.legend()

plt.tight_layout()
plt.show()

## <span style="font-size:18px; color:#007acc;"><b> 9. SHAP Analysis <a id="SHAP-Analysis"></a> ##    

In [None]:
#### SHAP Analysis ####
######################
import shap
def shap_analysis(model, input_tensor):
    # Ensure model is in evaluation mode and on CPU
    model.eval()
    model = model.cpu()

    # Create SHAP explainer
    explainer = shap.DeepExplainer(model, input_tensor)

    # Compute SHAP values
    shap_values = explainer.shap_values(input_tensor, check_additivity = False)
    
    return shap_values

# Perform SHAP analysis
subset_tensor = valid_input_tensor[:10000]
model.eval()
model = model.cpu()
model_predictions = model(subset_tensor).detach().numpy()[:, 0]
#print(model_predictions)

shap_values = shap_analysis(model,subset_tensor)
shap_values = np.squeeze(shap_values)  # This will remove all dimensions of size 1

shap_values = np.array(shap_values)
# Sum over the feature dimension (axis=1)
instance_sum = np.sum(shap_values, axis=1)
#print("Sum for each instance (shape (2,)):")
#print(instance_sum)

#shap_exp = shap.Explanation(shap_values, feature_names=["Feature_" + str(i) for i in range(train_input_tensor.size(1))])
shap_exp = shap.Explanation(shap_values, feature_names=['Credit Score', 'First Time Homebuyer Flag', 'MSA', 'No of Units', 'Occupancy Status', 
                                                        'CLTV', 'DTI Ratio', 'Ori UPB', 'Ori Interest Rate', 'Channel', 'Amortization Type', 
                                                        'Property State', 'Property Type', 'Loan Purpose', 'Ori Loan Term', 'Number of Borrowers',
                                                        'Servicer Name', 'I/O Indicator', 'MI Cancellation Indicator', 'Current actual UPB',
                                                        'loan age', 'Remaining months to legal maturity', 'Current interest rate'])


# Generate the summary plot
shap.summary_plot(shap_exp.values, features=subset_tensor.cpu().numpy(), feature_names=shap_exp.feature_names)
# Now, you can plot using the bar plot
shap.plots.bar(shap_exp)

## <span style="font-size:18px; color:#007acc;"><b> 10. Save Model <a id="Save-Model"></a> ##    

In [None]:
torch.save(model.state_dict(), 'model_weights.pth')
model.load_state_dict(torch.load('model_weights.pth'))
model.eval()  # Set the model to evaluation mode

## <span style="font-size:18px; color:#007acc;"><b> 11. Create FAST API <a id="Create-FAST-API"></a> ##    

In [None]:
# Assuming NNModel is already defined earlier in the notebook
# If not, make sure to define it before using it here

# Initialize the FastAPI app
app = FastAPI()

# Define a Pydantic model for input data
class InputData(BaseModel):
    features: list

# Initialize your model with the same parameters as during training
output_size = 23
hidden_sizes = [2560, 1280, 640, 320, 160, 80, 40, 20, 10, 5]
output_size = 1

# Create the model instance
model = NNModel(input_size, hidden_sizes, output_size, initialization)
model.load_state_dict(torch.load('model_weights.pth'))  # Load pre-trained weights
model.eval()  # Set the model to evaluation mode

@app.post("/predict/")
async def predict(data: InputData):
    # Convert input data to tensor and ensure it has the correct shape
    input_tensor = torch.tensor(data.features, dtype=torch.float32).unsqueeze(0)  # Add batch dimension
    with torch.no_grad():
        prediction = model(input_tensor)
    return {"prediction": prediction.tolist()}

# Function to run the app in a separate thread
def run_app():
    run(app, host="0.0.0.0", port=8000, log_level="info")

# Start the FastAPI server in a separate thread
thread = threading.Thread(target=run_app)
thread.start()

## <span style="font-size:18px; color:#007acc;"><b> 12. Inference from API <a id="Inference-from-API"></a> ##    

In [None]:
# Extract the first example (row) from test_input_tensor and convert it to a list
features = test_input_tensor[:1].tolist()[0]  # Extracts the first example and converts it to a list

# Extract the true label corresponding to the first example
true_label = test_output_tensor[0].item()  # Converts the tensor value to a Python scalar

# Define the endpoint
url = "http://localhost:8000/predict/"

data = {"features": features}  # Use the extracted features

try:
    # Make the POST request
    response = requests.post(url, json=data)
    
    # Print status code
    print(f"Status Code: {response.status_code}")
    
    # Print response text for debugging
    print(f"Response Text: {response.text}")

    # Assuming the response is a JSON array of logits
    logits = response.json()

    # Apply the sigmoid function and threshold to each logit
    prediction = sigmoid(logits)
    prediction = [1 if prediction >= 0.5 else 0 for logit in logits]

    # Print the resulting predictions (0 or 1)
    print("Predicted Labels:", predictions)

    # Print the true label
    print("True Label:", true_label)
    
except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")
except ValueError as e:
    print(f"JSON decode error: {e}")
