# Assignment 7 

## Grade: /100 pts

This notebook contains the questions for Assignment 7. 

Make sure to complete this assignment individually and appropriately reference all external code and documentation used. ***In order for your submission to be valid, you must adhere to the function definitions which have been made (failure to do so will result in a grade of 0). You must upload this completed Jupyter Notebook file as your submission (other file types are not permitted and will result in a grade of 0).*** You are responsible for selecting and importing additional packages.

### Preliminaries

Feel free to add any libraries to the Preliminaries. However, be mindful of every question's restrictions as some may exclude use of some functions.

In [1]:
## perform the necessary imports
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression, LinearRegression, Ridge
from sklearn.metrics import recall_score, make_scorer, mean_squared_error, confusion_matrix, precision_score, roc_curve, auc, accuracy_score, f1_score, roc_auc_score
from sklearn.utils.multiclass import unique_labels
from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder
import torch
import time

# Plotting
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


KeyboardInterrupt



### Data set

In this Assignment you need to download and use "Dataset.csv". 

This dataset is a modified dataset from Kaggle datasets called "Lower Back Pain Symptoms Dataset". Lower back pain can be caused by a variety of problems with any parts of the complex, interconnected network of spinal muscles, nerves, bones, discs or tendons in the lumbar spine.
This data set is about to identify/label a person as abnormal or normal using collected physical spine details/data.



### Question 1: Load Datasets (15pts)

A) Load the "Dataset.csv" file.

B) Encode the output classes `Label` (0: Normal, 1: Abnormal) and separate inputs and outputs (features and target). (2 pts)

C) Split the data into equals-sized training and test sets. Use a random_state = 42, and ensure the `balanced distribution` of labels when splitting data.  

D) How many observations do you have in your training set?  

E) How many observations for each class in your training set?

F) Z-standarize the input features of the training and test sets.

In [None]:
### Q1A) 
# Read Data 
data = pd.read_csv('Dataset.csv')
data.head()

In [None]:
### Q1B) 
# Encode Output Class

data['Label'] = data['Label'].astype('category')
encode_map = {
    'Abnormal': 1,
    'Normal': 0
}

data['Label'].replace(encode_map, inplace=True)

Xdata = data.drop('Label', axis=1)
ydata = data['Label']

### Q1C) 

Xtrain, Xtest, ytrain, ytest = train_test_split(
    Xdata, ydata, 
    test_size=0.5,       # 50% of the data will be used for testing
    random_state=42,     # Ensures a reproducible split
    stratify=ydata           # Preserve the class distribution in the split
)

### Q1D) 
print(Xtrain.shape)


### Q1E) 
print(ytrain.value_counts())

### Q1F) 
scaler = StandardScaler()
XtrainScaled = scaler.fit_transform(Xtrain)
XtestScaled = scaler.fit_transform(Xtest)

### Question 2: Logistic Regression (15 pts)

A) Build a L1-regularized logistic regression model to all the training data, and then get the predicted labels for each item of the test set. Tip: use the 'saga' solver for L1 regularization.

B) Print out the precision, recall, and F1-score of the test set.

C) Print out the model execution time (both training and test time) in milliseconds. Please keep two decimal places.

D) Plot ROC curve and report the area under the ROC curve for the test data set. 


In [None]:
### Q2A) 
model = LogisticRegression(penalty = 'l1', solver='saga')

time1 = time.time()
#fit model 
model.fit(XtrainScaled,ytrain)
time2 = time.time()
#predict labels
ypred = model.predict(XtestScaled)
time3 = time.time()

### Q2B) 
print('The Logistic Regression Precision is ', precision_score(ytest, ypred))
print('The Logistic Regression Recall is ', recall_score(ytest, ypred))
print('The Logistic Regression F1-score is ', f1_score(ytest, ypred))

### Q2C) 
print('The Logistic Regression model training time is '+ str(round((time2-time1)*1000,2)) + ' ms')
print('The Logistic Regression model test time is '+ str(round((time3-time2)*1000,2)) + ' ms')



prob_estimates = model.predict_proba(XtestScaled)[:, 1]

# Compute ROC AUC
fpr, tpr, thresholds = roc_curve(ytest, prob_estimates)
auc_roc = auc(fpr, tpr)

# Plot the ROC curve
plt.plot(fpr, tpr, label=f'AUC - Logistic Regression = {auc_roc:.4f}')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc='lower right')
plt.show()

print('The area under the ROC curve is', auc_roc)

### Question3: Neural Network 


### Q3a) Building model (15 pts)

Build a simple neural network model (NN_model1) using PyTorch packages with the features in the data set as the input units and two output units for the two output classes:

* Use a LogSigmoid as your output non-linearity.
* Use the Cross-entropy loss as a training criterion. 
* Use Stochastic gradient descent optimizer with a learning rate of 0.01. 
* Run the optimization for 8000 iterations and record the loss for each iteration. 
* Plot the loss versus iterations.

In [None]:
# Define linear model 
class LinearModel(torch.nn.Module):
    
    def __init__(self, num_features, num_classes):
        
        super().__init__()
        
        # Neural Network Architecture
        self.dense1 = torch.nn.Linear(in_features=num_features, out_features=num_classes)
        self.activation = torch.nn.LogSigmoid()
    
    def forward(self, X):
        X = self.dense1(X)
        X = self.activation(X)
        return X

In [None]:
NN_model1 = LinearModel(12,2)
Xt = torch.FloatTensor(XtrainScaled)
yt = torch.LongTensor(ytrain.values)
y_pred = NN_model1.forward(Xt)

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(NN_model1.parameters(), lr=1e-2)

max_iter = 8000
lossRecord = np.zeros(max_iter)
time1 = time.time()
for i in range(max_iter):
    # Intialize the gradient 
    optimizer.zero_grad()
    # Get current 
    y_pred = NN_model1.forward(Xt) # Get a forward pass with gradient 
    loss = criterion(input=y_pred, target=yt) # Caluculate the loss  
    lossRecord[i]=loss
    loss.backward() # propagate the derivative backwards 
    optimizer.step() # Take one updating step
time2 = time.time()
plt.plot(np.arange(max_iter),lossRecord)
plt.xlabel('Iteration')
plt.ylabel('Cross entropy Loss')
plt.title('Iteration vs Loss for NN_model1')


print('The NN_model1 training time is '+ str(round((time2-time1)*1000,2)) + ' ms')

### Q3b) Prediction (20 pts)

Now use your trained model (NN_model1) to make predictions on the test set.

A) Print out the precision, recall, and F1-score of the test set.

B) Print out the model execution time (both training and test time) in milliseconds. Please keep two decimal places.

C) Plot ROC curve and report the area under the ROC curve for the test data set. 

In [None]:
Xtt = torch.FloatTensor(XtestScaled)
time1 = time.time()
with torch.no_grad():
    y_pred = NN_model1(Xtt)
time2 = time.time()

yp = y_pred.argmax(dim=1).numpy()  # Convert to class indices
print('The NN_model1 Precision is ', precision_score(ytest, yp))
print('The NN_model1 Recall is ', recall_score(ytest, yp))
print('The NN_model1 F1-score is ', f1_score(ytest, yp))


print('The NN_model1 test time is '+ str(round((time2-time1)*1000,2)) + ' ms')

    
    
    

yprob = y_pred.numpy()[:, 1]  # Probabilities for the positive class    
fpr, tpr, thresholds = roc_curve(ytest, yprob)
auc_roc = auc(fpr, tpr)

# Plot the ROC curve
plt.plot(fpr, tpr, label=f'AUC - NN_model1 = {auc_roc:.4f}')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc='lower right')
plt.show()

print('The area under the ROC curve is', auc_roc)

### Q3c) Adding hidden layers (15 pts)
Change the neural network (NN_model2) and add two hidden layers with 100 and 60 units, respectively. Use the LogSigmoid non-linearity for the hidden layers. Leave all the other parameters the same as for Question 3a. Again train for 8000 iterations and plot the loss as a function of the iteration. 

In [None]:
class NonLinearModel(torch.nn.Module):
    
    def __init__(self, num_features, num_classes):
        
        super().__init__()
        
        # Neural Network Architecture: 
        self.dense1 = torch.nn.Linear(in_features=num_features, out_features=100)
        self.activation1 = torch.nn.LogSigmoid()
        self.dense2 = torch.nn.Linear(in_features=100, out_features=60)
        self.activation2 = torch.nn.LogSigmoid()
        self.dense3 = torch.nn.Linear(in_features=60, out_features=num_classes)
        self.activation3 = torch.nn.LogSigmoid()
        
    def forward(self, X):
        X = self.dense1(X)  
        X = self.activation1(X)
        X = self.dense2(X)
        X = self.activation2(X)
        X = self.dense3(X)
        X = self.activation3(X)
        return X

NN_model2 = NonLinearModel(12,2)
y_pred = NN_model2.forward(Xt)


criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(NN_model2.parameters(), lr=1e-2)

max_iter = 8000
lossRecord = np.zeros(max_iter)
time1 = time.time()
for i in range(max_iter):
    # Intialize the gradient 
    optimizer.zero_grad()
    # Get current 
    y_pred = NN_model2.forward(Xt) # Get a forward pass with gradient 
    loss = criterion(input=y_pred, target=yt) # Caluculate the loss  
    lossRecord[i]=loss
    loss.backward() # propagate the derivative backwards 
    optimizer.step() # Take one updating step
time2 = time.time()
plt.plot(np.arange(max_iter),lossRecord)
plt.xlabel('Iteration')
plt.ylabel('Cross entropy Loss')
plt.title('Iteration vs Loss for NN_model2')

print('The NN_model2 training time is '+ str(round((time2-time1)*1000,2)) + ' ms')

### Q3d) Prediction and model selection (20 pts)
Now use your trained model in Question 3c (NN_model2) to make predictions on the test set.

A) Print out the precision, recall, and F1-score of the test set.

B) Print out the model execution time (both training and test time) in milliseconds. Please keep two decimal places.

C) Plot ROC curve and report the area under the ROC curve for the test data set. 

__Written answer:__ Compare this model (NN_model2) to the results from Question 2 (Logistic Regression) and 3b (NN_model1), what do you conclude? 

In [None]:
Xtt = torch.FloatTensor(XtestScaled)
time1 = time.time()
with torch.no_grad():
    y_pred = NN_model2(Xtt)
time2 = time.time()

yp = y_pred.argmax(dim=1).numpy()  # Convert to class indices
print('The NN_model2 Precision is ', precision_score(ytest, yp))
print('The NN_model2 Recall is ', recall_score(ytest, yp))
print('The NN_model2 F1-score is ', f1_score(ytest, yp))

 
print('The NN_model2 test time is '+ str(round((time2-time1)*1000,2)) + ' ms')    


yprob = y_pred.numpy()[:, 1]  # Probabilities for the positive class    
fpr, tpr, thresholds = roc_curve(ytest, yprob)
auc_roc = auc(fpr, tpr)

# Plot the ROC curve
plt.plot(fpr, tpr, label=f'AUC - NN_model2 = {auc_roc:.4f}')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc='lower right')
plt.show()

print('The area under the ROC curve is', auc_roc)

__Written answer:__ Making the network deeper does not necessarily improve the performance but increases the execution time. However, we need to check their generalization performance to decide if depth and nonlinearity are required.
Logistic regresssion will be the choice for this dataset and problem, especially if we have limited resources for training.