### Ebube Ndubuisi
#### ANLY-6500
#### Module 3 Homework 4


### Instructions

#### Classification Using PyTorch – Homework Instructions

**1. Review the Materials**

Begin by thoroughly reviewing the lecture notes and watching the lecture videos. The concepts and techniques discussed in this module will guide your implementation of classification models in PyTorch.

**2. Select a Dataset (2 points)**

Choose a **new dataset** that can be used for a classification task.

Write a brief paragraph (3–5 sentences) describing the dataset you selected. Include the following:

- The source of the dataset

- The variables/features included

- The target variable for regression

- Any preprocessing or cleaning steps you performed, if there are any

> ⚠️ Do not reuse any example datasets or class examples provided in the lecture notes or videos.


**3. Build Classification Models (6 points)**

Using PyTorch, implement **two classification models** on your selected dataset. Your models should differ in structure or approach in a meaningful way (e.g., number of layers, hidden units, etc.). Use **two different activation functions** in your models (e.g., ReLU, Sigmoid, Tanh). Observe how the choice of activation affects performance.

**5. Analyze and Reflect (2 points)**

After training your models, provide comments and insights on the results:

- How well does each model perform?

- What might explain the differences in performance?

- Which activation function worked better, and why?

- Which model would you recommend, and under what conditions?

### About the data

**Dataset**: personality_dataset.csv

**Source:**

  Extrovert vs Introvert Personality traits dataset, a rich collection of behavioral and social data designed to explore the spectrum of human personality. This dataset captures key indicators of extroversion and introversion, making it a valuable resource for psycologists, data scientists, and researchers studying social behavior, personality prediction, or data preprocessing techniques.

  link: https://www.kaggle.com/datasets/rakeshkapilavai/extrovert-vs-introvert-behavior-data

**Columns:**

- 'Time_spent_Alone'
- 'Stage_fear'
- 'Social_event_attendance'
- 'Going_outside'
- 'Drained_after_socializing'
- 'Friends_circle_size'
- 'Post_frequency'
- 'Personality'


**Features:**

- 'Time_spent_Alone'
- 'Stage_fear'
- 'Social_event_attendance'
- 'Going_outside'
- 'Drained_after_socializing'
- 'Friends_circle_size'
- 'Post_frequency'

**Target Variable:**

- 'Personality'

**Processing Needed:**

 I will need to perform label encoding on the following columns ['Stage_fear', 'Drained_after_socializing', 'Personality'] so that they have numerical values and can be used to train the models.
 I will also need to get rid of the rows with NaN values as they are a little percentage of the entire dataset

In [1]:
%cd "/content/drive/MyDrive/Summer 2025/ANLY-6500/module-3/assignment/module 3 assignment 4 "

/content/drive/MyDrive/Summer 2025/ANLY-6500/module-3/assignment/module 3 assignment 4 


In [2]:
%ls

 P3-Ebube-Ndubuisi-Module-3-HW4.ipynb   T2-Ebube-Ndubuisi-Module-3-HW4.ipynb
 personality_dataset.csv               'Weather Data.csv'
 synthetic_ml_dataset.csv


In [3]:
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim


In [47]:
df_orig = pd.read_csv("personality_dataset.csv")
df_orig

Unnamed: 0,Time_spent_Alone,Stage_fear,Social_event_attendance,Going_outside,Drained_after_socializing,Friends_circle_size,Post_frequency,Personality
0,4.0,No,4.0,6.0,No,13.0,5.0,Extrovert
1,9.0,Yes,0.0,0.0,Yes,0.0,3.0,Introvert
2,9.0,Yes,1.0,2.0,Yes,5.0,2.0,Introvert
3,0.0,No,6.0,7.0,No,14.0,8.0,Extrovert
4,3.0,No,9.0,4.0,No,8.0,5.0,Extrovert
...,...,...,...,...,...,...,...,...
2895,3.0,No,7.0,6.0,No,6.0,6.0,Extrovert
2896,3.0,No,8.0,3.0,No,14.0,9.0,Extrovert
2897,4.0,Yes,1.0,1.0,Yes,4.0,0.0,Introvert
2898,11.0,Yes,1.0,,Yes,2.0,0.0,Introvert


In [48]:
df_orig.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2900 entries, 0 to 2899
Data columns (total 8 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Time_spent_Alone           2837 non-null   float64
 1   Stage_fear                 2827 non-null   object 
 2   Social_event_attendance    2838 non-null   float64
 3   Going_outside              2834 non-null   float64
 4   Drained_after_socializing  2848 non-null   object 
 5   Friends_circle_size        2823 non-null   float64
 6   Post_frequency             2835 non-null   float64
 7   Personality                2900 non-null   object 
dtypes: float64(5), object(3)
memory usage: 181.4+ KB


In [6]:
df_orig.columns

Index(['Time_spent_Alone', 'Stage_fear', 'Social_event_attendance',
       'Going_outside', 'Drained_after_socializing', 'Friends_circle_size',
       'Post_frequency', 'Personality'],
      dtype='object')

In [49]:
df_orig["Stage_fear"].value_counts()

Unnamed: 0_level_0,count
Stage_fear,Unnamed: 1_level_1
No,1417
Yes,1410


In [50]:
df_orig["Drained_after_socializing"].value_counts()

Unnamed: 0_level_0,count
Drained_after_socializing,Unnamed: 1_level_1
No,1441
Yes,1407


In [51]:
df_orig["Personality"].value_counts()

Unnamed: 0_level_0,count
Personality,Unnamed: 1_level_1
Extrovert,1491
Introvert,1409


Data Experimentation

In [10]:
from sklearn.preprocessing import LabelEncoder

In [52]:
le = LabelEncoder()

In [53]:
le.fit(df_orig['Stage_fear'])

In [54]:
df_orig["sf_label"] = le.transform(df_orig['Stage_fear'])

In [55]:
le.fit(df_orig['Drained_after_socializing'])

In [56]:
df_orig["das_label"] = le.transform(df_orig['Drained_after_socializing'])

In [57]:
le.fit(df_orig['Personality'])

In [58]:
df_orig["p_label"] = le.transform(df_orig['Personality'])

In [59]:
df_orig

Unnamed: 0,Time_spent_Alone,Stage_fear,Social_event_attendance,Going_outside,Drained_after_socializing,Friends_circle_size,Post_frequency,Personality,sf_label,das_label,p_label
0,4.0,No,4.0,6.0,No,13.0,5.0,Extrovert,0,0,0
1,9.0,Yes,0.0,0.0,Yes,0.0,3.0,Introvert,1,1,1
2,9.0,Yes,1.0,2.0,Yes,5.0,2.0,Introvert,1,1,1
3,0.0,No,6.0,7.0,No,14.0,8.0,Extrovert,0,0,0
4,3.0,No,9.0,4.0,No,8.0,5.0,Extrovert,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...
2895,3.0,No,7.0,6.0,No,6.0,6.0,Extrovert,0,0,0
2896,3.0,No,8.0,3.0,No,14.0,9.0,Extrovert,0,0,0
2897,4.0,Yes,1.0,1.0,Yes,4.0,0.0,Introvert,1,1,1
2898,11.0,Yes,1.0,,Yes,2.0,0.0,Introvert,1,1,1


In [60]:
df_orig[["Stage_fear", "sf_label"]]

Unnamed: 0,Stage_fear,sf_label
0,No,0
1,Yes,1
2,Yes,1
3,No,0
4,No,0
...,...,...
2895,No,0
2896,No,0
2897,Yes,1
2898,Yes,1


In [61]:
df_orig[["Drained_after_socializing", "das_label"]]

Unnamed: 0,Drained_after_socializing,das_label
0,No,0
1,Yes,1
2,Yes,1
3,No,0
4,No,0
...,...,...
2895,No,0
2896,No,0
2897,Yes,1
2898,Yes,1


In [62]:
df_orig[["Personality", "p_label"]]

Unnamed: 0,Personality,p_label
0,Extrovert,0
1,Introvert,1
2,Introvert,1
3,Extrovert,0
4,Extrovert,0
...,...,...
2895,Extrovert,0
2896,Extrovert,0
2897,Introvert,1
2898,Introvert,1


#### Drop the object columns and assign the new dataset to a new dataframe variable

In [63]:
df = df_orig.drop(["Stage_fear", "Drained_after_socializing", "Personality"], axis=1)
df

Unnamed: 0,Time_spent_Alone,Social_event_attendance,Going_outside,Friends_circle_size,Post_frequency,sf_label,das_label,p_label
0,4.0,4.0,6.0,13.0,5.0,0,0,0
1,9.0,0.0,0.0,0.0,3.0,1,1,1
2,9.0,1.0,2.0,5.0,2.0,1,1,1
3,0.0,6.0,7.0,14.0,8.0,0,0,0
4,3.0,9.0,4.0,8.0,5.0,0,0,0
...,...,...,...,...,...,...,...,...
2895,3.0,7.0,6.0,6.0,6.0,0,0,0
2896,3.0,8.0,3.0,14.0,9.0,0,0,0
2897,4.0,1.0,1.0,4.0,0.0,1,1,1
2898,11.0,1.0,,2.0,0.0,1,1,1


find the NAN values in the new dataset

In [72]:
df.isnull().sum()

Unnamed: 0,0
Time_spent_Alone,63
Social_event_attendance,62
Going_outside,66
Friends_circle_size,77
Post_frequency,65
sf_label,0
das_label,0
p_label,0


In [74]:
percentage_missing_all_columns = (df.isnull().sum() / len(df)) * 100
percentage_missing_all_columns

Unnamed: 0,0
Time_spent_Alone,2.172414
Social_event_attendance,2.137931
Going_outside,2.275862
Friends_circle_size,2.655172
Post_frequency,2.241379
sf_label,0.0
das_label,0.0
p_label,0.0


In [75]:
len(df)

2900

In [77]:
# Delete rows that have AT LEAST ONE missing value (most common) using .dropna()
df = df.dropna()

In [78]:
df.isnull().sum()

Unnamed: 0,0
Time_spent_Alone,0
Social_event_attendance,0
Going_outside,0
Friends_circle_size,0
Post_frequency,0
sf_label,0
das_label,0
p_label,0


In [79]:
len(df)

2585

#### Exploratory Data analysis

In [23]:
df.columns

Index(['Time_spent_Alone', 'Social_event_attendance', 'Going_outside',
       'Friends_circle_size', 'Post_frequency', 'sf_label', 'das_label',
       'p_label'],
      dtype='object')

#### Assign the features (X) and target (y)

In [132]:
X = df[['Time_spent_Alone', 'Social_event_attendance', 'Going_outside','Friends_circle_size', 'Post_frequency', 'sf_label', 'das_label']]

In [81]:
X

Unnamed: 0,Time_spent_Alone,Social_event_attendance,Going_outside,Friends_circle_size,Post_frequency,sf_label,das_label
0,4.0,4.0,6.0,13.0,5.0,0,0
1,9.0,0.0,0.0,0.0,3.0,1,1
2,9.0,1.0,2.0,5.0,2.0,1,1
3,0.0,6.0,7.0,14.0,8.0,0,0
4,3.0,9.0,4.0,8.0,5.0,0,0
...,...,...,...,...,...,...,...
2893,9.0,2.0,0.0,4.0,2.0,2,1
2895,3.0,7.0,6.0,6.0,6.0,0,0
2896,3.0,8.0,3.0,14.0,9.0,0,0
2897,4.0,1.0,1.0,4.0,0.0,1,1


In [82]:
y = df["p_label"]

In [83]:
y

Unnamed: 0,p_label
0,0
1,1
2,1
3,0
4,0
...,...
2893,1
2895,0
2896,0
2897,1


check for relationships between the columns

In [84]:
X = X.to_numpy()
X

array([[4., 4., 6., ..., 5., 0., 0.],
       [9., 0., 0., ..., 3., 1., 1.],
       [9., 1., 2., ..., 2., 1., 1.],
       ...,
       [3., 8., 3., ..., 9., 0., 0.],
       [4., 1., 1., ..., 0., 1., 1.],
       [3., 6., 6., ..., 9., 0., 0.]])

In [85]:
y = y.to_numpy()
y

array([0, 1, 1, ..., 0, 1, 0])

Split the dataset into training anf testing datasets

In [30]:
# import the train test split library
from sklearn.model_selection import train_test_split

In [86]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [87]:
X_train

array([[0., 7., 4., ..., 3., 0., 0.],
       [2., 5., 6., ..., 9., 0., 0.],
       [7., 2., 0., ..., 0., 1., 1.],
       ...,
       [6., 1., 2., ..., 2., 1., 1.],
       [3., 5., 5., ..., 4., 0., 0.],
       [2., 5., 6., ..., 8., 0., 0.]])

In [88]:
X_test

array([[4., 2., 2., ..., 1., 1., 1.],
       [3., 8., 5., ..., 6., 0., 0.],
       [3., 4., 6., ..., 9., 0., 0.],
       ...,
       [1., 9., 4., ..., 8., 0., 0.],
       [2., 7., 3., ..., 7., 0., 0.],
       [6., 1., 0., ..., 1., 1., 1.]])

In [89]:
y_train

array([0, 0, 1, ..., 0, 0, 1])

In [90]:
y_test

array([1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0,
       1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0,
       0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0,
       0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0,
       1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1,
       1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0,
       0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1,
       0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1,
       0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0,
       0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1,

Convert the training and testing features to tensor

In [91]:
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)

In [92]:
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)

#### Model 1: ReLU() Activation Function

In [93]:
X_train_tensor.shape[1]

7

In [94]:
len(torch.unique(y_train_tensor))

2

In [135]:
# Build the model
# 7 => 40 => 20 => 2

class classification_model_1(torch.nn.Module):
  def __init__(self, input_size, output_size):
    super().__init__()
    self.l1 = torch.nn.Linear(input_size, 40)
    self.af1 = torch.nn.ReLU()
    self.l2 = torch.nn.Linear(40, 20)
    self.af2 = torch.nn.ReLU()
    self.l3 = torch.nn.Linear(20, output_size)

  def forward(self, X_input):
    output = self.l1(X_input)
    output = self.af1(output)
    output = self.l2(output)
    output = self.af2(output)
    output = self.l3(output)
    return output

# Create a model object
output_size = len(torch.unique(y_train_tensor))

m1 = classification_model_1(X_train_tensor.shape[1], output_size)

# Create a function to measure the model's performance
# Using the Cross Entropy Loss Function for classification
criterion1 = torch.nn.CrossEntropyLoss()


# Define an optimizer method: Gradient Descent
# Set the learning rate: 0.001
optimizer1 = torch.optim.Adam(m1.parameters(), lr=0.000001)

# Write the training loop
# Set the number of epochs
epochs = 100000

m1.train()

for epoch in range(epochs):
  # Provide the values to the model to make predictions
  pred1 = m1(X_train_tensor)
  loss1 = criterion1(pred1, y_train_tensor)

  # Backward
  optimizer1.zero_grad()
  loss1.backward()

  # Next step
  optimizer1.step()

  if epoch % 10000 == 0:
    print(f"Epoch: {epoch+1}/{epochs}, Loss: {loss1.item()}")

Epoch: 1/100000, Loss: 0.6564663052558899
Epoch: 10001/100000, Loss: 0.5112952589988708
Epoch: 20001/100000, Loss: 0.41208332777023315
Epoch: 30001/100000, Loss: 0.34810778498649597
Epoch: 40001/100000, Loss: 0.3155478537082672
Epoch: 50001/100000, Loss: 0.3030986189842224
Epoch: 60001/100000, Loss: 0.29730668663978577
Epoch: 70001/100000, Loss: 0.2916421890258789
Epoch: 80001/100000, Loss: 0.2855672538280487
Epoch: 90001/100000, Loss: 0.2793588936328888


In [136]:
m1.eval()

with torch.no_grad():
  pred1 = m1(X_test_tensor).numpy()

pred_class = np.argmax(pred1,axis=1)
pred_class

array([1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0,
       1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0,
       0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0,
       0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0,
       1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1,
       1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0,
       0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1,
       0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1,
       0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0,
       0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1,

In [137]:
y_test_tensor

tensor([1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1,
        1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1,
        1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1,
        1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1,
        1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1,
        1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
        1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0,
        0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0,
        1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0,
        0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0,
        0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
        0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0,

In [138]:
from sklearn.metrics import accuracy_score

In [139]:
accuracy = accuracy_score(y_pred=pred_class,y_true=y_test_tensor)
print (f"Accuracy: {accuracy:.4f}")

Accuracy: 0.9478


In [140]:
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix, roc_curve, auc, RocCurveDisplay

# Calculate precision, recall, and F1-score
precision1 = precision_score(y_test_tensor, pred_class)
recall1 = recall_score(y_test_tensor, pred_class)
f11 = f1_score(y_test_tensor, pred_class)

print(f"Precision: {precision1:.4f}")
print(f"Recall: {recall1:.4f}")
print(f"F1-Score: {f11:.4f}")

# Calculate and display the confusion matrix
conf_matrix1 = confusion_matrix(y_test_tensor, pred_class)
print("\nConfusion Matrix:")
print(conf_matrix1)


Precision: 0.9451
Recall: 0.9488
F1-Score: 0.9470

Confusion Matrix:
[[249  14]
 [ 13 241]]


#### Model 2: Sigmoid Function

In [147]:
# Build the model
# 7 => 50 => 25 => 2

class classification_model_2(torch.nn.Module):
  def __init__(self, input_size, output_size):
    super().__init__()
    self.l1 = torch.nn.Linear(input_size, 50)
    self.af1 = torch.nn.Sigmoid()
    self.l2 = torch.nn.Linear(50, 25)
    self.af2 = torch.nn.Sigmoid()
    self.l3 = torch.nn.Linear(25, output_size)

  def forward(self, X_input):
    output = self.l1(X_input)
    output = self.af1(output)
    output = self.l2(output)
    output = self.af2(output)
    output = self.l3(output)
    return output

# Create a model object
output_size = len(torch.unique(y_train_tensor))

m2 = classification_model_2(X_train_tensor.shape[1], output_size)

# Create a function to measure the model's performance
# Using the Cross Entropy Loss Function for classification
criterion2 = torch.nn.CrossEntropyLoss()


# Define an optimizer method: Gradient Descent
# Set the learning rate: 0.001
optimizer2 = torch.optim.Adam(m2.parameters(), lr=0.000001)

# Write the training loop
# Set the number of epochs
epochs = 100000

m2.train()

for epoch in range(epochs):
  # Provide the values to the model to make predictions
  pred2 = m2(X_train_tensor)
  loss2 = criterion2(pred2, y_train_tensor)

  # Backward
  optimizer2.zero_grad()
  loss2.backward()

  # Next step
  optimizer2.step()

  if epoch % 10000 == 0:
    print(f"Epoch: {epoch+1}/{epochs}, Loss: {loss1.item()}")

Epoch: 1/100000, Loss: 0.27337586879730225
Epoch: 10001/100000, Loss: 0.27337586879730225
Epoch: 20001/100000, Loss: 0.27337586879730225
Epoch: 30001/100000, Loss: 0.27337586879730225
Epoch: 40001/100000, Loss: 0.27337586879730225
Epoch: 50001/100000, Loss: 0.27337586879730225
Epoch: 60001/100000, Loss: 0.27337586879730225
Epoch: 70001/100000, Loss: 0.27337586879730225
Epoch: 80001/100000, Loss: 0.27337586879730225
Epoch: 90001/100000, Loss: 0.27337586879730225


In [148]:
m2.eval()

with torch.no_grad():
  pred2 = m2(X_test_tensor).numpy()

pred2_class = np.argmax(pred2,axis=1)
pred2_class

array([1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0,
       1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0,
       0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0,
       0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0,
       1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1,
       1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0,
       0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1,
       0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1,
       0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0,
       0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1,

In [149]:
accuracy = accuracy_score(y_pred=pred2_class,y_true=y_test_tensor)
print (f"Accuracy: {accuracy:.4f}")

Accuracy: 0.9478


In [151]:
# Calculate precision, recall, and F1-score
precision2 = precision_score(y_test_tensor, pred2_class)
recall2 = recall_score(y_test_tensor, pred2_class)
f12 = f1_score(y_test_tensor, pred2_class)

print(f"Precision: {precision2:.4f}")
print(f"Recall: {recall2:.4f}")
print(f"F1-Score: {f12:.4f}")

# Calculate and display the confusion matrix
conf_matrix2 = confusion_matrix(y_test_tensor, pred2_class)
print("\nConfusion Matrix:")
print(conf_matrix2)


Precision: 0.9451
Recall: 0.9488
F1-Score: 0.9470

Confusion Matrix:
[[249  14]
 [ 13 241]]


#### Analysis and Insights:

**How well does each model perform?**

   Based on the metrics, both models appear to perform very similarly, achieving the same accuracy, precision, recall, and F1-score. However, the confusion matrices show identical results, which is unexpected. The AUC values differ (0.98 for Model 1 and 0.93 for Model 2), suggesting there might be differences in their performance that are not captured by the other metrics.


**What might explain the differences in performance?**

   Given the identical precision, recall, F1-score, and confusion matrix for both models, it's difficult to definitively explain the differences in performance based solely on these metrics. The difference in AUC suggests that Model 1 might have better discriminatory power across different thresholds, which is consistent with ReLU generally performing better than Sigmoid in avoiding vanishing gradients during training. However, the identical results for other metrics are unusual. The different network architectures and learning rates could also contribute, but the identical metrics suggest a potential issue in the evaluation.


**Which activation function worked better, and why?**

   Based on the AUC score, ReLU activation in Model 1 appears to have worked better, demonstrating higher discriminatory power. While the other metrics are identical, the theoretical advantages of ReLU over Sigmoid in preventing vanishing gradients and promoting sparsity could explain why it might lead to a better model, as suggested by the AUC.


**Which model would you recommend, and under what conditions?**

  Given the identical performance across accuracy, precision, recall, and F1-score, and the slightly higher AUC for Model 1, I would cautiously recommend Model 1 with ReLU activation. Model 1 is slightly preferred due to the higher AUC.