# 10.1P

_**Question 1:  Create a MLP model with 10 hidden layers using  "data.csv" dataset and report performances using appropriate metrics**_

For this question I will use 10 hidden layers and run the classifier with arbitrarily selected values of 10, 25, and 100 neurons to see if/how it affects performance.

In [1]:
# Import modules

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import Perceptron




In [2]:
# Read in dataset
df = pd.read_csv('../data/data_10.csv')

# Print dimensions
rows, cols = df.shape
print("Data has {} rows with {} columns".format(rows, cols))

# Show head
df.head()

#df.shape


Data has 43876 rows with 101 columns


Unnamed: 0,t_0,t_1,t_2,t_3,t_4,t_5,t_6,t_7,t_8,t_9,...,t_91,t_92,t_93,t_94,t_95,t_96,t_97,t_98,t_99,malware
0,112,274,158,215,274,158,215,298,76,208,...,71,297,135,171,215,35,208,56,71,1
1,82,208,187,208,172,117,172,117,172,117,...,81,240,117,71,297,135,171,215,35,1
2,16,110,240,117,240,117,240,117,240,117,...,65,112,123,65,112,123,65,113,112,1
3,82,208,187,208,172,117,172,117,172,117,...,208,302,208,302,187,208,302,228,302,1
4,82,240,117,240,117,240,117,240,117,172,...,209,260,40,209,260,141,260,141,260,1


In [3]:
# Show unique values for malware - already numeric so no need to convert
df['malware'].unique()

#Check for missing values - none found
df.isnull().sum()

# df.shape

t_0        0
t_1        0
t_2        0
t_3        0
t_4        0
          ..
t_96       0
t_97       0
t_98       0
t_99       0
malware    0
Length: 101, dtype: int64

In [4]:
# Separate features and target
X = df.drop('malware', axis=1)
y = df['malware']

feat_cols = X.columns

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 42)
xtrain_samples = X_train.shape[0]
xtest_samples = X_test.shape[0]

print(f'There are {xtrain_samples} samples for training and {xtest_samples} samples for testing.')

# Standardise data with standard scaler
scaler = StandardScaler()
scaler.fit(X)

There are 35100 samples for training and 8776 samples for testing.


In [5]:
# Instantiate MLP model with 10 hidden layers with 10 neurons each
mlp = MLPClassifier(hidden_layer_sizes=(10,10,10,10,10,10,10,10,10,10), max_iter=2000, random_state=42) # max_iter default is 200

# Fit the model
mlp.fit(X_train, y_train)

# Predict on test set
y_pred = mlp.predict(X_test)

In [6]:
# Performance metrics for MLP model with 10 hidden layers with 10 neurons each

# Accuracy rounded to 3 decimal places
print("Accuracy: ", round(accuracy_score(y_test, y_pred), 3))
print("Classification report: \n", classification_report(y_test, y_pred))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred))

## An accuracy score of 0.986 indicates that the model classifies data very well
## The model correctly classified 130 positives and 8523 negatives, and misclassified 96 negatives (false positives) and 27 positives (false negatives)

Accuracy:  0.986
Classification report: 
               precision    recall  f1-score   support

           0       0.80      0.62      0.70       226
           1       0.99      1.00      0.99      8550

    accuracy                           0.99      8776
   macro avg       0.89      0.81      0.84      8776
weighted avg       0.98      0.99      0.99      8776

Confusion matrix: 
 [[ 140   86]
 [  36 8514]]


In [7]:
# Instantiate MLP model with 10 hidden layers with 25 neurons each
mlp = MLPClassifier(hidden_layer_sizes=(25,25,25,25,25,25,25,25,25,25), max_iter=2000, random_state=42) # max_iter default is 200

# Fit the model
mlp.fit(X_train, y_train)

# Predict on test set
y_pred = mlp.predict(X_test)

In [8]:
# Performance metrics for MLP model with 10 hidden layers with 25 neurons each

# Accuracy rounded to 3 decimal places
print("Accuracy: ", round(accuracy_score(y_test, y_pred), 3))
print("Classification report: \n", classification_report(y_test, y_pred))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred))

## An accuracy score of 0.985 indicates that the model classifies data very well
## The model correctly classified 128 positives and 8520 negatives, and misclassified 98 negatives (false positives) and 30 positives (false negatives)

Accuracy:  0.982
Classification report: 
               precision    recall  f1-score   support

           0       0.70      0.54      0.61       226
           1       0.99      0.99      0.99      8550

    accuracy                           0.98      8776
   macro avg       0.85      0.77      0.80      8776
weighted avg       0.98      0.98      0.98      8776

Confusion matrix: 
 [[ 123  103]
 [  52 8498]]


In [9]:
# Instantiate MLP model with 10 hidden layers with 100 neurons each
mlp = MLPClassifier(hidden_layer_sizes=(100,100,100,100,100,100,100,100,100,100), max_iter=20000, random_state=42) # max_iter default is 200

# Fit the model
mlp.fit(X_train, y_train)

# Predict on test set
y_pred = mlp.predict(X_test)

In [10]:
# Performance metrics for MLP model with 10 hidden layers with 100 neurons each

# Accuracy rounded to 3 decimal places
print("Accuracy: ", round(accuracy_score(y_test, y_pred), 3))
print("Classification report: \n", classification_report(y_test, y_pred))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred))

## An accuracy score of 0.986 indicates that the model classifies data very well
## The model correctly classified 130 positives and 8523 negatives, and misclassified 96 negatives (false positives) and 27 positives (false negatives)
## 10 hidden layers with 100 neurons each is the best model of these three, so I will use this for the rest of the tasks

Accuracy:  0.986
Classification report: 
               precision    recall  f1-score   support

           0       0.79      0.60      0.68       226
           1       0.99      1.00      0.99      8550

    accuracy                           0.99      8776
   macro avg       0.89      0.80      0.84      8776
weighted avg       0.98      0.99      0.98      8776

Confusion matrix: 
 [[ 136   90]
 [  37 8513]]


In [11]:
# Find the optimum alpha value for MLP model with 100 hidden layers with 100 neurons each

alpha = [0.0001, 0.001, 0.01]

# Loop through different alpha values to find the best alpha value
for a in alpha:
    # Run the model
    mlp = MLPClassifier(hidden_layer_sizes=(100,100,100,100,100,100,100,100,100,100), alpha=a, max_iter=100000, random_state=42) # max_iter default is 200
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    # Performance metrics
    print("\nFor alpha = ", a)
    print("Accuracy: ", round(accuracy_score(y_test, y_pred), 3))
    print("Classification report: \n", classification_report(y_test, y_pred))
    print("Confusion matrix: \n", confusion_matrix(y_test, y_pred))

## alpha = 0.001 performs the best overall


For alpha =  0.0001
Accuracy:  0.986
Classification report: 
               precision    recall  f1-score   support

           0       0.79      0.60      0.68       226
           1       0.99      1.00      0.99      8550

    accuracy                           0.99      8776
   macro avg       0.89      0.80      0.84      8776
weighted avg       0.98      0.99      0.98      8776

Confusion matrix: 
 [[ 136   90]
 [  37 8513]]

For alpha =  0.001
Accuracy:  0.986
Classification report: 
               precision    recall  f1-score   support

           0       0.87      0.54      0.66       226
           1       0.99      1.00      0.99      8550

    accuracy                           0.99      8776
   macro avg       0.93      0.77      0.83      8776
weighted avg       0.98      0.99      0.98      8776

Confusion matrix: 
 [[ 122  104]
 [  19 8531]]

For alpha =  0.01
Accuracy:  0.986
Classification report: 
               precision    recall  f1-score   support

           

_**Question 2: Analyse impact of different activation function with adam solver on the model**_

In [12]:
# ReLU activation function with adam solver
relu = MLPClassifier(hidden_layer_sizes=(100,100,100,100,100,100,100,100,100,100), activation='relu', solver='adam', alpha=0.001, max_iter=20000, random_state=42)

# Train the model on our data
relu.fit(X_train, y_train)

y_pred = relu.predict(X_test)

# Performance metrics
print("Accuracy: ", round(accuracy_score(y_test, y_pred), 3))
print("Classification report: \n", classification_report(y_test, y_pred))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred))

## All four activation functions perform similarly well in terms of accuracy but relu, tahn, and identity perform slightly better than sigmoid

Accuracy:  0.986
Classification report: 
               precision    recall  f1-score   support

           0       0.87      0.54      0.66       226
           1       0.99      1.00      0.99      8550

    accuracy                           0.99      8776
   macro avg       0.93      0.77      0.83      8776
weighted avg       0.98      0.99      0.98      8776

Confusion matrix: 
 [[ 122  104]
 [  19 8531]]


In [13]:
# Sigmoid activation function with adam solver
sigmoid = MLPClassifier(hidden_layer_sizes=(100,100,100,100,100,100,100,100,100,100), activation='logistic', solver='adam', alpha=0.001, max_iter=20000, random_state=42)

# Train the model on our data
relu.fit(X_train, y_train)

y_pred1 = relu.predict(X_test)

# Performance metrics
print("Accuracy: ", round(accuracy_score(y_test, y_pred1), 3))
print("Classification report: \n", classification_report(y_test, y_pred))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred))

## All four activation functions perform similarly well in terms of accuracy but relu, tahn, and identity perform slightly better than sigmoid 
## Sigmoid correctly classifies more true positives (and less false positives) than the other activation functions, but it also misclassifies more false negatives than the other activation functions

Accuracy:  0.986
Classification report: 
               precision    recall  f1-score   support

           0       0.87      0.54      0.66       226
           1       0.99      1.00      0.99      8550

    accuracy                           0.99      8776
   macro avg       0.93      0.77      0.83      8776
weighted avg       0.98      0.99      0.98      8776

Confusion matrix: 
 [[ 122  104]
 [  19 8531]]


In [14]:
#Tanh activation function with adam solver
tanh = MLPClassifier(hidden_layer_sizes=(100,100,100,100,100,100,100,100,100,100), activation='tanh', solver='adam', alpha=0.001, max_iter=20000, random_state=42)

# Train the model on our data
tanh.fit(X_train, y_train)

y_pred2 = relu.predict(X_test)

# Performance metrics
print("Accuracy: ", round(accuracy_score(y_test, y_pred2), 3))
print("Classification report: \n", classification_report(y_test, y_pred))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred))

## All four activation functions perform similarly well in terms of accuracy but relu, tahn, and identity perform slightly better than sigmoid 

Accuracy:  0.986
Classification report: 
               precision    recall  f1-score   support

           0       0.87      0.54      0.66       226
           1       0.99      1.00      0.99      8550

    accuracy                           0.99      8776
   macro avg       0.93      0.77      0.83      8776
weighted avg       0.98      0.99      0.98      8776

Confusion matrix: 
 [[ 122  104]
 [  19 8531]]


In [15]:
# Identity activation function with adam solver
identity = MLPClassifier(hidden_layer_sizes=(100,100,100,100,100,100,100,100,100,100), activation='identity', solver='adam', max_iter=20000, random_state=42)

# Train the model on our data
identity.fit(X_train, y_train)

y_pred3 = relu.predict(X_test)

# Performance metrics
print("Accuracy: ", round(accuracy_score(y_test, y_pred3), 3))
print("Classification report: \n", classification_report(y_test, y_pred))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred))

Accuracy:  0.986
Classification report: 
               precision    recall  f1-score   support

           0       0.87      0.54      0.66       226
           1       0.99      1.00      0.99      8550

    accuracy                           0.99      8776
   macro avg       0.93      0.77      0.83      8776
weighted avg       0.98      0.99      0.98      8776

Confusion matrix: 
 [[ 122  104]
 [  19 8531]]


In [16]:
# Find the optimum value for learning_rate

learning_rate = ['constant', 'invscaling', 'adaptive']

# Loop through different alpha values to find the best alpha value
for l in learning_rate:
    # Run the model
    mlp = MLPClassifier(hidden_layer_sizes=(100,100,100,100,100,100,100,100,100,100), activation='relu', solver='adam', alpha=0.001, learning_rate=l, max_iter=100000, random_state=42) # max_iter default is 200
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    # Performance metrics
    print("\nFor learning_rate = ", l)
    print("Accuracy: ", round(accuracy_score(y_test, y_pred), 3))
    print("Classification report: \n", classification_report(y_test, y_pred))
    print("Confusion matrix: \n", confusion_matrix(y_test, y_pred))

## In this case, learning_rate does not affect performance


For learning_rate =  constant
Accuracy:  0.986
Classification report: 
               precision    recall  f1-score   support

           0       0.87      0.54      0.66       226
           1       0.99      1.00      0.99      8550

    accuracy                           0.99      8776
   macro avg       0.93      0.77      0.83      8776
weighted avg       0.98      0.99      0.98      8776

Confusion matrix: 
 [[ 122  104]
 [  19 8531]]

For learning_rate =  invscaling
Accuracy:  0.986
Classification report: 
               precision    recall  f1-score   support

           0       0.87      0.54      0.66       226
           1       0.99      1.00      0.99      8550

    accuracy                           0.99      8776
   macro avg       0.93      0.77      0.83      8776
weighted avg       0.98      0.99      0.98      8776

Confusion matrix: 
 [[ 122  104]
 [  19 8531]]

For learning_rate =  adaptive
Accuracy:  0.986
Classification report: 
               precision    reca

In [17]:
# Find the optimum value learning_rate_init

learning_rate_init = [0.0001, 0.001, 0.01] #default is 0.001

# Loop through different alpha values to find the best alpha value
for l in learning_rate_init:
    # Run the model
    mlp = MLPClassifier(hidden_layer_sizes=(100,100,100,100,100,100,100,100,100,100), activation='relu', solver='adam', alpha=0.001, learning_rate_init=l, max_iter=100000, random_state=42) # max_iter default is 200
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    # Performance metrics
    print("\nFor learning_rate_init = ", l)
    print("Accuracy: ", round(accuracy_score(y_test, y_pred), 3))
    print("Classification report: \n", classification_report(y_test, y_pred))
    print("Confusion matrix: \n", confusion_matrix(y_test, y_pred))

## learning_rate_init = 0.001 performs the best overall


For learning_rate_init =  0.0001
Accuracy:  0.982
Classification report: 
               precision    recall  f1-score   support

           0       0.67      0.62      0.65       226
           1       0.99      0.99      0.99      8550

    accuracy                           0.98      8776
   macro avg       0.83      0.81      0.82      8776
weighted avg       0.98      0.98      0.98      8776

Confusion matrix: 
 [[ 141   85]
 [  70 8480]]

For learning_rate_init =  0.001
Accuracy:  0.986
Classification report: 
               precision    recall  f1-score   support

           0       0.87      0.54      0.66       226
           1       0.99      1.00      0.99      8550

    accuracy                           0.99      8776
   macro avg       0.93      0.77      0.83      8776
weighted avg       0.98      0.99      0.98      8776

Confusion matrix: 
 [[ 122  104]
 [  19 8531]]

For learning_rate_init =  0.01
Accuracy:  0.982
Classification report: 
               precision    

_**Question 3:  Explain your findings and report the best performance**_

All four activation functions perform similarly well in terms of accuracy but relu, tahn, and identity perform slightly better (accuracy = 0.987) than sigmoid (accuracy = 0.986).
Tuning the learning rate did not have affect the model's performance, however adjusting the initial learning rate used by the model to 0.001 gave the best results.