### **IMPORTANT NOTE: The following code and explanation was generated via ChatGPT o1-preview model.**

# Step 1: Import Necessary Libraries

In your Jupyter Notebook, start by importing the libraries we'll need. You can find the libraries we are installing below or in the `README.md`

**Explanation**:
- **pandas**: For data manipulation and analysis.
- **numpy**: For numerical computations.
- **os**: For interacting with the operating system.
- **matplotlib.pyplot** and **seaborn**: For data visualization.
- **sklearn**: For machine learning algorithms and evaluation metrics.

In [1]:
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import (
    accuracy_score, classification_report, confusion_matrix,
    precision_score, recall_score, f1_score, roc_auc_score, roc_curve, auc
)
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
# Add any other imports you may have

# Step 2: Load and Combine Dataset Files

We load all CSV files from the dataset/ folder and combine them into one DataFrame for analysis.

1. **Import Libraries**: Import necessary modules for file handling and data manipulation.
2. **Locate CSV Files**: Find all CSV files in the dataset/ directory.
3. **Load Data**: Read each CSV file into a DataFrame, clean column names, and store them in a list.
4. **Combine Data**: Merge all DataFrames into a single DataFrame data.
5. **Confirm Loading**: Print the shapes to verify that data has been loaded correctly.

Now, we have all our data in one place, ready for preprocessing and modeling.

In [None]:
import glob
import os
import pandas as pd

# Path to your dataset folder
data_path = 'dataset/'

# Get a list of all CSV files in the dataset folder
csv_files = glob.glob(os.path.join(data_path, '*.csv'))

dataframes = []

for file in csv_files:
    df = pd.read_csv(file, encoding='utf-8')  # Adjust encoding if necessary
    # Strip whitespace from column names
    df.columns = df.columns.str.strip()
    dataframes.append(df)
    print(f"Loaded {file} with shape {df.shape}")

# Combine all DataFrames
data = pd.concat(dataframes, ignore_index=True)
print(f"Combined DataFrame shape: {data.shape}")

# Step 3: Clean the `Label` Column

We clean the 'Label' column to ensure our labels are consistent:

#### Replace Unwanted Characters:
- Some labels might contain an unidentified character '�' due to encoding issues.
- We replace '�' with a hyphen '-' to fix these labels.

#### Remove Extra Spaces:
- We strip any leading or trailing whitespace from the labels.
- This ensures there are no hidden spaces that could cause problems later.

By cleaning the 'Label' column, we make sure that all labels are properly formatted for our analysis.

In [3]:
# Replace the unidentified character '�' with a hyphen '-' in the 'Label' column
data['Label'] = data['Label'].str.replace('�', '-', regex=False)

# Optionally, strip any leading/trailing whitespace from the 'Label' column
data['Label'] = data['Label'].str.strip()

# Step 4: Explore the Data
### 4.1 View the First Few Rows

Preview the Data:
- `data.head()`: Displays the first five rows of the dataset.
- Purpose: Gives us a quick look at the data structure, column names, and some sample values.

In [None]:
data.head()

### 4.2 Get DataFrame Information
Dataset Summary:
- `data.info()`: Provides a summary of the dataset.
    - Shows the number of entries (rows) and columns.
    - Displays the data type of each column.
    - Indicates the number of non-null values in each column.
- Purpose: Helps us understand the overall structure of the data and identify any columns with missing values or incorrect data types.

In [None]:
data.info()

### 4.3 Check for Missing Values
Check for Missing Values:
- `data.isnull().sum()`: Calculates the number of missing values in each column.
- `print(missing_values)`: Outputs the count of missing values per column.
- Purpose: Identifies columns that may need data cleaning or imputation due to missing values.

In [None]:
missing_values = data.isnull().sum()
print(missing_values)

# Step 5: Preprocess the Data
### 5.1 Convert Non-Numeric Columns to Numeric

We start preprocessing by finding any non-numeric columns in our data:

- Find Non-Numeric Columns:
    - Use `data.select_dtypes(include=['object']).columns` to list columns with data type `object`.
    - These are usually categorical or text features that need to be converted to numbers for our machine learning models.
- Print the Columns:
    - We display the names of these non-numeric columns to decide how to handle them next.

Identifying these columns helps us prepare for encoding them into numeric format, ensuring our data is ready for modeling

In [None]:
# Identify non-numeric columns
non_numeric_cols = data.select_dtypes(include=['object']).columns
print("Non-numeric columns:", non_numeric_cols)

### 5.2 Encode Categorical Variables
#### 5.2.1 Encode the Target Variable

Before encoding the target variable, let's check the unique classes present in the 'Label' column.

Explanation:
- View Unique Labels:
    - We print all the unique values in the `Label` column to see the different classes in our target variable.

In [None]:
# Check unique values in the Label column
print(data['Label'].unique())

#### 5.2.2 Encode Labels for Binary Classification

In this step, we'll map all attack types to `Attack` and benign traffic to `Benign`. We'll then encode these labels into numerical values.

The Label Values are as follows:
```
    ['BENIGN' 'Infiltration' 'Bot' 'PortScan' 'DDoS' 'FTP-Patator',
     'SSH-Patator' 'DoS slowloris' 'DoS Slowhttptest' 'DoS Hulk', 
     'DoS GoldenEye' 'Heartbleed' 'Web Attack - Brute Force',
     'Web Attack - XSS' 'Web Attack - Sql Injection']
```

Explanation:
- We create a new column `Label_binary` where all attack types are mapped to `Attack` and benign traffic to `Benign`.
- `LabelEncoder` is used to convert these categorical labels into numerical values (0 and 1).
- We print the label mapping to verify that `Attack` and `Benign` are correctly encoded.

In [None]:
# Encode labels for binary classification
data['Label_binary'] = data['Label'].apply(lambda x: 'Benign' if x == 'BENIGN' else 'Attack')

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
data['Label_encoded'] = le.fit_transform(data['Label_binary'])

# Display label encoding mapping
label_mapping = dict(zip(le.classes_, le.transform(le.classes_)))
print("Label Encoding Mapping:")
for label, encoding in label_mapping.items():
    print(f"{label}: {encoding}")

### 5.3 Prepare Features (X) and Target (y)

We'll separate our dataset into features and target variables.

Explanation:
- `X` contains all the features used for training, excluding the original and encoded labels.
- `y` is our target variable containing the encoded labels (`0` for **Attack**, `1` for **Benign**).

In [10]:
# Features (drop unnecessary columns)
X = data.drop(['Label', 'Label_binary', 'Label_encoded'], axis=1, errors='ignore')

# Target variable
y = data['Label_encoded']

### 5.4 Handle Non-Numeric Features

We need to ensure all features are numeric.

Explanation:
- We check for any non-numeric columns in `X`.
- If non-numeric columns are found, we apply one-hot encoding to convert them into numeric format.

In [None]:
# Identify non-numeric columns
non_numeric_cols = X.select_dtypes(include=['object']).columns
print("Non-numeric columns:", non_numeric_cols.tolist())

# Encode non-numeric features
if len(non_numeric_cols) > 0:
    X = pd.get_dummies(X, columns=non_numeric_cols)

### 5.5 Split Data into Training and Testing Sets

We'll split our data into training and testing sets to evaluate model performance.

Explanation:
- We use `train_test_split` to split the data.
    - `test_size=0.2` reserves 20% of the data for testing and 80% for training
    - `stratify=y` ensures the class distribution is consistent in both training and testing sets.
    - `random_state=42` sets a seed for random number generation to ensure reproducibility.

In [12]:
from sklearn.model_selection import train_test_split

# Split data with stratification to maintain class balance
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

### 5.6 Handling Infinite and Missing Values

We need to replace infinite values and handle any missing data in our features.

Explanation:
- Infinite values are replaced with `NaN` to handle them appropriately.
- We check for missing values in `X_train` and `X_test`.
- `SimpleImputer` is used to fill missing values with the mean of each feature, calculated from the training set.

In [None]:
import numpy as np
from sklearn.impute import SimpleImputer

# Replace infinite values with NaN
X_train.replace([np.inf, -np.inf], np.nan, inplace=True)
X_test.replace([np.inf, -np.inf], np.nan, inplace=True)

# Check for NaN values
print("Checking for NaN values in X_train and X_test:")
print(f"X_train contains NaN values: {X_train.isnull().values.any()}")
print(f"X_test contains NaN values: {X_test.isnull().values.any()}")

# Impute missing values with the mean
imputer = SimpleImputer(strategy='mean')

# Fit on X_train and transform both X_train and X_test
X_train = pd.DataFrame(imputer.fit_transform(X_train), columns=X_train.columns)
X_test = pd.DataFrame(imputer.transform(X_test), columns=X_test.columns)

# Step 6: Feature Scaling
We scale the features to normalize the data.

Explanation:
- `StandardScaler` standardizes features by removing the mean and scaling to unit variance.
- We fit the scaler on `X_train` and transform both `X_train` and `X_test` to prevent data leakage.

In [14]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

# Fit the scaler on the training data
X_train_scaled = scaler.fit_transform(X_train)

# Transform the test data
X_test_scaled = scaler.transform(X_test)

# Step 7: Model Evaluation
### Baseline with no adjustments
We train a baseline Random Forest model without any adjustments to see how it performs:

- Train the Model:
    - Create a `RandomForestClassifier` with 100 trees.
    - Fit the model to the scaled training data.

- Make Predictions:
    - Predict the labels for the test data.

- Evaluate Performance:
    - Accuracy: Calculate the overall accuracy of the model.
    - Classification Report: Get precision, recall, F1-score, and support for each class.
    - Confusion Matrix: See the breakdown of correct and incorrect predictions for 'Attack' and 'Benign'.

This gives us a baseline to assess whether further improvements are needed.

In [None]:
# Baseline Model: RandomForestClassifier without class weights
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Train the baseline model
baseline_model = RandomForestClassifier(n_estimators=100, random_state=42)
baseline_model.fit(X_train_scaled, y_train)

# Make predictions
y_pred_baseline = baseline_model.predict(X_test_scaled)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred_baseline)
print(f"Baseline Model Accuracy: {accuracy:.4f}")

print("Baseline Model Classification Report:")
print(classification_report(y_test, y_pred_baseline, target_names=['Attack', 'Benign']))

print("Baseline Model Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_baseline))

### Baseline with class weights
We modify the Random Forest model to give more attention to the minority class:

- Adjust Class Weights:
    - Use `class_weight='balanced'` to automatically balance class weights.
    - Helps the model focus more on detecting `Attack` instances.

- Train, Predict, and Evaluate:
    - Fit the model to the training data.
    - Make predictions on the test data.
    - Evaluate using accuracy, classification report, and confusion matrix.

This step helps us assess whether class weighting improves model performance compared to the baseline.

In [None]:
# Model with class_weight='balanced'
model_class_weight = RandomForestClassifier(n_estimators=100, class_weight='balanced', random_state=42)
model_class_weight.fit(X_train_scaled, y_train)

# Make predictions
y_pred_class_weight = model_class_weight.predict(X_test_scaled)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred_class_weight)
print(f"Model with Class Weight Accuracy: {accuracy:.4f}")

print("Model with Class Weight Classification Report:")
print(classification_report(y_test, y_pred_class_weight, target_names=['Attack', 'Benign']))

print("Model with Class Weight Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_class_weight))

### Oversample Minority Class using SMOTE

We address class imbalance by oversampling the minority class ('Attack') using **SMOTE** (Synthetic Minority Oversampling Technique).

Explanation:
- `SMOTE` generates synthetic samples of the minority class to balance the dataset.
- `sampling_strategy='auto'` balances all classes to the number of samples in the majority class.
- We apply `SMOTE` only to the training data to avoid data leakage.
- We print class distributions before and after resampling to verify the changes.

In [None]:
from imblearn.over_sampling import SMOTE

# Define the SMOTE object
smote = SMOTE(random_state=42)

# Resample the training data
X_train_resampled, y_train_resampled = smote.fit_resample(X_train_scaled, y_train)

print("After SMOTE oversampling:")
print(f"Original y_train distribution: {np.bincount(y_train)}")
print(f"Resampled y_train distribution: {np.bincount(y_train_resampled)}")

# Train the model on resampled data
model_smote = RandomForestClassifier(n_estimators=100, random_state=42)
model_smote.fit(X_train_resampled, y_train_resampled)

# Make predictions
y_pred_smote = model_smote.predict(X_test_scaled)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred_smote)
print(f"Model with SMOTE Accuracy: {accuracy:.4f}")

print("Model with SMOTE Classification Report:")
print(classification_report(y_test, y_pred_smote, target_names=['Attack', 'Benign']))

print("Model with SMOTE Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_smote))

### Logistic Regression Evaluation
We test Logistic Regression as an alternative model:

- Set Up the Model:
    - Use LogisticRegression with:
        - `max_iter=1000` to allow more iterations for convergence.
        - `class_weight='balanced'` to handle class imbalance.
        - `random_state=42` for consistent results.

- Train the Model:
    - Fit the logistic regression model to the scaled training data.

- Make Predictions:
    - Predict labels for the test data.

- Evaluate Performance:
    - Accuracy: Check the overall correctness of the model.
    - Classification Report: Get detailed metrics like precision and recall for 'Attack' and 'Benign'.
    - Confusion Matrix: See how well the model distinguishes between the classes.

By comparing Logistic Regression to our previous models, we can see which algorithm performs best on our data.

In [None]:
from sklearn.linear_model import LogisticRegression

# Initialize and train the model
model_logreg = LogisticRegression(max_iter=1000, class_weight='balanced', random_state=42)
model_logreg.fit(X_train_scaled, y_train)

# Make predictions
y_pred_logreg = model_logreg.predict(X_test_scaled)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred_logreg)
print(f"Logistic Regression Accuracy: {accuracy:.4f}")

print("Logistic Regression Classification Report:")
print(classification_report(y_test, y_pred_logreg, target_names=['Attack', 'Benign']))

print("Logistic Regression Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_logreg))

### XGBoost Evaluation
We train an XGBoost model to classify network traffic, adjusting for class imbalance:

- Handle Class Imbalance:
    - Calculate `scale_pos_weight` by finding the ratio of `Benign` to `Attack` instances in the training data.
    - This tells the model to pay more attention to the minority class.

- Train the XGBoost Model:
    - Initialize `XGBClassifier` with the calculated `scale_pos_weight`.
    - Train the model on the scaled training data.

- Make Predictions:
    - Use the trained model to predict labels for the test data.

- Evaluate Performance:
    - Accuracy: Measure how often the model's predictions are correct.
    - Classification Report: Get detailed metrics like precision and recall for `Attack` and `Benign`.
    - Confusion Matrix: See the breakdown of correct and incorrect predictions for each class.

By adjusting for class imbalance, we aim to improve the model's ability to detect attacks effectively.

In [None]:
# Import the XGBoost classifier
from xgboost import XGBClassifier

# Calculate the scale_pos_weight parameter
from collections import Counter
counter = Counter(y_train)
ratio = counter[1] / counter[0]
print(f"Scale_pos_weight ratio: {ratio}")

# Initialize and train the model
model_xgb = XGBClassifier(scale_pos_weight=ratio, use_label_encoder=False, eval_metric='logloss', random_state=42)
model_xgb.fit(X_train_scaled, y_train)

# Make predictions
y_pred_xgb = model_xgb.predict(X_test_scaled)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred_xgb)
print(f"XGBoost Classifier Accuracy: {accuracy:.4f}")

print("XGBoost Classifier Classification Report:")
print(classification_report(y_test, y_pred_xgb, target_names=['Attack', 'Benign']))

print("XGBoost Classifier Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_xgb))


### Threshold Sensitivity Analysis


- Get Prediction Probabilities:
    - `We use baseline_model.predict_proba(X_test_scaled)[:, 1]` to get the predicted probabilities for the positive class ('Benign').
        - `predict_proba` returns an array where each element is a list of probabilities for each class.
        - `[:, 1]` selects the probability of the class labeled 1 (which we assigned to 'Benign').
    - These probabilities represent the model's confidence that each instance belongs to the 'Benign' class.

- Define Thresholds to Evaluate:
    - We set up a list of thresholds `[0.3, 0.5, 0.7]` to test.
        - `0.5` is the default threshold used by most classifiers.
        - By adjusting the threshold, we can control the sensitivity of the model to the positive class.

- Adjust Predictions Based on Thresholds:
    - For each threshold in the list:
        - We compare the predicted probabilities to the threshold: `(y_probs >= thresh)`.
        - This creates a boolean array where `True` indicates the probability is greater than or equal to the threshold.
        - We convert the boolean array to integers (`0` or `1`) using `.astype(int)`, resulting in the adjusted predictions.

- Evaluate the Model:
    - Accuracy Score:
        - We calculate the accuracy using `accuracy_score(y_test, y_pred_thresh)`.
        - This tells us the proportion of correct predictions at the given threshold.
    - Classification Report:
        - We generate a classification report showing precision, recall, F1-score, and support for both 'Attack' and 'Benign' classes.
        - This helps us understand how adjusting the threshold affects these metrics.
    - Confusion Matrix:
        - We display the confusion matrix using `confusion_matrix(y_test, y_pred_thresh)`.
        - It shows the counts of true positives, false positives, true negatives, and false negatives.

- Repeat for Each Threshold:
    - The loop runs for each threshold `(0.3, 0.5, 0.7)`, allowing us to compare the model's performance at different sensitivity levels.

**Why Adjust Thresholds?**
- Purpose:
    - The default threshold of `0.5` might not be optimal, especially in imbalanced datasets.
    - Adjusting the threshold can help balance between precision and recall based on our specific needs.
        - Lower Threshold (e.g., `0.3`):
            - The model is more likely to predict 'Benign'.
            - May increase recall (catch more positive cases) but decrease precision (more false positives).
        - Higher Threshold (e.g., `0.7`):
            - The model is less likely to predict 'Benign'.
            - May increase precision (fewer false positives) but decrease recall (miss more positive cases).

- Application:
    - In intrusion detection, we might prefer to minimize false negatives (undetected attacks), so we might choose a lower threshold to catch more potential attacks, accepting more false positives as a trade-off.
    - By evaluating different thresholds, we can select the one that offers the best balance for our specific objectives.

In [None]:
# Get prediction probabilities
y_probs = baseline_model.predict_proba(X_test_scaled)[:, 1]  # Probability of class 'Benign' (label 1)

# Define thresholds to evaluate
thresholds = [0.3, 0.5, 0.7]

for thresh in thresholds:
    # Predict based on adjusted threshold
    y_pred_thresh = (y_probs >= thresh).astype(int)
    
    # Evaluate the model
    accuracy = accuracy_score(y_test, y_pred_thresh)
    print(f"Threshold {thresh} - Accuracy: {accuracy:.4f}")
    print(f"Threshold {thresh} - Classification Report:")
    print(classification_report(y_test, y_pred_thresh, target_names=['Attack', 'Benign']))
    print(f"Threshold {thresh} - Confusion Matrix:")
    print(confusion_matrix(y_test, y_pred_thresh))
    print("-" * 50)

### Cross-Validation
Explanation:

- Why Use Cross-Validation?
    - Cross-validation provides a more reliable estimate of the model's performance by evaluating it on multiple subsets of the data.
    - It helps prevent overfitting and assesses how well the model might perform on new, unseen data.

- Initialize Stratified K-Fold:
    - We use `StratifiedKFold` to maintain the same class distribution in each fold as in the original dataset.
    - Parameters:
        - `n_splits=5`: Splits the data into five folds.
        - `shuffle=True`: Randomizes the data before splitting to ensure a good mix.
        - `random_state=42`: Sets a seed for reproducibility.

- Perform Cross-Validation:
    - The `cross_val_score` function evaluates the model using cross-validation.
    - Parameters:
        - `baseline_model`: The model we want to evaluate.
        - `X_train_scaled`: The features from the training set.
        - `y_train`: The target variable from the training set.
        - `cv=skf`: Specifies the cross-validation strategy to use.
        - `scoring='f1'`: Uses the F1 score as the evaluation metric.
    - Returns:
        - An array cv_scores containing the F1 scores for each fold.

By using cross-validation, we gain a better understanding of how our model performs across different subsets of the data, leading to a more robust evaluation than a single train-test split.

In [None]:
from sklearn.model_selection import cross_val_score, StratifiedKFold

# Initialize stratified k-fold
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Cross-validate the model
cv_scores = cross_val_score(baseline_model, X_train_scaled, y_train, cv=skf, scoring='f1')

print(f"Cross-Validation F1 Scores: {cv_scores}")
print(f"Mean F1 Score: {cv_scores.mean():.4f}")

# Step 8: Visualizing Results
### ROC Curve Analysis
We visualize our model's performance using the ROC curve:

- Purpose:
    - The ROC curve shows how the true positive rate (TPR) relates to the false positive rate (FPR) at different thresholds.
    - The AUC summarizes this relationship in a single number.

- Steps:
    - Get Predicted Probabilities:
        - Extract probabilities for the 'Benign' class from the model.
    - Calculate ROC Metrics:
        - Compute false positive rates, true positive rates, and thresholds.
        - Calculate the AUC.
    - Plot the ROC Curve:
        - Plot TPR vs. FPR.
        - Add a diagonal line representing random guessing.
        - Include labels, a title, and a legend showing the AUC.

- Interpretation:
    - The ROC curve helps visualize the trade-off between true positive rate and false positive rate across different thresholds.
    - The AUC provides a single metric to evaluate the model's ability to distinguish between the classes.
        - An AUC of 1.0 indicates perfect classification; 0.5 suggests no discriminative ability.

By analyzing the ROC curve and AUC, we gain deeper insights into our model's performance beyond accuracy, helping us assess how well it can detect attacks while minimizing false alarms.

In [None]:
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt

# Get prediction probabilities
y_probs_baseline = baseline_model.predict_proba(X_test_scaled)[:, 1]  # Probability of 'Benign'

# Calculate ROC curve and AUC
fpr, tpr, thresholds = roc_curve(y_test, y_probs_baseline, pos_label=1)
roc_auc = auc(fpr, tpr)

# Plot ROC curve
plt.figure()
plt.plot(fpr, tpr, color='blue', lw=2, label=f'Baseline Model ROC curve (area = {roc_auc:.4f})')
plt.plot([0, 1], [0, 1], color='grey', lw=1, linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve - Baseline Model')
plt.legend(loc="lower right")
plt.show()

### Compile Performance Metrics
We compile and compare the performance metrics of all our models:

- Purpose:
    - Gather key metrics like Accuracy, Precision, Recall, F1-Score, and AUC for each model.
    - Organize them into a DataFrame for easy comparison.

- Steps:
    - Initialize an Empty List:
        - Create metrics to store performance data.
    - Collect Metrics for Each Model:
        - Baseline Random Forest:
            - Calculate metrics using predictions and probabilities.
        - Random Forest with Class Weight:
            - Compute metrics for the model adjusted for class imbalance.
        - Random Forest with SMOTE:
            - Gather metrics for the model trained with SMOTE.
        - Logistic Regression:
            - Collect performance data.
        - XGBoost Classifier:
            - Compute metrics for the XGBoost model.
    - Create a DataFrame:
        - Convert the list of metrics into a DataFrame metrics_df.
    - Display the Results:
        - Print the DataFrame to compare the models side by side.

- Why It's Helpful:
    - Allows us to see which model performs best according to different metrics.
    - Facilitates informed decision-making on which model to choose for our intrusion detection system.

In [None]:
# Create a DataFrame to store metrics
import pandas as pd

metrics = []

# Baseline Model Metrics
metrics.append({
    'Model': 'Baseline RandomForest',
    'Accuracy': accuracy_score(y_test, y_pred_baseline),
    'Precision': precision_score(y_test, y_pred_baseline),
    'Recall': recall_score(y_test, y_pred_baseline),
    'F1-Score': f1_score(y_test, y_pred_baseline),
    'AUC': roc_auc_score(y_test, y_probs_baseline)
})

# Model with Class Weight Metrics
y_probs_class_weight = model_class_weight.predict_proba(X_test_scaled)[:, 1]
metrics.append({
    'Model': 'RandomForest with Class Weight',
    'Accuracy': accuracy_score(y_test, y_pred_class_weight),
    'Precision': precision_score(y_test, y_pred_class_weight),
    'Recall': recall_score(y_test, y_pred_class_weight),
    'F1-Score': f1_score(y_test, y_pred_class_weight),
    'AUC': roc_auc_score(y_test, y_probs_class_weight)
})

# SMOTE Model Metrics
y_probs_smote = model_smote.predict_proba(X_test_scaled)[:, 1]
metrics.append({
    'Model': 'RandomForest with SMOTE',
    'Accuracy': accuracy_score(y_test, y_pred_smote),
    'Precision': precision_score(y_test, y_pred_smote),
    'Recall': recall_score(y_test, y_pred_smote),
    'F1-Score': f1_score(y_test, y_pred_smote),
    'AUC': roc_auc_score(y_test, y_probs_smote)
})

# Logistic Regression Metrics
y_probs_logreg = model_logreg.predict_proba(X_test_scaled)[:, 1]
metrics.append({
    'Model': 'Logistic Regression',
    'Accuracy': accuracy_score(y_test, y_pred_logreg),
    'Precision': precision_score(y_test, y_pred_logreg),
    'Recall': recall_score(y_test, y_pred_logreg),
    'F1-Score': f1_score(y_test, y_pred_logreg),
    'AUC': roc_auc_score(y_test, y_probs_logreg)
})

# XGBoost Classifier Metrics
y_probs_xgb = model_xgb.predict_proba(X_test_scaled)[:, 1]
metrics.append({
    'Model': 'XGBoost Classifier',
    'Accuracy': accuracy_score(y_test, y_pred_xgb),
    'Precision': precision_score(y_test, y_pred_xgb),
    'Recall': recall_score(y_test, y_pred_xgb),
    'F1-Score': f1_score(y_test, y_pred_xgb),
    'AUC': roc_auc_score(y_test, y_probs_xgb)
})

# Create DataFrame
metrics_df = pd.DataFrame(metrics)
print(metrics_df)

### Plotting Performance Metrics

Shows a visualization of the different models with each respective metrics.

In [None]:
# Plotting the metrics
import seaborn as sns

# Melt the DataFrame for easier plotting
metrics_melted = metrics_df.melt(id_vars='Model', var_name='Metric', value_name='Value')

plt.figure(figsize=(12, 6))
sns.barplot(data=metrics_melted, x='Metric', y='Value', hue='Model')
plt.title('Model Comparison')
plt.ylabel('Score')
plt.show()
