# Feature Importance

## Data Preparation

In [1]:
import pandas as pd
import numpy as np
import joblib
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

ModuleNotFoundError: No module named 'pandas'

Load Previous Trained Model

In [None]:
from tensorflow.keras.models import load_model

model = load_model('../../../2_Modeling_Phase/Binary/Saved-Models/Farm-Flow_DNN-Deep-Neural-Network_Model.keras')

Load Train Dataset

In [None]:
df_train = pd.read_csv("../../../0_Datasets/Farm-Flow/train.csv")

Load Test Dataset

In [None]:
df_test = pd.read_csv("../../../0_Datasets/Farm-Flow/test.csv")

In [None]:
display(df_train)

In [None]:
display(df_test)

-----
## Train and Test Datasets

Drop Multiclass Column

In [None]:
df_train = df_train.drop('traffic', axis=1)
df_test = df_test.drop('traffic', axis=1)

Excluding the target variable

In [None]:
X_columns = df_train.columns.drop('is_attack')

Create a feature matrix X by selecting only the columns specified in X_columns. Then convert the selected data into a NumPy array.

In [None]:
X = df_train[X_columns].values

Creates a target variable y containing the target variable

In [None]:
y = df_train["is_attack"].values

Split into training and testing sets

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)

Get Features Names and Class Names

In [None]:
feature_names = list(X_columns)
class_names = ["Normal", "Malicious"]
response_dict = {0: 'Normal', 1: 'Malicious'}

Generate Prediction

In [None]:
pred = model.predict(X_test)

Labeled Df's

In [None]:
X_test_labeled = pd.DataFrame(X_test, columns=feature_names)
X_train_labeled = pd.DataFrame(X_train, columns=feature_names)

# Since both are one-dimensional NumPy arrays
pred_series = pd.Series(pred.flatten())
y_test_target_series = pd.Series(y_test)
y_train_target_series = pd.Series(y_train)

Create a subset of the Train DF for faster training

In [None]:
subset_percentage = 0.1
X_subset, _, y_subset, _ = train_test_split(X_train, y_train, test_size=1 - subset_percentage, stratify=y_train)

In [None]:
subset_percentage = 0.1
X_subset_labeled, _, y_subset_labeled, _ = train_test_split(X_train_labeled, y_train_target_series, test_size=1 - subset_percentage, stratify=y_train_target_series)

Row to explain

In [None]:
idx = 0

---

## Neural Network

In [None]:
weights = model.get_weights()

# Extract the first layer weights
input_layer_weights = weights[0]

feature_importance = np.mean(np.abs(input_layer_weights), axis=1)
df_feature_importance = pd.DataFrame({'Feature': feature_names, 'Importance': feature_importance})
df_feature_importance = df_feature_importance.sort_values(by='Importance', ascending=False).reset_index(drop=True)

df_feature_importance

## Imodels

In [None]:
import imodels
from imodels import FIGSClassifier

#model_figs = FIGSClassifier(max_rules=7, max_trees=3)
#model_figs.fit(X_test_labeled, y_test_target_series, feature_names=feature_names)

dfp_importance = pd.DataFrame({'feat_names': feature_names})
dfp_importance['feature'] = dfp_importance.index
dfp_importance_gini = pd.DataFrame({'importance_gini': feature_importance})
dfp_importance_gini['feature'] = dfp_importance_gini.index
dfp_importance_gini['importance_gini_pct'] = dfp_importance_gini['importance_gini'].rank(pct=True)
dfp_importance = pd.merge(dfp_importance, dfp_importance_gini, on='feature', how='left')
dfp_importance = dfp_importance.sort_values(by=['importance_gini', 'feature'], ascending=[False, True]).reset_index(drop=True)
display(dfp_importance)

## Shap

In [None]:
import shap

masker = shap.maskers.Independent(X_subset)

explainer = shap.Explainer(model, masker=masker)
#explainer = shap.KernelExplainer(model, data=X_subset)
#explainer = shap.TreeExplainer(model)

shap_values = explainer.shap_values(X_test_labeled)

shap.summary_plot(shap_values, X_test_labeled,feature_names=feature_names,class_names=class_names)

## Shapash

## InterpretML

## LOFO

In [None]:
from lofo import LOFOImportance, Dataset, plot_importance
from sklearn.model_selection import KFold

target_name = "is_attack"

column_names = feature_names + [target_name]

combined_data = np.column_stack((X_test_labeled, y_test_target_series))
combined_df = pd.DataFrame(combined_data, columns=column_names)

# define the validation scheme
cv = KFold(n_splits=4, shuffle=False, random_state=None) # Don't shuffle to keep the time split split validation

# define the binary target and the features
dataset = Dataset(df=combined_df, target="is_attack", features=[col for col in combined_df.columns if col != "is_attack"])

# define the validation scheme and scorer. The default model is LightGBM
lofo_imp = LOFOImportance(dataset, cv=cv, scoring="roc_auc")

# get the mean and standard deviation of the importances in pandas format
importance = lofo_imp.get_importance()

importance

------
## Notes

**SHAP Values vs Permutation Importance vs Morris Sensitivity vs LOFO (Leave One Feature Out)**

1. **SHAP Values:**
- **Concept:** SHAP values are based on cooperative game theory and aim to fairly distribute the contribution of each feature to the model's prediction.
- **How it works:** It calculates the average contribution of each feature across all possible feature combinations and assigns a value to each feature, indicating its impact on the prediction.
- **Interpretation:** A positive SHAP value for a feature contributes to increasing the model's output, while a negative value indicates a contribution to decreasing the output.

2. **Permutation Importance:**
- **Concept:** Permutation Importance assesses the importance of each feature by permuting (randomly shuffling) its values and observing the change in the model's performance.
- **How it works:** It measures the decrease in model performance (e.g., accuracy) when the values of a specific feature are randomly permuted, and the larger the decrease, the more important the feature is considered.
- **Interpretation:** A higher decrease in performance suggests that the feature is crucial for the model's predictions.

3. **Morris Sensitivity:**
- **Concept:** Morris Sensitivity is a global sensitivity analysis method that assesses the impact of small variations in individual features on the model's output.
- **How it works:** It involves perturbing one feature at a time while keeping other features constant, observing how the output changes, and quantifying the sensitivity of the model to each feature.
- **Interpretation:** A higher Morris Sensitivity value indicates a greater impact of the feature on the model output.

4. **LOFO (Leave One Feature Out):**
- **Concept:** LOFO evaluates the impact of leaving out each feature one at a time on the model's performance.
- **How it works:** It systematically removes each feature, re-trains the model, and measures the change in performance metrics (e.g., accuracy, AUC) to understand the importance of each feature.
- **Interpretation:** A larger decrease in performance when a specific feature is left out suggests that the feature is more critical for the model's predictions.

**Assumptions:**
- **SHAP Values:** Assumes that features interact cooperatively.
- **Permutation Importance:** Assumes that the change in model performance is solely due to the importance of the feature.
- **Morris Sensitivity:** Assumes small variations in individual features.
- **LOFO:** Assumes that leaving out a feature impacts the model's performance.

### Q: Why are the results from the XGBoost Feature Importance different from the results of the DNN?
XGBoost relies on decision trees, where each feature's importance is determined by its contribution to the reduction in impurity (Gini) in the decision tree nodes. Results in a clear and interpretable feature importance. On the other hand, DNNs are non-linear models, making them harder to interpret.

**In cybersecurity:**

The text discusses the number of packets with payload as a common feature, and asserts that the results vary based on the type of Feature Importance algorithm used.

- **Permutation Importance:** identifies the time between each package sent as having the most influence.
- **Shap:** identifies backward communication starting with subflow or packet size as having the most influence.
- **Morris Sensitivity:** the minimum payload size has been identified as having the most influence, which is reasonable given that it is an IoT dataset and consistency in the minimum payload size is crucial.
- **LOFO:** despite being mentioned after several other features, another proposition suggests that the payload size, including the packet header size, has more influence.

Understanding each feature with XGBoost is possible, but it becomes challenging with NN due to the varying results obtained from different techniques. Nevertheless, the packet consistently yields the same result across all techniques.

This highlights the importance of considering the interpretability of models, especially when dealing with complex neural networks, and understanding that different interpretability techniques may yield divergent results. The consistency in the interpretation of the "packet" feature across various techniques adds confidence to its significance in the context of the cybersecurity dataset.