<a href="https://colab.research.google.com/github/krishnamalani1164/xai-intrusion-detection-shap/blob/main/xai_intrusion_detection_shap.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Project Overview
This project demonstrates the application of Explainable AI techniques to security applications,
specifically focusing on Network Intrusion Detection Systems (IDS). The goal is to create
machine learning models that can effectively detect network intrusions while providing
transparent explanations for their decisions.

## Key Components
1. Data Processing: Using the NSL-KDD dataset, a benchmark dataset for intrusion detection
2. Model Training: Implementation of XGBoost classifier for attack detection
3. Explainable AI: Application of SHAP and LIME techniques to explain model predictions
4. Security Analysis: Evaluation of model performance and robustness against adversarial inputs

## Why XAI for Security?
Traditional ML/DL models act as "black boxes" where security analysts cannot understand
why certain alerts were triggered. This lack of transparency creates several challenges:
- Difficulty in trusting model decisions
- Inability to debug false positives/negatives
- Limited insights for improving security posture
- Challenges in compliance with regulations requiring explainable decisions

By applying XAI techniques, this implementation enables:
- Feature importance analysis to understand which network attributes contribute to detections
- Instance-level explanations for specific security alerts
- Improved trust in automated security systems

## Implementation Structure
The code follows a systematic approach:
1. Data acquisition and preprocessing
2. Model training and evaluation
3. Global explanations using SHAP
4. Local explanations using LIME
5. Visualization of results for better interpretation
6. Adversarial robustness testing
7. Security implications analysis

In [11]:
!pip install lime



In [12]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder ,StandardScaler,OneHotEncoder
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, roc_auc_score, roc_curve
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import xgboost as xgb
import shap
import lime
from lime.lime_tabular import LimeTabularExplainer
import urllib.request
import zipfile
import os
import warnings
import shutil
warnings.filterwarnings('ignore')

# Set random seed for reproducibility

In [13]:
np.random.seed(42)

# 1. Load and Explore the NSL-KDD Dataset

In [14]:
print("Step 1: Loading and Exploring the Dataset")

Step 1: Loading and Exploring the Dataset


## Installed and Uploaded the dataset


In [15]:
from google.colab import files
uploaded = files.upload()

Saving archive (1).zip to archive (1) (1).zip


In [16]:
def download_nslkdd():
  """Downolad the NSL-KDD dataset if not already available"""
  if not os.path.exists('NSL-KDD'):
    os.makedirs('NSL-KDD')

In [17]:
#Downolad train dataset
# Ensure the target folder exists
os.makedirs('NSL-KDD', exist_ok=True)

# Download and decompress the file
if not os.path.exists('NSL-KDD/KDDTrain.txt'):
    print("Downloading and extracting train dataset...")

    # Download the file
    url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/kddcup99/kddcup.data_10_percent.gz'
    gz_path = 'NSL-KDD/kddcup.data_10_percent.gz'
    txt_path = 'NSL-KDD/KDDTrain.txt'

    urllib.request.urlretrieve(url, gz_path)

    # Extract the .gz file
    with gzip.open(gz_path, 'rb') as f_in:
        with open(txt_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)

    print("Dataset downloaded and extracted to:", txt_path)

Downloading and extracting train dataset...


HTTPError: HTTP Error 404: Not Found