# Deep Learning for Intrusion Detection System (IDS)
## Complete Project Notebook

**Project**: Network Traffic Classification using Deep Learning  
**Dataset**: CICIDS2018 Network Traffic Data  
**Objective**: Classify network traffic into different attack types and benign traffic

---

## Table of Contents

1. [Problem Statement](#problem-statement)
2. [Data Loading and Exploration](#data-loading)
3. [Exploratory Data Analysis (EDA)](#eda)
4. [Data Preprocessing](#preprocessing)
5. [Model Architecture Selection](#model-architecture)
6. [Model Training](#training)
7. [Results and Evaluation](#results)
8. [Conclusion](#conclusion)


## 1. Problem Statement {#problem-statement}

### What are we trying to solve?

**Problem**: Network security threats are increasing, and traditional signature-based intrusion detection systems struggle with:
- Zero-day attacks (unknown attack patterns)
- Encrypted traffic
- High-volume network traffic
- Evolving attack techniques

**Solution**: Develop a Deep Learning-based Intrusion Detection System that can:
- Automatically learn patterns from network traffic features
- Classify traffic into multiple attack categories
- Detect both known and unknown attack patterns
- Handle high-dimensional feature spaces

**Dataset**: CICIDS2018 - Contains network traffic flows with labeled attack types:
- Benign traffic
- Various attack types (DDoS, Brute Force, Infiltration, etc.)

**Goal**: Build a multi-class classifier that can accurately identify different types of network attacks.


## 2. Setup and Imports

First, let's import all necessary libraries and set up the environment.


In [3]:
# Core libraries
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import pandas as pd
import yaml
from datetime import datetime
import os
import warnings
warnings.filterwarnings('ignore')

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import rcParams
rcParams['figure.figsize'] = (12, 6)
sns.set_style("whitegrid")

# Machine Learning
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler
from torchsummary import summary
from tqdm import tqdm

# Experiment tracking
import wandb

# Fix: Ensure local test.py is used instead of standard library test module
import sys
import os
# Add current directory to path if not already there
if os.getcwd() not in sys.path:
    sys.path.insert(0, os.getcwd())

# Custom modules
from preprocess import preprocess
from models import create_model, list_available_models
from train import train_model
from test import test_and_report, evaluate_model
from utils import calculate_class_weights, analyze_class_distribution

print("✓ All libraries imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")


ImportError: cannot import name 'evaluate_model' from 'test' (/usr/lib/python3.13/test/__init__.py)

## 3. Data Loading and Exploration {#data-loading}

Let's load the preprocessed data and explore its characteristics.


In [None]:
# Load configuration
with open('config.yaml', 'r') as f:
    config = yaml.safe_load(f)

print("Configuration:")
print(f"  Model: {config.get('model_name', 'mlp')}")
print(f"  Batch size: {config['batch_size']}")
print(f"  Learning rate: {config['learning_rate']}")
print(f"  Epochs: {config['num_epochs']}")


In [None]:
# Load preprocessed data
print("Loading data...")
train = np.load('data/train.npy')
test = np.load('data/test.npy')
val = np.load('data/val.npy')
class_names = np.load('data/class_names.npy', allow_pickle=True)
class_names = [str(name) for name in class_names]

print(f"\n✓ Data loaded successfully!")
print(f"  Train shape: {train.shape}")
print(f"  Test shape: {test.shape}")
print(f"  Val shape: {val.shape}")
print(f"  Number of features: {train.shape[1] - 1}")
print(f"  Number of classes: {len(class_names)}")
print(f"\n  Classes: {class_names}")


## 4. Exploratory Data Analysis (EDA) {#eda}

Let's analyze the dataset to understand its characteristics, class distribution, and potential challenges.


In [None]:
# Extract labels from each dataset
train_labels = train[:, -1].astype(int)
val_labels = val[:, -1].astype(int)
test_labels = test[:, -1].astype(int)

# Analyze class distribution
print("="*70)
print("CLASS DISTRIBUTION ANALYSIS")
print("="*70)

for name, labels in [('TRAIN', train_labels), ('VALIDATION', val_labels), ('TEST', test_labels)]:
    print(f"\n{name} SET:")
    unique, counts = np.unique(labels, return_counts=True)
    total = len(labels)
    
    for class_idx, count in zip(unique, counts):
        percentage = (count / total) * 100
        class_name = class_names[class_idx] if class_idx < len(class_names) else f"Class {class_idx}"
        print(f"  Class {class_idx} ({class_name:30s}): {count:8,} samples ({percentage:6.2f}%)")
    
    print(f"  Total: {total:,} samples")


In [None]:
# Visualize class distribution
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

for idx, (name, labels) in enumerate([('Train', train_labels), 
                                        ('Validation', val_labels), 
                                        ('Test', test_labels)]):
    unique, counts = np.unique(labels, return_counts=True)
    class_labels = [class_names[i] if i < len(class_names) else f"Class {i}" for i in unique]
    
    axes[idx].bar(range(len(unique)), counts, color=plt.cm.Set3(range(len(unique))))
    axes[idx].set_title(f'{name} Set Class Distribution', fontsize=12, fontweight='bold')
    axes[idx].set_xlabel('Class')
    axes[idx].set_ylabel('Number of Samples')
    axes[idx].set_xticks(range(len(unique)))
    axes[idx].set_xticklabels([f'{i}\n{name[:15]}' for i, name in zip(unique, class_labels)], 
                              rotation=45, ha='right', fontsize=8)
    axes[idx].grid(axis='y', alpha=0.3)
    
    # Add count labels on bars
    for i, count in enumerate(counts):
        axes[idx].text(i, count, f'{count:,}', ha='center', va='bottom', fontsize=8)

plt.tight_layout()
plt.savefig('class_distribution.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Class distribution visualization saved as 'class_distribution.png'")


In [None]:
# Calculate class imbalance ratio
train_unique, train_counts = np.unique(train_labels, return_counts=True)
max_count = train_counts.max()
min_count = train_counts.min()
imbalance_ratio = max_count / min_count if min_count > 0 else float('inf')

print(f"\nClass Imbalance Analysis:")
print(f"  Maximum class count: {max_count:,}")
print(f"  Minimum class count: {min_count:,}")
print(f"  Imbalance ratio: {imbalance_ratio:.2f}x")
print(f"\n  Interpretation:")
if imbalance_ratio > 100:
    print(f"    ⚠ Severe class imbalance - Consider class weights or resampling")
elif imbalance_ratio > 10:
    print(f"    ⚠ Moderate class imbalance - Class weights recommended")
else:
    print(f"    ✓ Relatively balanced dataset")


In [None]:
# Analyze feature statistics
train_features = train[:, :-1]

print("\n" + "="*70)
print("FEATURE STATISTICS")
print("="*70)
print(f"\nFeature shape: {train_features.shape}")
print(f"\nBasic Statistics:")
print(f"  Mean: {train_features.mean():.4f}")
print(f"  Std:  {train_features.std():.4f}")
print(f"  Min:  {train_features.min():.4f}")
print(f"  Max:  {train_features.max():.4f}")

# Check for any remaining NaN or Inf
nan_count = np.isnan(train_features).sum()
inf_count = np.isinf(train_features).sum()
print(f"\nData Quality:")
print(f"  NaN values: {nan_count}")
print(f"  Inf values: {inf_count}")

if nan_count == 0 and inf_count == 0:
    print(f"  ✓ Data is clean (no NaN or Inf values)")
else:
    print(f"  ⚠ Data contains NaN or Inf values - needs cleaning")


## 5. Data Preprocessing {#preprocessing}

Now let's preprocess the data: standardize features and create DataLoaders for training.


In [None]:
# Setup device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
if device == "cuda":
    print(f"  GPU: {torch.cuda.get_device_name(0)}")

# Preprocess data and create DataLoaders
print("\nPreprocessing data...")
train_loader, test_loader, val_loader = preprocess(
    train, test, val,
    batch_size=config['batch_size'],
    scaler_save_path='scaler.pkl'
)

print(f"\n✓ Data preprocessing complete!")
print(f"  Train batches: {len(train_loader)}")
print(f"  Val batches: {len(val_loader)}")
print(f"  Test batches: {len(test_loader)}")
print(f"  Batch size: {config['batch_size']}")


## 6. Model Architecture Selection {#model-architecture}

We'll test multiple architectures to find the best one. Let's start by examining available models.


In [None]:
# List available models
available_models = list_available_models()
print("Available Model Architectures:")
print("="*70)
for i, model_name in enumerate(available_models, 1):
    print(f"{i}. {model_name.upper()}")

# Get model configuration
model_name = config.get('model_name', 'mlp')
model_params = config.get('model_params', {})

print(f"\nSelected Model: {model_name.upper()}")
if model_params:
    print(f"Model Parameters: {model_params}")
else:
    print("Using default model parameters")


In [None]:
# Create model
num_features = train.shape[1] - 1
num_classes = len(class_names)

model = create_model(
    model_name=model_name,
    input_features=num_features,
    num_classes=num_classes,
    **model_params
).to(device)

print(f"\nModel Architecture:")
print("="*70)
summary(model, input_size=(num_features,), device=device)
print("="*70)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nModel Parameters:")
print(f"  Total parameters: {total_params:,}")
print(f"  Trainable parameters: {trainable_params:,}")


### Model Architecture Justification

**Why this architecture?**

1. **MLP (Multi-Layer Perceptron)**: 
   - Simple feedforward network
   - Good baseline for tabular data
   - Fast training and inference
   - Effective for high-dimensional feature spaces

2. **CNN (Convolutional Neural Network)**:
   - Captures local patterns in features
   - 1D convolutions work well for sequential-like data
   - Global pooling reduces overfitting

3. **LSTM/GRU**:
   - Models sequential dependencies
   - Handles long-term patterns
   - Good if features have temporal relationships

4. **Transformer**:
   - Attention mechanism captures feature relationships
   - State-of-the-art performance
   - Parallel processing

**Our Choice**: Starting with MLP as baseline, then comparing with other architectures.


## 7. Model Training {#training}

Set up training components and train the model.


In [None]:
# Setup optimizer
optimizer = optim.AdamW(model.parameters(), lr=config['learning_rate'])
print(f"Optimizer: AdamW")
print(f"  Learning rate: {config['learning_rate']}")

# Calculate class weights for imbalanced dataset
train_labels = train[:, -1].astype(int)
class_weights = calculate_class_weights(train_labels, method='balanced')
class_weights = class_weights.to(device)

# Setup loss function with class weights
criterion = torch.nn.CrossEntropyLoss(weight=class_weights)
print(f"\nLoss function: Weighted CrossEntropyLoss")
print(f"  Using class weights to handle imbalanced dataset")

# Setup learning rate scheduler
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode='min', factor=0.1, patience=3
)
print(f"\nScheduler: ReduceLROnPlateau")
print(f"  Reduces LR by 10x when validation loss stops improving")
print(f"  Patience: 3 epochs")

print(f"\n✓ Training components setup complete!")


In [None]:
# Initialize wandb for experiment tracking
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
run_name = f"{model_name}_{timestamp}_bs{config['batch_size']}_lr{config['learning_rate']}"

wandb.init(
    project="DL-CIS2018",
    name=run_name,
    config={
        "model_name": model_name,
        "batch_size": config['batch_size'],
        "num_epochs": config['num_epochs'],
        "learning_rate": config['learning_rate'],
        "optimizer": "AdamW",
        "scheduler": "ReduceLROnPlateau",
        **model_params
    }
)
wandb.watch(model, log="all")

print(f"✓ Wandb initialized")
print(f"  Project: DL-CIS2018")
print(f"  Run name: {run_name}")
print(f"  View at: https://wandb.ai")


In [None]:
# Train the model
print("\n" + "="*70)
print("TRAINING PHASE")
print("="*70)

# Store training history for visualization
training_history = {
    'train_loss': [],
    'val_loss': [],
    'learning_rate': []
}

trained_model = train_model(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    device=device,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    num_epochs=config['num_epochs']
)

print("\n" + "="*70)
print("✓ Training complete!")
print(f"✓ Best model saved to: DL-CIS2018.pth")
print("="*70)


## 8. Results and Evaluation {#results}

Now let's evaluate the trained model on the test set and visualize the results.


In [None]:
# Test the model
print("\n" + "="*70)
print("TESTING PHASE")
print("="*70)

test_accuracy = test_and_report(
    model=trained_model,
    test_loader=test_loader,
    device=device,
    class_names=class_names
)

print("\n" + "="*70)
print(f"✓ Final Test Accuracy: {test_accuracy*100:.2f}%")
print("="*70)


In [None]:
# Get detailed predictions for confusion matrix visualization
trained_model.eval()
all_preds, all_labels = [], []

with torch.inference_mode():
    for samples, labels in test_loader:
        samples = samples.to(device)
        labels = labels.to(device).long()
        
        predictions = trained_model(samples)
        preds = torch.argmax(predictions, dim=1)
        
        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

# Create confusion matrix
cm = confusion_matrix(all_labels, all_preds)

# Visualize confusion matrix
plt.figure(figsize=(12, 10))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=class_names, yticklabels=class_names,
            cbar_kws={'label': 'Count'})
plt.title('Confusion Matrix - Test Set', fontsize=14, fontweight='bold')
plt.xlabel('Predicted Label', fontsize=12)
plt.ylabel('True Label', fontsize=12)
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.tight_layout()
plt.savefig('confusion_matrix.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Confusion matrix saved as 'confusion_matrix.png'")


In [None]:
# Calculate per-class metrics
from sklearn.metrics import precision_recall_fscore_support

precision, recall, f1, support = precision_recall_fscore_support(
    all_labels, all_preds, labels=range(len(class_names)), zero_division=0
)

# Create metrics dataframe
metrics_df = pd.DataFrame({
    'Class': class_names,
    'Precision': precision,
    'Recall': recall,
    'F1-Score': f1,
    'Support': support
})

print("\nPer-Class Performance Metrics:")
print("="*70)
print(metrics_df.to_string(index=False))
print("="*70)

# Visualize per-class metrics
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

x_pos = np.arange(len(class_names))
width = 0.6

axes[0].bar(x_pos, precision, width, color='skyblue', alpha=0.8)
axes[0].set_title('Precision per Class', fontweight='bold')
axes[0].set_xlabel('Class')
axes[0].set_ylabel('Precision')
axes[0].set_xticks(x_pos)
axes[0].set_xticklabels([f'{i}' for i in range(len(class_names))], rotation=45)
axes[0].set_ylim([0, 1.1])
axes[0].grid(axis='y', alpha=0.3)

axes[1].bar(x_pos, recall, width, color='lightcoral', alpha=0.8)
axes[1].set_title('Recall per Class', fontweight='bold')
axes[1].set_xlabel('Class')
axes[1].set_ylabel('Recall')
axes[1].set_xticks(x_pos)
axes[1].set_xticklabels([f'{i}' for i in range(len(class_names))], rotation=45)
axes[1].set_ylim([0, 1.1])
axes[1].grid(axis='y', alpha=0.3)

axes[2].bar(x_pos, f1, width, color='lightgreen', alpha=0.8)
axes[2].set_title('F1-Score per Class', fontweight='bold')
axes[2].set_xlabel('Class')
axes[2].set_ylabel('F1-Score')
axes[2].set_xticks(x_pos)
axes[2].set_xticklabels([f'{i}' for i in range(len(class_names))], rotation=45)
axes[2].set_ylim([0, 1.1])
axes[2].grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.savefig('per_class_metrics.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Per-class metrics visualization saved as 'per_class_metrics.png'")


## 9. Conclusion {#conclusion}

### Summary of Results

Let's summarize what we learned from this project.


In [None]:
# Final summary
print("\n" + "="*70)
print("PROJECT SUMMARY")
print("="*70)

print(f"\nDataset:")
print(f"  Total training samples: {len(train):,}")
print(f"  Total validation samples: {len(val):,}")
print(f"  Total test samples: {len(test):,}")
print(f"  Number of features: {num_features}")
print(f"  Number of classes: {num_classes}")

print(f"\nModel:")
print(f"  Architecture: {model_name.upper()}")
print(f"  Total parameters: {total_params:,}")

print(f"\nTraining:")
print(f"  Batch size: {config['batch_size']}")
print(f"  Learning rate: {config['learning_rate']}")
print(f"  Epochs: {config['num_epochs']}")
print(f"  Optimizer: AdamW")
print(f"  Loss: Weighted CrossEntropyLoss")

print(f"\nResults:")
print(f"  Test Accuracy: {test_accuracy*100:.2f}%")
print(f"  Best model saved: DL-CIS2018.pth")

print("\n" + "="*70)


### What Worked?

1. **Data Preprocessing**:
   - StandardScaler normalization improved training stability
   - Class weights helped handle imbalanced dataset
   - Proper train/val/test split ensured fair evaluation

2. **Model Architecture**:
   - [Your model] performed well with [specific metrics]
   - BatchNorm and Dropout prevented overfitting
   - Learning rate scheduling improved convergence

3. **Training Strategy**:
   - Early stopping prevented overfitting
   - Gradient clipping stabilized training
   - Weighted loss handled class imbalance

### What Didn't Work?

1. **Challenges Encountered**:
   - [List any issues: e.g., rare classes, overfitting, etc.]

2. **Areas for Improvement**:
   - Could try different architectures (CNN, LSTM, Transformer)
   - Hyperparameter tuning could improve performance
   - Data augmentation might help with rare classes
   - Ensemble methods could boost accuracy

### Key Insights

1. **Class Imbalance**: The dataset has significant class imbalance, requiring class weights
2. **Feature Engineering**: Network traffic features are well-suited for deep learning
3. **Model Selection**: [Your model] achieved [X]% accuracy, which is [good/excellent] for this task
4. **Scalability**: The model can handle large-scale network traffic classification

### Future Work

1. **Model Improvements**:
   - Experiment with different architectures
   - Try ensemble methods
   - Implement attention mechanisms

2. **Data Improvements**:
   - Collect more data for rare classes
   - Feature engineering and selection
   - Data augmentation techniques

3. **Deployment**:
   - Real-time inference optimization
   - Model compression for edge devices
   - Integration with network monitoring systems


---

## End of Notebook

**Project Completed**: Deep Learning-based Intrusion Detection System

**Final Accuracy**: [Your accuracy]%

**Model Saved**: DL-CIS2018.pth

**Next Steps**: 
- Try different architectures using `compare_models.py`
- Tune hyperparameters for better performance
- Deploy model for real-world use

---

*This notebook can be run from top to bottom. Make sure you have:*
- *Preprocessed data files (train.npy, test.npy, val.npy, class_names.npy)*
- *All required Python packages installed*
- *Wandb account configured (optional)*


### Training Loss Visualization

If you have wandb logged, you can visualize training curves. Alternatively, we can extract from training logs.


In [None]:
# Note: To visualize training curves from wandb, you can:
# 1. View them directly in wandb dashboard at https://wandb.ai
# 2. Or use wandb API to fetch and plot:

try:
    import wandb
    api = wandb.Api()
    run = api.run(f"DL-CIS2018/{run_name}")
    history = run.history()
    
    if 'Train Loss' in history.columns and 'Val Loss' in history.columns:
        plt.figure(figsize=(12, 5))
        
        plt.subplot(1, 2, 1)
        plt.plot(history['Train Loss'], label='Train Loss', linewidth=2)
        plt.plot(history['Val Loss'], label='Val Loss', linewidth=2)
        plt.xlabel('Epoch')
        plt.ylabel('Loss')
        plt.title('Training and Validation Loss', fontweight='bold')
        plt.legend()
        plt.grid(alpha=0.3)
        
        plt.subplot(1, 2, 2)
        if 'Learning Rate' in history.columns:
            plt.plot(history['Learning Rate'], label='Learning Rate', color='green', linewidth=2)
            plt.xlabel('Epoch')
            plt.ylabel('Learning Rate')
            plt.title('Learning Rate Schedule', fontweight='bold')
            plt.legend()
            plt.grid(alpha=0.3)
            plt.yscale('log')
        
        plt.tight_layout()
        plt.savefig('training_curves.png', dpi=300, bbox_inches='tight')
        plt.show()
        print("✓ Training curves saved as 'training_curves.png'")
    else:
        print("⚠ Training history not available. Check wandb dashboard for visualizations.")
except Exception as e:
    print(f"⚠ Could not fetch wandb data: {e}")
    print("  View training curves at: https://wandb.ai")


### Model Comparison (Optional)

If you want to compare multiple models, you can run the comparison script or test different architectures here.


In [None]:
# Example: Quick comparison of a few models
# Uncomment to run (this will take time)

"""
models_to_test = ['mlp', 'cnn', 'lstm']
comparison_results = {}

for model_name in models_to_test:
    print(f"\nTesting {model_name.upper()}...")
    # Create and train model (simplified - use compare_models.py for full comparison)
    # ... training code ...
    pass
"""
