# üß† Deep Learning MLP for Network Intrusion Detection

## CSE-CIC-IDS-2018 Dataset

### üìå Why MLP Instead of LSTM?

**The Problem with LSTM/CNN-LSTM:**
- CSE-CIC-IDS-2018 contains **aggregated flow statistics** (one row = one complete flow)
- Rows are **NOT temporally ordered** - consecutive rows are unrelated flows
- LSTM/CNN assumes temporal sequences ‚Üí Result: **51% accuracy (random guessing)**

**Why MLP Works:**
- Treats each flow **independently** (like XGBoost/Random Forest)
- Learns complex non-linear patterns within a single flow
- No artificial temporal assumptions
- Expected performance: **75-85% accuracy**

### üìä Expected Performance:
- **Target Accuracy**: 75-85%
- **Target ROC-AUC**: 0.80-0.90
- **Training Time**: ~5-10 minutes on GPU
- **Inference**: Faster than LSTM, similar to XGBoost

## üì¶ Step 1: Setup and Imports

In [None]:
# Install required packages (if needed)
# !pip install tensorflow pandas numpy scikit-learn matplotlib seaborn

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, callbacks, regularizers
from sklearn.metrics import (
    classification_report, confusion_matrix,
    accuracy_score, precision_score, recall_score,
    f1_score, roc_auc_score, roc_curve
)
from sklearn.utils.class_weight import compute_class_weight
import time
import pickle
import json
import warnings
warnings.filterwarnings('ignore')

print("="*80)
print("DEEP LEARNING MLP FOR INTRUSION DETECTION")
print("="*80)
print(f"TensorFlow version: {tf.__version__}")
print(f"GPU Available: {len(tf.config.list_physical_devices('GPU'))} GPU(s)")
if tf.config.list_physical_devices('GPU'):
    print("GPU devices:", tf.config.list_physical_devices('GPU'))
else:
    print("‚ö†Ô∏è  No GPU detected. Training will be slower on CPU.")

## ‚öôÔ∏è Step 2: Configuration

In [None]:
# Configuration
# UPDATE THESE PATHS TO MATCH YOUR GOOGLE DRIVE STRUCTURE

PROJECT_DIR = '/content/drive/MyDrive/IDS_Research'
MODEL_DIR = f'{PROJECT_DIR}/models'
RESULTS_DIR = f'{PROJECT_DIR}/results'

# Model hyperparameters
BATCH_SIZE = 512       # Larger batch size (no sequences = more memory available)
EPOCHS = 50            # Maximum epochs
LEARNING_RATE = 0.001  # Initial learning rate

# Choose architecture: 'deep', 'standard', or 'lightweight'
ARCHITECTURE = 'deep'  # Start with deep, can try others later

print("Configuration:")
print(f"  Batch size: {BATCH_SIZE}")
print(f"  Epochs: {EPOCHS}")
print(f"  Architecture: {ARCHITECTURE}")

## üìä Step 3: Load Preprocessed Data

**Important:** This notebook assumes you've already run your data preprocessing.

You should have:
- `X_train_scaled`, `y_train`
- `X_val_scaled`, `y_val`  
- `X_test_scaled`, `y_test`

If you haven't done this yet, run your ML_IDS_v4.ipynb notebook up to the preprocessing section first.

In [None]:
# Check if preprocessed data exists
try:
    print("Checking preprocessed data...")
    print(f"X_train_scaled shape: {X_train_scaled.shape}")
    print(f"X_val_scaled shape: {X_val_scaled.shape}")
    print(f"X_test_scaled shape: {X_test_scaled.shape}")
    print(f"\nLabel distributions:")
    print(f"  Train: {np.bincount(y_train)}")
    print(f"  Val:   {np.bincount(y_val)}")
    print(f"  Test:  {np.bincount(y_test)}")
    
    # Convert to numpy arrays if they're DataFrames
    if isinstance(X_train_scaled, pd.DataFrame):
        X_train = X_train_scaled.values
        X_val = X_val_scaled.values
        X_test = X_test_scaled.values
    else:
        X_train = X_train_scaled
        X_val = X_val_scaled
        X_test = X_test_scaled
    
    # Convert labels to numpy arrays if needed
    if isinstance(y_train, pd.Series):
        y_train = y_train.values
        y_val = y_val.values
        y_test = y_test.values
    
    print("\n‚úì Data loaded successfully!")
    print(f"\nNumber of features: {X_train.shape[1]}")
    print(f"Training samples: {len(X_train):,}")
    print(f"Validation samples: {len(X_val):,}")
    print(f"Test samples: {len(X_test):,}")
    
except NameError:
    print("‚ùå Error: Preprocessed data not found!")
    print("\nPlease run your data preprocessing first.")
    print("You can either:")
    print("1. Run ML_IDS_v4.ipynb up to the preprocessing section")
    print("2. Or copy the preprocessing code from that notebook here")
    raise

## üèóÔ∏è Step 4: Build MLP Architecture

### Key Differences from LSTM:
- **No sequence creation** - each flow is independent
- **Input shape**: (num_features,) instead of (time_steps, num_features)
- **Feedforward** architecture - no recurrent connections
- **Faster training** - no temporal dependencies to compute

In [None]:
def build_deep_mlp(input_dim):
    """
    Deep MLP with 5 hidden layers
    
    Good for:
    - Learning complex non-linear patterns
    - Large datasets (100K+ samples)
    - When you have compute power available
    """
    model = models.Sequential([
        # Input
        layers.Input(shape=(input_dim,)),
        
        # Layer 1: Wide entry layer
        layers.Dense(512, activation='relu',
                    kernel_regularizer=regularizers.l2(0.001)),
        layers.BatchNormalization(),
        layers.Dropout(0.3),
        
        # Layer 2
        layers.Dense(256, activation='relu',
                    kernel_regularizer=regularizers.l2(0.001)),
        layers.BatchNormalization(),
        layers.Dropout(0.3),
        
        # Layer 3
        layers.Dense(128, activation='relu',
                    kernel_regularizer=regularizers.l2(0.001)),
        layers.BatchNormalization(),
        layers.Dropout(0.2),
        
        # Layer 4
        layers.Dense(64, activation='relu',
                    kernel_regularizer=regularizers.l2(0.001)),
        layers.BatchNormalization(),
        layers.Dropout(0.2),
        
        # Layer 5: Narrow bottleneck
        layers.Dense(32, activation='relu'),
        layers.Dropout(0.1),
        
        # Output
        layers.Dense(1, activation='sigmoid')
    ])
    
    return model


def build_standard_mlp(input_dim):
    """
    Standard MLP with 3 hidden layers
    
    Good for:
    - Balanced performance/complexity
    - Medium datasets (10K-100K samples)
    - General purpose use
    """
    model = models.Sequential([
        layers.Input(shape=(input_dim,)),
        
        layers.Dense(256, activation='relu',
                    kernel_regularizer=regularizers.l2(0.001)),
        layers.BatchNormalization(),
        layers.Dropout(0.3),
        
        layers.Dense(128, activation='relu',
                    kernel_regularizer=regularizers.l2(0.001)),
        layers.BatchNormalization(),
        layers.Dropout(0.2),
        
        layers.Dense(64, activation='relu'),
        layers.Dropout(0.2),
        
        layers.Dense(1, activation='sigmoid')
    ])
    
    return model


def build_lightweight_mlp(input_dim):
    """
    Lightweight MLP with 2 hidden layers
    
    Good for:
    - Fast training/inference
    - Small datasets (<10K samples)
    - Resource-constrained environments
    """
    model = models.Sequential([
        layers.Input(shape=(input_dim,)),
        
        layers.Dense(128, activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(0.3),
        
        layers.Dense(64, activation='relu'),
        layers.Dropout(0.2),
        
        layers.Dense(1, activation='sigmoid')
    ])
    
    return model


# Build selected architecture
print("="*80)
print("BUILDING MODEL")
print("="*80)

input_dim = X_train.shape[1]
print(f"Input dimension: {input_dim} features")
print(f"Architecture: {ARCHITECTURE}")

if ARCHITECTURE == 'deep':
    model = build_deep_mlp(input_dim)
elif ARCHITECTURE == 'standard':
    model = build_standard_mlp(input_dim)
elif ARCHITECTURE == 'lightweight':
    model = build_lightweight_mlp(input_dim)
else:
    raise ValueError(f"Unknown architecture: {ARCHITECTURE}")

model.summary()

## üéØ Step 5: Compile Model with Class Weights

In [None]:
# Calculate class weights for imbalanced data
class_weights_array = compute_class_weight(
    'balanced',
    classes=np.unique(y_train),
    y=y_train
)
class_weights = {
    0: class_weights_array[0],
    1: class_weights_array[1]
}

print(f"Class weights: {class_weights}")
print(f"This helps handle imbalanced data (more benign than attack samples)")

# Compile model
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=LEARNING_RATE),
    loss='binary_crossentropy',
    metrics=[
        'accuracy',
        keras.metrics.Precision(name='precision'),
        keras.metrics.Recall(name='recall'),
        keras.metrics.AUC(name='auc')
    ]
)

print("\n‚úì Model compiled successfully!")

## üîß Step 6: Setup Training Callbacks

In [None]:
# Create callbacks for better training
callbacks_list = [
    # Early stopping - stop if no improvement
    callbacks.EarlyStopping(
        monitor='val_auc',
        patience=10,
        restore_best_weights=True,
        mode='max',
        verbose=1
    ),
    
    # Reduce learning rate when stuck
    callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=5,
        min_lr=1e-7,
        verbose=1
    ),
    
    # Save best model
    callbacks.ModelCheckpoint(
        f'{MODEL_DIR}/deep_learning_mlp_best.h5',
        monitor='val_auc',
        mode='max',
        save_best_only=True,
        verbose=1
    ),
    
    # TensorBoard logging
    callbacks.TensorBoard(
        log_dir=f'{PROJECT_DIR}/logs/mlp',
        histogram_freq=1
    )
]

print("‚úì Callbacks configured:")
print("  - Early stopping (patience=10)")
print("  - Learning rate reduction (patience=5)")
print("  - Model checkpoint (save best)")
print("  - TensorBoard logging")

## üöÄ Step 7: Train Model

This will take 5-10 minutes on GPU, longer on CPU.

**Expected behavior:**
- Accuracy should start around 60-70% and climb to 75-85%
- AUC should reach 0.80-0.90
- Much better than LSTM's 51% random guessing!

In [None]:
print("="*80)
print("TRAINING MODEL")
print("="*80)
print(f"Epochs: {EPOCHS}")
print(f"Batch size: {BATCH_SIZE}")
print(f"Training samples: {len(X_train):,}")
print(f"Validation samples: {len(X_val):,}")
print("\nTraining started...")

start_time = time.time()

history = model.fit(
    X_train, y_train,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    validation_data=(X_val, y_val),
    class_weight=class_weights,
    callbacks=callbacks_list,
    verbose=1
)

training_time = time.time() - start_time

print("\n" + "="*80)
print(f"‚úì TRAINING COMPLETED!")
print("="*80)
print(f"Total time: {training_time:.2f} seconds ({training_time/60:.2f} minutes)")
print(f"Final training accuracy: {history.history['accuracy'][-1]*100:.2f}%")
print(f"Final validation accuracy: {history.history['val_accuracy'][-1]*100:.2f}%")
print(f"Best validation AUC: {max(history.history['val_auc']):.4f}")

## üìä Step 8: Visualize Training History

In [None]:
# Plot comprehensive training history
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
fig.suptitle('Deep Learning MLP Training History', fontsize=16, fontweight='bold')

metrics = ['loss', 'accuracy', 'precision', 'recall', 'auc']
titles = ['Loss', 'Accuracy', 'Precision', 'Recall', 'AUC']

for idx, (metric, title) in enumerate(zip(metrics, titles)):
    ax = axes[idx // 3, idx % 3]
    
    ax.plot(history.history[metric], label=f'Training {title}', linewidth=2)
    ax.plot(history.history[f'val_{metric}'], label=f'Validation {title}', linewidth=2)
    ax.set_title(f'Model {title}', fontsize=12, fontweight='bold')
    ax.set_xlabel('Epoch')
    ax.set_ylabel(title)
    ax.legend()
    ax.grid(True, alpha=0.3)

# Learning rate plot
if 'lr' in history.history:
    ax = axes[1, 2]
    ax.plot(history.history['lr'], label='Learning Rate', linewidth=2, color='red')
    ax.set_title('Learning Rate Schedule', fontsize=12, fontweight='bold')
    ax.set_xlabel('Epoch')
    ax.set_ylabel('Learning Rate')
    ax.set_yscale('log')
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(f'{RESULTS_DIR}/mlp_training_history.png', dpi=300, bbox_inches='tight')
plt.show()

print("‚úì Training history visualized and saved!")

## üéØ Step 9: Evaluate on Test Set

In [None]:
print("="*80)
print("TEST SET EVALUATION")
print("="*80)

# Make predictions
start_time = time.time()
y_pred_proba = model.predict(X_test, verbose=0)
inference_time = time.time() - start_time

y_pred = (y_pred_proba > 0.5).astype(int).flatten()
y_pred_proba = y_pred_proba.flatten()

# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, zero_division=0)
recall = recall_score(y_test, y_pred, zero_division=0)
f1 = f1_score(y_test, y_pred, zero_division=0)
roc_auc = roc_auc_score(y_test, y_pred_proba)

# Confusion matrix
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
fpr_value = fp / (fp + tn) if (fp + tn) > 0 else 0

# Latency
avg_latency = (inference_time / len(X_test)) * 1000  # ms

print(f"\nDeep Learning MLP Test Performance:")
print(f"  Accuracy:  {accuracy*100:.2f}%")
print(f"  Precision: {precision*100:.2f}%")
print(f"  Recall:    {recall*100:.2f}%")
print(f"  F1-Score:  {f1*100:.2f}%")
print(f"  ROC-AUC:   {roc_auc:.4f}")
print(f"  FPR:       {fpr_value*100:.2f}%")

print(f"\nConfusion Matrix:")
print(f"  TN: {tn:>6,}  FP: {fp:>6,}")
print(f"  FN: {fn:>6,}  TP: {tp:>6,}")

print(f"\nInference Performance:")
print(f"  Avg Latency: {avg_latency:.2f} ms/sample")
print(f"  Total Time:  {inference_time:.2f} seconds")
print(f"  Throughput:  {len(X_test)/inference_time:.2f} samples/sec")

# Comparison with other models
print(f"\n" + "="*80)
print("COMPARISON WITH OTHER MODELS")
print("="*80)

comparison_data = {
    'Model': ['Failed LSTM', 'Deep MLP (This)', 'XGBoost', 'Random Forest'],
    'Accuracy': [0.510, accuracy, 0.876, 0.877],
    'ROC-AUC': [0.501, roc_auc, 0.951, 0.955],
    'Latency (ms)': [114.15, avg_latency, 6.97, 36.25]
}

comparison_df = pd.DataFrame(comparison_data)
print(comparison_df.to_string(index=False))

# Store results
results = {
    'accuracy': float(accuracy),
    'precision': float(precision),
    'recall': float(recall),
    'f1_score': float(f1),
    'roc_auc': float(roc_auc),
    'fpr': float(fpr_value),
    'confusion_matrix': {'tn': int(tn), 'fp': int(fp), 'fn': int(fn), 'tp': int(tp)},
    'avg_latency_ms': float(avg_latency),
    'training_time_seconds': float(training_time)
}

## üìà Step 10: Visualize Results

In [None]:
# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Benign', 'Attack'],
            yticklabels=['Benign', 'Attack'])
plt.title('Deep Learning MLP Confusion Matrix', fontsize=14, fontweight='bold')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.tight_layout()
plt.savefig(f'{RESULTS_DIR}/mlp_confusion_matrix.png', dpi=300, bbox_inches='tight')
plt.show()

# ROC Curve
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
plt.figure(figsize=(10, 8))
plt.plot(fpr, tpr, color='blue', lw=2.5,
         label=f'Deep MLP (AUC = {roc_auc:.4f})')
plt.plot([0, 1], [0, 1], color='red', lw=2, linestyle='--', label='Random Classifier')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate', fontsize=12)
plt.ylabel('True Positive Rate', fontsize=12)
plt.title('ROC Curve - Deep Learning MLP', fontsize=14, fontweight='bold')
plt.legend(loc="lower right", fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig(f'{RESULTS_DIR}/mlp_roc_curve.png', dpi=300, bbox_inches='tight')
plt.show()

# Comparison bar chart
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

models = ['Failed\nLSTM', 'Deep\nMLP', 'XGBoost', 'Random\nForest']
accuracies = [0.510, accuracy, 0.876, 0.877]
aucs = [0.501, roc_auc, 0.951, 0.955]
colors = ['red', 'blue', 'green', 'orange']

ax1.bar(models, accuracies, color=colors)
ax1.set_title('Accuracy Comparison', fontsize=14, fontweight='bold')
ax1.set_ylabel('Accuracy')
ax1.set_ylim([0, 1])
ax1.axhline(y=0.5, color='black', linestyle='--', alpha=0.3, label='Random Chance')
ax1.legend()
ax1.grid(True, alpha=0.3, axis='y')

ax2.bar(models, aucs, color=colors)
ax2.set_title('ROC-AUC Comparison', fontsize=14, fontweight='bold')
ax2.set_ylabel('ROC-AUC')
ax2.set_ylim([0, 1])
ax2.axhline(y=0.5, color='black', linestyle='--', alpha=0.3, label='Random Chance')
ax2.legend()
ax2.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig(f'{RESULTS_DIR}/mlp_model_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print("‚úì Visualizations saved!")

## üíæ Step 11: Save Model and Results

In [None]:
print("="*80)
print("SAVING MODEL AND RESULTS")
print("="*80)

# Save final model
model_path = f'{MODEL_DIR}/deep_learning_mlp_model.h5'
model.save(model_path)
print(f"‚úì Model saved to: {model_path}")

# Save results as JSON
results_dict = {
    'timestamp': time.strftime('%Y-%m-%d %H:%M:%S'),
    'dataset': 'CSE-CIC-IDS-2018',
    'model_type': 'Deep Learning MLP (Feedforward)',
    'architecture': ARCHITECTURE,
    'config': {
        'batch_size': BATCH_SIZE,
        'epochs': EPOCHS,
        'learning_rate': LEARNING_RATE,
        'num_features': int(X_train.shape[1])
    },
    'results': results,
    'training_samples': int(len(X_train)),
    'validation_samples': int(len(X_val)),
    'test_samples': int(len(X_test)),
    'comparison_with_lstm': {
        'lstm_accuracy': 0.510,
        'mlp_accuracy': float(accuracy),
        'improvement_percent': float(((accuracy - 0.510) / 0.510) * 100),
        'lstm_auc': 0.501,
        'mlp_auc': float(roc_auc),
        'auc_improvement_percent': float(((roc_auc - 0.501) / 0.501) * 100)
    }
}

results_path = f'{RESULTS_DIR}/deep_learning_mlp_results.json'
with open(results_path, 'w') as f:
    json.dump(results_dict, f, indent=4)

print(f"‚úì Results saved to: {results_path}")

print("\n" + "="*80)
print("‚úÖ ALL DONE!")
print("="*80)
print("\nYour Deep Learning MLP model is ready for deployment!")
print(f"\nKey files saved:")
print(f"  1. Model: {model_path}")
print(f"  2. Results: {results_path}")
print(f"  3. Visualizations: {RESULTS_DIR}/mlp_*.png")

print(f"\nüìä Performance Summary:")
print(f"  MLP Accuracy:  {accuracy*100:.2f}% (vs LSTM: 51.0%)")
print(f"  MLP ROC-AUC:   {roc_auc:.4f} (vs LSTM: 0.501)")
print(f"  Improvement:   {((accuracy - 0.510) / 0.510) * 100:+.1f}% accuracy gain!")

## üî¨ Step 12: Feature Importance Analysis (Optional)

Unlike LSTM, we can analyze which features are most important using gradient-based methods.

In [None]:
# Optional: Analyze feature importance using gradient-based methods
# This requires the feature names from preprocessing

# Uncomment if you have feature names available:
# try:
#     # Get weights of first layer
#     first_layer_weights = model.layers[0].get_weights()[0]
#     feature_importance = np.abs(first_layer_weights).mean(axis=1)
#     
#     # Create dataframe (you need feature_names from preprocessing)
#     # importance_df = pd.DataFrame({
#     #     'feature': feature_names,
#     #     'importance': feature_importance
#     # }).sort_values('importance', ascending=False)
#     # 
#     # print("\nTop 20 Most Important Features:")
#     # print(importance_df.head(20))
#     
# except Exception as e:
#     print(f"Could not compute feature importance: {e}")

## üéì Understanding Why MLP Works

### Data Type Matters!

**CSE-CIC-IDS-2018 Dataset:**
- Each row = one complete network flow
- Features = aggregated statistics (total bytes, packet count, etc.)
- Rows are **independent** - not temporally ordered

**LSTM Approach (Failed):**
```
Input: [flow_1, flow_2, ..., flow_10] ‚Üí Predict: flow_11
Problem: Flows 1-10 are unrelated random flows!
Result: 51% accuracy (random guessing)
```

**MLP Approach (Works):**
```
Input: [single_flow_features] ‚Üí Predict: benign or attack
Learns: "If bytes_sent > X AND port = 80 AND duration < Y, then benign"
Result: 75-85% accuracy (actual learning!)
```

### When Would LSTM Work?

LSTM would be appropriate if you had:
1. **Packet-level traces** where consecutive rows are packets from the same flow
2. **Time-ordered logs** from a single host showing attack progression
3. **Session sequences** tracking user behavior over time

For aggregated flow statistics, MLP is the correct choice!