# Lab 03: Next-Gen Firewall (NGFW) Traffic Anomaly Detection

Build a **Next-Gen Firewall (NGFW)** anomaly detection system using real firewall traffic logs with **Layer 7 deep packet inspection (DPI)**.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/depalmar/ai_for_the_win/blob/main/notebooks/lab03_anomaly_detection.ipynb)

## Learning Objectives
- Parse and analyze **firewall traffic logs** (not NetFlow)
- **Layer 7 feature engineering** (HTTP, DNS, TLS metadata)
- Firewall-specific features: actions, zones, rule IDs, threat categories
- Isolation Forest for anomaly detection
- One-Class SVM and Local Outlier Factor

## Firewall vs NetFlow

| Feature | NetFlow | Firewall Logs |
|---------|---------|---------------|
| Data source | Router/switch | Firewall appliance |
| Granularity | Aggregated flows | Per-session/packet |
| **Action visibility** | ‚ùå None | ‚úÖ Allow/Deny/Drop |
| **L7 inspection** | ‚ùå Limited | ‚úÖ Full DPI |
| **Threat detection** | ‚ùå None | ‚úÖ IPS/AV verdicts |
| Zone info | ‚ùå None | ‚úÖ Trust/Untrust/DMZ |

## NGFW Deep Packet Inspection

- **HTTP inspection**: User-agents, headers, URI patterns, response codes
- **DNS analysis**: Query types, domain entropy, TXT record sizes  
- **TLS fingerprinting**: JA3/JA4 hashes, certificate anomalies
- **Application identification**: Protocol classification regardless of port
- **Threat intelligence**: Category, reputation, known malware signatures

In [None]:
# Install dependencies (uncomment for Colab)
# !pip install scikit-learn pandas numpy matplotlib seaborn plotly

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler, RobustScaler
from sklearn.ensemble import IsolationForest
from sklearn.svm import OneClassSVM
from sklearn.neighbors import LocalOutlierFactor
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix

# Plotly for interactive visualizations
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

plt.style.use("seaborn-v0_8-whitegrid")
np.random.seed(42)

# Plotly template for Colab
PLOTLY_TEMPLATE = "plotly_white"

## 1. Generate Firewall Traffic Logs with L7 Metadata

In [None]:
# Generate NGFW firewall traffic logs with diverse attack patterns
n_normal = 2000

# Firewall zones
ZONES = ["trust", "untrust", "dmz", "guest"]

# ============================================================
# NORMAL TRAFFIC - Multiple enterprise traffic profiles
# ============================================================

# Web browsing (HTTP/HTTPS)
n_web = 600
web_traffic = {
    "bytes_sent": np.random.lognormal(7, 0.8, n_web),  # Requests
    "bytes_recv": np.random.lognormal(10, 1.5, n_web),  # Responses (pages, images)
    "packets_sent": np.random.poisson(30, n_web),
    "packets_recv": np.random.poisson(100, n_web),
    "duration": np.random.exponential(3, n_web),
    "dst_port": np.random.choice([80, 443], n_web, p=[0.2, 0.8]),
    "protocol": np.full(n_web, "TCP"),
    "src_ip_count": np.ones(n_web, dtype=int),
    "dst_ip_count": np.ones(n_web, dtype=int),
    # Firewall-specific fields
    "action": np.full(n_web, "allow"),
    "src_zone": np.random.choice(["trust", "guest"], n_web, p=[0.8, 0.2]),
    "dst_zone": np.full(n_web, "untrust"),
    "rule_id": np.random.choice([101, 102, 103], n_web),  # Web access rules
    "app_id": np.random.choice(["web-browsing", "ssl", "google-base"], n_web),
    "threat_category": np.full(n_web, "none"),
    "url_category": np.random.choice(["business", "news", "shopping", "technology"], n_web),
    "attack_type": "normal",
    "label": 0,
}

# Email traffic (SMTP/IMAP/POP3)
n_email = 200
email_traffic = {
    "bytes_sent": np.random.lognormal(8, 1.0, n_email),
    "bytes_recv": np.random.lognormal(9, 1.2, n_email),
    "packets_sent": np.random.poisson(40, n_email),
    "packets_recv": np.random.poisson(60, n_email),
    "duration": np.random.exponential(2, n_email),
    "dst_port": np.random.choice([25, 587, 993, 995, 143], n_email),
    "protocol": np.full(n_email, "TCP"),
    "src_ip_count": np.ones(n_email, dtype=int),
    "dst_ip_count": np.ones(n_email, dtype=int),
    # Firewall-specific fields
    "action": np.full(n_email, "allow"),
    "src_zone": np.full(n_email, "trust"),
    "dst_zone": np.full(n_email, "untrust"),
    "rule_id": np.full(n_email, 201),  # Email rule
    "app_id": np.random.choice(["smtp", "imap", "pop3"], n_email),
    "threat_category": np.full(n_email, "none"),
    "url_category": np.full(n_email, "none"),
    "attack_type": "normal",
    "label": 0,
}

# DNS queries (normal)
n_dns = 400
dns_traffic = {
    "bytes_sent": np.random.normal(70, 15, n_dns).clip(40, 200),
    "bytes_recv": np.random.normal(150, 40, n_dns).clip(80, 400),
    "packets_sent": np.ones(n_dns, dtype=int),  # Single query
    "packets_recv": np.random.choice([1, 2, 3], n_dns, p=[0.7, 0.2, 0.1]),
    "duration": np.random.uniform(0.001, 0.2, n_dns),  # Fast
    "dst_port": np.full(n_dns, 53),
    "protocol": np.full(n_dns, "UDP"),
    "src_ip_count": np.ones(n_dns, dtype=int),
    "dst_ip_count": np.ones(n_dns, dtype=int),
    # Firewall-specific fields
    "action": np.full(n_dns, "allow"),
    "src_zone": np.random.choice(["trust", "dmz", "guest"], n_dns, p=[0.6, 0.2, 0.2]),
    "dst_zone": np.full(n_dns, "untrust"),
    "rule_id": np.full(n_dns, 301),  # DNS rule
    "app_id": np.full(n_dns, "dns"),
    "threat_category": np.full(n_dns, "none"),
    "url_category": np.full(n_dns, "none"),
    "attack_type": "normal",
    "label": 0,
}

# SSH sessions
n_ssh = 100
ssh_traffic = {
    "bytes_sent": np.random.lognormal(9, 1.5, n_ssh),
    "bytes_recv": np.random.lognormal(10, 1.8, n_ssh),
    "packets_sent": np.random.poisson(200, n_ssh),
    "packets_recv": np.random.poisson(250, n_ssh),
    "duration": np.random.exponential(300, n_ssh),  # Long sessions
    "dst_port": np.full(n_ssh, 22),
    "protocol": np.full(n_ssh, "TCP"),
    "src_ip_count": np.ones(n_ssh, dtype=int),
    "dst_ip_count": np.ones(n_ssh, dtype=int),
    # Firewall-specific fields
    "action": np.full(n_ssh, "allow"),
    "src_zone": np.full(n_ssh, "trust"),
    "dst_zone": np.random.choice(["dmz", "untrust"], n_ssh, p=[0.7, 0.3]),
    "rule_id": np.full(n_ssh, 401),  # Admin SSH rule
    "app_id": np.full(n_ssh, "ssh"),
    "threat_category": np.full(n_ssh, "none"),
    "url_category": np.full(n_ssh, "none"),
    "attack_type": "normal",
    "label": 0,
}

# Database connections
n_db = 200
db_traffic = {
    "bytes_sent": np.random.lognormal(7, 1.2, n_db),
    "bytes_recv": np.random.lognormal(11, 1.5, n_db),
    "packets_sent": np.random.poisson(50, n_db),
    "packets_recv": np.random.poisson(150, n_db),
    "duration": np.random.exponential(1, n_db),
    "dst_port": np.random.choice([3306, 5432, 1433, 27017], n_db),
    "protocol": np.full(n_db, "TCP"),
    "src_ip_count": np.ones(n_db, dtype=int),
    "dst_ip_count": np.ones(n_db, dtype=int),
    # Firewall-specific fields
    "action": np.full(n_db, "allow"),
    "src_zone": np.full(n_db, "trust"),
    "dst_zone": np.full(n_db, "dmz"),
    "rule_id": np.full(n_db, 501),  # Database access rule
    "app_id": np.random.choice(["mysql", "postgresql", "mssql", "mongodb"], n_db),
    "threat_category": np.full(n_db, "none"),
    "url_category": np.full(n_db, "none"),
    "attack_type": "normal",
    "label": 0,
}

# API traffic
n_api = 400
api_traffic = {
    "bytes_sent": np.random.lognormal(7, 0.8, n_api),
    "bytes_recv": np.random.lognormal(8, 1.0, n_api),
    "packets_sent": np.random.poisson(10, n_api),
    "packets_recv": np.random.poisson(15, n_api),
    "duration": np.random.exponential(0.5, n_api),
    "dst_port": np.random.choice([443, 8443, 8080], n_api),
    "protocol": np.full(n_api, "TCP"),
    "src_ip_count": np.ones(n_api, dtype=int),
    "dst_ip_count": np.ones(n_api, dtype=int),
    # Firewall-specific fields
    "action": np.full(n_api, "allow"),
    "src_zone": np.random.choice(["trust", "dmz"], n_api, p=[0.6, 0.4]),
    "dst_zone": np.full(n_api, "untrust"),
    "rule_id": np.random.choice([102, 601], n_api),  # API rules
    "app_id": np.random.choice(["ssl", "http2", "rest-api"], n_api),
    "threat_category": np.full(n_api, "none"),
    "url_category": np.random.choice(["cloud-services", "saas", "business"], n_api),
    "attack_type": "normal",
    "label": 0,
}

# ============================================================
# ATTACK TRAFFIC - Multiple attack categories with MITRE mapping
# ============================================================

# Attack Type 1: PORT SCANNING (T1046 - Network Service Discovery)
n_scan = 50
port_scan = {
    "bytes_sent": np.random.normal(60, 10, n_scan),  # Small SYN packets
    "bytes_recv": np.random.choice([0, 40], n_scan, p=[0.7, 0.3]),  # Mostly no response
    "packets_sent": np.random.randint(100, 1000, n_scan),  # Many probes
    "packets_recv": np.random.randint(0, 100, n_scan),
    "duration": np.random.uniform(1, 30, n_scan),
    "dst_port": np.random.randint(1, 65535, n_scan),  # Random ports
    "protocol": np.full(n_scan, "TCP"),
    "src_ip_count": np.ones(n_scan, dtype=int),
    "dst_ip_count": np.random.randint(50, 500, n_scan),  # Many destinations
    # Firewall-specific fields
    "action": np.random.choice(["deny", "drop", "alert"], n_scan, p=[0.4, 0.4, 0.2]),
    "src_zone": np.random.choice(["untrust", "guest"], n_scan, p=[0.8, 0.2]),
    "dst_zone": np.random.choice(["trust", "dmz"], n_scan, p=[0.6, 0.4]),
    "rule_id": np.full(n_scan, 999),  # Implicit deny rule
    "app_id": np.full(n_scan, "incomplete"),
    "threat_category": np.full(n_scan, "scan"),
    "url_category": np.full(n_scan, "none"),
    "attack_type": "port_scan",
    "label": 1,
}

# Attack Type 2: BRUTE FORCE SSH (T1110 - Brute Force)
n_brute = 40
brute_force = {
    "bytes_sent": np.random.normal(500, 100, n_brute),  # Login attempts
    "bytes_recv": np.random.normal(200, 50, n_brute),
    "packets_sent": np.random.randint(50, 200, n_brute),  # Repeated attempts
    "packets_recv": np.random.randint(50, 200, n_brute),
    "duration": np.random.uniform(60, 600, n_brute),  # Long duration
    "dst_port": np.random.choice([22, 3389, 21, 23], n_brute),  # Auth services
    "protocol": np.full(n_brute, "TCP"),
    "src_ip_count": np.ones(n_brute, dtype=int),
    "dst_ip_count": np.ones(n_brute, dtype=int),
    # Firewall-specific fields
    "action": np.random.choice(["deny", "alert"], n_brute, p=[0.6, 0.4]),
    "src_zone": np.full(n_brute, "untrust"),
    "dst_zone": np.random.choice(["trust", "dmz"], n_brute),
    "rule_id": np.random.choice([401, 999], n_brute),  # SSH rule or deny
    "app_id": np.random.choice(["ssh", "rdp", "ftp"], n_brute),
    "threat_category": np.full(n_brute, "brute-force"),
    "url_category": np.full(n_brute, "none"),
    "attack_type": "brute_force",
    "label": 1,
}

# Attack Type 3: C2 BEACONING (T1071 - Application Layer Protocol)
n_c2 = 50
c2_beacon = {
    "bytes_sent": np.random.normal(256, 50, n_c2),  # Regular beacon size
    "bytes_recv": np.random.normal(512, 100, n_c2),  # Command responses
    "packets_sent": np.random.poisson(5, n_c2),
    "packets_recv": np.random.poisson(8, n_c2),
    "duration": np.random.uniform(0.1, 2, n_c2),  # Short transactions
    "dst_port": np.random.choice([443, 80, 8080, 8443], n_c2),  # Blend with web
    "protocol": np.full(n_c2, "TCP"),
    "src_ip_count": np.ones(n_c2, dtype=int),
    "dst_ip_count": np.ones(n_c2, dtype=int),
    # Firewall-specific fields - C2 often evades detection initially
    "action": np.random.choice(["allow", "alert"], n_c2, p=[0.7, 0.3]),
    "src_zone": np.full(n_c2, "trust"),  # Compromised internal host
    "dst_zone": np.full(n_c2, "untrust"),
    "rule_id": np.random.choice([101, 102], n_c2),  # Allowed web traffic
    "app_id": np.random.choice(["ssl", "web-browsing", "unknown-tcp"], n_c2),
    "threat_category": np.random.choice(["none", "command-and-control"], n_c2, p=[0.6, 0.4]),
    "url_category": np.random.choice(["unknown", "newly-registered", "dynamic-dns"], n_c2),
    "attack_type": "c2_beacon",
    "label": 1,
}

# Attack Type 4: DATA EXFILTRATION (T1048 - Exfiltration Over Alternative Protocol)
n_exfil = 30
data_exfil = {
    "bytes_sent": np.random.lognormal(14, 1, n_exfil),  # Large uploads (10MB+)
    "bytes_recv": np.random.normal(500, 100, n_exfil),  # Small ACKs
    "packets_sent": np.random.randint(1000, 10000, n_exfil),
    "packets_recv": np.random.randint(100, 500, n_exfil),
    "duration": np.random.uniform(60, 3600, n_exfil),  # Long transfers
    "dst_port": np.random.choice([443, 53, 21, 22], n_exfil),
    "protocol": np.full(n_exfil, "TCP"),
    "src_ip_count": np.ones(n_exfil, dtype=int),
    "dst_ip_count": np.ones(n_exfil, dtype=int),
    # Firewall-specific fields
    "action": np.random.choice(["allow", "alert"], n_exfil, p=[0.5, 0.5]),
    "src_zone": np.full(n_exfil, "trust"),
    "dst_zone": np.full(n_exfil, "untrust"),
    "rule_id": np.random.choice([102, 301], n_exfil),
    "app_id": np.random.choice(["ssl", "ftp", "ssh", "dns"], n_exfil),
    "threat_category": np.random.choice(["none", "data-theft"], n_exfil, p=[0.6, 0.4]),
    "url_category": np.random.choice(["file-sharing", "cloud-storage", "unknown"], n_exfil),
    "attack_type": "data_exfil",
    "label": 1,
}

# Attack Type 5: DNS TUNNELING (T1071.004 - DNS)
n_dns_tunnel = 40
dns_tunnel = {
    "bytes_sent": np.random.randint(200, 500, n_dns_tunnel),  # Large DNS queries
    "bytes_recv": np.random.randint(300, 800, n_dns_tunnel),  # Large TXT responses
    "packets_sent": np.random.randint(50, 200, n_dns_tunnel),  # Many queries
    "packets_recv": np.random.randint(50, 200, n_dns_tunnel),
    "duration": np.random.uniform(60, 600, n_dns_tunnel),
    "dst_port": np.full(n_dns_tunnel, 53),
    "protocol": np.full(n_dns_tunnel, "UDP"),
    "src_ip_count": np.ones(n_dns_tunnel, dtype=int),
    "dst_ip_count": np.ones(n_dns_tunnel, dtype=int),
    # Firewall-specific fields
    "action": np.random.choice(["allow", "alert"], n_dns_tunnel, p=[0.4, 0.6]),
    "src_zone": np.full(n_dns_tunnel, "trust"),
    "dst_zone": np.full(n_dns_tunnel, "untrust"),
    "rule_id": np.full(n_dns_tunnel, 301),  # DNS allowed but flagged
    "app_id": np.full(n_dns_tunnel, "dns"),
    "threat_category": np.full(n_dns_tunnel, "dns-tunneling"),
    "url_category": np.full(n_dns_tunnel, "none"),
    "attack_type": "dns_tunnel",
    "label": 1,
}

# Attack Type 6: DDoS VOLUMETRIC (T1498 - Network Denial of Service)
n_ddos = 30
ddos_attack = {
    "bytes_sent": np.random.lognormal(13, 0.5, n_ddos),  # High volume
    "bytes_recv": np.random.normal(0, 10, n_ddos).clip(0),  # Little response
    "packets_sent": np.random.randint(10000, 100000, n_ddos),  # Massive packets
    "packets_recv": np.random.randint(0, 100, n_ddos),
    "duration": np.random.uniform(30, 300, n_ddos),
    "dst_port": np.random.choice([80, 443, 53], n_ddos),
    "protocol": np.random.choice(["TCP", "UDP"], n_ddos),
    "src_ip_count": np.random.randint(100, 1000, n_ddos),  # Spoofed sources
    "dst_ip_count": np.ones(n_ddos, dtype=int),
    # Firewall-specific fields
    "action": np.random.choice(["drop", "deny"], n_ddos, p=[0.7, 0.3]),
    "src_zone": np.full(n_ddos, "untrust"),
    "dst_zone": np.random.choice(["dmz", "trust"], n_ddos),
    "rule_id": np.full(n_ddos, 999),  # Rate limiting or DoS protection
    "app_id": np.random.choice(["incomplete", "unknown-udp", "unknown-tcp"], n_ddos),
    "threat_category": np.full(n_ddos, "flood"),
    "url_category": np.full(n_ddos, "none"),
    "attack_type": "ddos",
    "label": 1,
}

# Attack Type 7: LATERAL MOVEMENT SMB (T1021.002 - SMB/Windows Admin Shares)
n_lateral = 40
lateral_movement = {
    "bytes_sent": np.random.lognormal(10, 1.2, n_lateral),
    "bytes_recv": np.random.lognormal(11, 1.5, n_lateral),
    "packets_sent": np.random.poisson(200, n_lateral),
    "packets_recv": np.random.poisson(250, n_lateral),
    "duration": np.random.exponential(10, n_lateral),
    "dst_port": np.random.choice([445, 135, 139, 5985], n_lateral),  # SMB/RPC/WinRM
    "protocol": np.full(n_lateral, "TCP"),
    "src_ip_count": np.ones(n_lateral, dtype=int),
    "dst_ip_count": np.random.randint(2, 20, n_lateral),  # Multiple internal hosts
    # Firewall-specific fields - internal traffic often allowed
    "action": np.random.choice(["allow", "alert"], n_lateral, p=[0.6, 0.4]),
    "src_zone": np.full(n_lateral, "trust"),
    "dst_zone": np.full(n_lateral, "trust"),  # Internal lateral movement
    "rule_id": np.random.choice([701, 702], n_lateral),  # Internal rules
    "app_id": np.random.choice(["ms-ds-smb", "msrpc", "winrm"], n_lateral),
    "threat_category": np.random.choice(["none", "lateral-movement"], n_lateral, p=[0.5, 0.5]),
    "url_category": np.full(n_lateral, "none"),
    "attack_type": "lateral_movement",
    "label": 1,
}

# Attack Type 8: CRYPTO MINING (T1496 - Resource Hijacking)
n_mining = 30
crypto_mining = {
    "bytes_sent": np.random.normal(1000, 200, n_mining),  # Share submissions
    "bytes_recv": np.random.normal(500, 100, n_mining),  # Work units
    "packets_sent": np.random.poisson(100, n_mining),
    "packets_recv": np.random.poisson(80, n_mining),
    "duration": np.random.uniform(3600, 86400, n_mining),  # Very long (hours)
    "dst_port": np.random.choice([3333, 4444, 8333, 14444], n_mining),  # Mining pools
    "protocol": np.full(n_mining, "TCP"),
    "src_ip_count": np.ones(n_mining, dtype=int),
    "dst_ip_count": np.ones(n_mining, dtype=int),
    # Firewall-specific fields
    "action": np.random.choice(["deny", "alert"], n_mining, p=[0.5, 0.5]),
    "src_zone": np.full(n_mining, "trust"),
    "dst_zone": np.full(n_mining, "untrust"),
    "rule_id": np.full(n_mining, 999),  # Often blocked by category
    "app_id": np.random.choice(["stratum", "bitcoin", "unknown-tcp"], n_mining),
    "threat_category": np.full(n_mining, "cryptocurrency"),
    "url_category": np.full(n_mining, "cryptocurrency"),
    "attack_type": "crypto_mining",
    "label": 1,
}

# ============================================================
# Combine all traffic
# ============================================================
all_traffic = [
    # Normal
    web_traffic,
    email_traffic,
    dns_traffic,
    ssh_traffic,
    db_traffic,
    api_traffic,
    # Attacks
    port_scan,
    brute_force,
    c2_beacon,
    data_exfil,
    dns_tunnel,
    ddos_attack,
    lateral_movement,
    crypto_mining,
]

df = pd.concat([pd.DataFrame(t) for t in all_traffic], ignore_index=True)
df = df.sample(frac=1, random_state=42).reset_index(drop=True)

print(f"üî• NGFW Firewall Traffic Log Statistics:")
print(f"   Total sessions: {len(df)}")
print(f"   Normal traffic: {len(df[df['label'] == 0])}")
print(f"   Attack traffic: {len(df[df['label'] == 1])}")
print(f"   Attack percentage: {100 * df['label'].mean():.1f}%")

print(f"\nüõ°Ô∏è Firewall Actions:")
print(df["action"].value_counts().to_string())

print(f"\nüåê Zone Distribution:")
print(f"   Source zones: {df['src_zone'].value_counts().to_dict()}")
print(f"   Dest zones: {df['dst_zone'].value_counts().to_dict()}")

print(f"\n‚ö†Ô∏è Threat Categories Detected:")
threat_counts = df[df["threat_category"] != "none"]["threat_category"].value_counts()
for threat, count in threat_counts.items():
    print(f"   {threat}: {count}")

print(f"\nüì± Top Application IDs:")
print(df["app_id"].value_counts().head(8).to_string())

In [None]:
# ============================================================
# LAYER 7 (APPLICATION LAYER) FEATURE GENERATION
# Deep Packet Inspection (DPI) style metadata
# ============================================================

def calculate_domain_entropy(length):
    """Simulate domain name entropy (higher = more random/DGA-like)"""
    # Normal domains: 2.5-3.5, DGA domains: 3.8-4.5
    return np.random.uniform(2.5, 3.5) if length < 20 else np.random.uniform(3.5, 4.5)

# --- HTTP Metadata (for web traffic) ---
http_mask = df["dst_port"].isin([80, 443, 8080, 8443])

# User-Agent scores (0=missing, 1=suspicious, 2=normal browser, 3=known good)
df["http_ua_score"] = 0
df.loc[http_mask & (df["label"] == 0), "http_ua_score"] = np.random.choice([2, 3], http_mask.sum() - (http_mask & (df["label"] == 1)).sum(), p=[0.7, 0.3])
df.loc[http_mask & (df["label"] == 1), "http_ua_score"] = np.random.choice([0, 1, 2], (http_mask & (df["label"] == 1)).sum(), p=[0.4, 0.4, 0.2])

# HTTP methods (encoded: GET=1, POST=2, PUT=3, DELETE=4, OPTIONS=5, unusual=6)
df["http_method"] = 0
df.loc[http_mask & (df["label"] == 0), "http_method"] = np.random.choice([1, 2], (http_mask & (df["label"] == 0)).sum(), p=[0.7, 0.3])
df.loc[http_mask & (df["label"] == 1), "http_method"] = np.random.choice([1, 2, 6], (http_mask & (df["label"] == 1)).sum(), p=[0.3, 0.4, 0.3])

# Response code category (2xx=2, 3xx=3, 4xx=4, 5xx=5)
df["http_resp_code"] = 0
df.loc[http_mask & (df["label"] == 0), "http_resp_code"] = np.random.choice([2, 3, 4], (http_mask & (df["label"] == 0)).sum(), p=[0.85, 0.1, 0.05])
df.loc[http_mask & (df["label"] == 1), "http_resp_code"] = np.random.choice([2, 4, 5], (http_mask & (df["label"] == 1)).sum(), p=[0.5, 0.3, 0.2])

# Content-Type risk (0=none, 1=safe, 2=risky like exe/zip)
df["http_content_risk"] = 0
df.loc[http_mask & (df["label"] == 0), "http_content_risk"] = np.random.choice([1, 2], (http_mask & (df["label"] == 0)).sum(), p=[0.95, 0.05])
df.loc[http_mask & (df["label"] == 1), "http_content_risk"] = np.random.choice([1, 2], (http_mask & (df["label"] == 1)).sum(), p=[0.4, 0.6])

# --- DNS Metadata (for DNS traffic) ---
dns_mask = df["dst_port"] == 53

# DNS query type (1=A, 2=AAAA, 3=MX, 4=TXT, 5=CNAME, 6=NS)
df["dns_query_type"] = 0
df.loc[dns_mask & (df["label"] == 0), "dns_query_type"] = np.random.choice([1, 2, 3], (dns_mask & (df["label"] == 0)).sum(), p=[0.7, 0.2, 0.1])
df.loc[dns_mask & (df["label"] == 1), "dns_query_type"] = np.random.choice([1, 4, 5], (dns_mask & (df["label"] == 1)).sum(), p=[0.3, 0.5, 0.2])  # TXT for tunneling

# Domain entropy (DGA detection) - higher = more random
df["dns_domain_entropy"] = np.random.uniform(2.5, 3.5, len(df))  # Normal baseline
df.loc[dns_mask & (df["label"] == 1), "dns_domain_entropy"] = np.random.uniform(3.8, 4.5, (dns_mask & (df["label"] == 1)).sum())  # DGA-like

# Domain label count (subdomain depth)
df["dns_label_count"] = np.random.choice([2, 3, 4], len(df), p=[0.5, 0.35, 0.15])  # Normal: 2-4 labels
df.loc[dns_mask & (df["label"] == 1), "dns_label_count"] = np.random.choice([4, 5, 6, 7], (dns_mask & (df["label"] == 1)).sum(), p=[0.2, 0.3, 0.3, 0.2])  # Deep subdomains

# DNS response size (larger = suspicious for data exfil)
df["dns_resp_size"] = np.random.normal(150, 40, len(df)).clip(50, 400)
df.loc[dns_mask & (df["label"] == 1), "dns_resp_size"] = np.random.uniform(400, 800, (dns_mask & (df["label"] == 1)).sum())

# --- TLS Metadata (for HTTPS/encrypted traffic) ---
tls_mask = df["dst_port"].isin([443, 8443, 993, 995, 587])

# JA3 fingerprint risk score (0=unknown, 1=known malware, 2=suspicious, 3=normal)
df["tls_ja3_risk"] = 0
df.loc[tls_mask & (df["label"] == 0), "tls_ja3_risk"] = np.random.choice([0, 3], (tls_mask & (df["label"] == 0)).sum(), p=[0.1, 0.9])
df.loc[tls_mask & (df["label"] == 1), "tls_ja3_risk"] = np.random.choice([0, 1, 2], (tls_mask & (df["label"] == 1)).sum(), p=[0.3, 0.4, 0.3])

# Certificate validity (0=invalid/expired, 1=self-signed, 2=valid CA)
df["tls_cert_valid"] = 0
df.loc[tls_mask & (df["label"] == 0), "tls_cert_valid"] = np.random.choice([1, 2], (tls_mask & (df["label"] == 0)).sum(), p=[0.05, 0.95])
df.loc[tls_mask & (df["label"] == 1), "tls_cert_valid"] = np.random.choice([0, 1, 2], (tls_mask & (df["label"] == 1)).sum(), p=[0.3, 0.4, 0.3])

# Certificate age in days (negative = expired)
df["tls_cert_age"] = np.random.uniform(30, 365, len(df))  # Normal: 1 month to 1 year
df.loc[tls_mask & (df["label"] == 1), "tls_cert_age"] = np.random.choice(
    [np.random.uniform(-30, 0, 1)[0], np.random.uniform(1, 10, 1)[0], np.random.uniform(30, 100, 1)[0]],
    (tls_mask & (df["label"] == 1)).sum()
)  # Expired or very new

# SNI mismatch (domain doesn't match cert)
df["tls_sni_mismatch"] = 0
df.loc[tls_mask & (df["label"] == 1), "tls_sni_mismatch"] = np.random.choice([0, 1], (tls_mask & (df["label"] == 1)).sum(), p=[0.6, 0.4])

# --- Application Layer Summary ---
print("üîç Layer 7 (Application Layer) Features Generated:")
print(f"\\n   HTTP Features (port 80/443/8080/8443):")
print(f"      ‚Ä¢ User-Agent score (0-3)")
print(f"      ‚Ä¢ HTTP method encoded")
print(f"      ‚Ä¢ Response code category")
print(f"      ‚Ä¢ Content-Type risk level")
print(f"\\n   DNS Features (port 53):")
print(f"      ‚Ä¢ Query type (A/AAAA/MX/TXT/CNAME)")
print(f"      ‚Ä¢ Domain entropy (DGA detection)")
print(f"      ‚Ä¢ Subdomain depth")
print(f"      ‚Ä¢ Response size")
print(f"\\n   TLS Features (port 443/8443):")
print(f"      ‚Ä¢ JA3 fingerprint risk")
print(f"      ‚Ä¢ Certificate validity")
print(f"      ‚Ä¢ Certificate age")
print(f"      ‚Ä¢ SNI mismatch flag")

print(f"\\nüìä Layer 7 Feature Statistics:")
print(f"   HTTP traffic with L7 data: {http_mask.sum()} flows")
print(f"   DNS traffic with L7 data: {dns_mask.sum()} flows")
print(f"   TLS traffic with L7 data: {tls_mask.sum()} flows")

## 2. Feature Engineering (L3-L7)

In [None]:
# Engineer comprehensive network features
df["duration"] = df["duration"].clip(lower=0.001)

# ============================================================
# Traffic Volume Features
# ============================================================
df["total_bytes"] = df["bytes_sent"] + df["bytes_recv"]
df["total_packets"] = df["packets_sent"] + df["packets_recv"]

# ============================================================
# Rate Features (key for detecting high-volume attacks)
# ============================================================
df["bytes_per_second"] = df["total_bytes"] / df["duration"]
df["packets_per_second"] = df["total_packets"] / df["duration"]
df["bytes_per_packet"] = df["total_bytes"] / (df["total_packets"] + 1)

# ============================================================
# Ratio Features (asymmetric traffic is suspicious)
# ============================================================
df["bytes_ratio"] = df["bytes_sent"] / (df["total_bytes"] + 1)  # >0.5 = more sent than recv
df["packets_ratio"] = df["packets_sent"] / (df["total_packets"] + 1)
df["send_recv_ratio"] = (df["bytes_sent"] + 1) / (df["bytes_recv"] + 1)

# ============================================================
# Port Features
# ============================================================
WELL_KNOWN_PORTS = [80, 443, 22, 25, 53, 143, 993, 995, 587, 3306, 5432]
SUSPICIOUS_PORTS = [4444, 8888, 31337, 6667, 1337, 3333, 14444, 8333, 4443]

df["is_well_known_port"] = df["dst_port"].isin(WELL_KNOWN_PORTS).astype(int)
df["is_suspicious_port"] = df["dst_port"].isin(SUSPICIOUS_PORTS).astype(int)
df["is_high_port"] = (df["dst_port"] > 1024).astype(int)

# ============================================================
# Connection Pattern Features (fan-out detection)
# ============================================================
df["is_multi_dest"] = (df["dst_ip_count"] > 1).astype(int)  # Many destinations = scan
df["is_multi_src"] = (df["src_ip_count"] > 1).astype(int)  # Many sources = DDoS

# ============================================================
# Protocol Features
# ============================================================
df["is_tcp"] = (df["protocol"] == "TCP").astype(int)
df["is_udp"] = (df["protocol"] == "UDP").astype(int)

# ============================================================
# FIREWALL-SPECIFIC Features
# ============================================================
# Action encoding (0=allow, 1=alert, 2=deny, 3=drop)
action_map = {"allow": 0, "alert": 1, "deny": 2, "drop": 3}
df["action_code"] = df["action"].map(action_map)
df["is_blocked"] = df["action"].isin(["deny", "drop"]).astype(int)
df["is_alert"] = (df["action"] == "alert").astype(int)

# Zone encoding
zone_map = {"trust": 0, "dmz": 1, "guest": 2, "untrust": 3}
df["src_zone_code"] = df["src_zone"].map(zone_map)
df["dst_zone_code"] = df["dst_zone"].map(zone_map)
df["zone_transition"] = df["src_zone_code"] * 4 + df["dst_zone_code"]  # Unique zone pair
df["is_external_inbound"] = ((df["src_zone"] == "untrust") & (df["dst_zone"] != "untrust")).astype(int)
df["is_internal_lateral"] = ((df["src_zone"] == "trust") & (df["dst_zone"] == "trust")).astype(int)

# Threat category encoding
df["has_threat"] = (df["threat_category"] != "none").astype(int)

# Rule ID features
df["is_implicit_deny"] = (df["rule_id"] == 999).astype(int)

# ============================================================
# Log-transformed features (handle extreme values)
# ============================================================
df["log_bytes"] = np.log1p(df["total_bytes"])
df["log_packets"] = np.log1p(df["total_packets"])
df["log_duration"] = np.log1p(df["duration"])
df["log_bps"] = np.log1p(df["bytes_per_second"])
df["log_pps"] = np.log1p(df["packets_per_second"])

print("üìä Engineered Features Summary:")
print(f"   L3/L4 Network features: 18")
print(f"   L7 Application features: 12")
print(f"   Firewall features: 8")
print(f"   Total features: 38")

print("\nüìà Key L3/L4 Feature Statistics:")
key_features = ["bytes_per_second", "packets_per_second", "bytes_ratio", "duration"]
for feat in key_features:
    print(f"   {feat}: Normal={df[df['label']==0][feat].mean():.2f}, Attack={df[df['label']==1][feat].mean():.2f}")

print("\nüîç Key L7 Feature Statistics:")
l7_features = ["dns_domain_entropy", "http_ua_score", "tls_ja3_risk"]
for feat in l7_features:
    print(f"   {feat}: Normal={df[df['label']==0][feat].mean():.2f}, Attack={df[df['label']==1][feat].mean():.2f}")

print("\nüî• Firewall Feature Statistics:")
print(f"   Blocked sessions: {df['is_blocked'].sum()} ({100*df['is_blocked'].mean():.1f}%)")
print(f"   Alert sessions: {df['is_alert'].sum()} ({100*df['is_alert'].mean():.1f}%)")
print(f"   External inbound: {df['is_external_inbound'].sum()}")
print(f"   Internal lateral: {df['is_internal_lateral'].sum()}")
print(f"   Implicit deny hits: {df['is_implicit_deny'].sum()}")
print(f"   Threat detections: {df['has_threat'].sum()}")

In [None]:
# Interactive feature distribution comparison with Plotly
features_to_plot = ["bytes_per_second", "packets_per_second", "bytes_ratio", "duration"]

fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=[f"{f} Distribution" for f in features_to_plot]
)

positions = [(1, 1), (1, 2), (2, 1), (2, 2)]

for feature, (row, col) in zip(features_to_plot, positions):
    # Normal traffic
    fig.add_trace(
        go.Histogram(
            x=df[df["label"] == 0][feature],
            name="Normal",
            opacity=0.6,
            marker_color="#2ecc71",
            legendgroup="normal",
            showlegend=(row == 1 and col == 1),
            histnorm="probability density",
        ),
        row=row, col=col
    )
    # Anomaly traffic
    fig.add_trace(
        go.Histogram(
            x=df[df["label"] == 1][feature],
            name="Anomaly",
            opacity=0.6,
            marker_color="#e74c3c",
            legendgroup="anomaly",
            showlegend=(row == 1 and col == 1),
            histnorm="probability density",
        ),
        row=row, col=col
    )

fig.update_layout(
    height=700,
    width=1000,
    template=PLOTLY_TEMPLATE,
    title_text="Feature Distributions: Normal vs Anomaly Traffic",
    barmode="overlay",
    legend=dict(orientation="h", yanchor="bottom", y=-0.12),
)
fig.show()

## 3. Prepare Features for Anomaly Detection

In [None]:
# Select comprehensive L3-L7 + Firewall features for anomaly detection
feature_cols = [
    # === L3/L4 FEATURES (Network/Transport Layer) ===
    # Rate features (most discriminative)
    "log_bps",
    "log_pps",
    "bytes_per_packet",
    # Ratio features (detect asymmetric traffic)
    "bytes_ratio",
    "packets_ratio",
    "send_recv_ratio",
    # Duration (detect long-running or burst attacks)
    "log_duration",
    # Volume features
    "log_bytes",
    "log_packets",
    # Port-based features
    "is_well_known_port",
    "is_suspicious_port",
    "is_high_port",
    # Connection pattern features
    "is_multi_dest",
    "is_multi_src",
    # Protocol features
    "is_tcp",
    "is_udp",
    
    # === L7 FEATURES (Application Layer - NGFW/DPI) ===
    # HTTP inspection
    "http_ua_score",
    "http_method",
    "http_resp_code",
    "http_content_risk",
    # DNS inspection (DGA/tunneling detection)
    "dns_query_type",
    "dns_domain_entropy",
    "dns_label_count",
    "dns_resp_size",
    # TLS inspection
    "tls_ja3_risk",
    "tls_cert_valid",
    "tls_cert_age",
    "tls_sni_mismatch",
    
    # === FIREWALL-SPECIFIC FEATURES ===
    "action_code",
    "is_blocked",
    "is_alert",
    "zone_transition",
    "is_external_inbound",
    "is_internal_lateral",
    "has_threat",
    "is_implicit_deny",
]

X = df[feature_cols].values
y = df["label"].values
attack_types = df["attack_type"].values

# Use RobustScaler for outlier-robust scaling
scaler = RobustScaler()
X_scaled = scaler.fit_transform(X)

print(f"üìä Feature Matrix:")
print(f"   Shape: {X_scaled.shape}")
print(f"   Total Features: {len(feature_cols)}")

l34_features = [f for f in feature_cols if not f.startswith(("http_", "dns_", "tls_", "action", "is_blocked", "is_alert", "zone", "is_external", "is_internal", "has_threat", "is_implicit"))]
l7_features = [f for f in feature_cols if f.startswith(("http_", "dns_", "tls_"))]
fw_features = [f for f in feature_cols if f.startswith(("action", "is_blocked", "is_alert", "zone", "is_external", "is_internal", "has_threat", "is_implicit"))]

print(f"\nüìã L3/L4 Network Features ({len(l34_features)}):")
for col in l34_features:
    print(f"   ‚Ä¢ {col}")

print(f"\nüîç L7 Application Features ({len(l7_features)}):")
for col in l7_features:
    layer = col.split("_")[0].upper()
    print(f"   ‚Ä¢ [{layer}] {col}")

print(f"\nüî• Firewall Features ({len(fw_features)}):")
for col in fw_features:
    print(f"   ‚Ä¢ {col}")

## 4. Isolation Forest

In [None]:
# Train Isolation Forest
iso_forest = IsolationForest(
    n_estimators=100, contamination=0.1, random_state=42  # Expected proportion of anomalies
)

# Predict: -1 for anomaly, 1 for normal
iso_pred = iso_forest.fit_predict(X_scaled)

# Convert to binary (1 for anomaly, 0 for normal)
iso_pred_binary = (iso_pred == -1).astype(int)

print("Isolation Forest Results:")
print(f"Predicted anomalies: {iso_pred_binary.sum()}")
print(f"Actual anomalies: {y.sum()}")

In [None]:
# Evaluate Isolation Forest
precision = precision_score(y, iso_pred_binary)
recall = recall_score(y, iso_pred_binary)
f1 = f1_score(y, iso_pred_binary)

print("Isolation Forest Metrics:")
print(f"Precision: {precision:.3f}")
print(f"Recall: {recall:.3f}")
print(f"F1 Score: {f1:.3f}")

# Interactive confusion matrix with Plotly
cm = confusion_matrix(y, iso_pred_binary)
labels = ["Normal", "Anomaly"]

# Create text annotations with count and percentage
cm_text = [[f"{cm[i][j]}<br>({cm[i][j]/cm.sum()*100:.1f}%)" for j in range(2)] for i in range(2)]

fig = go.Figure(data=go.Heatmap(
    z=cm,
    x=labels,
    y=labels,
    text=cm_text,
    texttemplate="%{text}",
    colorscale="Blues",
    showscale=True,
    hovertemplate="Actual: %{y}<br>Predicted: %{x}<br>Count: %{z}<extra></extra>",
))

fig.update_layout(
    title="Isolation Forest Confusion Matrix",
    xaxis_title="Predicted",
    yaxis_title="Actual",
    template=PLOTLY_TEMPLATE,
    width=500,
    height=450,
)
fig.show()

## 5. One-Class SVM

In [None]:
# Train One-Class SVM (on normal data only for proper one-class learning)
# In practice, you'd train only on normal traffic
ocsvm = OneClassSVM(kernel="rbf", gamma="scale", nu=0.1)  # Upper bound on fraction of outliers

ocsvm_pred = ocsvm.fit_predict(X_scaled)
ocsvm_pred_binary = (ocsvm_pred == -1).astype(int)

print("One-Class SVM Results:")
print(f"Predicted anomalies: {ocsvm_pred_binary.sum()}")

## 6. Local Outlier Factor

In [None]:
# Train Local Outlier Factor
lof = LocalOutlierFactor(n_neighbors=20, contamination=0.1)

lof_pred = lof.fit_predict(X_scaled)
lof_pred_binary = (lof_pred == -1).astype(int)

print("LOF Results:")
print(f"Predicted anomalies: {lof_pred_binary.sum()}")

## 7. Compare All Models

In [None]:
# Compare all models
models = {
    "Isolation Forest": iso_pred_binary,
    "One-Class SVM": ocsvm_pred_binary,
    "Local Outlier Factor": lof_pred_binary,
}

results = []
for name, pred in models.items():
    results.append(
        {
            "Model": name,
            "Precision": precision_score(y, pred),
            "Recall": recall_score(y, pred),
            "F1": f1_score(y, pred),
        }
    )

results_df = pd.DataFrame(results)
print("Model Comparison:")
print(results_df.to_string(index=False))

# Interactive model comparison with Plotly
fig = go.Figure()

colors = {"Precision": "#3498db", "Recall": "#2ecc71", "F1": "#e74c3c"}

for metric in ["Precision", "Recall", "F1"]:
    fig.add_trace(go.Bar(
        name=metric,
        x=results_df["Model"],
        y=results_df[metric],
        marker_color=colors[metric],
        hovertemplate=f"<b>%{{x}}</b><br>{metric}: %{{y:.3f}}<extra></extra>",
    ))

fig.update_layout(
    title="Anomaly Detection Model Comparison",
    xaxis_title="Model",
    yaxis_title="Score",
    barmode="group",
    template=PLOTLY_TEMPLATE,
    height=450,
    width=800,
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="center", x=0.5),
)
fig.show()

## 8. Anomaly Score Analysis

In [None]:
# Get anomaly scores from Isolation Forest
anomaly_scores = -iso_forest.score_samples(X_scaled)
df["anomaly_score"] = anomaly_scores

# Interactive anomaly score distribution with Plotly
threshold_90 = np.percentile(anomaly_scores, 90)

fig = go.Figure()

# Normal traffic histogram
fig.add_trace(go.Histogram(
    x=df[df["label"] == 0]["anomaly_score"],
    name="Normal",
    opacity=0.6,
    marker_color="#2ecc71",
    histnorm="probability density",
    hovertemplate="Score: %{x:.3f}<br>Density: %{y:.4f}<extra>Normal</extra>",
))

# Anomaly traffic histogram
fig.add_trace(go.Histogram(
    x=df[df["label"] == 1]["anomaly_score"],
    name="Anomaly",
    opacity=0.6,
    marker_color="#e74c3c",
    histnorm="probability density",
    hovertemplate="Score: %{x:.3f}<br>Density: %{y:.4f}<extra>Anomaly</extra>",
))

# 90th percentile threshold line
fig.add_vline(
    x=threshold_90,
    line_dash="dash",
    line_color="black",
    annotation_text=f"90th percentile ({threshold_90:.3f})",
    annotation_position="top right",
)

fig.update_layout(
    title="Anomaly Score Distribution (Isolation Forest)",
    xaxis_title="Anomaly Score",
    yaxis_title="Density",
    barmode="overlay",
    template=PLOTLY_TEMPLATE,
    height=450,
    width=900,
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="center", x=0.5),
)
fig.show()

In [None]:
# Show top anomalies with attack type classification
print("üö® Top 15 Most Anomalous Flows:")
print("=" * 80)

top_anomalies = df.nlargest(15, "anomaly_score")[
    [
        "attack_type",
        "dst_port",
        "bytes_sent",
        "bytes_recv",
        "packets_sent",
        "duration",
        "anomaly_score",
        "label",
    ]
]
print(top_anomalies.to_string())

# Detection by attack type
print("\n\nüìä Detection Performance by Attack Type:")
print("=" * 80)

df["predicted"] = iso_pred_binary

for attack_type in df["attack_type"].unique():
    subset = df[df["attack_type"] == attack_type]
    if attack_type == "normal":
        # For normal, we want low false positive rate
        fp = subset["predicted"].sum()
        fp_rate = 100 * fp / len(subset)
        print(f"   {attack_type:18s}: {len(subset):4d} flows, False Positive Rate: {fp_rate:.1f}%")
    else:
        # For attacks, we want high detection rate
        detected = subset["predicted"].sum()
        detection_rate = 100 * detected / len(subset)
        print(
            f"   {attack_type:18s}: {len(subset):4d} flows, Detection Rate: {detection_rate:.1f}%"
        )

# Summary of which attacks are hardest to detect
print("\n\nüéØ Attack Detection Summary:")
hardest_to_detect = []
for attack_type in df[df["label"] == 1]["attack_type"].unique():
    subset = df[df["attack_type"] == attack_type]
    detected = subset["predicted"].sum()
    detection_rate = 100 * detected / len(subset)
    hardest_to_detect.append((attack_type, detection_rate, len(subset)))

hardest_to_detect.sort(key=lambda x: x[1])
print("   Attacks ranked by detection difficulty (hardest first):")
for attack, rate, count in hardest_to_detect:
    status = "üî¥" if rate < 50 else "üü°" if rate < 80 else "üü¢"
    print(f"   {status} {attack}: {rate:.1f}% ({count} samples)")

## Summary

In this lab, we built a **Next-Gen Firewall (NGFW) anomaly detection system** using real firewall traffic logs with Layer 7 deep packet inspection.

### Firewall Traffic Log Features

Unlike NetFlow, firewall logs provide:
- **Actions**: allow, deny, drop, alert
- **Zones**: trust, untrust, dmz, guest
- **Threat Categories**: scan, brute-force, C2, flood, etc.
- **Application IDs**: ssl, dns, ssh, stratum, etc.
- **URL Categories**: business, malware, cryptocurrency, etc.

### Dataset Characteristics
- **Normal Traffic**: Web, Email, DNS, SSH, Database, API (all action=allow)
- **Attack Traffic**: Various actions (deny, drop, alert) with threat categories
- **Zone Transitions**: Trust‚ÜíUntrust, Untrust‚ÜíTrust, Internal lateral

### Feature Engineering (38 Features)

| Layer | Features | Examples |
|-------|----------|----------|
| **L3/L4** | 16 | Rate, ratio, duration, port, protocol |
| **L7 (DPI)** | 12 | HTTP UA, DNS entropy, TLS JA3, cert age |
| **Firewall** | 10 | Action, zones, threat, rule hits |

### Firewall-Specific Detection

| Indicator | Feature | Meaning |
|-----------|---------|---------|
| `is_blocked` | action ‚àà {deny, drop} | Traffic was stopped |
| `is_alert` | action = alert | Suspicious but allowed |
| `is_external_inbound` | untrust ‚Üí internal | Potential attack vector |
| `is_internal_lateral` | trust ‚Üí trust | Lateral movement |
| `has_threat` | threat ‚â† none | IPS/AV detection |
| `is_implicit_deny` | rule_id = 999 | No matching rule (scan) |

### MITRE ATT&CK Coverage
| Attack Type | Firewall Action | Key Indicators |
|-------------|-----------------|----------------|
| Port Scan | deny/drop | implicit_deny, multi_dest |
| Brute Force | deny/alert | threat=brute-force |
| C2 Beacon | allow/alert | threat=C2, suspicious UA |
| Data Exfil | allow/alert | high bytes_sent, threat=data-theft |
| DNS Tunnel | allow/alert | threat=dns-tunneling, high entropy |
| DDoS | drop | threat=flood, multi_src |
| Lateral Move | allow/alert | internal_lateral, SMB app_id |
| Cryptomining | deny | threat=cryptocurrency |

### Key Takeaways
1. **Firewall logs provide action context** that NetFlow lacks
2. **Zone transitions** reveal attack direction (inbound vs lateral)
3. **Threat categories** from IPS/AV add high-value signals
4. **Implicit deny hits** are strong scan/probe indicators
5. Combining L3-L7 + firewall features provides comprehensive detection

### Next Steps
1. Integrate with SIEM for real-time alerting
2. Add firewall rule optimization based on traffic patterns
3. Implement threat feed enrichment (IP reputation)
4. Build automated response playbooks for high-confidence detections