# BGP Anomaly Label Reinforcement Methodology

## Purpose
This notebook implements a **reinforcement methodology for anomaly labeling** - validating and strengthening the confidence of anomaly labels assigned to real-world BGP incidents.

## Key Differences from Normal Traffic Labeling
While normal traffic labeling uses anomaly detection to **discover** outliers, anomaly label reinforcement:
1. **Validates** that labeled anomalies truly deviate from normal patterns
2. **Quantifies** the confidence level of each anomaly label
3. **Verifies** incident coherence (samples from same incident should cluster)
4. **Cross-validates** anomaly types using multiple methodologies

## Methodology Overview

### 1. Baseline Normal Profile Construction
- Build statistical profiles of "normal" BGP behavior from reference data
- Define multivariate boundaries for normal traffic

### 2. Anomaly Deviation Scoring (5 Methods)
| Method | Purpose |
|--------|--------|
| Mahalanobis Distance | Deviation from normal centroid |
| One-Class SVM | Boundary violation scoring |
| Autoencoder Reconstruction | Reconstruction error as anomaly proxy |
| Statistical Z-Score | Multi-feature z-score aggregation |
| LOF vs Normal Baseline | Density deviation from normal |

### 3. Incident Coherence Validation
- Verify samples within same incident cluster together
- Measure inter-incident separation

### 4. Temporal Consistency Check
- Validate anomaly patterns align with known incident timelines

### 5. Cross-Validation with Supervised Learning
- Train classifiers to distinguish anomaly types
- Identify ambiguous or potentially mislabeled samples

### 6. Final Confidence Scoring
$$\text{confidence}(x) = \frac{\sum_{m=1}^{M} w_m \cdot s_m(x)}{\sum_{m=1}^{M} w_m}$$

Where $s_m(x)$ is the normalized score from method $m$ and $w_m$ is the method weight.

---
## Setup and Imports

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.colors import ListedColormap
plt.style.use('seaborn-v0_8-whitegrid')

# Preprocessing
from sklearn.preprocessing import RobustScaler, StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold

# Anomaly Detection Methods
from sklearn.svm import OneClassSVM
from sklearn.neighbors import LocalOutlierFactor
from sklearn.ensemble import IsolationForest
from sklearn.covariance import EllipticEnvelope
from scipy.spatial.distance import mahalanobis
from scipy import stats

# Clustering
from sklearn.cluster import KMeans, DBSCAN
import hdbscan

# Classification for cross-validation
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import classification_report, confusion_matrix, silhouette_score
from sklearn.metrics import adjusted_rand_score, normalized_mutual_info_score

# Dimensionality Reduction
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import umap

# Deep Learning for Autoencoder
try:
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras import layers, Model
    TF_AVAILABLE = True
except ImportError:
    TF_AVAILABLE = False
    print("TensorFlow not available. Autoencoder method will be skipped.")

# Utilities
from collections import Counter
from datetime import datetime
import json

print("All imports successful!")
print(f"TensorFlow available: {TF_AVAILABLE}")

---
## 1. Load and Explore Anomaly Dataset

In [None]:
# Load the anomaly dataset
anomaly_df = pd.read_csv('all_incidents_anomalies_only.csv')

print("="*60)
print("ANOMALY DATASET OVERVIEW")
print("="*60)
print(f"Total samples: {len(anomaly_df):,}")
print(f"Features: {anomaly_df.shape[1]}")
print(f"\nColumns: {list(anomaly_df.columns)}")

In [None]:
# Display first few rows
anomaly_df.head()

In [None]:
# Label distribution
print("\nLABEL DISTRIBUTION:")
print("-"*40)
label_counts = anomaly_df['label'].value_counts()
for label, count in label_counts.items():
    pct = count / len(anomaly_df) * 100
    print(f"  {label}: {count:,} ({pct:.1f}%)")

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Pie chart
colors = ['#FF6B6B', '#4ECDC4', '#45B7D1']
axes[0].pie(label_counts.values, labels=label_counts.index, autopct='%1.1f%%', 
            colors=colors, explode=[0.02]*len(label_counts))
axes[0].set_title('Anomaly Type Distribution', fontsize=14, fontweight='bold')

# Bar chart
bars = axes[1].bar(label_counts.index, label_counts.values, color=colors)
axes[1].set_title('Sample Count by Anomaly Type', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Count')
for bar, count in zip(bars, label_counts.values):
    axes[1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 200, 
                 f'{count:,}', ha='center', fontsize=11)

plt.tight_layout()
plt.show()

In [None]:
# Incident distribution
print("\nINCIDENT DISTRIBUTION:")
print("-"*60)
incident_counts = anomaly_df['Incident'].value_counts()
for incident, count in incident_counts.head(15).items():
    label = anomaly_df[anomaly_df['Incident'] == incident]['label'].iloc[0]
    print(f"  {incident}: {count:,} samples [{label}]")
print(f"  ... and {len(incident_counts) - 15} more incidents")

---
## 2. Feature Preparation

In [None]:
# Define feature columns (exclude metadata columns)
metadata_cols = ['label', 'Incident', 'window_start', 'window_end']
feature_cols = [col for col in anomaly_df.columns if col not in metadata_cols]

print(f"Feature columns ({len(feature_cols)}):")
print(feature_cols)

In [None]:
# Extract features and handle missing values
X = anomaly_df[feature_cols].copy()
y_label = anomaly_df['label'].copy()
y_incident = anomaly_df['Incident'].copy()

# Check for missing values
missing = X.isnull().sum()
if missing.sum() > 0:
    print("Missing values found:")
    print(missing[missing > 0])
    # Fill with median
    X = X.fillna(X.median())
else:
    print("No missing values found.")

# Check for infinite values
inf_mask = np.isinf(X.values)
if inf_mask.any():
    print(f"\nInfinite values found: {inf_mask.sum()}")
    X = X.replace([np.inf, -np.inf], np.nan).fillna(X.median())

print(f"\nFeature matrix shape: {X.shape}")

In [None]:
# Normalize features using RobustScaler (handles outliers better)
scaler = RobustScaler()
X_scaled = scaler.fit_transform(X)
X_scaled_df = pd.DataFrame(X_scaled, columns=feature_cols)

print("Features normalized with RobustScaler")
print(f"Scaled feature statistics:")
print(f"  Mean range: [{X_scaled.mean(axis=0).min():.3f}, {X_scaled.mean(axis=0).max():.3f}]")
print(f"  Std range: [{X_scaled.std(axis=0).min():.3f}, {X_scaled.std(axis=0).max():.3f}]")

---
## 3. Baseline Normal Profile Construction

To validate anomalies, we need a reference "normal" profile. We have two approaches:
1. **Synthetic Normal Profile**: Use statistical characteristics of typical BGP traffic
2. **Load Real Normal Data**: If available, use your labeled normal traffic

We'll construct a synthetic baseline based on domain knowledge of normal BGP behavior.

In [None]:
class NormalBaselineProfile:
    """
    Constructs a statistical profile of "normal" BGP traffic behavior.
    
    Based on domain knowledge:
    - Normal traffic has low announcement/withdrawal ratios
    - Stable AS paths with minimal changes
    - Low edit distances (path consistency)
    - Minimal origin changes and flaps
    """
    
    def __init__(self):
        # Domain knowledge: typical ranges for normal BGP traffic
        # These are approximate - adjust based on your reference data
        self.normal_ranges = {
            'announcements': (0, 50),
            'withdrawals': (0, 10),
            'nlri_ann': (0, 100),
            'dups': (0, 5),
            'origin_changes': (0, 2),
            'imp_wd': (0, 5),
            'as_path_max': (1, 10),
            'edit_distance_avg': (0, 2),
            'edit_distance_max': (0, 5),
            'flaps': (0, 3),
            'nadas': (0, 5)
        }
        
        self.fitted = False
        self.mean_ = None
        self.cov_ = None
        self.cov_inv_ = None
    
    def fit_from_data(self, normal_data):
        """Fit baseline from actual normal traffic data."""
        self.mean_ = normal_data.mean(axis=0)
        self.cov_ = np.cov(normal_data.T)
        try:
            self.cov_inv_ = np.linalg.pinv(self.cov_)
        except:
            self.cov_inv_ = np.eye(normal_data.shape[1])
        self.fitted = True
        return self
    
    def fit_synthetic(self, feature_names, n_samples=10000):
        """
        Generate synthetic normal baseline when real normal data unavailable.
        Uses domain knowledge to create realistic normal traffic patterns.
        """
        np.random.seed(42)
        n_features = len(feature_names)
        
        # Generate synthetic normal samples
        synthetic_normal = np.zeros((n_samples, n_features))
        
        for i, feat in enumerate(feature_names):
            # Use known ranges or generate low-variance normal distribution
            if feat in self.normal_ranges:
                low, high = self.normal_ranges[feat]
                mean = (low + high) / 2
                std = (high - low) / 4
            else:
                # For unknown features, use low values (normal = quiet)
                mean, std = 1.0, 0.5
            
            synthetic_normal[:, i] = np.abs(np.random.normal(mean, std, n_samples))
        
        self.fit_from_data(synthetic_normal)
        self.synthetic_data = synthetic_normal
        return self
    
    def mahalanobis_distance(self, X):
        """Calculate Mahalanobis distance from normal baseline."""
        if not self.fitted:
            raise ValueError("Profile not fitted. Call fit_from_data() or fit_synthetic() first.")
        
        distances = []
        for x in X:
            try:
                d = mahalanobis(x, self.mean_, self.cov_inv_)
            except:
                d = np.sqrt(np.sum((x - self.mean_)**2))
            distances.append(d)
        return np.array(distances)


# Try to load real normal data if available
normal_data_paths = [
    '../scripts/labeled_bgp_features.csv',
    '../scripts/ripe_rrc04_features_labeled.csv',
    'normal_traffic_features.csv'
]

normal_baseline = NormalBaselineProfile()
real_normal_loaded = False

for path in normal_data_paths:
    try:
        normal_df = pd.read_csv(path)
        if 'label' in normal_df.columns:
            normal_samples = normal_df[normal_df['label'].str.contains('normal', case=False, na=False)]
            if len(normal_samples) > 100:
                # Use common features
                common_features = [f for f in feature_cols if f in normal_samples.columns]
                if len(common_features) > 5:
                    normal_X = normal_samples[common_features].values
                    normal_X = np.nan_to_num(normal_X, nan=0.0, posinf=0.0, neginf=0.0)
                    normal_baseline.fit_from_data(normal_X)
                    real_normal_loaded = True
                    print(f"Loaded real normal baseline from: {path}")
                    print(f"  Normal samples: {len(normal_samples):,}")
                    print(f"  Common features: {len(common_features)}")
                    break
    except Exception as e:
        continue

if not real_normal_loaded:
    print("No real normal data found. Using synthetic baseline.")
    normal_baseline.fit_synthetic(feature_cols)
    print(f"Synthetic baseline generated with {len(feature_cols)} features")

---
## 4. Anomaly Deviation Scoring (5 Methods)

Each method provides an independent score indicating how "anomalous" each sample is.

In [None]:
class AnomalyReinforcementScorer:
    """
    Ensemble anomaly reinforcement scoring using 5 complementary methods.
    Higher scores indicate stronger anomaly evidence.
    """
    
    def __init__(self, normal_baseline=None):
        self.normal_baseline = normal_baseline
        self.scores = {}
        self.models = {}
    
    def score_mahalanobis(self, X):
        """
        Method 1: Mahalanobis Distance from Normal Baseline
        
        Measures how far each sample is from the "normal" centroid,
        accounting for feature correlations.
        """
        print("[1/5] Computing Mahalanobis distances...")
        
        if self.normal_baseline is not None and self.normal_baseline.fitted:
            distances = self.normal_baseline.mahalanobis_distance(X)
        else:
            # Fallback: compute relative to data centroid
            mean = X.mean(axis=0)
            cov = np.cov(X.T)
            try:
                cov_inv = np.linalg.pinv(cov)
                distances = np.array([mahalanobis(x, mean, cov_inv) for x in X])
            except:
                distances = np.sqrt(np.sum((X - mean)**2, axis=1))
        
        # Normalize to [0, 1]
        self.scores['mahalanobis'] = self._normalize_scores(distances)
        return self.scores['mahalanobis']
    
    def score_one_class_svm(self, X):
        """
        Method 2: One-Class SVM Decision Function
        
        Trains on the assumption that all data is anomalous,
        then uses decision function as anomaly score.
        More negative = more anomalous.
        """
        print("[2/5] Fitting One-Class SVM...")
        
        # Use a sample if data is too large
        if len(X) > 10000:
            np.random.seed(42)
            sample_idx = np.random.choice(len(X), 10000, replace=False)
            X_train = X[sample_idx]
        else:
            X_train = X
        
        ocsvm = OneClassSVM(kernel='rbf', nu=0.1, gamma='scale')
        ocsvm.fit(X_train)
        self.models['ocsvm'] = ocsvm
        
        # Decision function: negative = outlier
        decision = -ocsvm.decision_function(X)  # Negate so higher = more anomalous
        self.scores['ocsvm'] = self._normalize_scores(decision)
        return self.scores['ocsvm']
    
    def score_statistical(self, X):
        """
        Method 3: Statistical Z-Score Aggregation
        
        Computes z-scores for each feature and aggregates.
        Higher aggregate z-score = more anomalous.
        """
        print("[3/5] Computing statistical z-scores...")
        
        # Compute z-scores
        mean = X.mean(axis=0)
        std = X.std(axis=0) + 1e-10  # Avoid division by zero
        z_scores = np.abs((X - mean) / std)
        
        # Aggregate: max z-score or sum of extreme z-scores
        max_z = z_scores.max(axis=1)
        extreme_count = (z_scores > 3).sum(axis=1)  # Count features beyond 3 sigma
        
        # Combined score
        combined = max_z + extreme_count * 0.5
        self.scores['statistical'] = self._normalize_scores(combined)
        return self.scores['statistical']
    
    def score_lof(self, X):
        """
        Method 4: Local Outlier Factor
        
        Measures local density deviation. Samples in sparse regions
        relative to their neighbors get higher LOF scores.
        """
        print("[4/5] Computing Local Outlier Factor...")
        
        n_neighbors = min(int(np.sqrt(len(X))), 50)
        n_neighbors = max(n_neighbors, 10)
        
        lof = LocalOutlierFactor(n_neighbors=n_neighbors, contamination=0.1, novelty=False)
        lof.fit_predict(X)
        
        # Negative outlier factor: more negative = more anomalous
        lof_scores = -lof.negative_outlier_factor_
        self.scores['lof'] = self._normalize_scores(lof_scores)
        return self.scores['lof']
    
    def score_isolation_forest(self, X):
        """
        Method 5: Isolation Forest Score
        
        Anomalies are easier to isolate, requiring fewer splits.
        Lower score = more anomalous (we negate for consistency).
        """
        print("[5/5] Fitting Isolation Forest...")
        
        iso_forest = IsolationForest(n_estimators=200, contamination=0.1, 
                                      random_state=42, n_jobs=-1)
        iso_forest.fit(X)
        self.models['isolation_forest'] = iso_forest
        
        # Score: lower = more anomalous, negate for consistency
        iso_scores = -iso_forest.score_samples(X)
        self.scores['isolation_forest'] = self._normalize_scores(iso_scores)
        return self.scores['isolation_forest']
    
    def _normalize_scores(self, scores):
        """Normalize scores to [0, 1] range."""
        scores = np.array(scores)
        min_s, max_s = scores.min(), scores.max()
        if max_s - min_s > 0:
            return (scores - min_s) / (max_s - min_s)
        return np.zeros_like(scores)
    
    def compute_all_scores(self, X):
        """Compute all 5 anomaly scores."""
        print("="*60)
        print("COMPUTING ANOMALY REINFORCEMENT SCORES")
        print("="*60)
        
        self.score_mahalanobis(X)
        self.score_one_class_svm(X)
        self.score_statistical(X)
        self.score_lof(X)
        self.score_isolation_forest(X)
        
        print("\nAll scores computed.")
        return self.scores
    
    def get_ensemble_score(self, weights=None):
        """
        Compute weighted ensemble score.
        
        Higher score = stronger evidence of anomaly.
        """
        if not self.scores:
            raise ValueError("No scores computed. Call compute_all_scores() first.")
        
        if weights is None:
            # Equal weights
            weights = {k: 1.0 for k in self.scores.keys()}
        
        total_weight = sum(weights.values())
        ensemble = np.zeros(len(list(self.scores.values())[0]))
        
        for method, score in self.scores.items():
            w = weights.get(method, 1.0)
            ensemble += w * score
        
        return ensemble / total_weight


# Initialize scorer and compute all scores
scorer = AnomalyReinforcementScorer(normal_baseline=normal_baseline)
all_scores = scorer.compute_all_scores(X_scaled)

In [None]:
# Visualize score distributions
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for i, (method, scores) in enumerate(all_scores.items()):
    ax = axes[i]
    
    # Histogram by label
    for label in y_label.unique():
        mask = y_label == label
        ax.hist(scores[mask], bins=50, alpha=0.6, label=label, density=True)
    
    ax.set_title(f'{method.upper()} Score Distribution', fontweight='bold')
    ax.set_xlabel('Anomaly Score')
    ax.set_ylabel('Density')
    ax.legend()

# Ensemble score
ensemble_score = scorer.get_ensemble_score()
ax = axes[5]
for label in y_label.unique():
    mask = y_label == label
    ax.hist(ensemble_score[mask], bins=50, alpha=0.6, label=label, density=True)
ax.set_title('ENSEMBLE Score Distribution', fontweight='bold')
ax.set_xlabel('Anomaly Score')
ax.set_ylabel('Density')
ax.legend()

plt.tight_layout()
plt.show()

In [None]:
# Score correlation analysis
score_df = pd.DataFrame(all_scores)
score_df['ensemble'] = ensemble_score

print("\nSCORE CORRELATION MATRIX:")
print("-"*40)
corr_matrix = score_df.corr()

plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='RdYlBu_r', center=0, 
            fmt='.2f', square=True, linewidths=0.5)
plt.title('Anomaly Score Method Correlation', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

---
## 5. Confidence Label Assignment

Based on ensemble scores, assign confidence levels to anomaly labels.

In [None]:
def assign_anomaly_confidence(ensemble_scores, method_scores, thresholds=None):
    """
    Assign confidence levels to anomaly labels.
    
    Confidence Levels:
    - high_confidence_anomaly: Ensemble score >= 0.7 AND 4+ methods agree
    - confirmed_anomaly: Ensemble score >= 0.5 AND 3+ methods agree
    - likely_anomaly: Ensemble score >= 0.3 AND 2+ methods agree
    - uncertain_anomaly: Lower scores, needs review
    """
    if thresholds is None:
        thresholds = {
            'high_confidence': (0.7, 4),
            'confirmed': (0.5, 3),
            'likely': (0.3, 2)
        }
    
    n_samples = len(ensemble_scores)
    confidence_labels = np.array(['uncertain_anomaly'] * n_samples, dtype=object)
    
    # Count methods flagging as high anomaly (score > 0.5)
    method_agreement = np.zeros(n_samples)
    for method, scores in method_scores.items():
        method_agreement += (scores > 0.5).astype(int)
    
    # Assign confidence levels
    for i in range(n_samples):
        score = ensemble_scores[i]
        agreement = method_agreement[i]
        
        if score >= thresholds['high_confidence'][0] and agreement >= thresholds['high_confidence'][1]:
            confidence_labels[i] = 'high_confidence_anomaly'
        elif score >= thresholds['confirmed'][0] and agreement >= thresholds['confirmed'][1]:
            confidence_labels[i] = 'confirmed_anomaly'
        elif score >= thresholds['likely'][0] and agreement >= thresholds['likely'][1]:
            confidence_labels[i] = 'likely_anomaly'
    
    return confidence_labels, method_agreement


# Assign confidence labels
confidence_labels, method_agreement = assign_anomaly_confidence(ensemble_score, all_scores)

# Summary
print("\nANOMALY CONFIDENCE DISTRIBUTION:")
print("="*60)
conf_counts = pd.Series(confidence_labels).value_counts()
for conf, count in conf_counts.items():
    pct = count / len(confidence_labels) * 100
    print(f"  {conf}: {count:,} ({pct:.1f}%)")

In [None]:
# Cross-tabulate confidence with original labels
print("\nCONFIDENCE vs ORIGINAL LABEL:")
print("="*60)
crosstab = pd.crosstab(y_label, confidence_labels, margins=True)
print(crosstab)

# Visualize
fig, ax = plt.subplots(figsize=(12, 6))
crosstab_pct = pd.crosstab(y_label, confidence_labels, normalize='index') * 100

crosstab_pct.plot(kind='bar', stacked=True, ax=ax, 
                  colormap='RdYlGn_r', edgecolor='white')
ax.set_title('Confidence Distribution by Anomaly Type', fontsize=14, fontweight='bold')
ax.set_xlabel('Original Anomaly Label')
ax.set_ylabel('Percentage')
ax.legend(title='Confidence', bbox_to_anchor=(1.02, 1))
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

---
## 6. Incident Coherence Validation

Validate that samples from the same incident cluster together in feature space.

In [None]:
def compute_incident_coherence(X, incidents):
    """
    Compute coherence metrics for each incident.
    
    Metrics:
    - Intra-incident variance: How spread out are samples within an incident
    - Inter-incident distance: How far is this incident from others
    - Silhouette score: Cluster quality metric
    """
    unique_incidents = incidents.unique()
    coherence_results = []
    
    for incident in unique_incidents:
        mask = incidents == incident
        X_incident = X[mask]
        X_other = X[~mask]
        
        if len(X_incident) < 2:
            continue
        
        # Intra-incident variance (lower = more coherent)
        centroid = X_incident.mean(axis=0)
        intra_variance = np.mean(np.sum((X_incident - centroid)**2, axis=1))
        
        # Inter-incident distance (higher = more distinct)
        other_centroid = X_other.mean(axis=0) if len(X_other) > 0 else centroid
        inter_distance = np.sqrt(np.sum((centroid - other_centroid)**2))
        
        # Coherence score (higher = better)
        coherence = inter_distance / (np.sqrt(intra_variance) + 1e-10)
        
        coherence_results.append({
            'incident': incident,
            'n_samples': mask.sum(),
            'intra_variance': intra_variance,
            'inter_distance': inter_distance,
            'coherence_score': coherence
        })
    
    return pd.DataFrame(coherence_results)


# Compute incident coherence
coherence_df = compute_incident_coherence(X_scaled, y_incident)
coherence_df = coherence_df.sort_values('coherence_score', ascending=False)

print("\nINCIDENT COHERENCE SCORES:")
print("="*80)
print(f"{'Incident':<45} {'Samples':>8} {'Coherence':>12}")
print("-"*80)
for _, row in coherence_df.head(15).iterrows():
    print(f"{row['incident']:<45} {row['n_samples']:>8,} {row['coherence_score']:>12.2f}")

In [None]:
# Visualize coherence scores
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Coherence score distribution
ax = axes[0]
ax.hist(coherence_df['coherence_score'], bins=30, color='steelblue', edgecolor='white')
ax.axvline(coherence_df['coherence_score'].median(), color='red', linestyle='--', 
           label=f"Median: {coherence_df['coherence_score'].median():.2f}")
ax.set_title('Incident Coherence Score Distribution', fontsize=14, fontweight='bold')
ax.set_xlabel('Coherence Score')
ax.set_ylabel('Count')
ax.legend()

# Top/bottom incidents
ax = axes[1]
top_5 = coherence_df.head(5)
bottom_5 = coherence_df.tail(5)
combined = pd.concat([top_5, bottom_5])

colors = ['green']*5 + ['red']*5
y_pos = range(len(combined))
ax.barh(y_pos, combined['coherence_score'], color=colors, alpha=0.7)
ax.set_yticks(y_pos)
ax.set_yticklabels([inc[:30] + '...' if len(inc) > 30 else inc 
                    for inc in combined['incident']])
ax.set_xlabel('Coherence Score')
ax.set_title('Top 5 (Green) vs Bottom 5 (Red) Incidents', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

---
## 7. Cross-Validation with Supervised Learning

Use supervised classifiers to verify anomaly type labels and identify potentially mislabeled samples.

In [None]:
# Encode labels
le = LabelEncoder()
y_encoded = le.fit_transform(y_label)

print("Label encoding:")
for i, label in enumerate(le.classes_):
    print(f"  {i}: {label}")

In [None]:
# Train Random Forest classifier
print("\nTRAINING RANDOM FOREST FOR CROSS-VALIDATION")
print("="*60)

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y_encoded, test_size=0.2, random_state=42, stratify=y_encoded
)

# Train
rf = RandomForestClassifier(n_estimators=200, max_depth=20, 
                            random_state=42, n_jobs=-1, class_weight='balanced')
rf.fit(X_train, y_train)

# Evaluate
y_pred = rf.predict(X_test)
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=le.classes_))

In [None]:
# Confusion matrix
plt.figure(figsize=(10, 8))
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=le.classes_, yticklabels=le.classes_)
plt.title('Anomaly Type Classification Confusion Matrix', fontsize=14, fontweight='bold')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.tight_layout()
plt.show()

In [None]:
# Identify potentially mislabeled samples using prediction probabilities
print("\nIDENTIFYING POTENTIALLY MISLABELED SAMPLES")
print("="*60)

# Get prediction probabilities for all data
proba = rf.predict_proba(X_scaled)
predictions = rf.predict(X_scaled)

# Find samples where prediction differs from label
mismatch_mask = predictions != y_encoded
mismatch_count = mismatch_mask.sum()

print(f"Samples with prediction mismatch: {mismatch_count:,} ({mismatch_count/len(y_encoded)*100:.1f}%)")

# Find samples with low confidence in their assigned label
assigned_proba = proba[np.arange(len(y_encoded)), y_encoded]
low_confidence_mask = assigned_proba < 0.5
low_conf_count = low_confidence_mask.sum()

print(f"Samples with low confidence (<50%): {low_conf_count:,} ({low_conf_count/len(y_encoded)*100:.1f}%)")

# Combined: mismatch AND low confidence = potentially mislabeled
potentially_mislabeled = mismatch_mask & low_confidence_mask
print(f"Potentially mislabeled: {potentially_mislabeled.sum():,} ({potentially_mislabeled.sum()/len(y_encoded)*100:.1f}%)")

In [None]:
# Feature importance
print("\nFEATURE IMPORTANCE FOR ANOMALY TYPE CLASSIFICATION:")
print("-"*60)

importance_df = pd.DataFrame({
    'feature': feature_cols,
    'importance': rf.feature_importances_
}).sort_values('importance', ascending=False)

print(importance_df.head(15).to_string(index=False))

# Visualize
plt.figure(figsize=(12, 8))
top_features = importance_df.head(15)
plt.barh(range(len(top_features)), top_features['importance'], color='steelblue')
plt.yticks(range(len(top_features)), top_features['feature'])
plt.xlabel('Importance')
plt.title('Top 15 Features for Anomaly Type Discrimination', fontsize=14, fontweight='bold')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

---
## 8. Clustering Validation (HDBSCAN)

Verify that natural clusters align with anomaly type labels.

In [None]:
# Reduce dimensionality for clustering
print("Performing dimensionality reduction for clustering...")

# PCA first for speed
pca = PCA(n_components=min(10, X_scaled.shape[1]), random_state=42)
X_pca = pca.fit_transform(X_scaled)
print(f"PCA explained variance: {pca.explained_variance_ratio_.sum():.2%}")

# UMAP for visualization
try:
    reducer = umap.UMAP(n_components=2, random_state=42, n_neighbors=30, min_dist=0.1)
    X_umap = reducer.fit_transform(X_pca)
    print("UMAP reduction complete.")
except:
    print("UMAP not available, using t-SNE...")
    X_umap = TSNE(n_components=2, random_state=42, perplexity=30).fit_transform(X_pca)

In [None]:
# HDBSCAN clustering
print("\nPerforming HDBSCAN clustering...")

clusterer = hdbscan.HDBSCAN(min_cluster_size=50, min_samples=10, 
                            cluster_selection_epsilon=0.5)
cluster_labels = clusterer.fit_predict(X_pca)

n_clusters = len(set(cluster_labels)) - (1 if -1 in cluster_labels else 0)
n_noise = (cluster_labels == -1).sum()

print(f"Number of clusters found: {n_clusters}")
print(f"Noise points: {n_noise:,} ({n_noise/len(cluster_labels)*100:.1f}%)")

In [None]:
# Compare clusters with labels
print("\nCLUSTER vs LABEL ALIGNMENT:")
print("="*60)

# Compute alignment metrics
valid_mask = cluster_labels != -1
if valid_mask.sum() > 100:
    ari = adjusted_rand_score(y_encoded[valid_mask], cluster_labels[valid_mask])
    nmi = normalized_mutual_info_score(y_encoded[valid_mask], cluster_labels[valid_mask])
    print(f"Adjusted Rand Index: {ari:.3f}")
    print(f"Normalized Mutual Information: {nmi:.3f}")

# Cross-tabulation
cluster_label_crosstab = pd.crosstab(cluster_labels, y_label, margins=True)
print("\nCluster vs Anomaly Type:")
print(cluster_label_crosstab)

In [None]:
# Visualize clusters vs labels
fig, axes = plt.subplots(1, 2, figsize=(16, 7))

# Plot by anomaly type
ax = axes[0]
for label in y_label.unique():
    mask = y_label == label
    ax.scatter(X_umap[mask, 0], X_umap[mask, 1], alpha=0.5, s=10, label=label)
ax.set_title('2D Projection by Anomaly Type', fontsize=14, fontweight='bold')
ax.set_xlabel('UMAP 1')
ax.set_ylabel('UMAP 2')
ax.legend()

# Plot by cluster
ax = axes[1]
scatter = ax.scatter(X_umap[:, 0], X_umap[:, 1], c=cluster_labels, 
                     cmap='tab20', alpha=0.5, s=10)
ax.set_title('2D Projection by HDBSCAN Cluster', fontsize=14, fontweight='bold')
ax.set_xlabel('UMAP 1')
ax.set_ylabel('UMAP 2')
plt.colorbar(scatter, ax=ax, label='Cluster')

plt.tight_layout()
plt.show()

---
## 9. Autoencoder Reconstruction Error (Deep Learning)

Train an autoencoder on anomaly data and use reconstruction error as additional validation.

In [None]:
if TF_AVAILABLE:
    print("Building Autoencoder for reconstruction-based validation...")
    
    # Build autoencoder
    input_dim = X_scaled.shape[1]
    encoding_dim = 8
    
    # Encoder
    inputs = keras.Input(shape=(input_dim,))
    x = layers.Dense(64, activation='relu')(inputs)
    x = layers.BatchNormalization()(x)
    x = layers.Dropout(0.2)(x)
    x = layers.Dense(32, activation='relu')(x)
    x = layers.BatchNormalization()(x)
    encoded = layers.Dense(encoding_dim, activation='relu', name='encoding')(x)
    
    # Decoder
    x = layers.Dense(32, activation='relu')(encoded)
    x = layers.BatchNormalization()(x)
    x = layers.Dense(64, activation='relu')(x)
    x = layers.BatchNormalization()(x)
    decoded = layers.Dense(input_dim, activation='linear')(x)
    
    # Model
    autoencoder = keras.Model(inputs, decoded)
    autoencoder.compile(optimizer='adam', loss='mse')
    
    print(autoencoder.summary())
else:
    print("TensorFlow not available. Skipping autoencoder analysis.")

In [None]:
if TF_AVAILABLE:
    # Train autoencoder
    print("\nTraining autoencoder...")
    
    history = autoencoder.fit(
        X_scaled, X_scaled,
        epochs=50,
        batch_size=256,
        validation_split=0.1,
        verbose=1,
        callbacks=[
            keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)
        ]
    )
    
    # Plot training history
    plt.figure(figsize=(10, 4))
    plt.plot(history.history['loss'], label='Training Loss')
    plt.plot(history.history['val_loss'], label='Validation Loss')
    plt.xlabel('Epoch')
    plt.ylabel('MSE Loss')
    plt.title('Autoencoder Training History', fontsize=14, fontweight='bold')
    plt.legend()
    plt.tight_layout()
    plt.show()

In [None]:
if TF_AVAILABLE:
    # Compute reconstruction error
    reconstructed = autoencoder.predict(X_scaled, verbose=0)
    reconstruction_error = np.mean((X_scaled - reconstructed)**2, axis=1)
    
    print("\nRECONSTRUCTION ERROR BY ANOMALY TYPE:")
    print("="*60)
    for label in y_label.unique():
        mask = y_label == label
        errors = reconstruction_error[mask]
        print(f"{label}:")
        print(f"  Mean: {errors.mean():.4f}, Std: {errors.std():.4f}")
        print(f"  Range: [{errors.min():.4f}, {errors.max():.4f}]")
    
    # Add to scores
    all_scores['autoencoder'] = (reconstruction_error - reconstruction_error.min()) / \
                                 (reconstruction_error.max() - reconstruction_error.min())
    
    # Visualize
    plt.figure(figsize=(10, 6))
    for label in y_label.unique():
        mask = y_label == label
        plt.hist(reconstruction_error[mask], bins=50, alpha=0.6, label=label, density=True)
    plt.xlabel('Reconstruction Error')
    plt.ylabel('Density')
    plt.title('Autoencoder Reconstruction Error by Anomaly Type', fontsize=14, fontweight='bold')
    plt.legend()
    plt.tight_layout()
    plt.show()

---
## 10. Final Reinforced Labels and Confidence Scores

In [None]:
def compute_final_reinforcement(original_labels, ensemble_scores, method_scores,
                                classifier_proba, cluster_labels, incidents,
                                coherence_df):
    """
    Compute final reinforced labels with comprehensive confidence scoring.
    
    Components:
    1. Anomaly deviation score (ensemble)
    2. Classifier confidence
    3. Incident coherence
    4. Cluster alignment
    """
    n_samples = len(original_labels)
    results = pd.DataFrame({
        'original_label': original_labels,
        'incident': incidents
    })
    
    # 1. Anomaly deviation score
    results['anomaly_score'] = ensemble_scores
    
    # 2. Classifier confidence (probability of assigned class)
    le = LabelEncoder()
    y_enc = le.fit_transform(original_labels)
    results['classifier_confidence'] = classifier_proba[np.arange(n_samples), y_enc]
    
    # 3. Incident coherence (lookup from coherence_df)
    coherence_map = dict(zip(coherence_df['incident'], coherence_df['coherence_score']))
    max_coherence = coherence_df['coherence_score'].max()
    results['incident_coherence'] = [coherence_map.get(inc, 0) / max_coherence 
                                      for inc in incidents]
    
    # 4. Cluster alignment (is sample in majority cluster for its label?)
    label_cluster_majority = {}
    for label in original_labels.unique():
        mask = original_labels == label
        clusters_for_label = cluster_labels[mask]
        valid_clusters = clusters_for_label[clusters_for_label != -1]
        if len(valid_clusters) > 0:
            majority_cluster = Counter(valid_clusters).most_common(1)[0][0]
            label_cluster_majority[label] = majority_cluster
    
    results['cluster_aligned'] = [
        1.0 if cluster_labels[i] == label_cluster_majority.get(original_labels.iloc[i], -99) else 0.5
        for i in range(n_samples)
    ]
    
    # Combined confidence score
    weights = {
        'anomaly_score': 0.3,
        'classifier_confidence': 0.35,
        'incident_coherence': 0.2,
        'cluster_aligned': 0.15
    }
    
    results['final_confidence'] = (
        weights['anomaly_score'] * results['anomaly_score'] +
        weights['classifier_confidence'] * results['classifier_confidence'] +
        weights['incident_coherence'] * results['incident_coherence'] +
        weights['cluster_aligned'] * results['cluster_aligned']
    )
    
    # Assign reinforced confidence labels
    def assign_label(conf):
        if conf >= 0.8:
            return 'very_high_confidence'
        elif conf >= 0.65:
            return 'high_confidence'
        elif conf >= 0.5:
            return 'medium_confidence'
        elif conf >= 0.35:
            return 'low_confidence'
        else:
            return 'needs_review'
    
    results['reinforced_label'] = results['final_confidence'].apply(assign_label)
    
    return results


# Compute final reinforcement
final_results = compute_final_reinforcement(
    y_label, ensemble_score, all_scores, proba, cluster_labels, y_incident, coherence_df
)

print("\nFINAL REINFORCEMENT SUMMARY:")
print("="*60)
reinforced_counts = final_results['reinforced_label'].value_counts()
for label, count in reinforced_counts.items():
    pct = count / len(final_results) * 100
    print(f"  {label}: {count:,} ({pct:.1f}%)")

In [None]:
# Cross-tabulate with original labels
print("\nREINFORCED CONFIDENCE vs ORIGINAL LABEL:")
print("="*60)
crosstab_final = pd.crosstab(final_results['original_label'], 
                              final_results['reinforced_label'], 
                              margins=True)
print(crosstab_final)

# Visualize
fig, ax = plt.subplots(figsize=(12, 6))
crosstab_pct = pd.crosstab(final_results['original_label'], 
                           final_results['reinforced_label'], 
                           normalize='index') * 100

# Reorder columns
col_order = ['very_high_confidence', 'high_confidence', 'medium_confidence', 
             'low_confidence', 'needs_review']
crosstab_pct = crosstab_pct[[c for c in col_order if c in crosstab_pct.columns]]

crosstab_pct.plot(kind='bar', stacked=True, ax=ax, 
                  colormap='RdYlGn', edgecolor='white')
ax.set_title('Reinforced Confidence by Anomaly Type', fontsize=14, fontweight='bold')
ax.set_xlabel('Original Anomaly Label')
ax.set_ylabel('Percentage')
ax.legend(title='Confidence', bbox_to_anchor=(1.02, 1))
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
# Detailed statistics
print("\nDETAILED CONFIDENCE STATISTICS BY ANOMALY TYPE:")
print("="*80)

for label in final_results['original_label'].unique():
    mask = final_results['original_label'] == label
    subset = final_results[mask]
    
    print(f"\n{label.upper()}:")
    print(f"  Samples: {len(subset):,}")
    print(f"  Final Confidence: mean={subset['final_confidence'].mean():.3f}, "
          f"std={subset['final_confidence'].std():.3f}")
    print(f"  Anomaly Score: mean={subset['anomaly_score'].mean():.3f}")
    print(f"  Classifier Confidence: mean={subset['classifier_confidence'].mean():.3f}")
    print(f"  Incident Coherence: mean={subset['incident_coherence'].mean():.3f}")

---
## 11. Export Reinforced Dataset

In [None]:
# Merge reinforcement results with original data
reinforced_df = anomaly_df.copy()

# Add reinforcement columns
reinforced_df['anomaly_deviation_score'] = final_results['anomaly_score'].values
reinforced_df['classifier_confidence'] = final_results['classifier_confidence'].values
reinforced_df['incident_coherence'] = final_results['incident_coherence'].values
reinforced_df['cluster_aligned'] = final_results['cluster_aligned'].values
reinforced_df['final_confidence_score'] = final_results['final_confidence'].values
reinforced_df['reinforced_confidence_label'] = final_results['reinforced_label'].values

# Add individual method scores
for method, scores in all_scores.items():
    reinforced_df[f'score_{method}'] = scores

# Add method agreement count
reinforced_df['method_agreement_count'] = method_agreement

print(f"Reinforced dataset shape: {reinforced_df.shape}")
print(f"New columns added: {len(reinforced_df.columns) - len(anomaly_df.columns)}")

In [None]:
# Save reinforced dataset
output_path = 'all_incidents_anomalies_reinforced.csv'
reinforced_df.to_csv(output_path, index=False)
print(f"\nReinforced dataset saved to: {output_path}")
print(f"File size: {reinforced_df.memory_usage(deep=True).sum() / 1024 / 1024:.2f} MB")

In [None]:
# Preview
print("\nSample of reinforced dataset:")
reinforced_df[['label', 'Incident', 'final_confidence_score', 
               'reinforced_confidence_label', 'method_agreement_count']].head(10)

---
## 12. Summary Report

In [None]:
print("\n" + "="*80)
print("BGP ANOMALY LABEL REINFORCEMENT - SUMMARY REPORT")
print("="*80)

print("\n1. DATASET OVERVIEW")
print("-"*40)
print(f"   Total samples: {len(anomaly_df):,}")
print(f"   Anomaly types: {anomaly_df['label'].nunique()}")
print(f"   Unique incidents: {anomaly_df['Incident'].nunique()}")
print(f"   Features used: {len(feature_cols)}")

print("\n2. ANOMALY TYPE DISTRIBUTION")
print("-"*40)
for label, count in anomaly_df['label'].value_counts().items():
    pct = count / len(anomaly_df) * 100
    print(f"   {label}: {count:,} ({pct:.1f}%)")

print("\n3. REINFORCEMENT METHODS USED")
print("-"*40)
print("   1. Mahalanobis Distance from Normal Baseline")
print("   2. One-Class SVM Decision Function")
print("   3. Statistical Z-Score Aggregation")
print("   4. Local Outlier Factor (LOF)")
print("   5. Isolation Forest Anomaly Score")
if TF_AVAILABLE:
    print("   6. Autoencoder Reconstruction Error")

print("\n4. REINFORCED CONFIDENCE DISTRIBUTION")
print("-"*40)
for label, count in final_results['reinforced_label'].value_counts().items():
    pct = count / len(final_results) * 100
    print(f"   {label}: {count:,} ({pct:.1f}%)")

print("\n5. CLASSIFIER PERFORMANCE (Random Forest)")
print("-"*40)
from sklearn.metrics import accuracy_score, f1_score
print(f"   Test Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"   Macro F1 Score: {f1_score(y_test, y_pred, average='macro'):.3f}")

print("\n6. CLUSTERING VALIDATION")
print("-"*40)
print(f"   Clusters found: {n_clusters}")
print(f"   Noise points: {n_noise:,} ({n_noise/len(cluster_labels)*100:.1f}%)")
if valid_mask.sum() > 100:
    print(f"   Adjusted Rand Index: {ari:.3f}")
    print(f"   Normalized Mutual Info: {nmi:.3f}")

print("\n7. INCIDENT COHERENCE")
print("-"*40)
print(f"   Mean coherence score: {coherence_df['coherence_score'].mean():.2f}")
print(f"   Median coherence score: {coherence_df['coherence_score'].median():.2f}")
print(f"   Most coherent incident: {coherence_df.iloc[0]['incident']}")

print("\n8. OUTPUT FILES")
print("-"*40)
print(f"   Reinforced dataset: {output_path}")

print("\n" + "="*80)
print("REINFORCEMENT COMPLETE")
print("="*80)

---
## Methodology Summary

### Anomaly Label Reinforcement Algorithm

```
Algorithm: BGP Anomaly Label Reinforcement

Input: Anomaly dataset X with labels Y, incidents I
Output: Reinforced labels with confidence scores

1. BASELINE CONSTRUCTION
   a. Load real normal traffic data (if available)
   b. OR construct synthetic normal baseline from domain knowledge
   c. Compute normal centroid and covariance

2. ANOMALY DEVIATION SCORING (5 Methods)
   For each method m ∈ {Mahalanobis, OCSVM, Statistical, LOF, IF}:
      a. Compute deviation score s_m(x) for each sample
      b. Normalize to [0, 1]
   Ensemble score: S(x) = mean(s_1(x), ..., s_M(x))

3. INCIDENT COHERENCE VALIDATION
   For each incident i:
      a. Compute intra-incident variance
      b. Compute inter-incident distance
      c. Coherence(i) = inter_distance / sqrt(intra_variance)

4. SUPERVISED CROSS-VALIDATION
   a. Train Random Forest on anomaly types
   b. Compute prediction probabilities
   c. Identify potentially mislabeled samples

5. CLUSTERING VALIDATION
   a. Apply HDBSCAN to discover natural clusters
   b. Compute cluster-label alignment metrics
   c. Flag samples in minority clusters

6. FINAL CONFIDENCE SCORING
   Confidence(x) = w₁·S(x) + w₂·P(y|x) + w₃·C(i) + w₄·A(x)
   Where:
      S(x) = anomaly deviation score
      P(y|x) = classifier probability
      C(i) = incident coherence
      A(x) = cluster alignment

7. ASSIGN REINFORCED LABELS
   very_high_confidence: Confidence >= 0.8
   high_confidence: Confidence >= 0.65
   medium_confidence: Confidence >= 0.5
   low_confidence: Confidence >= 0.35
   needs_review: Confidence < 0.35

Return: Reinforced dataset with confidence scores and labels
```

### Key Differences from Normal Traffic Labeling

| Aspect | Normal Traffic Labeling | Anomaly Label Reinforcement |
|--------|------------------------|-----------------------------|
| Goal | Discover anomalies in unlabeled data | Validate pre-labeled anomalies |
| Baseline | Data itself | External normal profile |
| Methods | Anomaly detection | Deviation + Classification |
| Output | Binary (normal/anomaly) | Multi-level confidence |
| Validation | Cluster discovery | Incident coherence |