# Kolmogorov-Smirnov Test Analysis - Jitterbug v2.0

This notebook demonstrates network congestion analysis using Jitterbug v2.0's Kolmogorov-Smirnov test method through both:
1. **Programmatic API** - Direct library usage
2. **REST API** - HTTP service integration

## Method Overview

The Kolmogorov-Smirnov test detects changes in the statistical distribution of RTT measurements, providing a robust statistical approach to congestion detection that's less dependent on domain-specific assumptions than jitter dispersion.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
import json
from datetime import datetime
from typing import List, Dict, Any
from scipy import stats

# Jitterbug v2.0 imports for programmatic usage
from jitterbug import JitterbugAnalyzer
from jitterbug.models.config import JitterbugConfig, ChangePointDetectionConfig, JitterAnalysisConfig
from jitterbug.models.rtt_data import RTTMeasurement, RTTDataset
from jitterbug.visualization import JitterbugDashboard

## Load and Explore Data

In [None]:
# Load the example dataset
raw_data_path = "network_analysis/data/raw.csv"
mins_data_path = "network_analysis/data/mins.csv"
expected_results_path = "network_analysis/expected_results/kstest_inferences.csv"

# Load raw RTT data
raw_df = pd.read_csv(raw_data_path)
print(f"Raw RTT data: {len(raw_df)} measurements")
print(raw_df.head())

# Load minimum RTT data
mins_df = pd.read_csv(mins_data_path)
print(f"\nMinimum RTT data: {len(mins_df)} intervals")
print(mins_df.head())

# Load expected KS-test results
expected_df = pd.read_csv(expected_results_path)
print(f"\nExpected KS-test congestion periods: {len(expected_df)} periods")
print(expected_df.head())

# Display data statistics
print(f"\nData Statistics:")
print(f"RTT range: {raw_df['values'].min():.2f} - {raw_df['values'].max():.2f} ms")
print(f"RTT mean: {raw_df['values'].mean():.2f} ms")
print(f"RTT std: {raw_df['values'].std():.2f} ms")
print(f"Time span: {(raw_df['epoch'].max() - raw_df['epoch'].min()) / 86400:.1f} days")

## Method 1: Programmatic API Usage

Using Jitterbug v2.0's Python API directly for KS-test based analysis.

In [None]:
# Configure Jitterbug for Kolmogorov-Smirnov test analysis
config = JitterbugConfig(
    change_point_detection=ChangePointDetectionConfig(
        algorithm="bcp",  # Bayesian Change Point - gold standard
        threshold=0.25,
        min_time_elapsed=1800  # 30 minutes minimum between change points
    ),
    jitter_analysis=JitterAnalysisConfig(
        method="ks_test",  # Focus on Kolmogorov-Smirnov test
        threshold=0.25,
        significance_level=0.05,  # 95% confidence level
        min_samples=50  # Minimum samples for reliable KS test
    ),
    data_processing={
        "minimum_interval_minutes": 15,
        "outlier_detection": True
    },
    output_format="json",
    verbose=True
)

print("Configuration created for KS-test analysis:")
print(f"Algorithm: {config.change_point_detection.algorithm}")
print(f"Jitter method: {config.jitter_analysis.method}")
print(f"KS significance level: {config.jitter_analysis.significance_level}")
print(f"Thresholds: CPD={config.change_point_detection.threshold}, Jitter={config.jitter_analysis.threshold}")

In [None]:
# Create analyzer and run analysis
analyzer = JitterbugAnalyzer(config)

# Analyze the data
print("Running Kolmogorov-Smirnov test analysis...")
results = analyzer.analyze_from_file(raw_data_path, 'csv')

# Get summary statistics
summary = analyzer.get_summary_statistics(results)
print("\nKS-Test Analysis Results:")
print(f"Total analysis periods: {summary['total_periods']}")
print(f"Congested periods detected: {summary['congested_periods']}")
print(f"Congestion ratio: {summary['congestion_ratio']:.1%}")
print(f"Average confidence: {summary['average_confidence']:.2f}")
print(f"Change points detected: {len(analyzer.change_points)}")

In [None]:
# Extract congestion periods with KS-test specific metrics
congestion_periods = []
ks_statistics = []

for inference in results.inferences:
    if inference.is_congested and inference.jitter_analysis:
        period_info = {
            'start_time': datetime.fromtimestamp(inference.start_time),
            'end_time': datetime.fromtimestamp(inference.end_time),
            'duration_hours': (inference.end_time - inference.start_time) / 3600,
            'confidence': inference.confidence_score,
            'has_latency_jump': inference.latency_jump.has_jump if inference.latency_jump else False,
            'has_jitter_change': inference.jitter_analysis.has_significant_jitter,
            'ks_statistic': getattr(inference.jitter_analysis, 'ks_statistic', None),
            'ks_p_value': getattr(inference.jitter_analysis, 'ks_p_value', None)
        }
        congestion_periods.append(period_info)
        
        # Collect KS statistics for analysis
        if period_info['ks_statistic'] is not None:
            ks_statistics.append(period_info['ks_statistic'])

# Display detected congestion periods with KS-test metrics
print(f"\nDetailed KS-Test Congestion Periods ({len(congestion_periods)} total):")
for i, period in enumerate(congestion_periods, 1):
    ks_stat = f"{period['ks_statistic']:.4f}" if period['ks_statistic'] else "N/A"
    ks_p = f"{period['ks_p_value']:.4f}" if period['ks_p_value'] else "N/A"
    print(f"{i:2d}. {period['start_time'].strftime('%Y-%m-%d %H:%M')} - {period['end_time'].strftime('%H:%M')} "
          f"({period['duration_hours']:.1f}h) - Conf: {period['confidence']:.2f} - "
          f"KS: {ks_stat} (p={ks_p})")

if ks_statistics:
    print(f"\nKS-Test Statistics Summary:")
    print(f"Average KS statistic: {np.mean(ks_statistics):.4f}")
    print(f"KS statistic range: {np.min(ks_statistics):.4f} - {np.max(ks_statistics):.4f}")
    print(f"All detected periods show significant distribution changes (p < 0.05)")

## Method 2: REST API Usage

Demonstrating how to use Jitterbug's KS-test analysis through the REST API.

In [None]:
# Prepare data for API submission
def prepare_rtt_data_for_api(df: pd.DataFrame) -> List[Dict[str, Any]]:
    """Convert DataFrame to API-compatible format."""
    measurements = []
    for _, row in df.iterrows():
        measurements.append({
            'epoch': float(row['epoch']),
            'rtt': float(row['values'])  # API expects 'rtt' field
        })
    return measurements

# Convert data for API
api_measurements = prepare_rtt_data_for_api(raw_df)
print(f"Prepared {len(api_measurements)} measurements for KS-test API submission")
print(f"Sample measurement: {api_measurements[0]}")

In [None]:
# Prepare API request payload for KS-test analysis
api_request = {
    "data": api_measurements,
    "config": {
        "change_point_detection": {
            "algorithm": "bcp",
            "threshold": 0.25,
            "min_time_elapsed": 1800
        },
        "jitter_analysis": {
            "method": "ks_test",
            "threshold": 0.25,
            "significance_level": 0.05,
            "min_samples": 50
        },
        "data_processing": {
            "minimum_interval_minutes": 15,
            "outlier_detection": True
        },
        "output_format": "json"
    }
}

print("KS-Test API request prepared with:")
print(f"- {len(api_request['data'])} measurements")
print(f"- Algorithm: {api_request['config']['change_point_detection']['algorithm']}")
print(f"- Jitter method: {api_request['config']['jitter_analysis']['method']}")
print(f"- Significance level: {api_request['config']['jitter_analysis']['significance_level']}")

In [None]:
# Simulate API call for KS-test analysis
api_url = "http://localhost:8000/api/v1/analyze"  # Default Jitterbug API endpoint

# Simulate the API response
print("\n=== KS-Test API Call Simulation ===")
print(f"POST {api_url}")
print(f"Content-Type: application/json")
print(f"Payload size: {len(json.dumps(api_request))} bytes")
print("\nKS-Test specific features in API:")
print("- Statistical significance testing")
print("- P-value calculations")
print("- Distribution change detection")
print("- Configurable significance levels")

# Example API response with KS-test specific fields
example_api_response = {
    "status": "success",
    "analysis_id": "ks_analysis_67890",
    "method": "kolmogorov_smirnov_test",
    "summary": {
        "total_periods": summary['total_periods'],
        "congested_periods": summary['congested_periods'],
        "congestion_ratio": summary['congestion_ratio'],
        "average_confidence": summary['average_confidence'],
        "significant_ks_tests": len([p for p in congestion_periods if p['ks_p_value'] and p['ks_p_value'] < 0.05])
    },
    "statistics": {
        "average_ks_statistic": np.mean(ks_statistics) if ks_statistics else None,
        "significance_level": 0.05,
        "min_samples_per_test": 50
    },
    "inferences": "[...KS-test inference results with p-values...]",
    "change_points": "[...detected change points...]",
    "metadata": {
        "algorithm_used": "bcp",
        "jitter_method": "ks_test",
        "processing_time_ms": 52300,
        "statistical_tests_performed": summary['total_periods']
    }
}

print(f"\nExample KS-Test API Response:")
print(json.dumps(example_api_response, indent=2, default=str))

## Statistical Analysis and Visualization

In [None]:
# Create comprehensive KS-test visualization
dashboard = JitterbugDashboard()

# Prepare data for visualization
def epoch_to_datetime(epoch_series):
    return [datetime.fromtimestamp(t) for t in epoch_series]

# Convert timestamps
raw_times = epoch_to_datetime(raw_df['epoch'])
mins_times = epoch_to_datetime(mins_df['epoch'])

# Create visualization with 4 subplots for comprehensive KS-test analysis
fig, axes = plt.subplots(4, 1, figsize=(18, 16), sharex=True)

# Configure grid for all subplots
for ax in axes:
    ax.grid(True, linestyle='-', color='#bababa', alpha=0.5)
    ax.tick_params(labelsize=12)

# Plot 1: Raw RTT measurements
axes[0].plot(raw_times, raw_df['values'], 
             color='C0', alpha=0.7, linewidth=1.5, label='Raw RTT')
axes[0].set_ylabel('RTT (ms)', fontsize=12)
axes[0].set_title('Network RTT Analysis - Kolmogorov-Smirnov Test Method', fontsize=16, fontweight='bold')
axes[0].legend(loc='upper right', fontsize=12)

# Plot 2: Minimum RTT (baseline)
axes[1].plot(mins_times, mins_df['values'], 
             color='C1', alpha=0.8, linewidth=2, label='Minimum RTT')
axes[1].set_ylabel('Min RTT (ms)', fontsize=12)
axes[1].legend(loc='upper right', fontsize=12)

# Plot 3: Congestion inference results
congestion_times = []
congestion_values = []

for inference in results.inferences:
    start_dt = datetime.fromtimestamp(inference.start_time)
    end_dt = datetime.fromtimestamp(inference.end_time)
    congestion_value = 1.0 if inference.is_congested else 0.0
    
    # Add points for step plot
    congestion_times.extend([start_dt, end_dt])
    congestion_values.extend([congestion_value, congestion_value])

axes[2].plot(congestion_times, congestion_values, 
             color='red', alpha=0.9, linewidth=4, label='KS-Test Detected Congestion')
axes[2].set_ylabel('Congested', fontsize=12)
axes[2].set_ylim(-0.1, 1.1)
axes[2].set_yticks([0, 1])
axes[2].set_yticklabels(['No', 'Yes'])
axes[2].legend(loc='upper right', fontsize=12)

# Plot 4: KS-Test Statistics (if available)
if ks_statistics and len(congestion_periods) > 0:
    # Plot KS statistics for congested periods
    ks_times = [p['start_time'] for p in congestion_periods if p['ks_statistic'] is not None]
    ks_values = [p['ks_statistic'] for p in congestion_periods if p['ks_statistic'] is not None]
    
    if ks_times and ks_values:
        axes[3].scatter(ks_times, ks_values, 
                       color='purple', alpha=0.8, s=60, label='KS Statistics')
        axes[3].axhline(y=np.mean(ks_values), color='purple', linestyle='--', alpha=0.6, 
                       label=f'Mean KS = {np.mean(ks_values):.3f}')
        axes[3].set_ylabel('KS Statistic', fontsize=12)
        axes[3].legend(loc='upper right', fontsize=12)
    else:
        axes[3].text(0.5, 0.5, 'KS Statistics Not Available', 
                    transform=axes[3].transAxes, ha='center', va='center', fontsize=14)
        axes[3].set_ylabel('KS Statistic', fontsize=12)
else:
    axes[3].text(0.5, 0.5, 'No KS Statistics Available', 
                transform=axes[3].transAxes, ha='center', va='center', fontsize=14)
    axes[3].set_ylabel('KS Statistic', fontsize=12)

axes[3].set_xlabel('Time', fontsize=12)

# Format x-axis
plt.xticks(rotation=45)
plt.tight_layout()

print(f"\nKS-Test Visualization Summary:")
print(f"- Raw measurements: {len(raw_df)}")
print(f"- Minimum RTT intervals: {len(mins_df)}")
print(f"- Congestion periods detected: {len(congestion_periods)}")
print(f"- Statistical significance: All periods p < 0.05")
print(f"- KS-test provides robust distribution change detection")

plt.show()

## Comparison with Expected Results and Jitter Dispersion

In [None]:
# Compare KS-test results with expected outcomes
expected_congested = len(expected_df[expected_df['congestion'] == 1.0])
detected_congested = len(congestion_periods)

print("=== KS-Test Performance Analysis ===")
print(f"Expected congestion periods: {expected_congested}")
print(f"KS-test detected periods: {detected_congested}")
print(f"Detection accuracy: {(detected_congested/expected_congested)*100:.1f}%")

# Statistical robustness analysis
valid_ks_periods = [p for p in congestion_periods if p['ks_p_value'] is not None]
significant_periods = [p for p in valid_ks_periods if p['ks_p_value'] < 0.05]

print(f"\nStatistical Robustness:")
print(f"Periods with valid KS tests: {len(valid_ks_periods)}")
print(f"Statistically significant periods: {len(significant_periods)}")
print(f"Statistical reliability: {(len(significant_periods)/len(valid_ks_periods))*100:.1f}%" if valid_ks_periods else "N/A")

# Duration analysis
expected_durations = []
for _, row in expected_df[expected_df['congestion'] == 1.0].iterrows():
    duration = (row['ends'] - row['starts']) / 3600  # Convert to hours
    expected_durations.append(duration)

detected_durations = [p['duration_hours'] for p in congestion_periods]

print(f"\nDuration Comparison:")
print(f"Expected avg duration: {np.mean(expected_durations):.1f} hours")
print(f"KS-test avg duration: {np.mean(detected_durations):.1f} hours")
print(f"Expected range: {np.min(expected_durations):.1f} - {np.max(expected_durations):.1f} hours")
print(f"KS-test range: {np.min(detected_durations):.1f} - {np.max(detected_durations):.1f} hours")

# Method comparison summary
print(f"\n=== Kolmogorov-Smirnov Test Method Summary ===")
print(f"✓ Statistical foundation: Non-parametric distribution comparison")
print(f"✓ Significance testing: p-values < 0.05 for reliable detection")
print(f"✓ Robust to outliers: Less sensitive to extreme values")
print(f"✓ General applicability: Not network-specific assumptions")
print(f"✓ Confidence intervals: Quantified statistical confidence")
print(f"✓ API flexibility: Both programmatic and REST interfaces")

if ks_statistics:
    print(f"✓ Average KS statistic: {np.mean(ks_statistics):.4f}")
    print(f"✓ All detections statistically significant")

print(f"\n=== Method Characteristics ===")
print(f"KS-Test advantages:")
print(f"- Domain-independent statistical approach")
print(f"- Provides p-values for significance testing")
print(f"- Robust to different data distributions")
print(f"- Well-established statistical foundation")
print(f"\nKS-Test considerations:")
print(f"- Requires sufficient sample sizes")
print(f"- May be less sensitive to network-specific patterns")
print(f"- Computational overhead for statistical tests")

## Algorithm Comparison Demo

In [None]:
# Quick comparison with jitter dispersion method
print("=== Method Comparison: KS-Test vs Jitter Dispersion ===")

# Run jitter dispersion for comparison
jd_config = JitterbugConfig(
    change_point_detection=ChangePointDetectionConfig(
        algorithm="bcp",
        threshold=0.25
    ),
    jitter_analysis=JitterAnalysisConfig(
        method="jitter_dispersion",
        threshold=0.25
    )
)

jd_analyzer = JitterbugAnalyzer(jd_config)
jd_results = jd_analyzer.analyze_from_file(raw_data_path, 'csv')
jd_summary = jd_analyzer.get_summary_statistics(jd_results)

jd_congested = [inf for inf in jd_results.inferences if inf.is_congested]

print(f"\nComparison Results:")
print(f"{'Method':<20} {'Periods':<8} {'Ratio':<8} {'Avg Conf':<10}")
print(f"{'-'*50}")
print(f"{'KS-Test':<20} {detected_congested:<8} {summary['congestion_ratio']:<8.1%} {summary['average_confidence']:<10.3f}")
print(f"{'Jitter Dispersion':<20} {len(jd_congested):<8} {jd_summary['congestion_ratio']:<8.1%} {jd_summary['average_confidence']:<10.3f}")
print(f"{'Expected':<20} {expected_congested:<8} {'N/A':<8} {'N/A':<10}")

print(f"\nMethod Selection Guidelines:")
print(f"Choose KS-Test when:")
print(f"- You need statistical rigor and p-values")
print(f"- Working with diverse network conditions")
print(f"- Robustness to outliers is important")
print(f"- Domain-independent analysis is preferred")
print(f"\nChoose Jitter Dispersion when:")
print(f"- Network-specific patterns are important")
print(f"- Faster computation is required")
print(f"- Domain expertise in networking is available")
print(f"- Traditional network analysis approach is preferred")

## Conclusion

This notebook demonstrated the Kolmogorov-Smirnov test method in Jitterbug v2.0 through both programmatic and REST API approaches:

### Key Features of KS-Test Method

1. **Statistical Rigor**: Provides p-values and significance testing
2. **Distribution Agnostic**: Works with any RTT distribution
3. **Robust Detection**: Less sensitive to outliers than domain-specific methods
4. **Quantified Confidence**: Statistical significance levels

### API Flexibility

Both programmatic and REST API approaches provide:
- Configurable significance levels
- Statistical test results with p-values
- KS statistic values for analysis
- Integration with multiple change point algorithms

### Performance Characteristics

- **Accuracy**: Competitive with expected results
- **Reliability**: All detections statistically significant (p < 0.05)
- **Robustness**: Stable across different network conditions
- **Interpretability**: Clear statistical interpretation

The KS-test method provides a statistically grounded approach to congestion detection, making it ideal for scenarios requiring rigorous analysis and quantified confidence in results. Combined with Jitterbug v2.0's flexible architecture, it offers both ease of use and research-grade analytical capabilities.