# Full Net Tracking Analysis: Multi-System Performance Assessment

This notebook performs comprehensive analysis of synchronized comparison data from three net-tracking systems using the `comparison_analysis` utility module.

**What this does:**
1. Loads and optionally smooths comparison data
2. Detects stable baseline segments
3. Computes noise, bias, and outlier statistics
4. Generates visualizations and summary reports

**Quick Start:**
1. Set `TARGET_BAG` to your bag ID
2. Run all cells
3. Check outputs in comparison_data and plots directories

#### **Generate for summary to csv for all of the available comparison data:**
`Use defaults`
python scripts/generate_bag_summary.py

## Configuration

In [1]:
from pathlib import Path
from utils.comparison_analysis import (
    load_and_prepare_data,
    compute_pairwise_differences,
    detect_stable_segments,
    compute_baseline_statistics,
    detect_outliers,
    analyze_failure_correlation,
    create_visualizations,
    generate_summary_report,
    print_multi_bag_summary
)

# Configuration
TARGET_BAG = "2024-08-20_13-57-42"
COMPARISON_DATA_DIR = Path("/Volumes/LaCie/SOLAQUA/comparison_data")
PLOTS_DIR = Path("/Volumes/LaCie/SOLAQUA/exports/plots")
PLOTS_DIR.mkdir(parents=True, exist_ok=True)

# Analysis parameters
SMOOTHING_ALPHA = None  # None = no smoothing, 0-1 = smoothing factor
ROLLING_WINDOW_SEC = 3.0
SIGMA_THRESH = 0.15
DELTA_THRESH = 0.15
MIN_SEGMENT_SEC = 1.0
OUTLIER_K = 3.5

print(f"Target: {TARGET_BAG}")
print(f"Smoothing: {'Disabled' if SMOOTHING_ALPHA is None else f'α={SMOOTHING_ALPHA:.2f}'}")
print(f"Stability thresholds: σ={SIGMA_THRESH}m, Δ={DELTA_THRESH}m")

   Missing TRACKING_CONFIG keys: {'corridor_both_directions', 'use_corridor_splitting', 'corridor_widen'}
Target: 2024-08-20_13-57-42
Smoothing: Disabled
Stability thresholds: σ=0.15m, Δ=0.15m


## Load and Process Data

In [2]:
# Load comparison data
data_path = COMPARISON_DATA_DIR / f"{TARGET_BAG}_raw_comparison.csv"
df, sampling_rate, available_systems = load_and_prepare_data(data_path, SMOOTHING_ALPHA)

# Compute pairwise differences
df = compute_pairwise_differences(df)

Loading data from: 2024-08-20_13-57-42_raw_comparison.csv

Data loaded: 2744 samples
Time range: 2024-08-20 11:57:45.106798649+00:00 to 2024-08-20 11:59:04.976633549+00:00
Duration: 79.9 seconds
Median sampling interval: 0.025 s (39.2 Hz)
Available systems: FFT, Sonar, DVL
Pairwise differences computed:
  diff_fft_nav: 2704 valid samples, mean=-0.133, std=1.176
  diff_sonar_nav: 2704 valid samples, mean=-0.021, std=0.169
  diff_fft_sonar: 2744 valid samples, mean=-0.110, std=1.148
  diff_pitch_fft_nav: 2704 valid samples, mean=-1.633, std=21.451
  diff_pitch_sonar_nav: 2704 valid samples, mean=-1.389, std=8.883
  diff_pitch_fft_sonar: 2744 valid samples, mean=-0.085, std=21.371
  diff_x_fft_nav: 2704 valid samples, mean=-0.145, std=1.167
  diff_x_sonar_nav: 2704 valid samples, mean=-0.022, std=0.185
  diff_x_fft_sonar: 2744 valid samples, mean=-0.120, std=1.136
  diff_y_fft_nav: 2704 valid samples, mean=-0.003, std=0.430
  diff_y_sonar_nav: 2704 valid samples, mean=-0.037, std=0.212
  

## Detect Stable Baseline Segments

In [3]:
df, segments, window_samples = detect_stable_segments(
    df, sampling_rate,
    rolling_window_sec=ROLLING_WINDOW_SEC,
    sigma_thresh=SIGMA_THRESH,
    delta_thresh=DELTA_THRESH,
    min_segment_sec=MIN_SEGMENT_SEC
)

Rolling window: 117 samples (3.0 s)

Computing rolling statistics...
Applying stability criteria...
Stable samples: 1017 / 2744 (37.1%)

Identifying stable baseline segments...
Found 3 stable baseline segments (≥ 1.0 s):

  Segment 1: 11:57:58 - 11:58:09 (11.6 s, 405 samples)
  Segment 2: 11:58:22 - 11:58:26 (4.2 s, 149 samples)
  Segment 3: 11:58:50 - 11:59:02 (12.6 s, 421 samples)


## Baseline Statistics

In [4]:
baseline_stats = compute_baseline_statistics(df)

=== BASELINE STATISTICS (1017 samples) ===

1. PER-METHOD NOISE - DISTANCE
  FFT: σ = 0.1124 m, MAD = 0.1410 m
  Sonar: σ = 0.0975 m, MAD = 0.0759 m
  DVL: σ = 0.1115 m, MAD = 0.1038 m

2. PER-METHOD NOISE - PITCH
  FFT: σ = 6.7911°, MAD = 6.6285°
  Sonar: σ = 8.9854°, MAD = 9.2353°
  DVL: σ = 8.3640°, MAD = 10.3814°

3. PER-METHOD NOISE - X POSITION
  FFT: σ = 0.1142 m, MAD = 0.1380 m
  Sonar: σ = 0.1001 m, MAD = 0.0862 m
  DVL: σ = 0.1106 m, MAD = 0.1185 m

4. PER-METHOD NOISE - Y POSITION
  FFT: σ = 0.1701 m, MAD = 0.1673 m
  Sonar: σ = 0.2134 m, MAD = 0.2416 m
  DVL: σ = 0.2088 m, MAD = 0.2640 m

5. PAIRWISE BIASES - DISTANCE
  FFT vs DVL: Bias = +0.0343 m, Std = 0.0721 m
  Sonar vs DVL: Bias = -0.0460 m, Std = 0.0656 m
  FFT vs Sonar: Bias = +0.0803 m, Std = 0.0811 m

6. PAIRWISE BIASES - PITCH
  FFT vs DVL: Bias = +5.4050 °, Std = 10.5018 °
  Sonar vs DVL: Bias = -0.9884 °, Std = 4.5380 °
  FFT vs Sonar: Bias = +6.3934 °, Std = 10.7871 °

7. PAIRWISE BIASES - X POSITION
  FFT vs 

## Outlier Detection

In [5]:
df, outlier_stats = detect_outliers(df, window_samples, outlier_k=OUTLIER_K)
correlation_stats = analyze_failure_correlation(df)

=== OUTLIER DETECTION ===

FFT: 518 outliers (18.88%), MAD = 0.0598 m
Sonar: 157 outliers (5.72%), MAD = 0.0211 m
DVL: 345 outliers (12.57%), MAD = 0.0297 m

=== OUTLIER CHARACTERISTICS ===

FFT:
  Magnitude:
    Mean: 1.3346 m, Median: 1.0244 m, Max: 19.5926 m
  Magnitude Categories:
    Small (≤0.4784m): 171 (33.0%)
    Medium (0.4784-1.2642m): 176 (34.0%)
    Large (>1.2642m): 171 (33.0%)
  Clustering:
    Total clusters: 33
    Avg cluster size: 16.7 frames
    Max cluster size: 120 frames
    Isolated outliers: 0 (0.0%)
    Multi-frame clusters: 33 (100.0%)

SONAR:
  Magnitude:
    Mean: 0.1032 m, Median: 0.1065 m, Max: 0.1430 m
  Magnitude Categories:
    Small (≤0.0917m): 52 (33.1%)
    Medium (0.0917-0.1128m): 53 (33.8%)
    Large (>0.1128m): 52 (33.1%)
  Clustering:
    Total clusters: 13
    Avg cluster size: 13.1 frames
    Max cluster size: 65 frames
    Isolated outliers: 0 (0.0%)
    Multi-frame clusters: 13 (100.0%)

DVL:
  Magnitude:
    Mean: 0.2162 m, Median: 0.1500 m

## Generate Visualizations

In [6]:
print("\n=== GENERATING VISUALIZATIONS ===\n")

figs = create_visualizations(
    df, segments, TARGET_BAG, PLOTS_DIR,
    sigma_thresh=SIGMA_THRESH,
    outlier_k=OUTLIER_K
)

# Display plots
for name, fig in figs.items():
    print(f"\nDisplaying {name}...")
    fig.show()

print(f"\n✓ All plots saved to: {PLOTS_DIR}")


=== GENERATING VISUALIZATIONS ===




Discarding nonzero nanoseconds in conversion.



Saved: 2024-08-20_13-57-42_timeseries.html
Saved: 2024-08-20_13-57-42_stability.html
Saved: 2024-08-20_13-57-42_outliers.html

Displaying timeseries...



Displaying stability...



Displaying outliers...



✓ All plots saved to: /Volumes/LaCie/SOLAQUA/exports/plots


## Generate Summary Report

In [7]:
config = {
    'Smoothing': f'α={SMOOTHING_ALPHA:.2f}' if SMOOTHING_ALPHA else 'Disabled',
    'Rolling Window': f'{ROLLING_WINDOW_SEC} s ({window_samples} samples)',
    'Stability σ Threshold': f'{SIGMA_THRESH} m',
    'Stability Δ Threshold': f'{DELTA_THRESH} m',
    'Min Segment Length': f'{MIN_SEGMENT_SEC} s',
    'Outlier Threshold': f'{OUTLIER_K} × MAD'
}

report_path = COMPARISON_DATA_DIR / f"{TARGET_BAG}_analysis_summary.md"
generate_summary_report(
    df, segments, baseline_stats, available_systems,
    TARGET_BAG, report_path, config, outlier_stats
)

print(f"\n✓ Analysis complete!")
print(f"  Plots: {PLOTS_DIR}")
print(f"  Report: {report_path}")


=== GENERATING SUMMARY REPORT ===

✓ Summary report saved: 2024-08-20_13-57-42_analysis_summary.md
  Location: /Volumes/LaCie/SOLAQUA/comparison_data/2024-08-20_13-57-42_analysis_summary.md

✓ Analysis complete!
  Plots: /Volumes/LaCie/SOLAQUA/exports/plots
  Report: /Volumes/LaCie/SOLAQUA/comparison_data/2024-08-20_13-57-42_analysis_summary.md


## Multi-Bag Summary Analysis

Load and analyze the comprehensive summary CSV containing all processed bags.

In [10]:
# Load and display multi-bag summary statistics
summary_csv_path = COMPARISON_DATA_DIR / "bag_summary.csv"
summary_df = print_multi_bag_summary(summary_csv_path)

# Optional: further analysis on summary_df
if summary_df is not None:
    print("\n📊 Additional insights:")
    print(f"  DataFrame available for custom analysis")
    print(f"  Example: summary_df[summary_df['baseline_percentage'] > 50]")
    
    # Check if outlier magnitude columns exist
    has_magnitude = any('outlier_mean_magnitude' in col for col in summary_df.columns)
    has_clustering = any('outlier_cluster_count' in col for col in summary_df.columns)
    
    if not has_magnitude or not has_clustering:
        print("\n⚠️  NOTE: Outlier magnitude and clustering metrics not found.")
        print("  The bag_summary.csv needs to be regenerated with the updated analysis.")
        print("  Run the cell below or: python scripts/generate_bag_summary.py")
        print("  Then re-run this cell to see outlier characteristics.")

Loading multi-bag summary from: bag_summary.csv

MULTI-BAG SUMMARY STATISTICS

📊 DATASET OVERVIEW
  Total bags analyzed: 46
  Total recording time: 52.7 hours
  Total samples: 54,347.0

🔧 SYSTEM AVAILABILITY
  FFT: 23/46 bags (50.0%)
  Sonar: 23/46 bags (50.0%)
  DVL: 23/46 bags (50.0%)

📈 BASELINE SEGMENTS (Stable Data)
  Avg baseline percentage: 9.5%
  Median baseline percentage: 1.3%
  Avg segments per bag: 2.6
  Total baseline time: 0.1 hours

📏 DISTANCE MEASUREMENT NOISE (Baseline STD)
  FFT: mean=0.1115m, median=0.0951m, min=0.0255m, max=0.2357m
  SONAR: mean=0.0983m, median=0.0709m, min=0.0178m, max=0.2221m
  DVL: mean=0.1101m, median=0.0949m, min=0.0130m, max=0.2500m

📐 PITCH MEASUREMENT NOISE (Baseline STD)
  FFT: mean=7.7254°, median=7.0645°, min=4.1893°, max=15.9795°
  SONAR: mean=3.7445°, median=3.7934°, min=0.8143°, max=8.9854°
  DVL: mean=4.5033°, median=4.3645°, min=1.7703°, max=8.3640°

⚖️  PAIRWISE BIASES (Systematic Differences)
  FFT vs DVL: mean=+0.0369m, median=+0.

## Regenerate Multi-Bag Summary (Optional)

Run this cell to regenerate the comprehensive bag summary CSV with all updated metrics including outlier characteristics.

In [9]:
# Regenerate the multi-bag summary CSV with all metrics
from utils.comparison_analysis import generate_multi_bag_summary_csv

print("Regenerating bag_summary.csv with all metrics...")
print("This may take a few minutes depending on the number of bags.\n")

output_path = generate_multi_bag_summary_csv(
    comparison_data_dir=COMPARISON_DATA_DIR,
    output_path=COMPARISON_DATA_DIR / "bag_summary.csv"
)

if output_path:
    print("\n" + "="*70)
    print("✓ Summary CSV regenerated successfully!")
    print(f"  Location: {output_path}")
    print("\nThe summary now includes:")
    print("  ✓ Outlier magnitude metrics (mean, median, max)")
    print("  ✓ Outlier clustering analysis (cluster count, sizes)")
    print("  ✓ Isolated vs. multi-frame outlier statistics")
    print("\nRe-run the 'Multi-Bag Summary Analysis' cell above to see the updated statistics.")
else:
    print("\n✗ Failed to regenerate summary CSV")

Regenerating bag_summary.csv with all metrics...
This may take a few minutes depending on the number of bags.


=== GENERATING MULTI-BAG SUMMARY CSV ===

Found 46 bags to process

Processing: ._2024-08-20_13-39-34
  ✗ Error processing ._2024-08-20_13-39-34: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte
Processing: ._2024-08-20_13-40-35
  ✗ Error processing ._2024-08-20_13-40-35: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte
Processing: ._2024-08-20_13-42-51
  ✗ Error processing ._2024-08-20_13-42-51: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte
Processing: ._2024-08-20_13-55-34
  ✗ Error processing ._2024-08-20_13-55-34: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte
Processing: ._2024-08-20_13-57-42
  ✗ Error processing ._2024-08-20_13-57-42: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte
Processing: ._2024-08-20_14-16-05
  ✗ Error processing ._2024-08-20_14