# Full Net Tracking Analysis: Multi-System Performance Assessment

This notebook performs comprehensive analysis of synchronized comparison data from three net-tracking systems using the `comparison_analysis` utility module.

**What this does:**
1. Loads and optionally smooths comparison data
2. Detects stable baseline segments
3. Computes noise, bias, and outlier statistics
4. Generates visualizations and summary reports

**Quick Start:**
1. Set `TARGET_BAG` to your bag ID
2. Run all cells
3. Check outputs in comparison_data and plots directories

#### **Generate for summary to csv for all of the available comparison data:**
`Use defaults`
python scripts/generate_bag_summary.py

## Configuration

In [10]:
from pathlib import Path
from utils.comparison_analysis import (
    load_and_prepare_data,
    compute_pairwise_differences,
    detect_stable_segments,
    compute_baseline_statistics,
    detect_outliers,
    analyze_failure_correlation,
    create_visualizations,
    generate_summary_report,
    print_multi_bag_summary
)

# Configuration
TARGET_BAG = "2024-08-20_13-57-42"
COMPARISON_DATA_DIR = Path("/Volumes/LaCie/SOLAQUA/comparison_data")
PLOTS_DIR = Path("/Volumes/LaCie/SOLAQUA/exports/plots")
PLOTS_DIR.mkdir(parents=True, exist_ok=True)

# Analysis parameters
SMOOTHING_ALPHA = 0.75      # None = no smoothing, 0-1 = smoothing factor
ROLLING_WINDOW_SEC = 2.0
SIGMA_THRESH = 0.25          # increase to allow more variance (e.g., 0.4 → 0.6)
DELTA_THRESH = 0.25          # increase to tolerate larger inter-system drift
MIN_SEGMENT_SEC = 1.0       # lower to accept shorter stable periods
OUTLIER_K = 3.5

print(f"Target: {TARGET_BAG}")
print(f"Smoothing: {'Disabled' if SMOOTHING_ALPHA is None else f'α={SMOOTHING_ALPHA:.2f}'}")
print(f"Stability thresholds: σ={SIGMA_THRESH}m, Δ={DELTA_THRESH}m")

Target: 2024-08-20_13-57-42
Smoothing: α=0.75
Stability thresholds: σ=0.25m, Δ=0.25m


## Load and Process Data

In [11]:
# Load comparison data
data_path = COMPARISON_DATA_DIR / f"{TARGET_BAG}_raw_comparison.csv"
df, sampling_rate, available_systems = load_and_prepare_data(data_path, SMOOTHING_ALPHA)

# Compute pairwise differences
df = compute_pairwise_differences(df)

Loading data from: 2024-08-20_13-57-42_raw_comparison.csv

Applying exponential smoothing (α=0.75)...
  ✓ Smoothed 12 columns

Data loaded: 2744 samples
Time range: 2024-08-20 11:57:45.106798649+00:00 to 2024-08-20 11:59:04.976633549+00:00
Duration: 79.9 seconds
Median sampling interval: 0.025 s (39.2 Hz)
Available systems: FFT, Sonar, DVL
Pairwise differences computed:
  diff_fft_nav: 2715 valid samples, mean=-0.132, std=1.123
  diff_sonar_nav: 2715 valid samples, mean=-0.021, std=0.167
  diff_fft_sonar: 2744 valid samples, mean=-0.110, std=1.097
  diff_pitch_fft_nav: 2715 valid samples, mean=-1.628, std=20.708
  diff_pitch_sonar_nav: 2715 valid samples, mean=-1.391, std=8.712
  diff_pitch_fft_sonar: 2744 valid samples, mean=-0.084, std=20.741
  diff_x_fft_nav: 2715 valid samples, mean=-0.144, std=1.115
  diff_x_sonar_nav: 2715 valid samples, mean=-0.022, std=0.183
  diff_x_fft_sonar: 2744 valid samples, mean=-0.120, std=1.086
  diff_y_fft_nav: 2715 valid samples, mean=-0.003, std=0.4

## Detect Stable Baseline Segments

In [12]:
df, segments, window_samples = detect_stable_segments(
    df, sampling_rate,
    rolling_window_sec=ROLLING_WINDOW_SEC,
    sigma_thresh=SIGMA_THRESH,
    delta_thresh=DELTA_THRESH,
    min_segment_sec=MIN_SEGMENT_SEC
)

Rolling window: 78 samples (2.0 s)

Computing rolling statistics...
Applying stability criteria...
Stable samples: 610 / 2744 (22.2%)

Identifying stable baseline segments...
Found 5 stable baseline segments (≥ 1.0 s):

  Segment 1: 11:58:09 - 11:58:12 (2.8 s, 97 samples)
  Segment 2: 11:58:21 - 11:58:22 (1.7 s, 60 samples)
  Segment 3: 11:58:23 - 11:58:26 (2.5 s, 88 samples)
  Segment 4: 11:58:50 - 11:58:56 (5.9 s, 202 samples)
  Segment 5: 11:58:58 - 11:59:03 (5.0 s, 168 samples)


## Baseline Statistics

In [13]:
baseline_stats = compute_baseline_statistics(df)

=== BASELINE STATISTICS (610 samples) ===

1. PER-METHOD NOISE - DISTANCE
  FFT: σ = 0.1297 m, MAD = 0.1326 m
  Sonar: σ = 0.1198 m, MAD = 0.1347 m
  DVL: σ = 0.1307 m, MAD = 0.0889 m

2. PER-METHOD NOISE - PITCH
  FFT: σ = 6.6722°, MAD = 6.0851°
  Sonar: σ = 5.3350°, MAD = 3.1061°
  DVL: σ = 6.1546°, MAD = 7.5401°

3. PER-METHOD NOISE - X POSITION
  FFT: σ = 0.1302 m, MAD = 0.1210 m
  Sonar: σ = 0.1211 m, MAD = 0.1277 m
  DVL: σ = 0.1329 m, MAD = 0.0890 m

4. PER-METHOD NOISE - Y POSITION
  FFT: σ = 0.1679 m, MAD = 0.1611 m
  Sonar: σ = 0.1263 m, MAD = 0.0816 m
  DVL: σ = 0.1519 m, MAD = 0.1855 m

5. PAIRWISE BIASES - DISTANCE
  FFT vs DVL: Bias = +0.0311 m, Std = 0.0981 m
  Sonar vs DVL: Bias = -0.0637 m, Std = 0.0899 m
  FFT vs Sonar: Bias = +0.0947 m, Std = 0.0794 m

6. PAIRWISE BIASES - PITCH
  FFT vs DVL: Bias = +1.3891 °, Std = 7.9303 °
  Sonar vs DVL: Bias = +0.3609 °, Std = 5.9444 °
  FFT vs Sonar: Bias = +1.0282 °, Std = 7.1647 °

7. PAIRWISE BIASES - X POSITION
  FFT vs DVL:

## Outlier Detection

In [14]:
df, outlier_stats = detect_outliers(df, window_samples, outlier_k=OUTLIER_K)
correlation_stats = analyze_failure_correlation(df)

=== OUTLIER DETECTION ===

FFT: 603 outliers (21.98%), MAD = 0.0418 m
Sonar: 274 outliers (9.99%), MAD = 0.0107 m
DVL: 473 outliers (17.24%), MAD = 0.0148 m

=== OUTLIER CHARACTERISTICS ===

FFT:
  Magnitude:
    Mean: 1.0950 m, Median: 0.6326 m, Max: 19.0845 m
  Magnitude Categories:
    Small (≤0.2717m): 199 (33.0%)
    Medium (0.2717-1.1094m): 205 (34.0%)
    Large (>1.1094m): 199 (33.0%)
  Clustering:
    Total clusters: 62
    Avg cluster size: 10.7 frames
    Max cluster size: 41 frames
    Isolated outliers: 0 (0.0%)
    Multi-frame clusters: 62 (100.0%)

SONAR:
  Magnitude:
    Mean: 0.0592 m, Median: 0.0514 m, Max: 0.1116 m
  Magnitude Categories:
    Small (≤0.0454m): 91 (33.2%)
    Medium (0.0454-0.0665m): 92 (33.6%)
    Large (>0.0665m): 91 (33.2%)
  Clustering:
    Total clusters: 26
    Avg cluster size: 11.5 frames
    Max cluster size: 60 frames
    Isolated outliers: 0 (0.0%)
    Multi-frame clusters: 26 (100.0%)

DVL:
  Magnitude:
    Mean: 0.1667 m, Median: 0.1124 m,

## Generate Visualizations

In [15]:
print("\n=== GENERATING VISUALIZATIONS ===\n")

figs = create_visualizations(
    df, segments, TARGET_BAG, PLOTS_DIR,
    sigma_thresh=SIGMA_THRESH,
    outlier_k=OUTLIER_K
)

# Display plots
for name, fig in figs.items():
    print(f"\nDisplaying {name}...")
    fig.show()

print(f"\n✓ All plots saved to: {PLOTS_DIR}")


=== GENERATING VISUALIZATIONS ===




Discarding nonzero nanoseconds in conversion.



Saved: 2024-08-20_13-57-42_xy_positions.html
Saved: 2024-08-20_13-57-42_timeseries.html
Saved: 2024-08-20_13-57-42_timeseries.html
Saved: 2024-08-20_13-57-42_stability.html
Saved: 2024-08-20_13-57-42_stability.html
Saved: 2024-08-20_13-57-42_outliers.html
Saved: 2024-08-20_13-57-42_outliers.html
Saved: 2024-08-20_13-57-42_xy_positions.html

Displaying timeseries...
Saved: 2024-08-20_13-57-42_xy_positions.html

Displaying timeseries...



Displaying stability...



Displaying outliers...



Displaying xy_positions...



✓ All plots saved to: /Volumes/LaCie/SOLAQUA/exports/plots


## Generate Summary Report

In [16]:
config = {
    'Smoothing': f'α={SMOOTHING_ALPHA:.2f}' if SMOOTHING_ALPHA else 'Disabled',
    'Rolling Window': f'{ROLLING_WINDOW_SEC} s ({window_samples} samples)',
    'Stability σ Threshold': f'{SIGMA_THRESH} m',
    'Stability Δ Threshold': f'{DELTA_THRESH} m',
    'Min Segment Length': f'{MIN_SEGMENT_SEC} s',
    'Outlier Threshold': f'{OUTLIER_K} × MAD'
}

report_path = COMPARISON_DATA_DIR / f"{TARGET_BAG}_analysis_summary.md"
generate_summary_report(
    df, segments, baseline_stats, available_systems,
    TARGET_BAG, report_path, config, outlier_stats
)

print(f"\n✓ Analysis complete!")
print(f"  Plots: {PLOTS_DIR}")
print(f"  Report: {report_path}")


=== GENERATING SUMMARY REPORT ===

✓ Summary report saved: 2024-08-20_13-57-42_analysis_summary.md
  Location: /Volumes/LaCie/SOLAQUA/comparison_data/2024-08-20_13-57-42_analysis_summary.md

✓ Analysis complete!
  Plots: /Volumes/LaCie/SOLAQUA/exports/plots
  Report: /Volumes/LaCie/SOLAQUA/comparison_data/2024-08-20_13-57-42_analysis_summary.md
✓ Summary report saved: 2024-08-20_13-57-42_analysis_summary.md
  Location: /Volumes/LaCie/SOLAQUA/comparison_data/2024-08-20_13-57-42_analysis_summary.md

✓ Analysis complete!
  Plots: /Volumes/LaCie/SOLAQUA/exports/plots
  Report: /Volumes/LaCie/SOLAQUA/comparison_data/2024-08-20_13-57-42_analysis_summary.md


In [17]:
# Regenerate the multi-bag summary CSV with all metrics
from utils.comparison_analysis import generate_multi_bag_summary_csv

print("Regenerating bag_summary.csv with all metrics...")
print("This may take a few minutes depending on the number of bags.\n")

output_path = generate_multi_bag_summary_csv(
    comparison_data_dir=COMPARISON_DATA_DIR,
    output_path=COMPARISON_DATA_DIR / "bag_summary.csv",
    smoothing_alpha=SMOOTHING_ALPHA,
    rolling_window_sec=ROLLING_WINDOW_SEC,
    sigma_thresh=SIGMA_THRESH,
    delta_thresh=DELTA_THRESH,
    min_segment_sec=MIN_SEGMENT_SEC,
    outlier_k=OUTLIER_K
)

if output_path:
    print("\n" + "="*70)
    print("✓ Summary CSV regenerated successfully!")
    print(f"  Location: {output_path}")
    print("\nThe summary now includes:")
    print("  ✓ Outlier magnitude metrics (mean, median, max)")
    print("  ✓ Outlier clustering analysis (cluster count, sizes)")
    print("  ✓ Isolated vs. multi-frame outlier statistics")
    print("\nRe-run the 'Multi-Bag Summary Analysis' cell above to see the updated statistics.")
else:
    print("\n✗ Failed to regenerate summary CSV")

Regenerating bag_summary.csv with all metrics...
This may take a few minutes depending on the number of bags.


=== GENERATING MULTI-BAG SUMMARY CSV ===

Found 46 bags to process

Processing: ._2024-08-20_13-39-34
  ✗ Error processing ._2024-08-20_13-39-34: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte
Processing: ._2024-08-20_13-40-35
  ✗ Error processing ._2024-08-20_13-40-35: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte
Processing: ._2024-08-20_13-42-51
  ✗ Error processing ._2024-08-20_13-42-51: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte
Processing: ._2024-08-20_13-55-34
  ✗ Error processing ._2024-08-20_13-55-34: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte
Processing: ._2024-08-20_13-57-42
  ✗ Error processing ._2024-08-20_13-57-42: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte
Processing: ._2024-08-20_14-16-05
  ✗ Error processing ._2024-08-20_14

## Multi-Bag Summary Analysis

Load and analyze the comprehensive summary CSV containing all processed bags.

In [18]:
# Load and display multi-bag summary statistics
summary_csv_path = COMPARISON_DATA_DIR / "bag_summary.csv"
summary_df = print_multi_bag_summary(summary_csv_path)

# Optional: further analysis on summary_df
if summary_df is not None:
    print("\n📊 Additional insights:")
    print(f"  DataFrame available for custom analysis")
    print(f"  Example: summary_df[summary_df['baseline_percentage'] > 50]")
    
    # Check if outlier magnitude columns exist
    has_magnitude = any('outlier_mean_magnitude' in col for col in summary_df.columns)
    has_clustering = any('outlier_cluster_count' in col for col in summary_df.columns)
    
    if not has_magnitude or not has_clustering:
        print("\n⚠️  NOTE: Outlier magnitude and clustering metrics not found.")
        print("  The bag_summary.csv needs to be regenerated with the updated analysis.")
        print("  Run the cell below or: python scripts/generate_bag_summary.py")
        print("  Then re-run this cell to see outlier characteristics.")

Loading multi-bag summary from: bag_summary.csv

MULTI-BAG SUMMARY STATISTICS

📊 DATASET OVERVIEW
  Total bags analyzed: 46
  Total recording time: 52.7 hours
  Total samples: 54,347.0

🔧 SYSTEM AVAILABILITY
  FFT: 23/46 bags (50.0%)
  Sonar: 23/46 bags (50.0%)
  DVL: 23/46 bags (50.0%)

📈 BASELINE SEGMENTS (Stable Data)
  Avg baseline percentage: 8.1%
  Median baseline percentage: 0.7%
  Avg segments per bag: 2.8
  Total baseline time: 0.1 hours

📏 DISTANCE MEASUREMENT NOISE (Baseline STD)
  FFT: mean=0.1156m, median=0.1058m, min=0.0306m, max=0.2196m
  SONAR: mean=0.0899m, median=0.0848m, min=0.0121m, max=0.1815m
  DVL: mean=0.1121m, median=0.1227m, min=0.0175m, max=0.2246m

📐 PITCH MEASUREMENT NOISE (Baseline STD)
  FFT: mean=6.6144°, median=5.9748°, min=1.9306°, max=20.0705°
  SONAR: mean=3.5119°, median=3.4452°, min=0.5557°, max=6.2299°
  DVL: mean=4.8225°, median=3.9342°, min=1.8718°, max=11.6527°

🧭 POSITION NOISE (Baseline STD)
  X FFT: mean=0.1165m, median=0.1000m, min=0.0335m,

## Regenerate Multi-Bag Summary (Optional)

Run this cell to regenerate the comprehensive bag summary CSV with all updated metrics including outlier characteristics.