# Full Net Tracking Analysis: Multi-System Performance Assessment

This notebook performs comprehensive analysis of synchronized comparison data from three net-tracking systems using the `comparison_analysis` utility module.

**What this does:**
1. Loads and optionally smooths comparison data
2. Detects stable baseline segments
3. Computes noise, bias, and outlier statistics
4. Generates visualizations and summary reports

**Quick Start:**
1. Set `TARGET_BAG` to your bag ID
2. Run all cells
3. Check outputs in comparison_data and plots directories

#### **Generate for summary to csv for all of the available comparison data:**
`Use defaults`
python scripts/generate_bag_summary.py

## Configuration

In [28]:
from pathlib import Path
from utils.comparison_analysis import (
    load_and_prepare_data,
    compute_pairwise_differences,
    detect_stable_segments,
    compute_baseline_statistics,
    detect_outliers,
    analyze_failure_correlation,
    create_visualizations,
    generate_summary_report,
    print_multi_bag_summary
)

# Configuration
TARGET_BAG = "2024-08-20_17-02-00"
COMPARISON_DATA_DIR = Path("/Volumes/LaCie/SOLAQUA/comparison_data")
PLOTS_DIR = Path("/Volumes/LaCie/SOLAQUA/exports/plots")
PLOTS_DIR.mkdir(parents=True, exist_ok=True)

# Analysis parameters
SMOOTHING_ALPHA = None      # None = no smoothing, 0-1 = smoothing factor 
ROLLING_WINDOW_SEC = 2.0
SIGMA_THRESH = 0.25          # increase to allow more variance (e.g., 0.4 → 0.6)
DELTA_THRESH = 0.25          # increase to tolerate larger inter-system drift
MIN_SEGMENT_SEC = 1.0       # lower to accept shorter stable periods
OUTLIER_K = 3.5

print(f"Target: {TARGET_BAG}")
print(f"Smoothing: {'Disabled' if SMOOTHING_ALPHA is None else f'α={SMOOTHING_ALPHA:.2f}'}")
print(f"Stability thresholds: σ={SIGMA_THRESH}m, Δ={DELTA_THRESH}m")

Target: 2024-08-20_17-02-00
Smoothing: Disabled
Stability thresholds: σ=0.25m, Δ=0.25m


## Load and Process Data

In [29]:
# Load comparison data
data_path = COMPARISON_DATA_DIR / f"{TARGET_BAG}_raw_comparison.csv"
df, sampling_rate, available_systems = load_and_prepare_data(data_path, SMOOTHING_ALPHA)

# Compute pairwise differences
df = compute_pairwise_differences(df)

Loading data from: 2024-08-20_17-02-00_raw_comparison.csv

Data loaded: 1851 samples
Time range: 2024-08-20 15:02:03.319382668+00:00 to 2024-08-20 15:03:03.660551786+00:00
Duration: 60.3 seconds
Median sampling interval: 0.030 s (33.5 Hz)
Available systems: FFT, Sonar, DVL
Pairwise differences computed:
  diff_fft_nav: 1739 valid samples, mean=-0.268, std=0.444
  diff_sonar_nav: 1813 valid samples, mean=0.035, std=0.448
  diff_fft_sonar: 1769 valid samples, mean=-0.238, std=0.523
  diff_pitch_fft_nav: 1739 valid samples, mean=5.272, std=17.470
  diff_pitch_sonar_nav: 1813 valid samples, mean=-4.176, std=13.194
  diff_pitch_fft_sonar: 1769 valid samples, mean=8.841, std=17.994
  diff_x_fft_nav: 1739 valid samples, mean=-0.258, std=0.415
  diff_x_sonar_nav: 1813 valid samples, mean=-0.004, std=0.313
  diff_x_fft_sonar: 1769 valid samples, mean=-0.227, std=0.461
  diff_y_fft_nav: 1739 valid samples, mean=0.101, std=0.227
  diff_y_sonar_nav: 1813 valid samples, mean=-0.123, std=0.453
  dif

## Detect Stable Baseline Segments

In [30]:
df, segments, window_samples = detect_stable_segments(
    df, sampling_rate,
    rolling_window_sec=ROLLING_WINDOW_SEC,
    sigma_thresh=SIGMA_THRESH,
    delta_thresh=DELTA_THRESH,
    min_segment_sec=MIN_SEGMENT_SEC
)

Rolling window: 67 samples (2.0 s)

Computing rolling statistics...
Applying stability criteria...
Stable samples: 1172 / 1851 (63.3%)

Identifying stable baseline segments...
Found 3 stable baseline segments (≥ 1.0 s):

  Segment 1: 15:02:06 - 15:02:21 (15.2 s, 506 samples)
  Segment 2: 15:02:23 - 15:02:25 (2.0 s, 64 samples)
  Segment 3: 15:02:25 - 15:02:44 (18.5 s, 596 samples)


## Baseline Statistics

In [31]:
baseline_stats = compute_baseline_statistics(df)

=== BASELINE STATISTICS (1172 samples) ===

1. PER-METHOD NOISE - DISTANCE
  FFT: σ = 0.0633 m, MAD = 0.0362 m
  Sonar: σ = 0.0646 m, MAD = 0.0290 m
  DVL: σ = 0.0672 m, MAD = 0.0297 m

2. PER-METHOD NOISE - PITCH
  FFT: σ = 5.2860°, MAD = 4.4117°
  Sonar: σ = 2.7808°, MAD = 1.9022°
  DVL: σ = 4.1112°, MAD = 2.4660°

3. PER-METHOD NOISE - X POSITION
  FFT: σ = 0.0653 m, MAD = 0.0368 m
  Sonar: σ = 0.0628 m, MAD = 0.0287 m
  DVL: σ = 0.0687 m, MAD = 0.0249 m

4. PER-METHOD NOISE - Y POSITION
  FFT: σ = 0.0804 m, MAD = 0.0673 m
  Sonar: σ = 0.0492 m, MAD = 0.0321 m
  DVL: σ = 0.0704 m, MAD = 0.0442 m

5. PAIRWISE BIASES - DISTANCE
  FFT vs DVL: Bias = -0.1279 m, Std = 0.0361 m
  Sonar vs DVL: Bias = -0.0677 m, Std = 0.0445 m
  FFT vs Sonar: Bias = -0.0602 m, Std = 0.0434 m

6. PAIRWISE BIASES - PITCH
  FFT vs DVL: Bias = +5.9530 °, Std = 5.9368 °
  Sonar vs DVL: Bias = -2.5482 °, Std = 3.9881 °
  FFT vs Sonar: Bias = +8.5012 °, Std = 5.4897 °

7. PAIRWISE BIASES - X POSITION
  FFT vs DVL

## Outlier Detection

In [32]:
df, outlier_stats = detect_outliers(df, window_samples, outlier_k=OUTLIER_K)
correlation_stats = analyze_failure_correlation(df)

=== OUTLIER DETECTION ===

FFT: 226 outliers (12.21%), MAD = 0.0145 m
Sonar: 167 outliers (9.02%), MAD = 0.0092 m
DVL: 317 outliers (17.13%), MAD = 0.0148 m

=== OUTLIER CHARACTERISTICS ===

FFT:
  Magnitude:
    Mean: 0.6251 m, Median: 0.5741 m, Max: 2.7162 m
  Magnitude Categories:
    Small (≤0.1946m): 78 (34.5%)
    Medium (0.1946-0.8110m): 75 (33.2%)
    Large (>0.8110m): 73 (32.3%)
  Clustering:
    Total clusters: 15
    Avg cluster size: 16.1 frames
    Max cluster size: 97 frames
    Isolated outliers: 0 (0.0%)
    Multi-frame clusters: 15 (100.0%)

SONAR:
  Magnitude:
    Mean: 0.1598 m, Median: 0.0557 m, Max: 0.5798 m
  Magnitude Categories:
    Small (≤0.0473m): 55 (32.9%)
    Medium (0.0473-0.1436m): 57 (34.1%)
    Large (>0.1436m): 55 (32.9%)
  Clustering:
    Total clusters: 13
    Avg cluster size: 13.8 frames
    Max cluster size: 53 frames
    Isolated outliers: 0 (0.0%)
    Multi-frame clusters: 13 (100.0%)

DVL:
  Magnitude:
    Mean: 0.4115 m, Median: 0.2350 m, Max

## Generate Visualizations

In [33]:
print("\n=== GENERATING VISUALIZATIONS ===\n")

figs = create_visualizations(
    df, segments, TARGET_BAG, PLOTS_DIR,
    sigma_thresh=SIGMA_THRESH,
    outlier_k=OUTLIER_K
)

# Display plots
for name, fig in figs.items():
    print(f"\nDisplaying {name}...")
    fig.show()

print(f"\n✓ All plots saved to: {PLOTS_DIR}")


=== GENERATING VISUALIZATIONS ===




Discarding nonzero nanoseconds in conversion.



Saved: 2024-08-20_17-02-00_xy_positions.html
Saved: 2024-08-20_17-02-00_timeseries.html
Saved: 2024-08-20_17-02-00_stability.html
Saved: 2024-08-20_17-02-00_outliers.html
Saved: 2024-08-20_17-02-00_stability.html
Saved: 2024-08-20_17-02-00_outliers.html
Saved: 2024-08-20_17-02-00_xy_positions.html

Displaying timeseries...
Saved: 2024-08-20_17-02-00_xy_positions.html

Displaying timeseries...



Displaying stability...



Displaying outliers...



Displaying xy_positions...



✓ All plots saved to: /Volumes/LaCie/SOLAQUA/exports/plots


## Generate Summary Report

In [34]:
config = {
    'Smoothing': f'α={SMOOTHING_ALPHA:.2f}' if SMOOTHING_ALPHA else 'Disabled',
    'Rolling Window': f'{ROLLING_WINDOW_SEC} s ({window_samples} samples)',
    'Stability σ Threshold': f'{SIGMA_THRESH} m',
    'Stability Δ Threshold': f'{DELTA_THRESH} m',
    'Min Segment Length': f'{MIN_SEGMENT_SEC} s',
    'Outlier Threshold': f'{OUTLIER_K} × MAD'
}

report_path = COMPARISON_DATA_DIR / f"{TARGET_BAG}_analysis_summary.md"
generate_summary_report(
    df, segments, baseline_stats, available_systems,
    TARGET_BAG, report_path, config, outlier_stats
)

print(f"\n✓ Analysis complete!")
print(f"  Plots: {PLOTS_DIR}")
print(f"  Report: {report_path}")


=== GENERATING SUMMARY REPORT ===

✓ Summary report saved: 2024-08-20_17-02-00_analysis_summary.md
  Location: /Volumes/LaCie/SOLAQUA/comparison_data/2024-08-20_17-02-00_analysis_summary.md

✓ Analysis complete!
  Plots: /Volumes/LaCie/SOLAQUA/exports/plots
  Report: /Volumes/LaCie/SOLAQUA/comparison_data/2024-08-20_17-02-00_analysis_summary.md


In [35]:
# Regenerate the multi-bag summary CSV with all metrics
from utils.comparison_analysis import generate_multi_bag_summary_csv

print("Regenerating bag_summary.csv with all metrics...")
print("This may take a few minutes depending on the number of bags.\n")

output_path = generate_multi_bag_summary_csv(
    comparison_data_dir=COMPARISON_DATA_DIR,
    output_path=COMPARISON_DATA_DIR / "bag_summary.csv",
    smoothing_alpha=SMOOTHING_ALPHA,
    rolling_window_sec=ROLLING_WINDOW_SEC,
    sigma_thresh=SIGMA_THRESH,
    delta_thresh=DELTA_THRESH,
    min_segment_sec=MIN_SEGMENT_SEC,
    outlier_k=OUTLIER_K
)

if output_path:
    print("\n" + "="*70)
    print("✓ Summary CSV regenerated successfully!")
    print(f"  Location: {output_path}")
    print("\nThe summary now includes:")
    print("  ✓ Outlier magnitude metrics (mean, median, max)")
    print("  ✓ Outlier clustering analysis (cluster count, sizes)")
    print("  ✓ Isolated vs. multi-frame outlier statistics")
    print("\nRe-run the 'Multi-Bag Summary Analysis' cell above to see the updated statistics.")
else:
    print("\n✗ Failed to regenerate summary CSV")

Regenerating bag_summary.csv with all metrics...
This may take a few minutes depending on the number of bags.


=== GENERATING MULTI-BAG SUMMARY CSV ===

Found 46 bags to process

Processing: ._2024-08-20_13-39-34
  ✗ Error processing ._2024-08-20_13-39-34: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte
Processing: ._2024-08-20_13-40-35
  ✗ Error processing ._2024-08-20_13-40-35: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte
Processing: ._2024-08-20_13-42-51

=== GENERATING MULTI-BAG SUMMARY CSV ===

Found 46 bags to process

Processing: ._2024-08-20_13-39-34
  ✗ Error processing ._2024-08-20_13-39-34: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte
Processing: ._2024-08-20_13-40-35
  ✗ Error processing ._2024-08-20_13-40-35: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte
Processing: ._2024-08-20_13-42-51
  ✗ Error processing ._2024-08-20_13-42-51: 'utf-8' codec can't decode byte 0xb0 in

## Multi-Bag Summary Analysis

Load and analyze the comprehensive summary CSV containing all processed bags.

In [36]:
# Load and display multi-bag summary statistics
summary_csv_path = COMPARISON_DATA_DIR / "bag_summary.csv"
summary_df = print_multi_bag_summary(summary_csv_path)

# Optional: further analysis on summary_df
if summary_df is not None:
    print("\n📊 Additional insights:")
    print(f"  DataFrame available for custom analysis")
    print(f"  Example: summary_df[summary_df['baseline_percentage'] > 50]")
    
    # Check if outlier magnitude columns exist
    has_magnitude = any('outlier_mean_magnitude' in col for col in summary_df.columns)
    has_clustering = any('outlier_cluster_count' in col for col in summary_df.columns)
    
    if not has_magnitude or not has_clustering:
        print("\n⚠️  NOTE: Outlier magnitude and clustering metrics not found.")
        print("  The bag_summary.csv needs to be regenerated with the updated analysis.")
        print("  Run the cell below or: python scripts/generate_bag_summary.py")
        print("  Then re-run this cell to see outlier characteristics.")

Loading multi-bag summary from: bag_summary.csv

MULTI-BAG SUMMARY STATISTICS

📊 DATASET OVERVIEW
  Total bags analyzed: 46
  Total recording time: 52.7 hours
  Total samples: 54,347.0

🔧 SYSTEM AVAILABILITY
  FFT: 23/46 bags (50.0%)
  Sonar: 23/46 bags (50.0%)
  DVL: 23/46 bags (50.0%)

📈 BASELINE SEGMENTS (Stable Data)
  Avg baseline percentage: 7.8%
  Median baseline percentage: 0.7%
  Avg segments per bag: 2.7
  Total baseline time: 0.1 hours

📏 DISTANCE MEASUREMENT NOISE (Baseline STD)
  FFT: mean=0.1056m, median=0.0831m, min=0.0239m, max=0.2244m
  SONAR: mean=0.0858m, median=0.0646m, min=0.0105m, max=0.1819m
  DVL: mean=0.1107m, median=0.1054m, min=0.0176m, max=0.2309m

📐 PITCH MEASUREMENT NOISE (Baseline STD)
  FFT: mean=6.8184°, median=6.2233°, min=0.8609°, max=23.4524°
  SONAR: mean=3.3893°, median=3.4600°, min=0.1260°, max=6.2635°
  DVL: mean=5.3473°, median=4.0465°, min=1.8891°, max=15.2511°

🧭 POSITION NOISE (Baseline STD)
  X FFT: mean=0.1079m, median=0.0842m, min=0.0290m,

## Regenerate Multi-Bag Summary (Optional)

Run this cell to regenerate the comprehensive bag summary CSV with all updated metrics including outlier characteristics.