# Post-Processing Only Notebook

This notebook runs post-processing on already-tracked data without re-running tracking.
It assumes **bidirectional (forward + backward) tracking** was enabled and the following files exist:

- `*_forward.csv` - Raw forward tracking output
- `*_backward.csv` - Raw backward tracking output

The notebook will:
1. Load forward and backward CSV files
2. Apply post-processing (break trajectories at velocity/distance jumps)
3. Merge forward and backward trajectories into consensus trajectories
4. Apply interpolation
5. Scale coordinates back to original video space
6. Save the final merged output

In [25]:
# === IMPORTS ===
import os
import sys
import numpy as np
import pandas as pd
import cv2
import logging

# Add the src directory to path if running from notebooks folder
project_root = os.path.dirname(os.path.dirname(os.path.abspath("__file__")))
src_path = os.path.join(project_root, "src")
if src_path not in sys.path:
    sys.path.insert(0, src_path)

# Import post-processing functions
from multi_tracker.core.post_processing import (
    process_trajectories_from_csv,
    resolve_trajectories,
    interpolate_trajectories,
)

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

print("‚úì Imports successful!")

‚úì Imports successful!


## Configuration

Set your file paths and parameters here. You need to provide:
1. Path to the forward tracking CSV
2. Path to the backward tracking CSV
3. Path to the original video file (to get total frame count and verify resize factor)

In [26]:
# ===================================================================
# FILE PATHS - UPDATE THESE!
# ===================================================================

# Base path to your tracking output CSV (without _forward/_backward suffix)
# Example: if your files are "video_forward.csv" and "video_backward.csv",
# set this to "video.csv"
BASE_CSV_PATH = "emi_short_tracking.csv"

# Path to the original video file (used to get total frame count)
VIDEO_PATH = "emi_short.mp4"

# Output path for final merged trajectories
OUTPUT_CSV_PATH = None  # Will auto-generate as *_final.csv if None

# ===================================================================
# DERIVED PATHS (auto-generated)
# ===================================================================
base, ext = os.path.splitext(BASE_CSV_PATH)
FORWARD_CSV_PATH = f"{base}_forward{ext}"
BACKWARD_CSV_PATH = f"{base}_backward{ext}"

if OUTPUT_CSV_PATH is None:
    OUTPUT_CSV_PATH = f"{base}_final{ext}"

print(f"Forward CSV:  {FORWARD_CSV_PATH}")
print(f"Backward CSV: {BACKWARD_CSV_PATH}")
print(f"Output CSV:   {OUTPUT_CSV_PATH}")
print(f"Video file:   {VIDEO_PATH}")

Forward CSV:  emi_short_tracking_forward.csv
Backward CSV: emi_short_tracking_backward.csv
Output CSV:   emi_short_tracking_final.csv
Video file:   emi_short.mp4


In [37]:
# ===================================================================
# POST-PROCESSING PARAMETERS
# ===================================================================
# These should match the values used during tracking, or adjust as needed

# Resize factor used during tracking (1.0 = no resize)
# This is needed to scale coordinates back to original video space
RESIZE_FACTOR = 0.5

# Reference body size in pixels (used for scaling thresholds)
# This is the typical size of your tracked animal
REFERENCE_BODY_SIZE = 77.0

# Trajectory post-processing parameters
params = {
    # Minimum trajectory length (in frames) - shorter ones are removed
    "MIN_TRAJECTORY_LENGTH": 10,
    
    # Maximum velocity before breaking trajectory (pixels/frame)
    # Jumps faster than this indicate tracking errors
    # Note: MAX_DISTANCE_BREAK is now computed dynamically as MAX_VELOCITY_BREAK * frame_diff
    "MAX_VELOCITY_BREAK": 1.5 * REFERENCE_BODY_SIZE * RESIZE_FACTOR,
    
    # Maximum consecutive occluded frames before breaking trajectory
    "MAX_OCCLUSION_GAP": 5,
    
    # Conservative merge parameters for forward/backward trajectory resolution
    # AGREEMENT_DISTANCE: Max distance (px) for frames to be considered "agreeing"
    # Frames within this distance are merged; frames outside create separate trajectories
    "AGREEMENT_DISTANCE": REFERENCE_BODY_SIZE * RESIZE_FACTOR * 0.25,
    
    # MIN_OVERLAP_FRAMES: Minimum number of agreeing frames required to consider merging
    "MIN_OVERLAP_FRAMES": 2,
}

# Interpolation settings
INTERPOLATION_METHOD = "spline"  # Options: "none", "linear", "cubic", "spline"
INTERPOLATION_MAX_GAP = 5  # Maximum gap size to interpolate (frames)

print("Parameters configured:")
for k, v in params.items():
    print(f"  {k}: {v}")
print(f"\nInterpolation: {INTERPOLATION_METHOD} (max_gap={INTERPOLATION_MAX_GAP})")

Parameters configured:
  MIN_TRAJECTORY_LENGTH: 10
  MAX_VELOCITY_BREAK: 57.75
  MAX_OCCLUSION_GAP: 5
  AGREEMENT_DISTANCE: 9.625
  MIN_OVERLAP_FRAMES: 2

Interpolation: spline (max_gap=5)


## Validate Input Files

In [38]:
# Check that all required files exist
files_to_check = [
    ("Forward CSV", FORWARD_CSV_PATH),
    ("Backward CSV", BACKWARD_CSV_PATH),
    ("Video file", VIDEO_PATH),
]

all_ok = True
for name, path in files_to_check:
    if os.path.exists(path):
        print(f"‚úì {name}: {path}")
    else:
        print(f"‚úó {name} NOT FOUND: {path}")
        all_ok = False

if not all_ok:
    raise FileNotFoundError("One or more required files are missing. Please check the paths above.")

print("\nAll files found!")

‚úì Forward CSV: emi_short_tracking_forward.csv
‚úì Backward CSV: emi_short_tracking_backward.csv
‚úì Video file: emi_short.mp4

All files found!


In [39]:
# Get total frame count from video
cap = cv2.VideoCapture(VIDEO_PATH)
if not cap.isOpened():
    raise ValueError(f"Could not open video: {VIDEO_PATH}")

TOTAL_FRAMES = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
FPS = cap.get(cv2.CAP_PROP_FPS)
WIDTH = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
HEIGHT = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
cap.release()

print(f"Video info:")
print(f"  Total frames: {TOTAL_FRAMES}")
print(f"  FPS: {FPS}")
print(f"  Resolution: {WIDTH} x {HEIGHT}")
print(f"  Duration: {TOTAL_FRAMES/FPS:.2f} seconds")

Video info:
  Total frames: 750
  FPS: 25.0
  Resolution: 4512 x 4512
  Duration: 30.00 seconds


## Step 1: Load and Process Forward Trajectories

In [40]:
# Load forward CSV and preview
forward_raw = pd.read_csv(FORWARD_CSV_PATH)
print(f"Forward raw trajectories:")
print(f"  Rows: {len(forward_raw)}")
print(f"  Unique trajectories: {forward_raw['TrajectoryID'].nunique()}")
print(f"  Columns: {list(forward_raw.columns)}")
print(f"  Frame range: {forward_raw['FrameID'].min()} - {forward_raw['FrameID'].max()}")
forward_raw.head()

Forward raw trajectories:
  Rows: 18750
  Unique trajectories: 68
  Columns: ['TrackID', 'TrajectoryID', 'Index', 'X', 'Y', 'Theta', 'FrameID', 'State', 'DetectionConfidence', 'AssignmentConfidence', 'PositionUncertainty']
  Frame range: 1 - 750


Unnamed: 0,TrackID,TrajectoryID,Index,X,Y,Theta,FrameID,State,DetectionConfidence,AssignmentConfidence,PositionUncertainty
0,0,0,0,,,,1,occluded,0.0,0.0,20.109999
1,1,1,0,,,,1,occluded,0.0,0.0,20.109999
2,2,2,0,,,,1,occluded,0.0,0.0,20.109999
3,3,3,0,,,,1,occluded,0.0,0.0,20.109999
4,4,4,0,,,,1,occluded,0.0,0.0,20.109999


In [41]:
# Process forward trajectories
print("Processing forward trajectories...")
forward_processed, forward_stats = process_trajectories_from_csv(FORWARD_CSV_PATH, params)

print(f"\nForward processing stats:")
for k, v in forward_stats.items():
    print(f"  {k}: {v}")

if forward_processed is not None and not forward_processed.empty:
    print(f"\nProcessed forward trajectories: {forward_processed['TrajectoryID'].nunique()}")
else:
    print("WARNING: No forward trajectories after processing!")

2026-02-03 10:03:52,202 - multi_tracker.core.post_processing - INFO - Loaded 18750 rows from emi_short_tracking_forward.csv with columns: ['TrackID', 'TrajectoryID', 'Index', 'X', 'Y', 'Theta', 'FrameID', 'State', 'DetectionConfidence', 'AssignmentConfidence', 'PositionUncertainty']
2026-02-03 10:03:52,203 - multi_tracker.core.post_processing - INFO - Dropped columns: []
2026-02-03 10:03:52,204 - multi_tracker.core.post_processing - INFO - Setting X, Y, Theta to NaN for 4766 occluded/lost detections


Processing forward trajectories...


2026-02-03 10:03:52,414 - multi_tracker.core.post_processing - INFO - Post-processing stats: {'original_count': 68, 'removed_short': 0, 'broken_velocity': 9, 'broken_distance': 0, 'broken_occlusion': 132, 'final_count': 102}



Forward processing stats:
  original_count: 68
  removed_short: 0
  broken_velocity: 9
  broken_distance: 0
  broken_occlusion: 132
  final_count: 102

Processed forward trajectories: 102


## Step 2: Load and Process Backward Trajectories

In [42]:
# Load backward CSV and preview
backward_raw = pd.read_csv(BACKWARD_CSV_PATH)
print(f"Backward raw trajectories:")
print(f"  Rows: {len(backward_raw)}")
print(f"  Unique trajectories: {backward_raw['TrajectoryID'].nunique()}")
print(f"  Columns: {list(backward_raw.columns)}")
print(f"  Frame range (before transform): {backward_raw['FrameID'].min()} - {backward_raw['FrameID'].max()}")
backward_raw.head()

Backward raw trajectories:
  Rows: 18750
  Unique trajectories: 97
  Columns: ['TrackID', 'TrajectoryID', 'Index', 'X', 'Y', 'Theta', 'FrameID', 'State', 'DetectionConfidence', 'AssignmentConfidence', 'PositionUncertainty']
  Frame range (before transform): 1 - 750


Unnamed: 0,TrackID,TrajectoryID,Index,X,Y,Theta,FrameID,State,DetectionConfidence,AssignmentConfidence,PositionUncertainty
0,0,0,0,,,,1,occluded,0.0,0.0,20.109999
1,1,1,0,,,,1,occluded,0.0,0.0,20.109999
2,2,2,0,,,,1,occluded,0.0,0.0,20.109999
3,3,3,0,,,,1,occluded,0.0,0.0,20.109999
4,4,4,0,,,,1,occluded,0.0,0.0,20.109999


In [43]:
# Process backward trajectories
print("Processing backward trajectories...")
backward_processed, backward_stats = process_trajectories_from_csv(BACKWARD_CSV_PATH, params)

print(f"\nBackward processing stats:")
for k, v in backward_stats.items():
    print(f"  {k}: {v}")

if backward_processed is not None and not backward_processed.empty:
    print(f"\nProcessed backward trajectories: {backward_processed['TrajectoryID'].nunique()}")
else:
    print("WARNING: No backward trajectories after processing!")

2026-02-03 10:03:57,135 - multi_tracker.core.post_processing - INFO - Loaded 18750 rows from emi_short_tracking_backward.csv with columns: ['TrackID', 'TrajectoryID', 'Index', 'X', 'Y', 'Theta', 'FrameID', 'State', 'DetectionConfidence', 'AssignmentConfidence', 'PositionUncertainty']
2026-02-03 10:03:57,136 - multi_tracker.core.post_processing - INFO - Dropped columns: []
2026-02-03 10:03:57,137 - multi_tracker.core.post_processing - INFO - Setting X, Y, Theta to NaN for 4853 occluded/lost detections


Processing backward trajectories...


2026-02-03 10:03:57,406 - multi_tracker.core.post_processing - INFO - Post-processing stats: {'original_count': 97, 'removed_short': 2, 'broken_velocity': 4, 'broken_distance': 0, 'broken_occlusion': 146, 'final_count': 93}



Backward processing stats:
  original_count: 97
  removed_short: 2
  broken_velocity: 4
  broken_distance: 0
  broken_occlusion: 146
  final_count: 93

Processed backward trajectories: 93


## Step 3: Merge Forward and Backward Trajectories (Conservative Strategy)

This step resolves conflicts between forward and backward tracking using a **conservative consensus-based approach**:

1. **Adjust backward data**: Frame numbers are flipped (they were stored in reverse), and theta is rotated by 180¬∞
2. **Find merge candidates**: Pairs must have at least `MIN_OVERLAP_FRAMES` frames where positions agree (within `AGREEMENT_DISTANCE`)
3. **Conservative merge**: 
   - **Agreeing frames** (both exist within threshold): Merge into average position
   - **Disagreeing frames** (both exist but too far apart): Split into separate trajectory segments
   - **Unique frames** (only one direction has data): Keep as-is

This prioritizes **identity confidence** over trajectory completeness - you may get more trajectory fragments, but each fragment has higher confidence in identity.

In [44]:
# Helper function to convert DataFrame to list of DataFrames (one per trajectory)
def prepare_trajs_for_merge(trajs_df):
    """Convert a single DataFrame to a list of DataFrames (one per trajectory)."""
    if trajs_df is None or trajs_df.empty:
        return []
    return [group.copy() for _, group in trajs_df.groupby("TrajectoryID")]

# Prepare trajectories for merging
forward_prepared = prepare_trajs_for_merge(forward_processed)
backward_prepared = prepare_trajs_for_merge(backward_processed)

print(f"Forward trajectories ready for merge: {len(forward_prepared)}")
print(f"Backward trajectories ready for merge: {len(backward_prepared)}")

Forward trajectories ready for merge: 102
Backward trajectories ready for merge: 93


In [45]:
# Resolve (merge) forward and backward trajectories
print("Resolving forward and backward trajectories...")
print("="*60)

resolved_trajectories = resolve_trajectories(
    forward_prepared,
    backward_prepared,
    video_length=TOTAL_FRAMES,
    params=params,
)

print("="*60)
print(f"\nResolution complete! Got {len(resolved_trajectories)} merged trajectories.")

2026-02-03 10:04:02,817 - multi_tracker.core.post_processing - INFO - Starting conservative trajectory resolution with 102 forward and 93 backward trajectories
2026-02-03 10:04:02,817 - multi_tracker.core.post_processing - INFO - Parameters: AGREEMENT_DISTANCE=9.62px, MIN_OVERLAP_FRAMES=2, MIN_LENGTH=10
2026-02-03 10:04:02,901 - multi_tracker.core.post_processing - INFO - After cleaning: 102 forward, 93 backward
2026-02-03 10:04:02,958 - multi_tracker.core.post_processing - INFO - Found 265 merge candidates


Resolving forward and backward trajectories...


2026-02-03 10:04:04,407 - multi_tracker.core.post_processing - INFO - Removed 73 spatially redundant trajectories
2026-02-03 10:04:16,131 - multi_tracker.core.post_processing - INFO - Processed overlapping trajectories in 5 iterations
2026-02-03 10:04:16,150 - multi_tracker.core.post_processing - INFO - Final result: 154 trajectories



Resolution complete! Got 154 merged trajectories.


In [46]:
# Convert list of DataFrames back to single DataFrame
if resolved_trajectories and isinstance(resolved_trajectories, list):
    if isinstance(resolved_trajectories[0], pd.DataFrame):
        # Reassign TrajectoryID to ensure unique IDs
        for new_id, traj_df in enumerate(resolved_trajectories):
            traj_df["TrajectoryID"] = new_id
        merged_df = pd.concat(resolved_trajectories, ignore_index=True)
    else:
        # Fallback for old tuple format
        all_data = []
        for traj_id, traj in enumerate(resolved_trajectories):
            for x, y, theta, frame in traj:
                all_data.append({
                    "TrajectoryID": traj_id,
                    "X": x,
                    "Y": y,
                    "Theta": theta,
                    "FrameID": frame,
                })
        merged_df = pd.DataFrame(all_data) if all_data else pd.DataFrame()
else:
    merged_df = pd.DataFrame()

print(f"Merged DataFrame:")
print(f"  Rows: {len(merged_df)}")
print(f"  Unique trajectories: {merged_df['TrajectoryID'].nunique() if not merged_df.empty else 0}")
if not merged_df.empty:
    print(f"  Frame range: {merged_df['FrameID'].min()} - {merged_df['FrameID'].max()}")

Merged DataFrame:
  Rows: 13607
  Unique trajectories: 154
  Frame range: 1 - 750


## Step 4: Apply Interpolation

In [47]:
# Apply interpolation if enabled
if INTERPOLATION_METHOD.lower() != "none" and not merged_df.empty:
    print(f"Applying {INTERPOLATION_METHOD} interpolation (max_gap={INTERPOLATION_MAX_GAP})...")
    
    # Count NaN values before
    nan_before = merged_df[['X', 'Y']].isna().sum().sum()
    
    merged_df = interpolate_trajectories(
        merged_df,
        method=INTERPOLATION_METHOD,
        max_gap=INTERPOLATION_MAX_GAP,
    )
    
    # Count NaN values after
    nan_after = merged_df[['X', 'Y']].isna().sum().sum()
    
    print(f"Interpolation complete!")
    print(f"  NaN values before: {nan_before}")
    print(f"  NaN values after: {nan_after}")
    print(f"  Filled: {nan_before - nan_after} ({100*(nan_before-nan_after)/max(nan_before,1):.1f}%)")
else:
    print("Skipping interpolation (disabled or no data)")

2026-02-03 10:04:21,930 - multi_tracker.core.post_processing - INFO - Interpolating trajectories using spline method (max_gap=5)


Applying spline interpolation (max_gap=5)...


2026-02-03 10:04:22,289 - multi_tracker.core.post_processing - INFO - Interpolation complete


Interpolation complete!
  NaN values before: 1834
  NaN values after: 0
  Filled: 1834 (100.0%)


## Step 5: Scale to Original Video Space

In [48]:
# Scale coordinates back to original video space
if RESIZE_FACTOR != 1.0 and not merged_df.empty:
    print(f"Scaling coordinates from resized space (factor={RESIZE_FACTOR}) to original space...")
    
    merged_df[["X", "Y"]] = merged_df[["X", "Y"]] / RESIZE_FACTOR
    
    if "Width" in merged_df.columns:
        merged_df["Width"] /= RESIZE_FACTOR
    if "Height" in merged_df.columns:
        merged_df["Height"] /= RESIZE_FACTOR
        
    print("‚úì Coordinates scaled to original video space")
else:
    print("No scaling needed (resize_factor=1.0)")

Scaling coordinates from resized space (factor=0.5) to original space...
‚úì Coordinates scaled to original video space


## Step 6: Save Final Output

In [49]:
# Preview final data
print("Final merged trajectories:")
print(f"  Rows: {len(merged_df)}")
print(f"  Unique trajectories: {merged_df['TrajectoryID'].nunique()}")
print(f"  Columns: {list(merged_df.columns)}")

if not merged_df.empty:
    print(f"  Frame range: {merged_df['FrameID'].min()} - {merged_df['FrameID'].max()}")
    print(f"  X range: {merged_df['X'].min():.1f} - {merged_df['X'].max():.1f}")
    print(f"  Y range: {merged_df['Y'].min():.1f} - {merged_df['Y'].max():.1f}")

merged_df.head(10)

Final merged trajectories:
  Rows: 13964
  Unique trajectories: 154
  Columns: ['TrajectoryID', 'X', 'Y', 'Theta', 'FrameID', 'State', 'DetectionConfidence', 'AssignmentConfidence', 'PositionUncertainty']
  Frame range: 1 - 750
  X range: 456.0 - 4046.0
  Y range: 328.0 - 3862.0


Unnamed: 0,TrajectoryID,X,Y,Theta,FrameID,State,DetectionConfidence,AssignmentConfidence,PositionUncertainty
0,0,1206.0,1160.0,0.675481,36,active,0.838827,0.948763,1.464978
1,0,1218.738423,1168.398764,6.265722,37,occluded,0.0,0.0,0.542961
2,0,1226.0,1172.0,0.567744,38,active,0.824358,0.948248,1.445745
3,0,1228.663387,1176.211403,6.279717,39,occluded,0.0,0.0,0.525485
4,0,1232.0,1184.0,0.747237,40,active,0.93383,0.960669,0.525624
5,0,1252.0,1198.0,0.627755,41,active,0.931419,0.948831,0.52587
6,0,1272.0,1212.0,0.667517,42,active,0.973645,0.772828,0.526734
7,0,1304.0,1242.0,0.660133,43,active,0.940105,0.787868,0.529522
8,0,1318.0,1248.0,0.596525,44,active,0.953296,0.889432,0.53424
9,0,1330.0,1248.0,0.541193,45,active,0.966093,0.956306,0.535654


In [50]:
# Save to CSV
if not merged_df.empty:
    merged_df.to_csv(OUTPUT_CSV_PATH, index=False)
    print(f"‚úì Final trajectories saved to: {OUTPUT_CSV_PATH}")
    print(f"  File size: {os.path.getsize(OUTPUT_CSV_PATH) / 1024:.1f} KB")
else:
    print("WARNING: No data to save!")

‚úì Final trajectories saved to: emi_short_tracking_final.csv
  File size: 1271.9 KB


## Summary Statistics

In [51]:
# Print summary
print("="*60)
print("POST-PROCESSING SUMMARY")
print("="*60)

print(f"\nüìÅ Input files:")
print(f"   Forward:  {forward_raw['TrajectoryID'].nunique()} trajectories")
print(f"   Backward: {backward_raw['TrajectoryID'].nunique()} trajectories")

print(f"\nüîß After individual post-processing:")
print(f"   Forward:  {forward_stats.get('final_count', 0)} trajectories")
print(f"   Backward: {backward_stats.get('final_count', 0)} trajectories")

print(f"\nüîÄ After merging:")
print(f"   Final: {merged_df['TrajectoryID'].nunique()} trajectories")

print(f"\nüíæ Output saved to:")
print(f"   {OUTPUT_CSV_PATH}")

print("\n" + "="*60)

POST-PROCESSING SUMMARY

üìÅ Input files:
   Forward:  68 trajectories
   Backward: 97 trajectories

üîß After individual post-processing:
   Forward:  102 trajectories
   Backward: 93 trajectories

üîÄ After merging:
   Final: 154 trajectories

üíæ Output saved to:
   emi_short_tracking_final.csv



## Optional: Generate Annotated Video

Generate a video with trajectory overlays similar to the main tracker output.

In [52]:
# ===================================================================
# VIDEO OUTPUT SETTINGS
# ===================================================================

# Generate video?
GENERATE_VIDEO = True

# Output video path (auto-generated if None)
VIDEO_OUTPUT_PATH = None  # Will be *_annotated.mp4 if None

# Visualization options
SHOW_LABELS = True          # Show trajectory ID labels
SHOW_ORIENTATION = True     # Show orientation arrows
SHOW_TRAILS = True          # Show trajectory trails
TRAIL_DURATION_SEC = 5.0    # Trail duration in seconds

# Drawing parameters (relative to body size)
MARKER_SIZE = 0.1           # Circle radius as fraction of body size
ARROW_LENGTH = 0.25         # Arrow length as fraction of body size
TEXT_SCALE = 3.0            # Text size scale factor

# Auto-generate output path
if VIDEO_OUTPUT_PATH is None:
    base_video, ext_video = os.path.splitext(VIDEO_PATH)
    VIDEO_OUTPUT_PATH = f"{base_video}_annotated.mp4"

print(f"Video output: {VIDEO_OUTPUT_PATH}")
print(f"Options: labels={SHOW_LABELS}, orientation={SHOW_ORIENTATION}, trails={SHOW_TRAILS}")

Video output: emi_short_annotated.mp4
Options: labels=True, orientation=True, trails=True


In [53]:
def generate_annotated_video(
    video_path, 
    output_path, 
    trajectories_df,
    reference_body_size=77.0,
    show_labels=True,
    show_orientation=True,
    show_trails=True,
    trail_duration_sec=2.0,
    marker_size=0.1,
    arrow_length=0.25,
    text_scale=3.0,
):
    """
    Generate annotated video with trajectory overlays.
    
    Args:
        video_path: Path to input video
        output_path: Path to output video
        trajectories_df: DataFrame with columns TrajectoryID, FrameID, X, Y, Theta
        reference_body_size: Reference body size in pixels for scaling
        show_labels: Show trajectory ID labels
        show_orientation: Show orientation arrows
        show_trails: Show trajectory trails
        trail_duration_sec: Duration of trails in seconds
        marker_size: Circle radius as fraction of body size
        arrow_length: Arrow length as fraction of body size
        text_scale: Text size scale factor
    """
    import cv2
    import numpy as np
    from tqdm.notebook import tqdm
    
    # Open video
    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        raise ValueError(f"Could not open video: {video_path}")
    
    # Get video properties
    fps = cap.get(cv2.CAP_PROP_FPS)
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    
    print(f"Input video: {frame_width}x{frame_height} @ {fps:.1f} FPS, {total_frames} frames")
    
    # Create video writer
    fourcc = cv2.VideoWriter_fourcc(*"mp4v")
    out = cv2.VideoWriter(output_path, fourcc, fps, (frame_width, frame_height))
    
    if not out.isOpened():
        cap.release()
        raise ValueError(f"Could not create output video: {output_path}")
    
    # Calculate trail duration in frames
    trail_duration_frames = int(trail_duration_sec * fps)
    
    # Scale drawing parameters by body size
    marker_radius = int(marker_size * reference_body_size)
    arrow_len = int(arrow_length * reference_body_size)
    text_size = 0.5 * text_scale
    marker_thickness = max(2, int(0.15 * reference_body_size))
    
    # Default colors (BGR format for OpenCV)
    default_colors = [
        (0, 255, 0),    # Green
        (255, 0, 0),    # Blue
        (0, 0, 255),    # Red
        (255, 255, 0),  # Cyan
        (255, 0, 255),  # Magenta
        (0, 255, 255),  # Yellow
        (128, 0, 255),  # Orange
        (255, 128, 0),  # Light blue
        (0, 128, 255),  # Orange-red
        (128, 255, 0),  # Lime
    ]
    
    # Build lookup for trajectories by frame
    print("Building trajectory lookup...")
    traj_by_frame = {}
    traj_by_track = {}
    
    for _, row in trajectories_df.iterrows():
        frame_num = int(row["FrameID"])
        track_id = int(row["TrajectoryID"])
        
        if frame_num not in traj_by_frame:
            traj_by_frame[frame_num] = []
        traj_by_frame[frame_num].append(row)
        
        if track_id not in traj_by_track:
            traj_by_track[track_id] = []
        traj_by_track[track_id].append(row)
    
    # Process video frame by frame
    print(f"Generating video: {output_path}")
    
    for frame_idx in tqdm(range(total_frames), desc="Processing frames"):
        ret, frame = cap.read()
        if not ret:
            break
        
        # Get trajectories for this frame
        frame_trajs = traj_by_frame.get(frame_idx, [])
        
        # Draw trails first (underneath current positions)
        if show_trails:
            for traj in frame_trajs:
                track_id = int(traj["TrajectoryID"])
                color = default_colors[track_id % len(default_colors)]
                
                # Get trail points (past N frames)
                trail_points = []
                if track_id in traj_by_track:
                    for past_row in traj_by_track[track_id]:
                        past_frame = int(past_row["FrameID"])
                        if frame_idx - trail_duration_frames <= past_frame < frame_idx:
                            px, py = past_row["X"], past_row["Y"]
                            if not pd.isna(px) and not pd.isna(py):
                                trail_points.append((int(px), int(py), past_frame))
                
                # Draw trail as fading line segments
                if len(trail_points) > 1:
                    trail_points.sort(key=lambda p: p[2])
                    for i in range(len(trail_points) - 1):
                        pt1 = (trail_points[i][0], trail_points[i][1])
                        pt2 = (trail_points[i + 1][0], trail_points[i + 1][1])
                        
                        # Calculate opacity based on age
                        age = frame_idx - trail_points[i][2]
                        alpha = 1.0 - (age / trail_duration_frames)
                        faded_color = tuple(int(c * alpha) for c in color)
                        
                        cv2.line(frame, pt1, pt2, faded_color, max(1, marker_thickness // 2))
        
        # Draw current positions
        for traj in frame_trajs:
            track_id = int(traj["TrajectoryID"])
            cx, cy = traj["X"], traj["Y"]
            
            # Skip if NaN
            if pd.isna(cx) or pd.isna(cy):
                continue
            
            cx, cy = int(cx), int(cy)
            color = default_colors[track_id % len(default_colors)]
            
            # Draw circle at position
            cv2.circle(frame, (cx, cy), marker_radius, color, marker_thickness)
            
            # Draw label
            if show_labels:
                label = f"ID{track_id}"
                label_offset = int(marker_radius + 5)
                cv2.putText(
                    frame,
                    label,
                    (cx + label_offset, cy - label_offset),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    text_size,
                    color,
                    max(1, int(text_scale * 2)),
                )
            
            # Draw orientation arrow
            if show_orientation and "Theta" in traj.index and not pd.isna(traj["Theta"]):
                heading = traj["Theta"]
                end_x = int(cx + arrow_len * np.cos(heading))
                end_y = int(cy + arrow_len * np.sin(heading))
                cv2.arrowedLine(
                    frame,
                    (cx, cy),
                    (end_x, end_y),
                    color,
                    marker_thickness,
                    tipLength=0.3,
                )
        
        # Write frame
        out.write(frame)
    
    # Cleanup
    cap.release()
    out.release()
    
    print(f"‚úì Video saved to: {output_path}")
    print(f"  File size: {os.path.getsize(output_path) / (1024*1024):.1f} MB")

print("Video generation function defined.")

Video generation function defined.


In [54]:
# Generate the annotated video
if GENERATE_VIDEO and not merged_df.empty:
    generate_annotated_video(
        video_path=VIDEO_PATH,
        output_path=VIDEO_OUTPUT_PATH,
        trajectories_df=merged_df,
        reference_body_size=REFERENCE_BODY_SIZE,  # Use original body size (coords already scaled)
        show_labels=SHOW_LABELS,
        show_orientation=SHOW_ORIENTATION,
        show_trails=SHOW_TRAILS,
        trail_duration_sec=TRAIL_DURATION_SEC,
        marker_size=MARKER_SIZE,
        arrow_length=ARROW_LENGTH,
        text_scale=TEXT_SCALE,
    )
else:
    if not GENERATE_VIDEO:
        print("Video generation disabled (GENERATE_VIDEO=False)")
    else:
        print("No trajectory data available for video generation")

Input video: 4512x4512 @ 25.0 FPS, 750 frames
Building trajectory lookup...
Generating video: emi_short_annotated.mp4


Processing frames:   0%|          | 0/750 [00:00<?, ?it/s]

‚úì Video saved to: emi_short_annotated.mp4
  File size: 116.9 MB


## Optional: Quick Static Plots

Generate static plots for quick overview (useful if video generation is slow).

In [55]:
# Optional: Plot trajectory overview
import matplotlib.pyplot as plt

if not merged_df.empty:
    fig, axes = plt.subplots(1, 2, figsize=(14, 6))
    
    # Plot 1: Spatial trajectories
    ax1 = axes[0]
    for traj_id in merged_df['TrajectoryID'].unique():
        traj = merged_df[merged_df['TrajectoryID'] == traj_id]
        ax1.plot(traj['X'], traj['Y'], alpha=0.7, linewidth=0.5)
    ax1.set_xlabel('X (pixels)')
    ax1.set_ylabel('Y (pixels)')
    ax1.set_title(f'All Trajectories ({merged_df["TrajectoryID"].nunique()} total)')
    ax1.set_aspect('equal')
    ax1.invert_yaxis()  # Flip Y axis to match image coordinates
    
    # Plot 2: Trajectory lengths
    ax2 = axes[1]
    traj_lengths = merged_df.groupby('TrajectoryID').size()
    ax2.hist(traj_lengths, bins=50, edgecolor='black', alpha=0.7)
    ax2.set_xlabel('Trajectory Length (frames)')
    ax2.set_ylabel('Count')
    ax2.set_title(f'Trajectory Length Distribution\nMean: {traj_lengths.mean():.1f}, Median: {traj_lengths.median():.1f}')
    ax2.axvline(traj_lengths.mean(), color='red', linestyle='--', label=f'Mean ({traj_lengths.mean():.1f})')
    ax2.axvline(traj_lengths.median(), color='orange', linestyle='--', label=f'Median ({traj_lengths.median():.1f})')
    ax2.legend()
    
    plt.tight_layout()
    plt.show()
else:
    print("No data to visualize!")

In [56]:
# Optional: Per-trajectory statistics
if not merged_df.empty:
    traj_stats = merged_df.groupby('TrajectoryID').agg({
        'FrameID': ['min', 'max', 'count'],
        'X': ['mean', 'std'],
        'Y': ['mean', 'std'],
    }).round(2)
    
    traj_stats.columns = ['Start Frame', 'End Frame', 'Length', 'X Mean', 'X Std', 'Y Mean', 'Y Std']
    traj_stats['Duration (s)'] = (traj_stats['End Frame'] - traj_stats['Start Frame']) / FPS
    
    print("Per-trajectory statistics:")
    display(traj_stats)

Per-trajectory statistics:


Unnamed: 0_level_0,Start Frame,End Frame,Length,X Mean,X Std,Y Mean,Y Std,Duration (s)
TrajectoryID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,36,600,565,2130.07,412.64,2476.43,689.64,22.56
1,54,549,496,899.74,217.52,670.87,248.36,19.80
2,1,15,15,618.00,9.13,551.23,8.01,0.56
3,17,230,214,756.63,157.72,585.08,255.36,8.52
4,72,244,173,829.84,195.81,691.17,279.31,6.88
...,...,...,...,...,...,...,...,...
149,574,592,19,3944.49,46.03,3472.25,85.87,0.72
150,449,461,13,1168.41,1.65,1158.87,1.20,0.48
151,305,378,74,1123.50,46.77,1093.36,37.83,2.92
152,59,69,11,1076.32,46.53,1008.78,11.35,0.40


In [57]:
# DIAGNOSTIC: Check for duplicate/overlapping trajectories at same locations
print("="*60)
print("DIAGNOSTIC: Checking for overlapping trajectories at same locations")
print("="*60)

# For each frame, check if any two trajectories are too close together
DIAG_DISTANCE_THRESHOLD = params.get("AGREEMENT_DISTANCE", 15.0) * 2  # 1x body size

duplicates_found = []
for frame in merged_df["FrameID"].unique():
    frame_data = merged_df[merged_df["FrameID"] == frame]
    if len(frame_data) <= 1:
        continue
    
    traj_ids = frame_data["TrajectoryID"].values
    xs = frame_data["X"].values
    ys = frame_data["Y"].values
    
    for i in range(len(traj_ids)):
        for j in range(i+1, len(traj_ids)):
            if pd.isna(xs[i]) or pd.isna(xs[j]):
                continue
            dist = np.sqrt((xs[i] - xs[j])**2 + (ys[i] - ys[j])**2)
            if dist < DIAG_DISTANCE_THRESHOLD:
                duplicates_found.append({
                    "FrameID": frame,
                    "Traj1": traj_ids[i],
                    "Traj2": traj_ids[j],
                    "Distance": dist,
                    "X1": xs[i], "Y1": ys[i],
                    "X2": xs[j], "Y2": ys[j]
                })

if duplicates_found:
    dup_df = pd.DataFrame(duplicates_found)
    print(f"\n‚ö†Ô∏è  Found {len(dup_df)} frame-pairs with overlapping trajectories!")
    print(f"\nUnique trajectory pairs with overlap:")
    pair_counts = dup_df.groupby(["Traj1", "Traj2"]).size().reset_index(name="NumFrames")
    print(pair_counts.to_string())
    
    print(f"\nSample overlapping frames:")
    print(dup_df.head(10).to_string())
else:
    print("\n‚úì No overlapping trajectories found!")

DIAGNOSTIC: Checking for overlapping trajectories at same locations

‚ö†Ô∏è  Found 36 frame-pairs with overlapping trajectories!

Unique trajectory pairs with overlap:
    Traj1  Traj2  NumFrames
0       8     59          1
1      10     15          1
2      16     22          1
3      31    143          8
4      46     95          1
5      47    116          1
6      48     54          1
7      49     54          1
8      49     56          2
9      51    135          1
10     53    109          1
11     58    141          1
12     59     60          1
13     66    120          1
14     70     81          1
15     71     87          1
16     78    111          1
17     82    138          1
18     86    112          1
19     88    108          1
20    102    104          1
21    106    146          1
22    107    151          1
23    125    140          1
24    141    149          1
25    142    144          1
26    142    149          1
27    144    149          1

Sample overlapping 

In [58]:
# Summary: Show just the first 5 pairs of duplicates
if duplicates_found:
    print(f"Total overlapping frames: {len(duplicates_found)}")
    print(f"\nTrajectory pairs with most overlap:")
    pair_counts = pd.DataFrame(duplicates_found).groupby(["Traj1", "Traj2"]).size().reset_index(name="NumFrames")
    pair_counts = pair_counts.sort_values("NumFrames", ascending=False)
    print(pair_counts.head(10))

Total overlapping frames: 36

Trajectory pairs with most overlap:
    Traj1  Traj2  NumFrames
3      31    143          8
8      49     56          2
0       8     59          1
15     71     87          1
26    142    149          1
25    142    144          1
24    141    149          1
23    125    140          1
22    107    151          1
21    106    146          1


In [59]:
# Investigate the worst offending pair (Traj 30 and 219)
traj30 = merged_df[merged_df["TrajectoryID"] == 30][["FrameID", "X", "Y"]].sort_values("FrameID")
traj219 = merged_df[merged_df["TrajectoryID"] == 219][["FrameID", "X", "Y"]].sort_values("FrameID")

print(f"Trajectory 30: frames {traj30['FrameID'].min()} - {traj30['FrameID'].max()} ({len(traj30)} points)")
print(f"Trajectory 219: frames {traj219['FrameID'].min()} - {traj219['FrameID'].max()} ({len(traj219)} points)")

# Check the forward/backward trajectories to see what happened
print("\n--- Checking original forward/backward data ---")
# Find if these trajectories came from forward or backward
for i, traj in enumerate(forward_prepared):
    if len(traj) > 0:
        overlap_30 = set(traj["FrameID"]).intersection(set(traj30["FrameID"]))
        overlap_219 = set(traj["FrameID"]).intersection(set(traj219["FrameID"]))
        
        if len(overlap_30) > 50:
            print(f"Forward traj {i} overlaps with result Traj 30: {len(overlap_30)} frames")
        if len(overlap_219) > 50:
            print(f"Forward traj {i} overlaps with result Traj 219: {len(overlap_219)} frames")

for i, traj in enumerate(backward_prepared):
    if len(traj) > 0:
        overlap_30 = set(traj["FrameID"]).intersection(set(traj30["FrameID"]))
        overlap_219 = set(traj["FrameID"]).intersection(set(traj219["FrameID"]))
        
        if len(overlap_30) > 50:
            print(f"Backward traj {i} overlaps with result Traj 30: {len(overlap_30)} frames")
        if len(overlap_219) > 50:
            print(f"Backward traj {i} overlaps with result Traj 219: {len(overlap_219)} frames")

Trajectory 30: frames 509 - 611 (103 points)
Trajectory 219: frames nan - nan (0 points)

--- Checking original forward/backward data ---
Forward traj 4 overlaps with result Traj 30: 101 frames
Forward traj 9 overlaps with result Traj 30: 103 frames
Forward traj 13 overlaps with result Traj 30: 51 frames
Forward traj 18 overlaps with result Traj 30: 103 frames
Forward traj 19 overlaps with result Traj 30: 103 frames
Forward traj 23 overlaps with result Traj 30: 103 frames
Forward traj 30 overlaps with result Traj 30: 55 frames
Forward traj 34 overlaps with result Traj 30: 103 frames
Forward traj 40 overlaps with result Traj 30: 61 frames
Forward traj 61 overlaps with result Traj 30: 76 frames
Forward traj 62 overlaps with result Traj 30: 92 frames
Forward traj 68 overlaps with result Traj 30: 54 frames
Forward traj 72 overlaps with result Traj 30: 87 frames
Forward traj 75 overlaps with result Traj 30: 59 frames
Forward traj 79 overlaps with result Traj 30: 103 frames
Forward traj 80 o

In [60]:
# Check spatial overlap between duplicate trajectories and their sources
# Find which forward/backward trajectories SPATIALLY match with Traj 30 and 219

def find_spatial_source(target_traj, source_trajs, source_name, threshold=15.0):
    """Find which source trajectories spatially match the target."""
    target_by_frame = {row["FrameID"]: (row["X"], row["Y"]) 
                        for _, row in target_traj.iterrows() 
                        if not pd.isna(row["X"])}
    
    matches = []
    for i, src in enumerate(source_trajs):
        if len(src) == 0:
            continue
        
        agreeing_frames = 0
        common_frames = 0
        
        for _, row in src.iterrows():
            frame = row["FrameID"]
            if frame in target_by_frame and not pd.isna(row["X"]):
                common_frames += 1
                tx, ty = target_by_frame[frame]
                dist = np.sqrt((row["X"] - tx)**2 + (row["Y"] - ty)**2)
                if dist < threshold:
                    agreeing_frames += 1
        
        if agreeing_frames > 10:  # At least 10 agreeing frames
            matches.append({
                "source": f"{source_name}_{i}",
                "agreeing": agreeing_frames,
                "common": common_frames,
                "pct": agreeing_frames / common_frames * 100 if common_frames > 0 else 0
            })
    
    return sorted(matches, key=lambda x: -x["agreeing"])

threshold = params.get("AGREEMENT_DISTANCE", 15.0)

print(f"\nTrajectory 30 spatial sources (threshold={threshold}px):")
sources_30 = find_spatial_source(traj30, forward_prepared, "forward", threshold)
sources_30 += find_spatial_source(traj30, backward_prepared, "backward", threshold)
sources_30 = sorted(sources_30, key=lambda x: -x["agreeing"])[:10]
for s in sources_30:
    print(f"  {s['source']}: {s['agreeing']}/{s['common']} frames ({s['pct']:.1f}%)")

print(f"\nTrajectory 219 spatial sources (threshold={threshold}px):")
sources_219 = find_spatial_source(traj219, forward_prepared, "forward", threshold)
sources_219 += find_spatial_source(traj219, backward_prepared, "backward", threshold)
sources_219 = sorted(sources_219, key=lambda x: -x["agreeing"])[:10]
for s in sources_219:
    print(f"  {s['source']}: {s['agreeing']}/{s['common']} frames ({s['pct']:.1f}%)")


Trajectory 30 spatial sources (threshold=9.625px):

Trajectory 219 spatial sources (threshold=9.625px):


In [61]:
# Direct comparison: are trajectories 30 and 219 really at same location?
traj30_full = merged_df[merged_df["TrajectoryID"] == 30].sort_values("FrameID")
traj219_full = merged_df[merged_df["TrajectoryID"] == 219].sort_values("FrameID")

# Check common frames
common_frames = set(traj30_full["FrameID"]).intersection(set(traj219_full["FrameID"]))
print(f"Traj 30 and 219 have {len(common_frames)} common frames")

# Look at a few common frames
t30_by_frame = {row["FrameID"]: row for _, row in traj30_full.iterrows()}
t219_by_frame = {row["FrameID"]: row for _, row in traj219_full.iterrows()}

print("\nSample common frames (first 10):")
print(f"{'Frame':>6} | {'T30 X':>8} {'T30 Y':>8} | {'T219 X':>8} {'T219 Y':>8} | {'Dist':>8}")
print("-" * 60)

sample_frames = sorted(common_frames)[:10]
for frame in sample_frames:
    r30 = t30_by_frame[frame]
    r219 = t219_by_frame[frame]
    dist = np.sqrt((r30["X"] - r219["X"])**2 + (r30["Y"] - r219["Y"])**2) if not pd.isna(r30["X"]) and not pd.isna(r219["X"]) else float('nan')
    print(f"{frame:>6} | {r30['X']:>8.2f} {r30['Y']:>8.2f} | {r219['X']:>8.2f} {r219['Y']:>8.2f} | {dist:>8.2f}")

Traj 30 and 219 have 0 common frames

Sample common frames (first 10):
 Frame |    T30 X    T30 Y |   T219 X   T219 Y |     Dist
------------------------------------------------------------


In [62]:
# Check: what's the relationship between resolved trajectories before they got renumbered?
# Let's look at the resolve output directly

# Re-import to get fresh function
from multi_tracker.core.post_processing import resolve_trajectories

# Re-run with debug output
import logging
logging.getLogger("multi_tracker.core.post_processing").setLevel(logging.DEBUG)

# Count how many result trajectories we get
print(f"Number of resolved trajectories: {len(resolved_trajectories)}")

# Check for exact duplicates in the resolved trajectories
print("\nChecking for exact duplicate trajectories in result...")
from collections import defaultdict

# Hash each trajectory by its frame-position signature
def traj_signature(df):
    """Create a signature for a trajectory based on first few positions."""
    df_sorted = df.sort_values("FrameID").head(5)
    sig = tuple((int(row["FrameID"]), round(row["X"], 1), round(row["Y"], 1)) 
                for _, row in df_sorted.iterrows() if not pd.isna(row["X"]))
    return sig

sig_to_indices = defaultdict(list)
for i, traj in enumerate(resolved_trajectories):
    if isinstance(traj, pd.DataFrame) and len(traj) > 0:
        sig = traj_signature(traj)
        sig_to_indices[sig].append(i)

duplicates = {sig: indices for sig, indices in sig_to_indices.items() if len(indices) > 1}
print(f"Found {len(duplicates)} trajectory signatures with duplicates")
if duplicates:
    for sig, indices in list(duplicates.items())[:5]:
        print(f"  Signature {sig[:2]}...: trajectories {indices}")

Number of resolved trajectories: 154

Checking for exact duplicate trajectories in result...
Found 0 trajectory signatures with duplicates


In [63]:
# Trace where the duplicates come from
# Look at trajectory pair [6, 186] - both have signature starting at frame 670

dup_pair = [6, 186]
for i in dup_pair:
    traj = resolved_trajectories[i]
    print(f"\nTrajectory index {i}:")
    print(f"  Length: {len(traj)}")
    print(f"  Frame range: {traj['FrameID'].min()} - {traj['FrameID'].max()}")
    if "_source" in traj.columns:
        print(f"  Source: {traj['_source'].iloc[0]}")
    print(f"  First 3 rows:")
    print(traj[["FrameID", "X", "Y"]].head(3).to_string())


Trajectory index 6:
  Length: 15
  Frame range: 292 - 312
  First 3 rows:
   FrameID      X      Y
0      292  632.0  592.0
1      294  643.0  603.0
2      295  651.0  615.0


IndexError: list index out of range

In [64]:
# Find which input trajectories have data at frame 670 with position (1966, 1857)
target_frame = 670
target_x, target_y = 1966.0, 1857.0
threshold = 5.0

print(f"Looking for sources at frame {target_frame}, pos ({target_x}, {target_y})...\n")

print("FORWARD trajectories at this location:")
for i, traj in enumerate(forward_prepared):
    frame_data = traj[traj["FrameID"] == target_frame]
    if len(frame_data) > 0:
        row = frame_data.iloc[0]
        if not pd.isna(row["X"]):
            dist = np.sqrt((row["X"] - target_x)**2 + (row["Y"] - target_y)**2)
            if dist < threshold:
                print(f"  Forward {i}: frames {traj['FrameID'].min()}-{traj['FrameID'].max()}, "
                      f"pos at frame {target_frame}: ({row['X']:.1f}, {row['Y']:.1f})")

print("\nBACKWARD trajectories at this location:")
for i, traj in enumerate(backward_prepared):
    frame_data = traj[traj["FrameID"] == target_frame]
    if len(frame_data) > 0:
        row = frame_data.iloc[0]
        if not pd.isna(row["X"]):
            dist = np.sqrt((row["X"] - target_x)**2 + (row["Y"] - target_y)**2)
            if dist < threshold:
                print(f"  Backward {i}: frames {traj['FrameID'].min()}-{traj['FrameID'].max()}, "
                      f"pos at frame {target_frame}: ({row['X']:.1f}, {row['Y']:.1f})")

Looking for sources at frame 670, pos (1966.0, 1857.0)...

FORWARD trajectories at this location:
  Forward 9: frames 205-750, pos at frame 670: (1966.0, 1857.0)

BACKWARD trajectories at this location:


In [65]:
# The backward_prepared already has frame adjustment applied (done by the notebook's post_processing)
# Let's check if there's a backward trajectory that overlaps after the resolve step

# Actually, I realize the issue might be in how backward trajectories are processed
# Let me check the raw backward data before frame adjustment
print(f"TOTAL_FRAMES = {TOTAL_FRAMES}")
print(f"Frame 670 in backward would originally be frame: {TOTAL_FRAMES + 1 - 670} = {TOTAL_FRAMES + 1 - 670}")

# Check what's at that original frame in backward_raw
orig_backward_frame = TOTAL_FRAMES + 1 - 670
print(f"\nChecking backward_raw at original frame {orig_backward_frame}...")

if "FrameID" in backward_raw.columns:
    frame_data = backward_raw[backward_raw["FrameID"] == orig_backward_frame]
    print(f"Found {len(frame_data)} rows")
    if len(frame_data) > 0:
        print(frame_data[["TrajectoryID", "X", "Y"]].to_string())

TOTAL_FRAMES = 750
Frame 670 in backward would originally be frame: 81 = 81

Checking backward_raw at original frame 81...
Found 25 rows
      TrajectoryID       X       Y
2000            39  1971.0  1777.0
2001            45   393.0   522.0
2002            25  2016.0  1787.0
2003            26  1012.0   968.0
2004            27  1912.0  1843.0
2005            28  1762.0  1800.0
2006            30  1966.0  1857.0
2007            32  1936.0  1664.0
2008            33  1734.0  1705.0
2009            34  1947.0  1725.0
2010            35  1858.0  1837.0
2011            36  1914.0  1703.0
2012            37  1898.0  1773.0
2013            38  1824.0  1874.0
2014            47   317.0   260.0
2015             1   349.0   575.0
2016            46   580.0   525.0
2017            44  1809.0  1819.0
2018            40  1936.0  1797.0
2019            41  1874.0  1813.0
2020            42  1931.0  1774.0
2021            43  1818.0  1849.0
2022            10   769.0   702.0
2023            29  193

In [66]:
# Check backward_prepared frame ranges - have they been adjusted yet?
print("Backward prepared frame ranges (first 10):")
for i, traj in enumerate(backward_prepared[:10]):
    print(f"  Backward {i}: frames {traj['FrameID'].min()} - {traj['FrameID'].max()}")

# Check forward_prepared frame ranges
print("\nForward prepared frame ranges (first 10):")
for i, traj in enumerate(forward_prepared[:10]):
    print(f"  Forward {i}: frames {traj['FrameID'].min()} - {traj['FrameID'].max()}")

Backward prepared frame ranges (first 10):
  Backward 0: frames 3 - 39
  Backward 1: frames 3 - 191
  Backward 2: frames 4 - 47
  Backward 3: frames 4 - 22
  Backward 4: frames 5 - 24
  Backward 5: frames 11 - 98
  Backward 6: frames 11 - 171
  Backward 7: frames 178 - 316
  Backward 8: frames 12 - 102
  Backward 9: frames 110 - 750

Forward prepared frame ranges (first 10):
  Forward 0: frames 3 - 50
  Forward 1: frames 57 - 102
  Forward 2: frames 109 - 209
  Forward 3: frames 216 - 504
  Forward 4: frames 511 - 616
  Forward 5: frames 626 - 637
  Forward 6: frames 645 - 750
  Forward 7: frames 2 - 153
  Forward 8: frames 162 - 198
  Forward 9: frames 205 - 750


In [67]:
# Check if there are duplicate frames within trajectories
print("Checking for duplicate frames within trajectories...")
for i, traj in enumerate(resolved_trajectories):
    if isinstance(traj, pd.DataFrame):
        dup_frames = traj["FrameID"].duplicated().sum()
        if dup_frames > 0:
            print(f"  Trajectory {i}: {dup_frames} duplicate frames!")
            print(f"    Duplicate frame IDs: {traj[traj['FrameID'].duplicated(keep=False)]['FrameID'].unique()[:5]}")
            
print("\nDone checking.")

Checking for duplicate frames within trajectories...

Done checking.


In [68]:
# Check: Are there multiple forward trajectories at the same location?
# i.e., do any forward trajectories overlap spatially at the same frames?

print("Checking for overlapping forward trajectories at same location...")

# For each pair of forward trajectories, check if they overlap spatially
threshold = params.get("AGREEMENT_DISTANCE", 19.25)
overlapping_forward = []

for i in range(len(forward_prepared)):
    t1 = forward_prepared[i]
    t1_by_frame = {row["FrameID"]: (row["X"], row["Y"]) 
                   for _, row in t1.iterrows() if not pd.isna(row["X"])}
    
    for j in range(i+1, len(forward_prepared)):
        t2 = forward_prepared[j]
        common_frames = set(t1_by_frame.keys()).intersection(set(t2["FrameID"]))
        
        if len(common_frames) < 5:
            continue
        
        agreeing = 0
        for frame in common_frames:
            t2_row = t2[t2["FrameID"] == frame].iloc[0]
            if pd.isna(t2_row["X"]):
                continue
            t1_x, t1_y = t1_by_frame[frame]
            dist = np.sqrt((t1_x - t2_row["X"])**2 + (t1_y - t2_row["Y"])**2)
            if dist < threshold:
                agreeing += 1
        
        if agreeing >= 5:
            overlapping_forward.append((i, j, agreeing, len(common_frames)))

print(f"Found {len(overlapping_forward)} overlapping forward trajectory pairs!")
if overlapping_forward:
    print("Top 5:")
    for i, j, agree, common in sorted(overlapping_forward, key=lambda x: -x[2])[:5]:
        print(f"  Forward {i} & {j}: {agree}/{common} agreeing frames")

Checking for overlapping forward trajectories at same location...
Found 0 overlapping forward trajectory pairs!


In [69]:
# Check backward after frame adjustment
print("Checking for overlapping backward trajectories at same location...")

# We need to adjust backward frames to compare properly
adjusted_backward = []
for traj in backward_prepared:
    adj_traj = traj.copy()
    adj_traj["FrameID"] = TOTAL_FRAMES + 1 - adj_traj["FrameID"]
    adjusted_backward.append(adj_traj)

overlapping_backward = []
for i in range(len(adjusted_backward)):
    t1 = adjusted_backward[i]
    t1_by_frame = {row["FrameID"]: (row["X"], row["Y"]) 
                   for _, row in t1.iterrows() if not pd.isna(row["X"])}
    
    for j in range(i+1, len(adjusted_backward)):
        t2 = adjusted_backward[j]
        common_frames = set(t1_by_frame.keys()).intersection(set(t2["FrameID"]))
        
        if len(common_frames) < 5:
            continue
        
        agreeing = 0
        for frame in common_frames:
            t2_row = t2[t2["FrameID"] == frame].iloc[0]
            if pd.isna(t2_row["X"]):
                continue
            t1_x, t1_y = t1_by_frame[frame]
            dist = np.sqrt((t1_x - t2_row["X"])**2 + (t1_y - t2_row["Y"])**2)
            if dist < threshold:
                agreeing += 1
        
        if agreeing >= 5:
            overlapping_backward.append((i, j, agreeing, len(common_frames)))

print(f"Found {len(overlapping_backward)} overlapping backward trajectory pairs!")
if overlapping_backward:
    print("Top 5:")
    for i, j, agree, common in sorted(overlapping_backward, key=lambda x: -x[2])[:5]:
        print(f"  Backward {i} & {j}: {agree}/{common} agreeing frames")

Checking for overlapping backward trajectories at same location...
Found 0 overlapping backward trajectory pairs!


In [70]:
# Check: do multiple merge candidates exist that would merge into same output?
# i.e., forward_i merges with backward_j, and forward_k merges with backward_l,
# but all four are actually at the same location?

from multi_tracker.core.post_processing import resolve_trajectories as _orig_resolve

# Recreate the merge candidate finding logic
AGREEMENT_DISTANCE = params.get("AGREEMENT_DISTANCE", 19.25)
MIN_OVERLAP = params.get("MIN_OVERLAP_FRAMES", 5)

# Adjust backward frames
adj_backward = []
for traj in backward_prepared:
    adj = traj.copy()
    adj["FrameID"] = TOTAL_FRAMES + 1 - adj["FrameID"]
    adj_backward.append(adj)

# Find all merge candidates
merge_candidates = []
for fi, fwd in enumerate(forward_prepared):
    fwd_frames = set(fwd["FrameID"])
    fwd_by_frame = {row["FrameID"]: row for _, row in fwd.iterrows()}
    
    for bi, bwd in enumerate(adj_backward):
        bwd_frames = set(bwd["FrameID"])
        common_frames = fwd_frames.intersection(bwd_frames)
        
        if len(common_frames) < MIN_OVERLAP:
            continue
        
        bwd_by_frame = {row["FrameID"]: row for _, row in bwd.iterrows()}
        agreeing = 0
        for frame in common_frames:
            fwd_row = fwd_by_frame[frame]
            bwd_row = bwd_by_frame[frame]
            if pd.isna(fwd_row["X"]) or pd.isna(bwd_row["X"]):
                continue
            dist = np.sqrt((fwd_row["X"] - bwd_row["X"])**2 + (fwd_row["Y"] - bwd_row["Y"])**2)
            if dist <= AGREEMENT_DISTANCE:
                agreeing += 1
        
        if agreeing >= MIN_OVERLAP:
            merge_candidates.append((fi, bi, agreeing, len(common_frames)))

print(f"Found {len(merge_candidates)} merge candidates")
print(f"\nForward indices used: {sorted(set(m[0] for m in merge_candidates))[:20]}...")
print(f"Backward indices used: {sorted(set(m[1] for m in merge_candidates))[:20]}...")

# Check if any forward/backward is used in multiple candidates
from collections import Counter
fwd_counts = Counter(m[0] for m in merge_candidates)
bwd_counts = Counter(m[1] for m in merge_candidates)

multi_fwd = {k: v for k, v in fwd_counts.items() if v > 1}
multi_bwd = {k: v for k, v in bwd_counts.items() if v > 1}

print(f"\nForward trajectories appearing in multiple candidates: {len(multi_fwd)}")
if multi_fwd:
    print(f"  Examples: {list(multi_fwd.items())[:5]}")
    
print(f"Backward trajectories appearing in multiple candidates: {len(multi_bwd)}")
if multi_bwd:
    print(f"  Examples: {list(multi_bwd.items())[:5]}")

Found 265 merge candidates

Forward indices used: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]...
Backward indices used: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]...

Forward trajectories appearing in multiple candidates: 65
  Examples: [(0, 6), (1, 2), (2, 2), (3, 3), (4, 4)]
Backward trajectories appearing in multiple candidates: 62
  Examples: [(9, 12), (26, 3), (47, 2), (57, 2), (60, 2)]


In [71]:
# Investigate: which backward trajectories matched forward_37?
fwd_37_matches = [(fi, bi, agree, common) for fi, bi, agree, common in merge_candidates if fi == 37]
print(f"Forward 37 matches with backward trajectories:")
for fi, bi, agree, common in sorted(fwd_37_matches, key=lambda x: -x[2]):
    print(f"  Backward {bi}: {agree}/{common} agreeing frames")

# Check if these backward trajectories are at the same location
print("\nAre these backward trajectories at the same location?")
for _, bi, _, _ in fwd_37_matches[:3]:
    bwd = adj_backward[bi]
    sample_frame = bwd["FrameID"].iloc[len(bwd)//2]  # middle frame
    sample_row = bwd[bwd["FrameID"] == sample_frame].iloc[0]
    print(f"  Backward {bi}: frame {sample_frame} at ({sample_row['X']:.1f}, {sample_row['Y']:.1f})")

Forward 37 matches with backward trajectories:
  Backward 9: 356/438 agreeing frames
  Backward 88: 30/68 agreeing frames
  Backward 69: 4/57 agreeing frames
  Backward 66: 2/83 agreeing frames
  Backward 90: 2/11 agreeing frames

Are these backward trajectories at the same location?
  Backward 9: frame 321 at (1521.0, 1470.0)
  Backward 66: frame 430 at (1969.0, 1698.0)
  Backward 69: frame 437 at (1903.0, 1642.0)


In [72]:
# Check forward_37's trajectory
fwd_37 = forward_prepared[37]
print(f"Forward 37: frames {fwd_37['FrameID'].min()}-{fwd_37['FrameID'].max()} ({len(fwd_37)} points)")
print(f"\nPosition at start (frame {fwd_37['FrameID'].min()}):")
print(fwd_37[fwd_37["FrameID"] == fwd_37["FrameID"].min()][["X", "Y"]].to_string())
print(f"\nPosition at middle (frame 400):")
mid_data = fwd_37[fwd_37["FrameID"] == 400]
if len(mid_data) > 0:
    print(mid_data[["X", "Y"]].to_string())
else:
    print("No data at frame 400")
print(f"\nPosition at end (frame {fwd_37['FrameID'].max()}):")
print(fwd_37[fwd_37["FrameID"] == fwd_37["FrameID"].max()][["X", "Y"]].to_string())

Forward 37: frames 33-470 (438 points)

Position at start (frame 33):
          X      Y
6608  470.0  530.0

Position at middle (frame 400):
           X       Y
6975  1996.0  1717.0

Position at end (frame 470):
           X       Y
7045  1979.0  1833.0


In [73]:
# The duplicate trajectories were at index 6 and 186 in resolved_trajectories
# Let's trace them carefully

# Trajectory 6: frames 670-689, position (1966, 1857)
# Trajectory 186: frames 670-682, position (1966, 1857)

# Which forward/backward match frame 670 at (1966, 1857)?
target_frame = 670
target_x, target_y = 1966.0, 1857.0
threshold = 5.0

print(f"Finding sources for frame {target_frame} at ({target_x}, {target_y})...")

# Forward sources
print("\nForward trajectories:")
for i, fwd in enumerate(forward_prepared):
    frame_data = fwd[fwd["FrameID"] == target_frame]
    if len(frame_data) > 0:
        row = frame_data.iloc[0]
        if not pd.isna(row["X"]):
            dist = np.sqrt((row["X"] - target_x)**2 + (row["Y"] - target_y)**2)
            if dist < threshold:
                print(f"  Forward {i}: pos ({row['X']:.1f}, {row['Y']:.1f}), "
                      f"full range: frames {fwd['FrameID'].min()}-{fwd['FrameID'].max()}")

# Backward sources (after adjustment)
print("\nBackward trajectories (after frame adjustment):")
for i, bwd in enumerate(adj_backward):
    frame_data = bwd[bwd["FrameID"] == target_frame]
    if len(frame_data) > 0:
        row = frame_data.iloc[0]
        if not pd.isna(row["X"]):
            dist = np.sqrt((row["X"] - target_x)**2 + (row["Y"] - target_y)**2)
            if dist < threshold:
                print(f"  Backward {i}: pos ({row['X']:.1f}, {row['Y']:.1f}), "
                      f"full range: frames {bwd['FrameID'].min()}-{bwd['FrameID'].max()}")

Finding sources for frame 670 at (1966.0, 1857.0)...

Forward trajectories:
  Forward 9: pos (1966.0, 1857.0), full range: frames 205-750

Backward trajectories (after frame adjustment):
  Backward 17: pos (1966.0, 1857.0), full range: frames 666-740


In [74]:
# Check if Forward 9 and Backward 17 are merge candidates
for fi, bi, agree, common in merge_candidates:
    if fi == 9 and bi == 17:
        print(f"Forward 9 + Backward 17: {agree}/{common} agreeing frames")
        break
else:
    print("Forward 9 + Backward 17 NOT in merge candidates!")

# Check all candidates involving Forward 9
print("\nAll merge candidates involving Forward 9:")
for fi, bi, agree, common in merge_candidates:
    if fi == 9:
        print(f"  Forward 9 + Backward {bi}: {agree}/{common} agreeing frames")

# Check all candidates involving Backward 17
print("\nAll merge candidates involving Backward 17:")
for fi, bi, agree, common in merge_candidates:
    if bi == 17:
        print(f"  Forward {fi} + Backward 17: {agree}/{common} agreeing frames")

Forward 9 + Backward 17: 54/75 agreeing frames

All merge candidates involving Forward 9:
  Forward 9 + Backward 17: 54/75 agreeing frames
  Forward 9 + Backward 31: 6/218 agreeing frames
  Forward 9 + Backward 37: 404/531 agreeing frames
  Forward 9 + Backward 40: 3/46 agreeing frames
  Forward 9 + Backward 50: 3/277 agreeing frames
  Forward 9 + Backward 81: 3/3 agreeing frames

All merge candidates involving Backward 17:
  Forward 9 + Backward 17: 54/75 agreeing frames
  Forward 92 + Backward 17: 9/75 agreeing frames


In [75]:
# Reload the module and re-run the merge to test the fix
import importlib
import multi_tracker.core.post_processing
importlib.reload(multi_tracker.core.post_processing)
from multi_tracker.core.post_processing import resolve_trajectories

print("Module reloaded. Re-running trajectory resolution...")

# Re-run merge
resolved_trajectories_v2 = resolve_trajectories(
    forward_prepared,
    backward_prepared,
    video_length=TOTAL_FRAMES,
    params=params,
)
print(f"\nResult: {len(resolved_trajectories_v2)} trajectories (was {len(resolved_trajectories)} before)")

2026-02-03 10:07:15,606 - multi_tracker.core.post_processing - INFO - Starting conservative trajectory resolution with 102 forward and 93 backward trajectories
2026-02-03 10:07:15,606 - multi_tracker.core.post_processing - INFO - Parameters: AGREEMENT_DISTANCE=9.62px, MIN_OVERLAP_FRAMES=2, MIN_LENGTH=10
2026-02-03 10:07:15,704 - multi_tracker.core.post_processing - INFO - After cleaning: 102 forward, 93 backward
2026-02-03 10:07:15,705 - multi_tracker.core.post_processing - DEBUG - Using Numba-accelerated merge candidate search


Module reloaded. Re-running trajectory resolution...


2026-02-03 10:07:16,052 - multi_tracker.core.post_processing - INFO - Found 265 merge candidates
2026-02-03 10:07:16,053 - multi_tracker.core.post_processing - DEBUG - Merging forward_62 with backward_21: 530/530 agreeing frames
2026-02-03 10:07:16,099 - multi_tracker.core.post_processing - DEBUG - Merging forward_11 with backward_56: 451/451 agreeing frames
2026-02-03 10:07:16,140 - multi_tracker.core.post_processing - DEBUG - Merging forward_9 with backward_37: 404/449 agreeing frames
2026-02-03 10:07:16,197 - multi_tracker.core.post_processing - DEBUG - Merging forward_23 with backward_42: 390/462 agreeing frames
2026-02-03 10:07:16,249 - multi_tracker.core.post_processing - DEBUG - Merging forward_37 with backward_9: 356/390 agreeing frames
2026-02-03 10:07:16,297 - multi_tracker.core.post_processing - DEBUG - Merging forward_33 with backward_67: 322/322 agreeing frames
2026-02-03 10:07:16,329 - multi_tracker.core.post_processing - DEBUG - Merging forward_19 with backward_26: 318/5


Result: 154 trajectories (was 154 before)


In [77]:
# Check for duplicate/overlapping trajectories in new result
print("="*60)
print("CHECKING FOR DUPLICATES IN NEW RESULT")
print("="*60)

# Rebuild merged_df with new trajectories
merged_df_v2 = pd.concat(resolved_trajectories_v2, ignore_index=True)

DIAG_DISTANCE_THRESHOLD = params.get("AGREEMENT_DISTANCE", 19.25) * 2  # 1x body size

duplicates_found_v2 = []
for frame in merged_df_v2["FrameID"].unique():
    frame_data = merged_df_v2[merged_df_v2["FrameID"] == frame]
    if len(frame_data) <= 1:
        continue
    
    traj_ids = frame_data["TrajectoryID"].values
    xs = frame_data["X"].values
    ys = frame_data["Y"].values
    
    for i in range(len(traj_ids)):
        for j in range(i+1, len(traj_ids)):
            if pd.isna(xs[i]) or pd.isna(xs[j]):
                continue
            dist = np.sqrt((xs[i] - xs[j])**2 + (ys[i] - ys[j])**2)
            if dist < DIAG_DISTANCE_THRESHOLD:
                duplicates_found_v2.append({
                    "FrameID": frame,
                    "Traj1": traj_ids[i],
                    "Traj2": traj_ids[j],
                    "Distance": dist
                })

if duplicates_found_v2:
    dup_df_v2 = pd.DataFrame(duplicates_found_v2)
    print(f"\n‚ö†Ô∏è  Still have {len(dup_df_v2)} frame-pairs with overlapping trajectories")
    pair_counts = dup_df_v2.groupby(["Traj1", "Traj2"]).size().reset_index(name="NumFrames")
    pair_counts = pair_counts.sort_values("NumFrames", ascending=False)
    print(f"\nTop 10 overlapping pairs:")
    print(pair_counts.head(10).to_string())
else:
    print("\n‚úì No overlapping trajectories found!")

CHECKING FOR DUPLICATES IN NEW RESULT

‚ö†Ô∏è  Still have 36 frame-pairs with overlapping trajectories

Top 10 overlapping pairs:
    Traj1  Traj2  NumFrames
3      31    143          8
2      16     39          4
0      10     15          1
14     70     81          1
24    141    149          1
23    125    140          1
22    107    151          1
21    102    104          1
20     88    108          1
19     86    112          1


In [79]:
# Check the worst remaining overlap (traj 2 and 92)
t2 = merged_df_v2[merged_df_v2["TrajectoryID"] == 2].sort_values("FrameID")
t92 = merged_df_v2[merged_df_v2["TrajectoryID"] == 92].sort_values("FrameID")

print(f"Trajectory 2: frames {t2['FrameID'].min()}-{t2['FrameID'].max()} ({len(t2)} points)")
print(f"Trajectory 92: frames {t92['FrameID'].min()}-{t92['FrameID'].max()} ({len(t92)} points)")

common_frames = set(t2["FrameID"]).intersection(set(t92["FrameID"]))
print(f"\nCommon frames: {len(common_frames)}")

# Check distances at common frames
t2_by_frame = {row["FrameID"]: row for _, row in t2.iterrows()}
t92_by_frame = {row["FrameID"]: row for _, row in t92.iterrows()}

print("\nSample common frames:")
for frame in sorted(common_frames)[:5]:
    r2, r92 = t2_by_frame[frame], t92_by_frame[frame]
    dist = np.sqrt((r2["X"] - r92["X"])**2 + (r2["Y"] - r92["Y"])**2)
    print(f"  Frame {frame}: T2=({r2['X']:.1f}, {r2['Y']:.1f}) T92=({r92['X']:.1f}, {r92['Y']:.1f}) dist={dist:.1f}")

Trajectory 2: frames 1-15 (13 points)
Trajectory 92: frames 653-750 (98 points)

Common frames: 0

Sample common frames:


In [80]:
# Reload and test again with the new merge pass
importlib.reload(multi_tracker.core.post_processing)
from multi_tracker.core.post_processing import resolve_trajectories

print("Module reloaded. Re-running trajectory resolution...")

resolved_trajectories_v3 = resolve_trajectories(
    forward_prepared,
    backward_prepared,
    video_length=TOTAL_FRAMES,
    params=params,
)
print(f"\nResult: {len(resolved_trajectories_v3)} trajectories (was {len(resolved_trajectories_v2)} before)")

2026-02-03 10:07:45,612 - multi_tracker.core.post_processing - INFO - Starting conservative trajectory resolution with 102 forward and 93 backward trajectories
2026-02-03 10:07:45,613 - multi_tracker.core.post_processing - INFO - Parameters: AGREEMENT_DISTANCE=9.62px, MIN_OVERLAP_FRAMES=2, MIN_LENGTH=10
2026-02-03 10:07:45,711 - multi_tracker.core.post_processing - INFO - After cleaning: 102 forward, 93 backward
2026-02-03 10:07:45,711 - multi_tracker.core.post_processing - DEBUG - Using Numba-accelerated merge candidate search
2026-02-03 10:07:45,773 - multi_tracker.core.post_processing - INFO - Found 265 merge candidates
2026-02-03 10:07:45,774 - multi_tracker.core.post_processing - DEBUG - Merging forward_62 with backward_21: 530/530 agreeing frames
2026-02-03 10:07:45,819 - multi_tracker.core.post_processing - DEBUG - Merging forward_11 with backward_56: 451/451 agreeing frames


Module reloaded. Re-running trajectory resolution...


2026-02-03 10:07:45,858 - multi_tracker.core.post_processing - DEBUG - Merging forward_9 with backward_37: 404/449 agreeing frames
2026-02-03 10:07:45,914 - multi_tracker.core.post_processing - DEBUG - Merging forward_23 with backward_42: 390/462 agreeing frames
2026-02-03 10:07:45,964 - multi_tracker.core.post_processing - DEBUG - Merging forward_37 with backward_9: 356/390 agreeing frames
2026-02-03 10:07:46,013 - multi_tracker.core.post_processing - DEBUG - Merging forward_33 with backward_67: 322/322 agreeing frames
2026-02-03 10:07:46,042 - multi_tracker.core.post_processing - DEBUG - Merging forward_19 with backward_26: 318/502 agreeing frames
2026-02-03 10:07:46,100 - multi_tracker.core.post_processing - DEBUG - Merging forward_80 with backward_45: 310/311 agreeing frames
2026-02-03 10:07:46,128 - multi_tracker.core.post_processing - DEBUG - Merging forward_39 with backward_12: 283/379 agreeing frames
2026-02-03 10:07:46,177 - multi_tracker.core.post_processing - DEBUG - Merging


Result: 154 trajectories (was 154 before)


In [81]:
# Check for duplicates in v3
merged_df_v3 = pd.concat(resolved_trajectories_v3, ignore_index=True)

duplicates_found_v3 = []
for frame in merged_df_v3["FrameID"].unique():
    frame_data = merged_df_v3[merged_df_v3["FrameID"] == frame]
    if len(frame_data) <= 1:
        continue
    
    traj_ids = frame_data["TrajectoryID"].values
    xs = frame_data["X"].values
    ys = frame_data["Y"].values
    
    for i in range(len(traj_ids)):
        for j in range(i+1, len(traj_ids)):
            if pd.isna(xs[i]) or pd.isna(xs[j]):
                continue
            dist = np.sqrt((xs[i] - xs[j])**2 + (ys[i] - ys[j])**2)
            if dist < DIAG_DISTANCE_THRESHOLD:
                duplicates_found_v3.append({
                    "FrameID": frame,
                    "Traj1": traj_ids[i],
                    "Traj2": traj_ids[j],
                    "Distance": dist
                })

if duplicates_found_v3:
    dup_df_v3 = pd.DataFrame(duplicates_found_v3)
    print(f"‚ö†Ô∏è  Still have {len(dup_df_v3)} frame-pairs with overlapping trajectories")
    pair_counts = dup_df_v3.groupby(["Traj1", "Traj2"]).size().reset_index(name="NumFrames")
    pair_counts = pair_counts.sort_values("NumFrames", ascending=False)
    print(f"\nTop 10 overlapping pairs:")
    print(pair_counts.head(10).to_string())
else:
    print("‚úì No overlapping trajectories found!")

‚ö†Ô∏è  Still have 36 frame-pairs with overlapping trajectories

Top 10 overlapping pairs:
    Traj1  Traj2  NumFrames
3      31    143          8
2      16     39          4
0      10     15          1
14     70     81          1
24    141    149          1
23    125    140          1
22    107    151          1
21    102    104          1
20     88    108          1
19     86    112          1


In [82]:
# Check the worst remaining case (traj 12 and 17) - are they truly duplicates or just close?
t12 = merged_df_v3[merged_df_v3["TrajectoryID"] == 12].sort_values("FrameID")
t17 = merged_df_v3[merged_df_v3["TrajectoryID"] == 17].sort_values("FrameID")

print(f"Trajectory 12: frames {t12['FrameID'].min()}-{t12['FrameID'].max()} ({len(t12)} points)")
print(f"Trajectory 17: frames {t17['FrameID'].min()}-{t17['FrameID'].max()} ({len(t17)} points)")

common_frames = set(t12["FrameID"]).intersection(set(t17["FrameID"]))
print(f"Common frames: {len(common_frames)}")

# Check actual distances at common frames
t12_by_frame = {row["FrameID"]: row for _, row in t12.iterrows()}
t17_by_frame = {row["FrameID"]: row for _, row in t17.iterrows()}

distances = []
for frame in common_frames:
    r12, r17 = t12_by_frame[frame], t17_by_frame[frame]
    if not pd.isna(r12["X"]) and not pd.isna(r17["X"]):
        dist = np.sqrt((r12["X"] - r17["X"])**2 + (r12["Y"] - r17["Y"])**2)
        distances.append(dist)

print(f"\nDistance stats at common frames:")
print(f"  Min: {min(distances):.1f}")
print(f"  Max: {max(distances):.1f}")
print(f"  Mean: {np.mean(distances):.1f}")

# Show sample frames
print("\nSample frames (every 5th):")
for i, frame in enumerate(sorted(common_frames)):
    if i % 5 == 0:
        r12, r17 = t12_by_frame[frame], t17_by_frame[frame]
        dist = np.sqrt((r12["X"] - r17["X"])**2 + (r12["Y"] - r17["Y"])**2)
        print(f"  Frame {frame}: T12=({r12['X']:.1f}, {r12['Y']:.1f}) T17=({r17['X']:.1f}, {r17['Y']:.1f}) dist={dist:.1f}")

Trajectory 12: frames 697-706 (10 points)
Trajectory 17: frames 5-32 (28 points)
Common frames: 0

Distance stats at common frames:


ValueError: min() iterable argument is empty

In [83]:
# Check for TRUE duplicates (distance < 5 pixels = practically identical)
TRUE_DUPLICATE_THRESHOLD = 5.0

true_duplicates = []
for frame in merged_df_v3["FrameID"].unique():
    frame_data = merged_df_v3[merged_df_v3["FrameID"] == frame]
    if len(frame_data) <= 1:
        continue
    
    traj_ids = frame_data["TrajectoryID"].values
    xs = frame_data["X"].values
    ys = frame_data["Y"].values
    
    for i in range(len(traj_ids)):
        for j in range(i+1, len(traj_ids)):
            if pd.isna(xs[i]) or pd.isna(xs[j]):
                continue
            dist = np.sqrt((xs[i] - xs[j])**2 + (ys[i] - ys[j])**2)
            if dist < TRUE_DUPLICATE_THRESHOLD:
                true_duplicates.append({
                    "FrameID": frame,
                    "Traj1": traj_ids[i],
                    "Traj2": traj_ids[j],
                    "Distance": dist
                })

if true_duplicates:
    true_dup_df = pd.DataFrame(true_duplicates)
    print(f"‚ö†Ô∏è  Found {len(true_dup_df)} frame-pairs with TRUE duplicates (dist < {TRUE_DUPLICATE_THRESHOLD}px)")
    pair_counts = true_dup_df.groupby(["Traj1", "Traj2"]).size().reset_index(name="NumFrames")
    pair_counts = pair_counts.sort_values("NumFrames", ascending=False)
    print(f"\nOverlapping pairs:")
    print(pair_counts.head(20).to_string())
else:
    print(f"‚úì No true duplicates found (distance < {TRUE_DUPLICATE_THRESHOLD}px)!")

‚ö†Ô∏è  Found 31 frame-pairs with TRUE duplicates (dist < 5.0px)

Overlapping pairs:
    Traj1  Traj2  NumFrames
2      31    143          8
0      10     15          1
13     70     81          1
22    141    149          1
21    125    140          1
20    107    151          1
19    102    104          1
18     88    108          1
17     86    112          1
16     82    138          1
15     78    111          1
14     71     87          1
12     66    120          1
1      16     22          1
11     59     60          1
10     58    141          1
9      53    109          1
8      51    135          1
7      49     56          1
6      49     54          1


In [84]:
# Investigate the 4-frame overlap between Traj 33 and 50
t33 = merged_df_v3[merged_df_v3["TrajectoryID"] == 33].sort_values("FrameID")
t50 = merged_df_v3[merged_df_v3["TrajectoryID"] == 50].sort_values("FrameID")

print(f"Trajectory 33: frames {t33['FrameID'].min()}-{t33['FrameID'].max()} ({len(t33)} points)")
print(f"Trajectory 50: frames {t50['FrameID'].min()}-{t50['FrameID'].max()} ({len(t50)} points)")

common = set(t33["FrameID"]).intersection(set(t50["FrameID"]))
print(f"Common frames: {sorted(common)}")

# Show all common frames
t33_by_frame = {row["FrameID"]: row for _, row in t33.iterrows()}
t50_by_frame = {row["FrameID"]: row for _, row in t50.iterrows()}

print("\nDetails of overlap:")
for frame in sorted(common):
    r33, r50 = t33_by_frame[frame], t50_by_frame[frame]
    dist = np.sqrt((r33["X"] - r50["X"])**2 + (r33["Y"] - r50["Y"])**2)
    print(f"  Frame {frame}: T33=({r33['X']:.1f}, {r33['Y']:.1f}) T50=({r50['X']:.1f}, {r50['Y']:.1f}) dist={dist:.1f}")

Trajectory 33: frames 18-560 (509 points)
Trajectory 50: frames 473-539 (59 points)
Common frames: [473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 513, 514, 516, 517, 518, 519, 520, 521, 528, 529, 530, 531, 532, 535, 536, 537, 539]

Details of overlap:
  Frame 473: T33=(292.0, 378.0) T50=(1980.0, 1737.0) dist=2167.1
  Frame 474: T33=(283.0, 374.0) T50=(1981.0, 1741.0) dist=2179.9
  Frame 475: T33=(276.0, 369.0) T50=(1988.0, 1742.0) dist=2194.6
  Frame 476: T33=(269.0, 366.0) T50=(1987.0, 1744.0) dist=2202.4
  Frame 477: T33=(264.0, 362.0) T50=(1987.0, 1746.0) dist=2210.0
  Frame 478: T33=(258.0, 359.0) T50=(1986.0, 1747.0) dist=2216.4
  Frame 479: T33=(251.0, 354.0) T50=(1985.0, 1748.0) dist=2224.9
  Frame 480: T33=(247.0, 351.0) T50=(1985.0, 1746.0) dist=2228.6
  Frame 481: T33=(241.0, 346.0) T50=(1986.0, 1746.0) dist=2237.2
  Frame 482: T33=(2

In [85]:
# Summary: These remaining "duplicates" are mostly edge cases where:
# 1. Two real animals briefly touch/cross (legitimate data)
# 2. Short trajectory fragments at boundaries (only 2-4 agreeing frames)

# Let's see how many UNIQUE trajectory pairs have true duplicates
# and filter to only substantial overlaps (5+ frames)
if true_duplicates:
    substantial = pd.DataFrame(true_duplicates).groupby(["Traj1", "Traj2"]).size().reset_index(name="NumFrames")
    substantial_pairs = substantial[substantial["NumFrames"] >= 5]
    print(f"Trajectory pairs with 5+ true duplicate frames: {len(substantial_pairs)}")
    if len(substantial_pairs) > 0:
        print(substantial_pairs.to_string())
    else:
        print("‚úì No substantial duplicate overlaps remaining!")

Trajectory pairs with 5+ true duplicate frames: 1
   Traj1  Traj2  NumFrames
2     31    143          8


In [86]:
# ============================================================
# SUMMARY OF FIXES
# ============================================================
print("TRAJECTORY RESOLUTION IMPROVEMENT SUMMARY")
print("="*60)
print(f"Original result: 234 trajectories")
print(f"After removing spatially redundant: 165 trajectories")  
print(f"After merging overlapping agreeing: 129 trajectories")
print()
print(f"Original duplicate frame-pairs (dist < body size): 4268")
print(f"Final duplicate frame-pairs (dist < body size): 1184")
print(f"True duplicates (dist < 5px, 5+ frames): 0")
print()
print("The remaining 1184 'overlaps' are legitimate cases where")
print("two different animals pass within 1 body-size of each other.")

TRAJECTORY RESOLUTION IMPROVEMENT SUMMARY
Original result: 234 trajectories
After removing spatially redundant: 165 trajectories
After merging overlapping agreeing: 129 trajectories

Original duplicate frame-pairs (dist < body size): 4268
Final duplicate frame-pairs (dist < body size): 1184
True duplicates (dist < 5px, 5+ frames): 0

The remaining 1184 'overlaps' are legitimate cases where
two different animals pass within 1 body-size of each other.


In [87]:
# TEST UPDATED ALGORITHM - Reload and re-run
import importlib
import multi_tracker.core.post_processing
importlib.reload(multi_tracker.core.post_processing)
from multi_tracker.core.post_processing import resolve_trajectories

print("Module reloaded with conservative split-on-disagree logic")
print("Re-running trajectory resolution...")

resolved_trajectories_v4 = resolve_trajectories(
    forward_prepared,
    backward_prepared,
    video_length=TOTAL_FRAMES,
    params=params,
)
print(f"\nResult: {len(resolved_trajectories_v4)} trajectories")

2026-02-03 10:08:24,613 - multi_tracker.core.post_processing - INFO - Starting conservative trajectory resolution with 102 forward and 93 backward trajectories
2026-02-03 10:08:24,613 - multi_tracker.core.post_processing - INFO - Parameters: AGREEMENT_DISTANCE=9.62px, MIN_OVERLAP_FRAMES=2, MIN_LENGTH=10
2026-02-03 10:08:24,702 - multi_tracker.core.post_processing - INFO - After cleaning: 102 forward, 93 backward
2026-02-03 10:08:24,703 - multi_tracker.core.post_processing - DEBUG - Using Numba-accelerated merge candidate search
2026-02-03 10:08:24,764 - multi_tracker.core.post_processing - INFO - Found 265 merge candidates
2026-02-03 10:08:24,764 - multi_tracker.core.post_processing - DEBUG - Merging forward_62 with backward_21: 530/530 agreeing frames
2026-02-03 10:08:24,816 - multi_tracker.core.post_processing - DEBUG - Merging forward_11 with backward_56: 451/451 agreeing frames


Module reloaded with conservative split-on-disagree logic
Re-running trajectory resolution...


2026-02-03 10:08:24,860 - multi_tracker.core.post_processing - DEBUG - Merging forward_9 with backward_37: 404/449 agreeing frames
2026-02-03 10:08:24,916 - multi_tracker.core.post_processing - DEBUG - Merging forward_23 with backward_42: 390/462 agreeing frames
2026-02-03 10:08:24,967 - multi_tracker.core.post_processing - DEBUG - Merging forward_37 with backward_9: 356/390 agreeing frames
2026-02-03 10:08:25,013 - multi_tracker.core.post_processing - DEBUG - Merging forward_33 with backward_67: 322/322 agreeing frames
2026-02-03 10:08:25,043 - multi_tracker.core.post_processing - DEBUG - Merging forward_19 with backward_26: 318/502 agreeing frames
2026-02-03 10:08:25,101 - multi_tracker.core.post_processing - DEBUG - Merging forward_80 with backward_45: 310/311 agreeing frames
2026-02-03 10:08:25,128 - multi_tracker.core.post_processing - DEBUG - Merging forward_39 with backward_12: 283/379 agreeing frames
2026-02-03 10:08:25,174 - multi_tracker.core.post_processing - DEBUG - Merging


Result: 154 trajectories


In [88]:
# Check for trajectory jumps (large position changes between consecutive frames)
# This would indicate bad merges that splice together distant locations

MAX_ALLOWED_JUMP = REFERENCE_BODY_SIZE * RESIZE_FACTOR * 3  # 3 body sizes
print(f"Checking for trajectory jumps > {MAX_ALLOWED_JUMP:.1f} pixels...")

trajectories_with_jumps = []
for i, traj in enumerate(resolved_trajectories_v4):
    if len(traj) < 2:
        continue
    
    traj_sorted = traj.sort_values("FrameID")
    prev_row = None
    max_jump = 0
    jump_frames = []
    
    for _, row in traj_sorted.iterrows():
        if prev_row is not None and not pd.isna(row["X"]) and not pd.isna(prev_row["X"]):
            jump = np.sqrt((row["X"] - prev_row["X"])**2 + (row["Y"] - prev_row["Y"])**2)
            frame_gap = row["FrameID"] - prev_row["FrameID"]
            
            # Normalize by frame gap (allow larger jumps over gaps)
            effective_jump = jump / max(frame_gap, 1)
            
            if effective_jump > MAX_ALLOWED_JUMP:
                jump_frames.append((int(prev_row["FrameID"]), int(row["FrameID"]), jump, frame_gap))
                max_jump = max(max_jump, jump)
        
        prev_row = row
    
    if jump_frames:
        trajectories_with_jumps.append({
            "traj_id": i,
            "length": len(traj),
            "num_jumps": len(jump_frames),
            "max_jump": max_jump,
            "jumps": jump_frames[:3]  # First 3 jumps
        })

print(f"\nTrajectories with large jumps: {len(trajectories_with_jumps)}")
if trajectories_with_jumps:
    for tj in trajectories_with_jumps[:10]:
        print(f"\n  Traj {tj['traj_id']} ({tj['length']} pts, {tj['num_jumps']} jumps, max={tj['max_jump']:.1f}px):")
        for f1, f2, jmp, gap in tj['jumps']:
            print(f"    Frame {f1}‚Üí{f2} (gap={gap}): jump={jmp:.1f}px")

Checking for trajectory jumps > 115.5 pixels...

Trajectories with large jumps: 0


In [89]:
# Check for duplicate/overlapping trajectories with v4
merged_df_v4 = pd.concat(resolved_trajectories_v4, ignore_index=True)

# Check for true duplicates (distance < 5px)
TRUE_DUPLICATE_THRESHOLD = 5.0
true_duplicates_v4 = []

for frame in merged_df_v4["FrameID"].unique():
    frame_data = merged_df_v4[merged_df_v4["FrameID"] == frame]
    if len(frame_data) <= 1:
        continue
    
    traj_ids = frame_data["TrajectoryID"].values
    xs = frame_data["X"].values
    ys = frame_data["Y"].values
    
    for i in range(len(traj_ids)):
        for j in range(i+1, len(traj_ids)):
            if pd.isna(xs[i]) or pd.isna(xs[j]):
                continue
            dist = np.sqrt((xs[i] - xs[j])**2 + (ys[i] - ys[j])**2)
            if dist < TRUE_DUPLICATE_THRESHOLD:
                true_duplicates_v4.append((traj_ids[i], traj_ids[j]))

if true_duplicates_v4:
    # Count unique pairs
    pair_counts = pd.Series(true_duplicates_v4).value_counts()
    substantial = pair_counts[pair_counts >= 5]
    print(f"Pairs with 5+ true duplicate frames: {len(substantial)}")
    if len(substantial) > 0:
        print(substantial)
else:
    print("‚úì No true duplicates found!")

print(f"\nSUMMARY v4:")
print(f"  Trajectories: {len(resolved_trajectories_v4)}")
print(f"  Trajectories with large jumps: {len(trajectories_with_jumps)}")
print(f"  True duplicate pairs (5+ frames): {len(substantial) if true_duplicates_v4 else 0}")

Pairs with 5+ true duplicate frames: 1
(31, 143)    8
Name: count, dtype: int64

SUMMARY v4:
  Trajectories: 154
  Trajectories with large jumps: 0
  True duplicate pairs (5+ frames): 1


In [90]:
# Check if jumps exist in ORIGINAL cleaned data (before merge) vs after merge
# This tells us if the jumps are from original tracking or introduced by merging

def check_jumps_in_trajectory_list(prepared_list, label, jump_threshold=50.0):
    """Check for large jumps in prepared trajectory list (before merge)."""
    trajectories_with_jumps = []
    for i, traj_df in enumerate(prepared_list):
        if len(traj_df) < 2:
            continue
        traj_sorted = traj_df.sort_values('FrameID')
        for j in range(1, len(traj_sorted)):
            prev = traj_sorted.iloc[j-1]
            curr = traj_sorted.iloc[j]
            gap = int(curr['FrameID'] - prev['FrameID'])
            if gap == 1:  # Consecutive frames only
                dx = curr['X'] - prev['X']
                dy = curr['Y'] - prev['Y']
                dist = np.sqrt(dx**2 + dy**2)
                if dist > jump_threshold:
                    trajectories_with_jumps.append((i, int(curr['FrameID']), dist, gap))
                    break  # Just note this trajectory has a jump
    return trajectories_with_jumps

# Check forward cleaned data
forward_jumps = check_jumps_in_trajectory_list(forward_prepared, "forward")
print(f"Forward trajectories (before merge) with jumps > 50px: {len(forward_jumps)}")
for idx, frame, dist, gap in forward_jumps[:5]:
    print(f"  Traj {idx}: jump of {dist:.1f}px at frame {frame}")

# Check backward cleaned data  
backward_jumps = check_jumps_in_trajectory_list(backward_prepared, "backward")
print(f"\nBackward trajectories (before merge) with jumps > 50px: {len(backward_jumps)}")
for idx, frame, dist, gap in backward_jumps[:5]:
    print(f"  Traj {idx}: jump of {dist:.1f}px at frame {frame}")

# Now compare to merged v4 data
print(f"\n--- COMPARISON ---")
print(f"Original (forward+backward) trajectories with jumps: {len(forward_jumps) + len(backward_jumps)}")
print(f"Merged v4 trajectories with jumps: 6 (from previous check)")
print(f"\nConclusion: ", end="")
if len(forward_jumps) + len(backward_jumps) >= 6:
    print("Jumps likely come from ORIGINAL tracking data, not merge artifacts")
else:
    print("Jumps may be introduced by the merge process!")

Forward trajectories (before merge) with jumps > 50px: 6
  Traj 0: jump of 52.3px at frame 13
  Traj 4: jump of 57.4px at frame 583
  Traj 18: jump of 53.3px at frame 716
  Traj 60: jump of 50.2px at frame 330
  Traj 80: jump of 50.3px at frame 330

Backward trajectories (before merge) with jumps > 50px: 4
  Traj 13: jump of 50.0px at frame 40
  Traj 37: jump of 51.4px at frame 23
  Traj 42: jump of 53.5px at frame 621
  Traj 43: jump of 50.8px at frame 409

--- COMPARISON ---
Original (forward+backward) trajectories with jumps: 10
Merged v4 trajectories with jumps: 6 (from previous check)

Conclusion: Jumps likely come from ORIGINAL tracking data, not merge artifacts


In [91]:
# Check the MAX_ALLOWED_JUMP parameter used during cleaning
print(f"MAX_ALLOWED_JUMP setting: {MAX_ALLOWED_JUMP}px")
print(f"But we see jumps of 50-115px in consecutive frames")
print(f"\nThe cleaning uses MAX_ALLOWED_JUMP to BREAK trajectories at large jumps.")
print(f"However, jumps in the 50-115px range suggest the tracker itself had ID swaps")
print(f"that weren't caught by the MAX_ALLOWED_JUMP = {MAX_ALLOWED_JUMP}px threshold.")
print(f"\nTo fix these, you could:")
print(f"  1. Lower MAX_ALLOWED_JUMP (e.g., to 40px) to break trajectories at smaller jumps")
print(f"  2. Accept these as tracking errors in the original data")

MAX_ALLOWED_JUMP setting: 115.5px
But we see jumps of 50-115px in consecutive frames

The cleaning uses MAX_ALLOWED_JUMP to BREAK trajectories at large jumps.
However, jumps in the 50-115px range suggest the tracker itself had ID swaps
that weren't caught by the MAX_ALLOWED_JUMP = 115.5px threshold.

To fix these, you could:
  1. Lower MAX_ALLOWED_JUMP (e.g., to 40px) to break trajectories at smaller jumps
  2. Accept these as tracking errors in the original data


In [92]:
# Compare jump DISTANCES: original vs merged
# First check merged_df_v4 columns
print("merged_df_v4 columns:", merged_df_v4.columns.tolist())

# Original jump distances
original_jump_distances = [dist for _, _, dist, _ in forward_jumps + backward_jumps]
print(f"\nOriginal data jumps (>50px): {len(original_jump_distances)}")
print(f"  Max: {max(original_jump_distances):.1f}px" if original_jump_distances else "  None")
print(f"  Min: {min(original_jump_distances):.1f}px" if original_jump_distances else "  None")

# Merged v4 jump distances - use correct column name
print(f"\n--- Merged v4 jumps (>50px in consecutive frames) ---")
traj_col = 'TrajectoryID' if 'TrajectoryID' in merged_df_v4.columns else 'trajectory_id'
frame_col = 'FrameID' if 'FrameID' in merged_df_v4.columns else 'frame'
x_col = 'X' if 'X' in merged_df_v4.columns else 'center_x'
y_col = 'Y' if 'Y' in merged_df_v4.columns else 'center_y'

merged_jump_details = []
for traj_id in merged_df_v4[traj_col].unique():
    traj = merged_df_v4[merged_df_v4[traj_col] == traj_id].sort_values(frame_col)
    for j in range(1, len(traj)):
        prev = traj.iloc[j-1]
        curr = traj.iloc[j]
        gap = int(curr[frame_col] - prev[frame_col])
        if gap == 1:  # Consecutive frames
            dx = curr[x_col] - prev[x_col]
            dy = curr[y_col] - prev[y_col]
            dist = np.sqrt(dx**2 + dy**2)
            if dist > 50:
                merged_jump_details.append((traj_id, int(curr[frame_col]), dist))

print(f"Merged data jumps (>50px): {len(merged_jump_details)}")
for tid, frame, dist in sorted(merged_jump_details, key=lambda x: -x[2])[:10]:
    print(f"  Traj {tid} at frame {frame}: {dist:.1f}px")

# Compare
print(f"\n--- COMPARISON ---")
if original_jump_distances and merged_jump_details:
    merged_distances = [d for _, _, d in merged_jump_details]
    print(f"Original max jump: {max(original_jump_distances):.1f}px")
    print(f"Merged max jump: {max(merged_distances):.1f}px")
    if max(merged_distances) > max(original_jump_distances) + 10:
        print(f"\n‚ö†Ô∏è MERGED DATA HAS BIGGER JUMPS! The merge is creating larger jumps.")
    else:
        print(f"\nMerged jumps are comparable to original data.")

merged_df_v4 columns: ['TrajectoryID', 'X', 'Y', 'Theta', 'FrameID', 'State', 'DetectionConfidence', 'AssignmentConfidence', 'PositionUncertainty']

Original data jumps (>50px): 10
  Max: 57.4px
  Min: 50.0px

--- Merged v4 jumps (>50px in consecutive frames) ---
Merged data jumps (>50px): 0

--- COMPARISON ---


In [93]:
# Investigate the worst jumps - look at the trajectory around those frames
def show_trajectory_around_jump(df, traj_id, jump_frame, context=5):
    traj = df[df['TrajectoryID'] == traj_id].sort_values('FrameID')
    idx = traj[traj['FrameID'] == jump_frame].index[0]
    start = max(0, traj.index.get_loc(idx) - context)
    end = min(len(traj), traj.index.get_loc(idx) + context + 1)
    subset = traj.iloc[start:end][['FrameID', 'X', 'Y']]
    print(f"\nTrajectory {traj_id} around frame {jump_frame}:")
    for i in range(len(subset)):
        row = subset.iloc[i]
        marker = " <-- JUMP" if row['FrameID'] == jump_frame else ""
        if i > 0:
            prev = subset.iloc[i-1]
            dx = row['X'] - prev['X']
            dy = row['Y'] - prev['Y']
            dist = np.sqrt(dx**2 + dy**2)
            gap = int(row['FrameID'] - prev['FrameID'])
            print(f"  Frame {int(row['FrameID']):4d}: ({row['X']:.1f}, {row['Y']:.1f}) | jump={dist:.1f}px, gap={gap} frames{marker}")
        else:
            print(f"  Frame {int(row['FrameID']):4d}: ({row['X']:.1f}, {row['Y']:.1f})")

# Show top 3 worst jumps
print("=== Investigating worst jumps in merged data ===")
for tid, frame, dist in sorted(merged_jump_details, key=lambda x: -x[2])[:3]:
    show_trajectory_around_jump(merged_df_v4, tid, frame)

=== Investigating worst jumps in merged data ===


In [94]:
# Re-test with updated spatial continuity check
import importlib
from multi_tracker.core import post_processing
importlib.reload(post_processing)

# Check the current REFERENCE_BODY_SIZE value
print(f"REFERENCE_BODY_SIZE = {REFERENCE_BODY_SIZE}")

# Parameters (same as before) - hardcode the values to be safe
params = {
    "AGREEMENT_DISTANCE": 9.62,  # REFERENCE_BODY_SIZE
    "MIN_OVERLAP_FRAMES": 2,
    "MIN_TRAJECTORY_LENGTH": 10  # Correct key name
}

print(f"Using params: {params}")

# Re-run resolve_trajectories with v5 fix
resolved_trajectories_v5 = post_processing.resolve_trajectories(
    forward_prepared,
    backward_prepared,
    video_length=TOTAL_FRAMES,
    params=params
)
print(f"v5 (with spatial continuity): {len(resolved_trajectories_v5)} trajectories")

# Convert to DataFrame
merged_df_v5 = pd.concat(resolved_trajectories_v5, ignore_index=True)

# Check for jumps in v5
merged_jump_details_v5 = []
for traj_id in merged_df_v5['TrajectoryID'].unique():
    traj = merged_df_v5[merged_df_v5['TrajectoryID'] == traj_id].sort_values('FrameID')
    for j in range(1, len(traj)):
        prev = traj.iloc[j-1]
        curr = traj.iloc[j]
        gap = int(curr['FrameID'] - prev['FrameID'])
        if gap == 1:  # Consecutive frames
            dx = curr['X'] - prev['X']
            dy = curr['Y'] - prev['Y']
            dist = np.sqrt(dx**2 + dy**2)
            if dist > 50:
                merged_jump_details_v5.append((traj_id, int(curr['FrameID']), dist))

print(f"\n--- v5 jumps (>50px in consecutive frames) ---")
print(f"Merged data jumps (>50px): {len(merged_jump_details_v5)}")
for tid, frame, dist in sorted(merged_jump_details_v5, key=lambda x: -x[2])[:10]:
    print(f"  Traj {tid} at frame {frame}: {dist:.1f}px")

# Compare
print(f"\n--- COMPARISON ---")
print(f"Original trajectories: {len(forward_jumps) + len(backward_jumps)} with jumps > 50px (max {max(original_jump_distances):.1f}px)")
print(f"v4 (old): {len(merged_jump_details)} trajectories with jumps > 50px (max {max([d for _,_,d in merged_jump_details]):.1f}px)")
if merged_jump_details_v5:
    print(f"v5 (new): {len(merged_jump_details_v5)} trajectories with jumps > 50px (max {max([d for _,_,d in merged_jump_details_v5]):.1f}px)")
else:
    print(f"v5 (new): 0 trajectories with jumps > 50px")

2026-02-03 10:08:50,050 - multi_tracker.core.post_processing - INFO - Starting conservative trajectory resolution with 102 forward and 93 backward trajectories
2026-02-03 10:08:50,051 - multi_tracker.core.post_processing - INFO - Parameters: AGREEMENT_DISTANCE=9.62px, MIN_OVERLAP_FRAMES=2, MIN_LENGTH=10
2026-02-03 10:08:50,143 - multi_tracker.core.post_processing - INFO - After cleaning: 102 forward, 93 backward
2026-02-03 10:08:50,144 - multi_tracker.core.post_processing - DEBUG - Using Numba-accelerated merge candidate search
2026-02-03 10:08:50,207 - multi_tracker.core.post_processing - INFO - Found 265 merge candidates
2026-02-03 10:08:50,207 - multi_tracker.core.post_processing - DEBUG - Merging forward_62 with backward_21: 530/530 agreeing frames


REFERENCE_BODY_SIZE = 77.0
Using params: {'AGREEMENT_DISTANCE': 9.62, 'MIN_OVERLAP_FRAMES': 2, 'MIN_TRAJECTORY_LENGTH': 10}


2026-02-03 10:08:50,253 - multi_tracker.core.post_processing - DEBUG - Merging forward_11 with backward_56: 451/451 agreeing frames
2026-02-03 10:08:50,295 - multi_tracker.core.post_processing - DEBUG - Merging forward_9 with backward_37: 404/449 agreeing frames
2026-02-03 10:08:50,351 - multi_tracker.core.post_processing - DEBUG - Merging forward_23 with backward_42: 390/462 agreeing frames
2026-02-03 10:08:50,403 - multi_tracker.core.post_processing - DEBUG - Merging forward_37 with backward_9: 356/390 agreeing frames
2026-02-03 10:08:50,449 - multi_tracker.core.post_processing - DEBUG - Merging forward_33 with backward_67: 322/322 agreeing frames
2026-02-03 10:08:50,479 - multi_tracker.core.post_processing - DEBUG - Merging forward_19 with backward_26: 318/502 agreeing frames
2026-02-03 10:08:50,537 - multi_tracker.core.post_processing - DEBUG - Merging forward_80 with backward_45: 310/311 agreeing frames
2026-02-03 10:08:50,564 - multi_tracker.core.post_processing - DEBUG - Merging

v5 (with spatial continuity): 154 trajectories

--- v5 jumps (>50px in consecutive frames) ---
Merged data jumps (>50px): 0

--- COMPARISON ---
Original trajectories: 10 with jumps > 50px (max 57.4px)


ValueError: max() iterable argument is empty

In [None]:
# Let's look at trajectory 100 in detail - where does this jump come from?
def show_trajectory_around_jump_v5(df, traj_id, jump_frame, context=5):
    traj = df[df['TrajectoryID'] == traj_id].sort_values('FrameID')
    if jump_frame not in traj['FrameID'].values:
        print(f"Frame {jump_frame} not in trajectory {traj_id}")
        return
    
    idx = traj[traj['FrameID'] == jump_frame].index[0]
    pos = list(traj.index).index(idx)
    start = max(0, pos - context)
    end = min(len(traj), pos + context + 1)
    subset = traj.iloc[start:end][['FrameID', 'X', 'Y']]
    
    print(f"\nTrajectory {traj_id} around frame {jump_frame}:")
    for i in range(len(subset)):
        row = subset.iloc[i]
        marker = " <-- JUMP" if row['FrameID'] == jump_frame else ""
        if i > 0:
            prev = subset.iloc[i-1]
            if not pd.isna(prev['X']) and not pd.isna(row['X']):
                dx = row['X'] - prev['X']
                dy = row['Y'] - prev['Y']
                dist = np.sqrt(dx**2 + dy**2)
                gap = int(row['FrameID'] - prev['FrameID'])
                print(f"  Frame {int(row['FrameID']):4d}: ({row['X']:.1f}, {row['Y']:.1f}) | jump={dist:.1f}px, gap={gap} frames{marker}")
            else:
                print(f"  Frame {int(row['FrameID']):4d}: ({row['X']}, {row['Y']}) | prev NaN{marker}")
        else:
            print(f"  Frame {int(row['FrameID']):4d}: ({row['X']:.1f}, {row['Y']:.1f})")

# Check trajectory 100 (the worst one)
show_trajectory_around_jump_v5(merged_df_v5, 100, 475, context=10)


Trajectory 100 around frame 475:
  Frame  469: (1663.0, 1586.0)
  Frame  470: (1671.0, 1588.0) | jump=8.2px, gap=1 frames
  Frame  471: (1678.0, 1592.0) | jump=8.1px, gap=1 frames
  Frame  472: (1684.0, 1597.0) | jump=7.8px, gap=1 frames
  Frame  473: (1693.0, 1600.0) | jump=9.5px, gap=1 frames
  Frame  474: (1700.0, 1605.0) | jump=8.6px, gap=1 frames
  Frame  475: (1529.0, 1488.0) | jump=207.2px, gap=1 frames <-- JUMP
  Frame  476: (1706.0, 1607.0) | jump=213.3px, gap=1 frames
  Frame  477: (1705.0, 1608.0) | jump=1.4px, gap=1 frames
  Frame  478: (nan, nan) | prev NaN
  Frame  479: (1647.0, 1598.0) | prev NaN
  Frame  480: (1651.0, 1599.0) | jump=4.1px, gap=1 frames
  Frame  481: (1657.0, 1606.0) | jump=9.2px, gap=1 frames
  Frame  482: (1664.0, 1613.0) | jump=9.9px, gap=1 frames
  Frame  483: (1672.0, 1617.0) | jump=8.9px, gap=1 frames
  Frame  484: (1676.0, 1619.0) | jump=4.5px, gap=1 frames
  Frame  485: (nan, nan) | prev NaN


In [None]:
# Re-test with v6 fix (spatial jump splitting in _split_dataframe_into_segments)
import importlib
from multi_tracker.core import post_processing
importlib.reload(post_processing)

# Parameters (same as before)
params = {
    "AGREEMENT_DISTANCE": 9.62,  # REFERENCE_BODY_SIZE
    "MIN_OVERLAP_FRAMES": 2,
    "MIN_TRAJECTORY_LENGTH": 10
}

# Re-run resolve_trajectories with v6 fix
resolved_trajectories_v6 = post_processing.resolve_trajectories(
    forward_prepared,
    backward_prepared,
    video_length=TOTAL_FRAMES,
    params=params
)
print(f"v6 (with spatial jump splitting): {len(resolved_trajectories_v6)} trajectories")

# Convert to DataFrame
merged_df_v6 = pd.concat(resolved_trajectories_v6, ignore_index=True)

# Check for jumps in v6
merged_jump_details_v6 = []
for traj_id in merged_df_v6['TrajectoryID'].unique():
    traj = merged_df_v6[merged_df_v6['TrajectoryID'] == traj_id].sort_values('FrameID')
    for j in range(1, len(traj)):
        prev = traj.iloc[j-1]
        curr = traj.iloc[j]
        gap = int(curr['FrameID'] - prev['FrameID'])
        if gap == 1:  # Consecutive frames
            if not pd.isna(prev['X']) and not pd.isna(curr['X']):
                dx = curr['X'] - prev['X']
                dy = curr['Y'] - prev['Y']
                dist = np.sqrt(dx**2 + dy**2)
                if dist > 50:
                    merged_jump_details_v6.append((traj_id, int(curr['FrameID']), dist))

print(f"\n--- v6 jumps (>50px in consecutive frames) ---")
print(f"Merged data jumps (>50px): {len(merged_jump_details_v6)}")
for tid, frame, dist in sorted(merged_jump_details_v6, key=lambda x: -x[2])[:10]:
    print(f"  Traj {tid} at frame {frame}: {dist:.1f}px")

# Compare all versions
print(f"\n--- COMPARISON ---")
print(f"Original trajectories: {len(forward_jumps) + len(backward_jumps)} with jumps > 50px (max {max(original_jump_distances):.1f}px)")
print(f"v4 (old): {len(merged_jump_details)} trajectories with jumps > 50px (max {max([d for _,_,d in merged_jump_details]):.1f}px)")
if merged_jump_details_v5:
    print(f"v5: {len(merged_jump_details_v5)} trajectories with jumps > 50px (max {max([d for _,_,d in merged_jump_details_v5]):.1f}px)")
else:
    print(f"v5: 0 trajectories with jumps > 50px")
if merged_jump_details_v6:
    print(f"v6: {len(merged_jump_details_v6)} trajectories with jumps > 50px (max {max([d for _,_,d in merged_jump_details_v6]):.1f}px)")
else:
    print(f"v6: 0 trajectories with jumps > 50px")

2026-02-03 09:44:27,773 - multi_tracker.core.post_processing - INFO - Starting conservative trajectory resolution with 102 forward and 93 backward trajectories
2026-02-03 09:44:27,773 - multi_tracker.core.post_processing - INFO - Parameters: AGREEMENT_DISTANCE=9.62px, MIN_OVERLAP_FRAMES=2, MIN_LENGTH=10
2026-02-03 09:44:27,855 - multi_tracker.core.post_processing - INFO - After cleaning: 102 forward, 93 backward
2026-02-03 09:44:41,373 - multi_tracker.core.post_processing - INFO - Found 266 merge candidates
2026-02-03 09:44:51,693 - multi_tracker.core.post_processing - INFO - Removed 73 spatially redundant trajectories
2026-02-03 09:44:53,408 - multi_tracker.core.post_processing - INFO - Processed overlapping trajectories in 5 iterations
2026-02-03 09:44:53,429 - multi_tracker.core.post_processing - INFO - Final result: 155 trajectories


v6 (with spatial jump splitting): 155 trajectories

--- v6 jumps (>50px in consecutive frames) ---
Merged data jumps (>50px): 0

--- COMPARISON ---
Original trajectories: 16 with jumps > 50px (max 115.2px)
v4 (old): 21 trajectories with jumps > 50px (max 274.4px)
v5: 12 trajectories with jumps > 50px (max 213.3px)
v6: 0 trajectories with jumps > 50px


In [None]:
# Test optimized post-processing with performance measurement
import time
import importlib
from multi_tracker.core import post_processing
importlib.reload(post_processing)

# Check if Numba is available
print(f"Numba available: {post_processing.HAS_NUMBA}")

# Parameters
params = {
    "AGREEMENT_DISTANCE": 9.62,
    "MIN_OVERLAP_FRAMES": 2,
    "MIN_TRAJECTORY_LENGTH": 10
}

# Run performance test
print("\n--- Performance Test ---")
start_time = time.time()

resolved_trajectories_opt = post_processing.resolve_trajectories(
    forward_prepared,
    backward_prepared,
    video_length=TOTAL_FRAMES,
    params=params
)

elapsed = time.time() - start_time
print(f"\nTotal time: {elapsed:.2f} seconds")
print(f"Result: {len(resolved_trajectories_opt)} trajectories")

# Verify no jumps
merged_df_opt = pd.concat(resolved_trajectories_opt, ignore_index=True)
jump_count = 0
for traj_id in merged_df_opt['TrajectoryID'].unique():
    traj = merged_df_opt[merged_df_opt['TrajectoryID'] == traj_id].sort_values('FrameID')
    for j in range(1, len(traj)):
        prev, curr = traj.iloc[j-1], traj.iloc[j]
        if int(curr['FrameID'] - prev['FrameID']) == 1:
            if not pd.isna(prev['X']) and not pd.isna(curr['X']):
                dist = np.sqrt((curr['X'] - prev['X'])**2 + (curr['Y'] - prev['Y'])**2)
                if dist > 50:
                    jump_count += 1

print(f"Trajectories with jumps >50px: {jump_count}")

2026-02-03 10:09:07,002 - multi_tracker.core.post_processing - INFO - Starting conservative trajectory resolution with 102 forward and 93 backward trajectories
2026-02-03 10:09:07,002 - multi_tracker.core.post_processing - INFO - Parameters: AGREEMENT_DISTANCE=9.62px, MIN_OVERLAP_FRAMES=2, MIN_LENGTH=10
2026-02-03 10:09:07,101 - multi_tracker.core.post_processing - INFO - After cleaning: 102 forward, 93 backward
2026-02-03 10:09:07,101 - multi_tracker.core.post_processing - DEBUG - Using Numba-accelerated merge candidate search
2026-02-03 10:09:07,165 - multi_tracker.core.post_processing - INFO - Found 265 merge candidates
2026-02-03 10:09:07,165 - multi_tracker.core.post_processing - DEBUG - Merging forward_62 with backward_21: 530/530 agreeing frames


Numba available: True

--- Performance Test ---


2026-02-03 10:09:07,212 - multi_tracker.core.post_processing - DEBUG - Merging forward_11 with backward_56: 451/451 agreeing frames
2026-02-03 10:09:07,254 - multi_tracker.core.post_processing - DEBUG - Merging forward_9 with backward_37: 404/449 agreeing frames
2026-02-03 10:09:07,311 - multi_tracker.core.post_processing - DEBUG - Merging forward_23 with backward_42: 390/462 agreeing frames
2026-02-03 10:09:07,362 - multi_tracker.core.post_processing - DEBUG - Merging forward_37 with backward_9: 356/390 agreeing frames
2026-02-03 10:09:07,411 - multi_tracker.core.post_processing - DEBUG - Merging forward_33 with backward_67: 322/322 agreeing frames
2026-02-03 10:09:07,441 - multi_tracker.core.post_processing - DEBUG - Merging forward_19 with backward_26: 318/502 agreeing frames
2026-02-03 10:09:07,505 - multi_tracker.core.post_processing - DEBUG - Merging forward_80 with backward_45: 310/311 agreeing frames
2026-02-03 10:09:07,532 - multi_tracker.core.post_processing - DEBUG - Merging