# Full Pipeline Demo

This notebook demonstrates the complete pipeline workflow:
1. **Full Pipeline** - Run from raw video through all stages
2. **Reuse Outputs** - Use intermediate outputs as inputs to skip stages

## Pipeline Stages
```
Video → Pose Estimation → Postprocessing → Event Extraction → Soundscape
```


In [None]:
import importlib
import torch
import datafawn
importlib.reload(datafawn)

from pathlib import Path

# Check GPU
if torch.cuda.is_available():
    device = torch.device('cuda')
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device('cpu')
    print("CUDA not available, using CPU")


---
## Setup: Create All Pipeline Components


In [None]:
# =============== POSE ESTIMATOR =============== #
dlc_estimator = datafawn.DeepLabCutPoseEstimator(
    model_name='superanimal_quadruped',
    detector_name='fasterrcnn_resnet50_fpn_v2',
    hrnet_model='hrnet_w32',
    max_individuals=1,
    pcutoff=0.15,
    device=device
)

# =============== POSTPROCESSORS =============== #
rel_paws = ['front_left_paw_rel', 'front_right_paw_rel', 'back_left_paw_rel', 'back_right_paw_rel']
reference_map = {
    'back_base': ['front_left_paw', 'front_right_paw'],
    'tail_base': ['back_left_paw', 'back_right_paw']
}

rel_pp = datafawn.RelativePawPositionPostprocessor()
error_pp = datafawn.ErrorPostprocessor(
    bodyparts=rel_paws,
    use_velocity=True,
    use_likelihood=True,
    use_distance=True,
    velocity_kwargs={'threshold_pixels': 50, 'window_size': 5},
    likelihood_kwargs={'min_likelihood': 0.5},
    distance_kwargs={'reference_map': reference_map, 'max_distance': 300}
)

# =============== EVENT EXTRACTOR =============== #
zeni_extractor = datafawn.ZeniExtractor(
    smooth_window_size=5,
    prominence_percentage=0.05,
    orientation_likelihood_threshold=0.0,
    orientation_smooth_window_size=15,
    show_plots=False
)

# =============== SOUNDSCAPE GENERATOR =============== #
SOUNDSCAPE_CONFIG = {
    'event_sound_map': {
        'front_left_paw_strike': Path('data_example/sounds/22415__anthousai__wind-chimes/398494__anthousai__wind-chimes-single-01.wav'),
        'front_right_paw_strike': Path('data_example/sounds/22415__anthousai__wind-chimes/398493__anthousai__wind-chimes-single-02.wav'),
        'back_left_paw_strike': Path('data_example/sounds/22415__anthousai__wind-chimes/398492__anthousai__wind-chimes-single-03.wav'),
        'back_right_paw_strike': Path('data_example/sounds/22415__anthousai__wind-chimes/398496__anthousai__wind-chimes-single-04.wav')
    }
}
ss_generator = datafawn.SoundScapeFromConfig(soundscape_config=SOUNDSCAPE_CONFIG)

print("All components created!")


In [None]:
# Create the FULL pipeline with all components
full_pipeline = datafawn.EventDetectionPipeline(
    pose_estimator=dlc_estimator,
    postprocessors=[rel_pp, error_pp],
    event_extractors=[zeni_extractor],
    soundscape_generators=[ss_generator]
)

print("Full pipeline ready!")


---
## Part 1: Run Full Pipeline from Video

This runs ALL stages:
- ✅ Pose Estimation (DLC)
- ✅ Postprocessing (relative positions + error detection)
- ✅ Event Extraction (Zeni algorithm)
- ✅ Soundscape Generation


In [None]:
# Run full pipeline from raw video
RAW_VIDEO_PATH = 'data_example/raw_videos/deerrunning.mp4'
OUTPUT_DIR = 'data/deerrunning'

results = full_pipeline.run(
    video_path=RAW_VIDEO_PATH,
    output_dir=OUTPUT_DIR,
    soundscape_input_video="pose_est"  # Use the labeled video for soundscape
)

print("Full pipeline complete!")


In [None]:
results.keys()

In [None]:
# Check what stages ran
print("Stages executed:")
for stage, ran in results['metadata']['stages_run'].items():
    status = "✅" if ran else "⏭️ skipped"
    print(f"  {stage}: {status}")

print("\nOutput files created:")
for name, path in results['output_paths'].items():
    print(f"  {name}: {path}")


In [None]:
# Check the results
print("Results summary:")
print(f"  Pose data shape: {results['pose_data'].shape}")
print(f"  Postprocessed data shape: {results['postprocessed_data'].shape}")
print(f"\nEvents extracted:")
for (scorer, individual), event_dict in results['events'].items():
    print(f"  {individual}:")
    for event_type, frames in event_dict.items():
        print(f"    {event_type}: {len(frames)} events")


---
## Part 2: Reuse Intermediate Outputs

Now let's see how to use outputs from the first run as inputs to skip stages.


### Option A: Start from Pose Data (skip pose estimation)

Use the pose data from the first run. This is useful when:
- Pose estimation already done
- Want to try different postprocessors
- Want to experiment with different event extractors


In [None]:
# Option A1: Use pose_data directly from previous results (in-memory)
results_from_pose = full_pipeline.run(
    pose_data=results['pose_data'],
    output_dir='full_demo_output/from_pose_data_memory',
    soundscape_input_video='data_example/raw_videos/deerrunning.mp4'  # Need video for soundscape
)

print("From pose_data (in-memory):")
for stage, ran in results_from_pose['metadata']['stages_run'].items():
    status = "✅" if ran else "⏭️ skipped"
    print(f"  {stage}: {status}")


In [None]:
# Option A2: Use pose_data_path from saved file (from disk)
POSE_DATA_FILE = results['output_paths'].get('pose_data_file')
print(f"Loading from: {POSE_DATA_FILE}")

results_from_pose_file = full_pipeline.run(
    pose_data_path=POSE_DATA_FILE,
    output_dir='full_demo_output/from_pose_data_file',
    soundscape_input_video='data_example/raw_videos/deerrunning.mp4'
)

print("\nFrom pose_data_path (from file):")
for stage, ran in results_from_pose_file['metadata']['stages_run'].items():
    status = "✅" if ran else "⏭️ skipped"
    print(f"  {stage}: {status}")


### Option B: Start from Postprocessed Data (skip pose estimation + postprocessing)

Use postprocessed data. This is useful when:
- Want to try different event extractors only
- Already have postprocessed data from a previous run


In [None]:
# Option B1: Use postprocessed_data directly (in-memory)
results_from_postproc = full_pipeline.run(
    postprocessed_data=results['postprocessed_data'],
    output_dir='full_demo_output/from_postprocessed_memory',
    soundscape_input_video='data_example/raw_videos/deerrunning.mp4'
)

print("From postprocessed_data (in-memory):")
for stage, ran in results_from_postproc['metadata']['stages_run'].items():
    status = "✅" if ran else "⏭️ skipped"
    print(f"  {stage}: {status}")


In [None]:
# Option B2: Use postprocessed_data_path from saved file (from disk)
POSTPROC_FILE = results['output_paths'].get('postprocessed_data_file')
print(f"Loading from: {POSTPROC_FILE}")

results_from_postproc_file = full_pipeline.run(
    postprocessed_data_path=POSTPROC_FILE,
    output_dir='full_demo_output/from_postprocessed_file',
    soundscape_input_video='data_example/raw_videos/deerrunning.mp4'
)

print("\nFrom postprocessed_data_path (from file):")
for stage, ran in results_from_postproc_file['metadata']['stages_run'].items():
    status = "✅" if ran else "⏭️ skipped"
    print(f"  {stage}: {status}")


### Option C: Start from Events (soundscape generation only)

Use extracted events. This is useful when:
- Only want to regenerate soundscape with different sounds
- Want to try different soundscape configurations


In [None]:
# Option C1: Use events directly (in-memory)
results_from_events = full_pipeline.run(
    events=results['events'],
    output_dir='full_demo_output/from_events_memory',
    soundscape_input_video='data_example/raw_videos/deerrunning.mp4'
)

print("From events (in-memory):")
for stage, ran in results_from_events['metadata']['stages_run'].items():
    status = "✅" if ran else "⏭️ skipped"
    print(f"  {stage}: {status}")


In [None]:
# Option C2: Use events_path from saved file (from disk)
EVENTS_FILE = results['output_paths'].get('events_file')
print(f"Loading from: {EVENTS_FILE}")

results_from_events_file = full_pipeline.run(
    events_path=EVENTS_FILE,
    output_dir='full_demo_output/from_events_file',
    soundscape_input_video='data_example/raw_videos/deerrunning.mp4'
)

print("\nFrom events_path (from file):")
for stage, ran in results_from_events_file['metadata']['stages_run'].items():
    status = "✅" if ran else "⏭️ skipped"
    print(f"  {stage}: {status}")


---
## Part 3: Save and Load for Later

You can also use `save_results()` and `load_results()` for more control.


In [None]:
# Save all results to a specific directory
saved_paths = full_pipeline.save_results(results, 'full_demo_output/saved_for_later')
print("Saved to:")
for name, path in saved_paths.items():
    print(f"  {name}: {path}")


In [None]:
# Load results back in a new session
loaded = datafawn.EventDetectionPipeline.load_results('full_demo_output/saved_for_later')

print("Loaded:")
print(f"  pose_data: {loaded['pose_data'].shape if 'pose_data' in loaded else 'None'}")
print(f"  postprocessed_data: {loaded['postprocessed_data'].shape if 'postprocessed_data' in loaded else 'None'}")
print(f"  events: {len(loaded.get('events', {}))} individuals")


---
## Summary

| Input Type | Stages Skipped | Stages Run |
|------------|----------------|------------|
| `video_path` | None | Pose Est → Postproc → Events → Soundscape |
| `pose_data` / `pose_data_path` | Pose Est | Postproc → Events → Soundscape |
| `postprocessed_data` / `postprocessed_data_path` | Pose Est, Postproc | Events → Soundscape |
| `events` / `events_path` | Pose Est, Postproc, Events | Soundscape only |

**In-memory vs File:**
- Use `pose_data`, `postprocessed_data`, `events` for in-memory data (same session)
- Use `pose_data_path`, `postprocessed_data_path`, `events_path` for data from files


In [None]:
# Optional: Cleanup demo output
import shutil
if Path('full_demo_output').exists():
    # shutil.rmtree('full_demo_output')  # Uncomment to delete
    print("Demo outputs in 'full_demo_output/' - uncomment above to delete")
