# Tutorial 1: Human Pose Estimation with YOLO & Ultralytics

**Pipeline Stage:** 2D Pose Estimation from Multi-Camera Video

---

## Overview

This tutorial walks you through the **first stage** of our multi-camera human tracking pipeline:
extracting 2D human pose (skeleton) keypoints from video using the **YOLOv8 pose estimation model**.

### What You Will Learn

1. **Installation** — How to install Ultralytics, OpenCV, SLEAP-IO, and all dependencies
2. **Frame Extraction** — How to pull individual frames from video at specific timestamps
3. **Single-Frame Inference** — Running YOLO pose estimation on a single image
4. **Understanding the Output** — Bounding boxes, keypoints, confidence scores, and the COCO skeleton
5. **Batch Video Processing** — Running pose estimation across an entire video
6. **Handling Portrait/Rotated Cameras** — Rotating frames before inference and mapping keypoints back
7. **Exporting Results** — Saving to SLEAP (.slp), JSON, and Pickle formats

### Pipeline Context

```
┌─────────────────────┐     ┌──────────────────────┐     ┌─────────────────────┐
│  Tutorial 1 (HERE)  │ ──► │  Tutorial 2           │ ──► │  Tutorial 3          │
│  YOLO Pose (2D)     │     │  Person ReID          │     │  3D Triangulation    │
│  per-camera          │     │  cross-camera match   │     │  multi-camera fusion │
└─────────────────────┘     └──────────────────────┘     └─────────────────────┘
```

**Why this order?** Each camera independently detects people, but has no idea which
detection in Camera 1 is the same person as which detection in Camera 3. **ReID must
come before triangulation** — otherwise triangulation doesn't know which 2D keypoints
across cameras to match up into 3D points.

### Prerequisites

- A machine with a **CUDA-capable GPU** (recommended) or CPU
- Python 3.8+ (3.10+ recommended)
- Video files from your camera setup (`.mkv` or `.mp4`)

---

## Part 1: Environment Setup & Installation

Before running any code, you need:

1. **Conda** — a package/environment manager
2. **A dedicated conda environment** with Python and CUDA support (for GPU acceleration)
3. **The Python packages** used in this tutorial
4. **Register the kernel** so Jupyter can use the environment

> **Why conda?** Conda manages both Python packages *and* system-level CUDA libraries.
> This means you do **not** need a system-wide CUDA install — conda handles it inside
> your environment. This is the most reliable way to get GPU support across platforms.

---

### Step 1a: Install Conda (if you don't already have it)

If you already have `conda` or `mamba` on your system, skip to Step 1b.

We recommend **Miniforge** (includes `mamba`, a faster drop-in replacement for `conda`):

| Platform | Download Link | Notes |
|----------|--------------|-------|
| **Windows** | [Miniforge3-Windows-x86_64.exe](https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Windows-x86_64.exe) | Run the `.exe` installer. Check "Add to PATH" when prompted. |
| **macOS (Intel)** | [Miniforge3-MacOSX-x86_64.sh](https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-x86_64.sh) | Run in Terminal (see below) |
| **macOS (Apple Silicon)** | [Miniforge3-MacOSX-arm64.sh](https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh) | For M1/M2/M3/M4 Macs |
| **Linux (x86_64)** | [Miniforge3-Linux-x86_64.sh](https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh) | Most common for servers/workstations |
| **Linux (aarch64)** | [Miniforge3-Linux-aarch64.sh](https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-aarch64.sh) | For ARM64 machines (e.g., Jetson) |

**macOS / Linux install commands** (after downloading):

```bash
# Make the installer executable and run it
bash Miniforge3-$(uname)-$(uname -m).sh

# Follow the prompts (accept license, choose install location)
# Then restart your terminal, or run:
source ~/.bashrc   # Linux
source ~/.zshrc    # macOS
```

**Verify installation:**

```bash
conda --version
# Should print something like: conda 24.x.x
```

> **Alternative installers:**
> - [Anaconda](https://www.anaconda.com/download) — larger distribution with GUI, 250+ packages pre-installed
> - [Miniconda](https://docs.anaconda.com/miniconda/) — minimal installer from Anaconda (similar to Miniforge but uses the `defaults` channel)

---

### Step 1b: Create a Conda Environment with CUDA Support

Open a terminal (or Anaconda Prompt on Windows) and run the command for **your platform**:

#### Linux (with NVIDIA GPU)

```bash
conda create -n yolo-pose python=3.11 pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia -y
conda activate yolo-pose
pip install ultralytics opencv-python sleap-io numpy tqdm jupyter ipykernel
```

#### Windows (with NVIDIA GPU)

```bash
conda create -n yolo-pose python=3.11 pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia -y
conda activate yolo-pose
pip install ultralytics opencv-python sleap-io numpy tqdm jupyter ipykernel
```

#### macOS (Apple Silicon — M1/M2/M3/M4)

```bash
conda create -n yolo-pose python=3.11 pytorch torchvision -c pytorch -y
conda activate yolo-pose
pip install ultralytics opencv-python sleap-io numpy tqdm jupyter ipykernel
```

> **Note:** Apple Silicon Macs use **MPS** (Metal Performance Shaders) for GPU acceleration
> instead of CUDA. PyTorch supports MPS automatically — no extra flags needed.
> Performance is good but not as fast as a dedicated NVIDIA GPU.

#### macOS (Intel)

```bash
conda create -n yolo-pose python=3.11 pytorch torchvision -c pytorch -y
conda activate yolo-pose
pip install ultralytics opencv-python sleap-io numpy tqdm jupyter ipykernel
```

> **Note:** Intel Macs have no GPU acceleration for PyTorch. Inference will run on CPU only.

#### CPU Only (any platform, no GPU)

```bash
conda create -n yolo-pose python=3.11 pytorch torchvision cpuonly -c pytorch -y
conda activate yolo-pose
pip install ultralytics opencv-python sleap-io numpy tqdm jupyter ipykernel
```

---

### Step 1c: Register the Kernel & Launch Jupyter

After creating and activating the environment, you must **register it as a Jupyter kernel**
so that Jupyter (and VSCode, JupyterLab, etc.) can find and use it:

```bash
conda activate yolo-pose
python -m ipykernel install --user --name yolo-pose --display-name "Python (yolo-pose)"
```

Then launch Jupyter:

```bash
jupyter notebook
# or
jupyter lab
```

In the Jupyter interface, select the **"Python (yolo-pose)"** kernel from the kernel picker
(top-right in JupyterLab, or Kernel → Change Kernel in classic Notebook).

> **VSCode users:** After registering the kernel, open the Command Palette
> (`Ctrl+Shift+P` / `Cmd+Shift+P`) → "Notebook: Select Notebook Kernel" →
> choose **"Python (yolo-pose)"**.

---

### Package Summary

| Package | Purpose |
|---|---|
| `pytorch` + `torchvision` | Deep learning framework (with CUDA/MPS support) |
| `ultralytics` | YOLO model for pose estimation |
| `opencv-python` | Video reading, image manipulation |
| `sleap-io` | Saving pose data in SLEAP format |
| `numpy` | Numerical operations |
| `tqdm` | Progress bars |
| `jupyter` | Run this notebook |
| `ipykernel` | Register this environment as a Jupyter kernel |

### GPU Support Notes

| Platform | GPU Backend | How to Check |
|---|---|---|
| Linux / Windows (NVIDIA) | **CUDA** | `nvidia-smi` in terminal |
| macOS (Apple Silicon) | **MPS** | Automatic with PyTorch ≥ 1.12 |
| macOS (Intel) | None | CPU only |

Ultralytics automatically selects the best available device (CUDA → MPS → CPU).

In [1]:
# ============================================================
# STEP 1a: Install packages (run this if not already installed)
# ============================================================
# If you followed the conda setup in the markdown cell above,
# everything should already be installed. This cell is a fallback
# for users who skipped the terminal setup or are running on
# Google Colab / other hosted notebooks.
#
# Uncomment the section that matches your platform:

# ── Google Colab (GPU runtime recommended) ───────────────────
# !pip install ultralytics opencv-python sleap-io numpy tqdm

# ── Linux / Windows with NVIDIA GPU (CUDA 12.1) ─────────────
# !pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# !pip install ultralytics opencv-python sleap-io numpy tqdm ipykernel

# ── Linux / Windows CPU only ────────────────────────────────
# !pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
# !pip install ultralytics opencv-python sleap-io numpy tqdm ipykernel

# ── macOS (Apple Silicon or Intel) ──────────────────────────
# !pip install torch torchvision
# !pip install ultralytics opencv-python sleap-io numpy tqdm ipykernel

In [2]:
# ============================================================
# STEP 1b: Register this conda environment as a Jupyter kernel
# ============================================================
# If you created the "yolo-pose" conda environment but Jupyter
# doesn't show it as a kernel option, run this cell to register it.
#
# You only need to do this ONCE per environment.
# After running, restart Jupyter and select "Python (yolo-pose)"
# from the kernel picker.

# !pip install ipykernel

# Uncomment and run:
# !python -m ipykernel install --user --name yolo-pose --display-name "Python (yolo-pose)"

# To verify which kernels are available:
# !jupyter kernelspec list

In [3]:
# ============================================================
# STEP 2: Verify installation and GPU availability
# ============================================================
import platform
import sys

print(f"Python version:   {sys.version}")
print(f"Platform:         {platform.system()} {platform.machine()}")
print()

import torch
print(f"PyTorch version:  {torch.__version__}")
print(f"CUDA available:   {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device:      {torch.cuda.get_device_name(0)}")
    print(f"CUDA version:     {torch.version.cuda}")
    device = "cuda"
elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
    print(f"MPS available:    True (Apple Silicon GPU)")
    device = "mps"
else:
    print("WARNING: No GPU detected. Inference will run on CPU (slower).")
    device = "cpu"
print(f"Selected device:  {device}")

print()
import ultralytics
print(f"Ultralytics:      {ultralytics.__version__}")

import cv2
print(f"OpenCV:           {cv2.__version__}")

import sleap_io as sio
print(f"SLEAP-IO:         {sio.__version__}")

import numpy as np
print(f"NumPy:            {np.__version__}")

print("\nAll packages installed successfully!")

Python version:   3.11.14 | packaged by conda-forge | (main, Jan 26 2026, 23:48:32) [GCC 14.3.0]
Platform:         Linux x86_64

PyTorch version:  2.5.1
CUDA available:   True
CUDA device:      NVIDIA A40
CUDA version:     12.1
Selected device:  cuda

Ultralytics:      8.4.16
OpenCV:           4.13.0
SLEAP-IO:         0.6.4
NumPy:            2.4.2

All packages installed successfully!


---

## Part 2: Understanding the YOLO Pose Model

### What is YOLOv8n-pose?

YOLO (You Only Look Once) is a real-time object detection model. The **pose** variant adds
**human skeleton keypoint estimation** on top of person detection.

- **yolov8n-pose** — the "nano" (smallest/fastest) pose model
- Detects people AND their 17 body keypoints in a single forward pass
- Runs at ~100+ FPS on modern GPUs

### The COCO 17-Keypoint Skeleton

YOLO pose models use the **COCO keypoint format** with 17 joints:

```
Index  Name             Index  Name
─────  ──────────────   ─────  ──────────────
  0    nose               9    left_wrist
  1    left_eye          10    right_wrist
  2    right_eye         11    left_hip
  3    left_ear          12    right_hip
  4    right_ear         13    left_knee
  5    left_shoulder     14    right_knee
  6    right_shoulder    15    left_ankle
  7    left_elbow        16    right_ankle
  8    right_elbow
```

### Skeleton Connections (Edges)

```
         nose(0)
        /     \
   l_eye(1)  r_eye(2)
      |         |
   l_ear(3)  r_ear(4)

   l_shoulder(5)───r_shoulder(6)
      |                  |
   l_elbow(7)        r_elbow(8)
      |                  |
   l_wrist(9)        r_wrist(10)
      |                  |
    l_hip(11)──────r_hip(12)
      |                  |
   l_knee(13)        r_knee(14)
      |                  |
   l_ankle(15)       r_ankle(16)
```

In [4]:
# ============================================================
# STEP 3: Load the YOLO pose model
# ============================================================
# The first time you run this, it will download the model weights
# (~6 MB for the nano model). Subsequent runs use the cached file.

from ultralytics import YOLO

model = YOLO("yolov8n-pose.pt")  # nano pose model
print(f"Model loaded: {model.model_name}")
print(f"Task: {model.task}")

Model loaded: yolov8n-pose.pt
Task: pose


---

## Part 3: Extracting a Frame from Video

Before running inference on entire videos, let's start by extracting a single frame.
This is useful for:
- Verifying the video loads correctly
- Testing the model on one image
- Quick visual inspection

### How Timestamp-to-Frame Conversion Works

```
frame_number = timestamp_in_seconds × FPS
```

For example, to get the frame at 1h 28m 10s in a 30 FPS video:
```
seconds = 3600 + (28×60) + 10 = 5290
frame   = 5290 × 30 = 158,700
```

In [None]:
# ============================================================
# STEP 4: Extract a single frame from a video file
# ============================================================
import cv2
import os

# ── CONFIGURE THESE ──────────────────────────────────────────
video_path = "PUT_YOUR_VIDEO_HERE.mp4"  # e.g. "/path/to/CAM1_recording.mkv"

# Target timestamp: 1 hour, 28 minutes, 0 seconds
hours, minutes, seconds = 1, 28, 0
# ─────────────────────────────────────────────────────────────

# Convert timestamp to total seconds
timestamp_seconds = (hours * 3600) + (minutes * 60) + seconds
print(f"Target time: {hours}h {minutes}m {seconds}s = {timestamp_seconds} seconds")

# Open the video
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
    raise FileNotFoundError(f"Could not open video: {video_path}")

# Read video properties
fps = cap.get(cv2.CAP_PROP_FPS)
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
duration = total_frames / fps

print(f"\nVideo properties:")
print(f"  FPS:        {fps}")
print(f"  Resolution: {width} x {height}")
print(f"  Frames:     {total_frames:,}")
print(f"  Duration:   {duration:.1f}s ({duration/60:.1f} min)")

# Calculate target frame number
frame_number = int(timestamp_seconds * fps)
print(f"\nTarget frame: {frame_number:,}")

# Seek to the target frame
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_number)

# Read the frame
success, frame = cap.read()
cap.release()

if not success:
    raise RuntimeError(f"Could not read frame at {timestamp_seconds}s")

# Save the extracted frame
frame_path = "extracted_frame.jpg"
cv2.imwrite(frame_path, frame)
print(f"Frame saved to: {frame_path}")
print(f"Frame shape: {frame.shape}  (height, width, channels)")

---

## Part 4: Single-Frame Pose Estimation

Now let's run YOLO pose estimation on the extracted frame and examine the results.

In [None]:
# ============================================================
# STEP 5: Run pose estimation on a single frame
# ============================================================
from ultralytics import YOLO

model = YOLO("yolov8n-pose.pt")

# Run inference — returns a list of Results objects (one per image)
results = model("extracted_frame.jpg")

# There's one result per image
result = results[0]

print(f"Number of people detected: {len(result.boxes)}")

In [None]:
# ============================================================
# STEP 6: Examine the detection results in detail
# ============================================================
import numpy as np

# --- Bounding Boxes ---
# result.boxes contains detection bounding boxes
boxes = result.boxes
print("=" * 60)
print("BOUNDING BOXES")
print("=" * 60)
print(f"Format: [x1, y1, x2, y2] — top-left and bottom-right corners")
print(f"Number of boxes: {len(boxes)}")
print(f"\nBox coordinates (xyxy):")
print(boxes.xyxy.cpu().numpy())
print(f"\nConfidence scores:")
print(boxes.conf.cpu().numpy())
print(f"\nClass IDs (0 = person):")
print(boxes.cls.cpu().numpy())

# --- Keypoints ---
# result.keypoints contains the 17 body keypoints per person
kpts = result.keypoints
print("\n" + "=" * 60)
print("KEYPOINTS")
print("=" * 60)
print(f"Keypoints shape: {kpts.xy.shape}")
print(f"  → (num_people, 17_keypoints, 2_xy_coords)")

# Show keypoints for the first detected person
if len(kpts.xy) > 0:
    person_0_kpts = kpts.xy[0].cpu().numpy()
    person_0_conf = kpts.conf[0].cpu().numpy() if kpts.conf is not None else None
    
    node_names = [
        "nose", "left_eye", "right_eye", "left_ear", "right_ear",
        "left_shoulder", "right_shoulder", "left_elbow", "right_elbow",
        "left_wrist", "right_wrist", "left_hip", "right_hip",
        "left_knee", "right_knee", "left_ankle", "right_ankle"
    ]
    
    print(f"\nPerson 0 keypoints:")
    print(f"{'Index':<6} {'Name':<18} {'X':>8} {'Y':>8} {'Conf':>8}")
    print("-" * 50)
    for i, name in enumerate(node_names):
        x, y = person_0_kpts[i]
        conf = person_0_conf[i] if person_0_conf is not None else float('nan')
        visible = "✓" if not (x == 0 and y == 0) else "✗"
        print(f"{i:<6} {name:<18} {x:>8.1f} {y:>8.1f} {conf:>8.3f} {visible}")

In [None]:
# ============================================================
# STEP 7: Visualize the detection inline in the notebook
# ============================================================
import matplotlib.pyplot as plt
from IPython.display import display

# Get annotated frame with bounding boxes and skeleton overlay
annotated = results[0].plot()  # returns BGR numpy array

# Convert BGR → RGB for matplotlib
annotated_rgb = cv2.cvtColor(annotated, cv2.COLOR_BGR2RGB)

fig, ax = plt.subplots(1, 1, figsize=(16, 9))
ax.imshow(annotated_rgb)
ax.set_title(f"YOLO Pose Detection — {len(results[0].boxes)} people detected", fontsize=14)
ax.axis("off")
plt.tight_layout()
plt.show()

# Also save to file for reference
results[0].save(filename="detected_frame.jpg")
print("Also saved to: detected_frame.jpg")

---

## Part 5: Processing an Entire Video

Now we scale up from a single frame to processing every frame in a video.

### Strategy

1. Use `model.predict(video_path, stream=True)` for memory-efficient streaming
2. For each frame, extract keypoints, boxes, and confidence scores
3. Store results in both **SLEAP format** (.slp) and **JSON/Pickle** for flexibility

### Output Formats

| Format | File | Use Case |
|---|---|---|
| SLEAP (.slp) | `*_pose.slp` | Visual inspection in SLEAP GUI, downstream triangulation |
| JSON | `*_results.json` | Human-readable, cross-language |
| Pickle | `*_results.pkl` | Fast Python loading, preserves types |

In [None]:
# ============================================================
# STEP 8: Define the SLEAP skeleton structure
# ============================================================
# We need to define the skeleton so SLEAP knows how to connect
# the keypoints when we save the results.

import sleap_io as sio
import numpy as np

# COCO 17-keypoint names
node_names = [
    "nose", "left_eye", "right_eye", "left_ear", "right_ear",
    "left_shoulder", "right_shoulder", "left_elbow", "right_elbow",
    "left_wrist", "right_wrist", "left_hip", "right_hip",
    "left_knee", "right_knee", "left_ankle", "right_ankle"
]

# COCO skeleton edge connections (pairs of keypoint indices)
edge_inds = [
    [0, 1], [0, 2],   # nose → eyes
    [1, 3], [2, 4],   # eyes → ears
    [5, 7], [7, 9],   # left arm: shoulder → elbow → wrist
    [6, 8], [8, 10],  # right arm: shoulder → elbow → wrist
    [5, 6],           # shoulder to shoulder
    [5, 11], [6, 12], # shoulders → hips
    [11, 12],         # hip to hip
    [11, 13], [13, 15],  # left leg: hip → knee → ankle
    [12, 14], [14, 16]   # right leg: hip → knee → ankle
]

# Create the SLEAP skeleton
skeleton = sio.Skeleton(nodes=node_names, edges=edge_inds)
print(f"Skeleton created with {len(node_names)} nodes and {len(edge_inds)} edges")

In [None]:
# ============================================================
# STEP 9: Process an entire video — standard (landscape) camera
# ============================================================
import os
import json
import pickle
import math
from pathlib import Path
from tqdm import tqdm
from ultralytics import YOLO
import sleap_io as sio
import numpy as np
import cv2

# ── CONFIGURE THESE ──────────────────────────────────────────
video_path = "PUT_YOUR_LANDSCAPE_VIDEO_HERE.mp4"  # e.g. "/path/to/CAM1_video.mp4"
output_dir = "pose_results"
# ─────────────────────────────────────────────────────────────

os.makedirs(output_dir, exist_ok=True)
video_name = Path(video_path).stem

# Load model
model = YOLO("yolov8n-pose.pt")

# Create SLEAP objects
video_obj = sio.Video(filename=video_path)
labels = sio.Labels(videos=[video_obj], skeletons=[skeleton])

# Storage for raw Ultralytics results
ultralytics_results = []

# Get video properties
cap = cv2.VideoCapture(video_path)
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
vid_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
vid_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
cap.release()

print(f"Processing: {video_name}")
print(f"Total frames: {total_frames:,}")
print(f"Video resolution: {vid_w}x{vid_h}")

# Stream inference (memory-efficient: processes one frame at a time)
results = model.predict(video_path, stream=True, verbose=False)

for frame_idx, result in tqdm(enumerate(results), total=total_frames, desc="Processing"):
    
    # ── Store raw results ──
    frame_result = {
        'frame_idx': frame_idx,
        'keypoints': result.keypoints.xy.cpu().numpy().tolist()
            if hasattr(result, 'keypoints') and result.keypoints is not None else [],
        'keypoints_conf': result.keypoints.conf.cpu().numpy().tolist()
            if hasattr(result, 'keypoints') and result.keypoints is not None
            and hasattr(result.keypoints, 'conf') else [],
        'boxes': result.boxes.xyxy.cpu().numpy().tolist()
            if hasattr(result, 'boxes') and result.boxes is not None else [],
        'boxes_conf': result.boxes.conf.cpu().numpy().tolist()
            if hasattr(result, 'boxes') and result.boxes is not None else [],
        'boxes_cls': result.boxes.cls.cpu().numpy().tolist()
            if hasattr(result, 'boxes') and result.boxes is not None else []
    }
    ultralytics_results.append(frame_result)
    
    # ── Build SLEAP labeled frame ──
    labeled_frame = sio.LabeledFrame(video=video_obj, frame_idx=frame_idx)
    
    detections = result.keypoints.xy.cpu().numpy() \
        if hasattr(result, 'keypoints') and result.keypoints is not None else []
    
    if len(detections) == 0:
        continue
    
    for person_idx, keypoints in enumerate(detections):
        # Build points array: NaN for invisible keypoints, (x,y) for visible
        points_arr = np.zeros((len(keypoints), 2))
        for kp_idx, kp in enumerate(keypoints):
            x, y = float(kp[0]), float(kp[1])
            if x == 0 and y == 0:
                points_arr[kp_idx] = [np.nan, np.nan]  # not visible
            else:
                points_arr[kp_idx] = [x, y]
        
        instance = sio.Instance.from_numpy(points_arr, skeleton)
        labeled_frame.instances.append(instance)
    
    labels.append(labeled_frame)

# ── Save all output formats ──
slp_path = f"{output_dir}/{video_name}_pose.slp"
json_path = f"{output_dir}/{video_name}_ultralytics_results.json"
pkl_path = f"{output_dir}/{video_name}_ultralytics_results.pkl"

labels.save(slp_path)
with open(json_path, 'w') as f:
    json.dump(ultralytics_results, f, indent=2)
with open(pkl_path, 'wb') as f:
    pickle.dump(ultralytics_results, f)

print(f"\nDone! Outputs saved:")
print(f"  SLEAP:  {slp_path}")
print(f"  JSON:   {json_path}")
print(f"  Pickle: {pkl_path}")

In [None]:
# ============================================================
# STEP 9b: Visualize sample frames from the processed video
# ============================================================
import matplotlib.pyplot as plt

# Pick a few evenly-spaced frames to visualize
n_samples = 4
sample_indices = np.linspace(0, len(ultralytics_results) - 1, n_samples, dtype=int)

cap = cv2.VideoCapture(video_path)
model_viz = YOLO("yolov8n-pose.pt")

fig, axes = plt.subplots(1, n_samples, figsize=(20, 5))

for i, fidx in enumerate(sample_indices):
    cap.set(cv2.CAP_PROP_POS_FRAMES, fidx)
    ret, frame = cap.read()
    if not ret:
        continue
    
    # Run inference on this frame (single frame, no stream)
    res = model_viz.predict(frame, verbose=False)[0]
    annotated = cv2.cvtColor(res.plot(), cv2.COLOR_BGR2RGB)
    
    axes[i].imshow(annotated)
    n_people = len(res.boxes)
    axes[i].set_title(f"Frame {fidx} — {n_people} people", fontsize=10)
    axes[i].axis("off")

cap.release()

fig.suptitle(f"Sample detections from: {video_name}", fontsize=13, y=1.02)
plt.tight_layout()
plt.show()

---

## Part 6: Handling Portrait (Rotated) Cameras

Some cameras may be mounted in **portrait orientation** (e.g., CAM4 in our setup).
YOLO performs best on landscape-oriented images, so we need to:

1. **Rotate the frame 90° clockwise** before inference (portrait → landscape)
2. **Pad the height** to a stride-32 multiple (YOLO requirement)
3. **Run inference** on the rotated frame
4. **Map keypoints back** to original portrait coordinates

### Coordinate Mapping

When rotating 90° CW:
```
Portrait (x_p, y_p)  ──rotate 90° CW──►  Landscape (x_r, y_r)
  x_r = y_p
  y_r = W_portrait - x_p

To reverse (landscape → portrait):
  x_p = y_r
  y_p = W_rotated - x_r
```

In [None]:
# ============================================================
# STEP 10: Process a portrait/rotated camera video
# ============================================================
import math

def rotate_back_cw90(kpts_xy, rot_shape):
    """
    Convert keypoints detected on a 90°-CW-rotated frame
    back to the original portrait coordinate system.
    
    Parameters:
        kpts_xy:   array of shape (N, 2) — keypoint (x, y) in rotated frame
        rot_shape: tuple (h, w) of the rotated frame (before padding)
    
    Returns:
        array of shape (N, 2) — keypoints in original portrait coordinates
    """
    h, w = rot_shape
    x_r, y_r = kpts_xy[:, 0], kpts_xy[:, 1]
    x_p = y_r           # portrait x = rotated y
    y_p = w - x_r       # portrait y = rotated width - rotated x
    return np.stack([x_p, y_p], axis=-1)


# ── CONFIGURE ──
video_path = "PUT_YOUR_PORTRAIT_VIDEO_HERE.mp4"  # e.g. "/path/to/CAM4_portrait_video.mp4"
output_dir = "pose_results"
# ───────────────

os.makedirs(output_dir, exist_ok=True)
video_name = Path(video_path).stem

model = YOLO("yolov8n-pose.pt")
video_obj = sio.Video(filename=video_path)
labels = sio.Labels(videos=[video_obj], skeletons=[skeleton])
ultra_results = []

cap = cv2.VideoCapture(video_path)
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

print(f"Processing portrait camera: {video_name}")
print(f"Total frames: {total_frames:,}")

for idx in tqdm(range(total_frames), desc="Processing"):
    ret, frame_portrait = cap.read()
    if not ret:
        break
    
    # 1. Rotate 90° CW: portrait → landscape
    frame_rotated = cv2.rotate(frame_portrait, cv2.ROTATE_90_CLOCKWISE)
    rh, rw = frame_rotated.shape[:2]
    
    # 2. Run inference on rotated frame
    result = model.predict(frame_rotated, verbose=False)[0]
    
    # Store raw results
    ultra_results.append({
        "frame_idx": idx,
        "keypoints": result.keypoints.xy.cpu().numpy().tolist()
            if hasattr(result, 'keypoints') and result.keypoints is not None else [],
        "keypoints_conf": result.keypoints.conf.cpu().numpy().tolist()
            if hasattr(result, 'keypoints') and result.keypoints is not None
            and hasattr(result.keypoints, 'conf') else [],
        "boxes": result.boxes.xyxy.cpu().numpy().tolist()
            if hasattr(result, 'boxes') and result.boxes is not None else [],
        "boxes_conf": result.boxes.conf.cpu().numpy().tolist()
            if hasattr(result, 'boxes') and result.boxes is not None else [],
        "boxes_cls": result.boxes.cls.cpu().numpy().tolist()
            if hasattr(result, 'boxes') and result.boxes is not None else []
    })
    
    # 3. Map keypoints back to portrait coordinates
    dets = (result.keypoints.xy.cpu().numpy()
            if hasattr(result, 'keypoints') and result.keypoints is not None else [])
    
    if len(dets):
        lf = sio.LabeledFrame(video=video_obj, frame_idx=idx)
        for kpts in dets:
            kpts_portrait = rotate_back_cw90(kpts, (rh, rw))
            pts = np.where(
                (kpts_portrait == 0).all(axis=1, keepdims=True),
                np.nan, kpts_portrait
            )
            lf.instances.append(sio.Instance.from_numpy(pts, skeleton))
        labels.append(lf)

cap.release()

# Save outputs
slp_path = f"{output_dir}/{video_name}_pose.slp"
json_path = f"{output_dir}/{video_name}_ultralytics_results.json"
pkl_path = f"{output_dir}/{video_name}_ultralytics_results.pkl"

labels.save(slp_path)
with open(json_path, 'w') as f:
    json.dump(ultra_results, f, indent=2)
with open(pkl_path, 'wb') as f:
    pickle.dump(ultra_results, f)

print(f"\nDone! Portrait camera outputs saved:")
print(f"  SLEAP:  {slp_path}")
print(f"  JSON:   {json_path}")
print(f"  Pickle: {pkl_path}")

In [None]:
ultra_results[0]

In [None]:
# ============================================================
# STEP 11: Batch process multiple camera videos
# ============================================================

# Define all your video paths (landscape cameras only — process portrait cameras with Step 10)
video_paths = [
    "PUT_CAM_A_VIDEO_HERE.mp4",
    "PUT_CAM_B_VIDEO_HERE.mp4",
    "PUT_CAM_C_VIDEO_HERE.mp4",
    # "PUT_CAM_D_VIDEO_HERE.mp4",  # portrait — use Step 10 instead
    "PUT_CAM_E_VIDEO_HERE.mp4",
    "PUT_CAM_F_VIDEO_HERE.mp4",
]

output_dir = "pose_results"
os.makedirs(output_dir, exist_ok=True)

model = YOLO("yolov8n-pose.pt")

for video_path in video_paths:
    video_name = Path(video_path).stem
    print(f"\n{'='*60}")
    print(f"Processing: {video_name}")
    print(f"{'='*60}")
    
    video_obj = sio.Video(filename=video_path)
    labels = sio.Labels(videos=[video_obj], skeletons=[skeleton])
    ultralytics_results = []
    
    cap = cv2.VideoCapture(video_path)
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    vid_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    vid_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    cap.release()
    
    print(f"  Resolution: {vid_w}x{vid_h}")
    
    results = model.predict(video_path, stream=True, verbose=False)
    
    for frame_idx, result in tqdm(enumerate(results), total=total_frames):
        frame_result = {
            'frame_idx': frame_idx,
            'keypoints': result.keypoints.xy.cpu().numpy().tolist()
                if hasattr(result, 'keypoints') and result.keypoints is not None else [],
            'keypoints_conf': result.keypoints.conf.cpu().numpy().tolist()
                if hasattr(result, 'keypoints') and result.keypoints is not None
                and hasattr(result.keypoints, 'conf') else [],
            'boxes': result.boxes.xyxy.cpu().numpy().tolist()
                if hasattr(result, 'boxes') and result.boxes is not None else [],
            'boxes_conf': result.boxes.conf.cpu().numpy().tolist()
                if hasattr(result, 'boxes') and result.boxes is not None else [],
            'boxes_cls': result.boxes.cls.cpu().numpy().tolist()
                if hasattr(result, 'boxes') and result.boxes is not None else []
        }
        ultralytics_results.append(frame_result)
        
        labeled_frame = sio.LabeledFrame(video=video_obj, frame_idx=frame_idx)
        detections = result.keypoints.xy.cpu().numpy() \
            if hasattr(result, 'keypoints') and result.keypoints is not None else []
        
        if len(detections) > 0:
            for keypoints in detections:
                points_arr = np.zeros((len(keypoints), 2))
                for kp_idx, kp in enumerate(keypoints):
                    x, y = float(kp[0]), float(kp[1])
                    points_arr[kp_idx] = [np.nan, np.nan] if (x == 0 and y == 0) else [x, y]
                labeled_frame.instances.append(sio.Instance.from_numpy(points_arr, skeleton))
            labels.append(labeled_frame)
    
    # Save
    labels.save(f"{output_dir}/{video_name}_pose.slp")
    with open(f"{output_dir}/{video_name}_ultralytics_results.json", 'w') as f:
        json.dump(ultralytics_results, f, indent=2)
    with open(f"{output_dir}/{video_name}_ultralytics_results.pkl", 'wb') as f:
        pickle.dump(ultralytics_results, f)
    
    print(f"  Saved {len(ultralytics_results)} frames")

print("\nAll cameras processed!")

In [None]:
# ============================================================
# STEP 12: Visualize one synchronized frame across all cameras
# ============================================================
# Since the videos are time-aligned, frame 0 in each video
# corresponds to the same moment in time. This grid shows what
# all 6 cameras see simultaneously.

import matplotlib.pyplot as plt
from ultralytics import YOLO
import cv2

model_viz = YOLO("yolov8n-pose.pt")

# All camera videos (update these to match your video_paths from Steps 9-11)
all_camera_paths = [
    "PUT_CAM_A_VIDEO_HERE.mp4",
    "PUT_CAM_B_VIDEO_HERE.mp4",
    "PUT_CAM_C_VIDEO_HERE.mp4",
    "PUT_CAM_D_VIDEO_HERE.mp4",
    "PUT_CAM_E_VIDEO_HERE.mp4",
    "PUT_CAM_F_VIDEO_HERE.mp4",
]
camera_labels = ["CAM A", "CAM B", "CAM C", "CAM D (portrait)", "CAM E", "CAM F"]
portrait_indices = {3}  # set of indices for portrait-oriented cameras

target_frame = 0  # all videos are aligned, so frame 0 = same moment

fig, axes = plt.subplots(2, 3, figsize=(24, 12))
axes = axes.flatten()

for i, (vpath, label) in enumerate(zip(all_camera_paths, camera_labels)):
    cap = cv2.VideoCapture(vpath)
    cap.set(cv2.CAP_PROP_POS_FRAMES, target_frame)
    ret, frame = cap.read()
    cap.release()
    
    if not ret:
        axes[i].set_title(f"{label} — FAILED TO READ", fontsize=12)
        axes[i].axis("off")
        continue
    
    # For portrait cameras, rotate before inference
    if i in portrait_indices:
        frame_for_model = cv2.rotate(frame, cv2.ROTATE_90_CLOCKWISE)
    else:
        frame_for_model = frame
    
    res = model_viz.predict(frame_for_model, verbose=False)[0]
    annotated = cv2.cvtColor(res.plot(), cv2.COLOR_BGR2RGB)
    
    # For portrait cameras, rotate the annotated image back for display
    if i in portrait_indices:
        annotated = cv2.rotate(annotated, cv2.ROTATE_90_COUNTERCLOCKWISE)
    
    axes[i].imshow(annotated)
    n_people = len(res.boxes)
    axes[i].set_title(f"{label} — {n_people} people", fontsize=12)
    axes[i].axis("off")

fig.suptitle("Synchronized Frame 0 — All Cameras with Pose Detection", fontsize=15, y=1.01)
plt.tight_layout()
plt.show()