CS 585 — Final Project
Sean Tomany, Jonah Rothman, Jigar Kanakhara, Bhavya Bavishi, Harsha Basavaraj Beth
Multi-view 3D pose tracking system that reconstructs 3D human joint positions from calibrated cameras using only classical methods — no neural networks as core contributions. We fork the Varun-Tandon14 reimplementation of Chen et al. (2020) and add three key upgrades:
- IRLS Robust Triangulation — Iteratively Reweighted Least Squares with Huber/Tukey loss on reprojection residuals, incorporating per-joint confidence scores as prior weights
- Uncertainty-Aware Matching Affinity — Mahalanobis-like cost that maps detection confidence to per-joint variances, so high-confidence joints dominate the matching
- Kalman Filter Temporal Smoothing — Constant-velocity Kalman filter per joint with measurement covariances from IRLS residuals
- Skeleton Visualization + Exercise Feedback — Bone connections, joint angle computation (knee flexion, hip hinge, trunk tilt), and threshold-based form deviation flags
Additionally includes a single-camera exercise feedback demo using MediaPipe for real-time pose estimation on video files or webcam.
All configurations achieve 100% PCP on clean Campus data. The improvements show under stress:
| Method | Clean | Outlier 10% | Outlier 20% | Occlusion 20% | Limb Drop 20% |
|---|---|---|---|---|---|
| Baseline | 1.000 | 0.663 | 0.429 | 0.254 | 0.753 |
| IRLS (Huber) | 1.000 | 0.701 | 0.475 | 0.758 | 0.982 |
| IRLS + Uncertainty | 1.000 | 0.701 | 0.475 | 0.756 | 0.984 |
| Full (+ Kalman) | 0.999 | 0.724 | 0.497 | 0.827 | 0.989 |
Average PCP across all bone groups. Full per-bone breakdowns in results/full_comparison.json.
Key takeaways:
- IRLS provides massive gains under occlusion (25% → 76% PCP) by down-weighting outlier cameras
- Kalman smoothing further improves occlusion robustness (76% → 83%) through temporal consistency
- Limb drop resilience jumps from 75% to 98% with IRLS
- Python 3.10+
- macOS / Linux (Windows should work but paths untested)
git clone https://github.com/seantomany/585Project.git
cd 585Project
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtDownload the Campus dataset and extract it:
- Download from Google Drive
- Extract into
dataset/Campus_Seq1/
The directory should contain:
dataset/Campus_Seq1/
├── calibration.json
├── annotation_2d.json
├── annotation_3d.json
├── detection.json
└── frames/
├── Camera0/
├── Camera1/
└── Camera2/
# Baseline (linear triangulation, uniform affinity)
python -m src.run_pipeline --dataset dataset/Campus_Seq1 --method baseline
# IRLS robust triangulation (Huber loss)
python -m src.run_pipeline --dataset dataset/Campus_Seq1 --method irls_huber
# IRLS + uncertainty-aware affinity
python -m src.run_pipeline --dataset dataset/Campus_Seq1 --method irls_uncertainty
# Full pipeline (IRLS + uncertainty + Kalman filter)
python -m src.run_pipeline --dataset dataset/Campus_Seq1 --method full
# Run ALL methods
python -m src.run_pipeline --dataset dataset/Campus_Seq1 --method all# Run all methods under all perturbations
python -m src.run_pipeline --dataset dataset/Campus_Seq1 --method all --robustness all
# Single perturbation
python -m src.run_pipeline --dataset dataset/Campus_Seq1 --method full --robustness outlier_10pctAvailable perturbations: outlier_5pct, outlier_10pct, outlier_20pct, occlusion_10pct, occlusion_20pct, limb_drop_20pct, time_delay
# Save a 3D skeleton plot
python -m src.run_pipeline --dataset dataset/Campus_Seq1 --method full --visualize
# Save 3D animation
python -m src.run_pipeline --dataset dataset/Campus_Seq1 --method full --animateUses MediaPipe Pose to analyze exercise form from a video file or webcam:
# Process a video file (model downloads automatically on first run)
python -m src.demo_single_cam --input my_squat.mp4 --output results/annotated.mp4
# Live webcam
python -m src.demo_single_cam --input webcam --display
# Both save and display
python -m src.demo_single_cam --input my_squat.mp4 --output results/annotated.mp4 --displayThe demo overlays skeleton bone connections, joint angle readouts (knee flexion, hip hinge, trunk tilt), and real-time form deviation warnings.
585Project/
├── src/ # Modular pipeline code
│ ├── config.py # All constants and hyperparameters
│ ├── triangulation.py # Linear baseline + IRLS robust triangulation
│ ├── affinity.py # Uniform + uncertainty-aware matching affinity
│ ├── tracking.py # Main pipeline loop (iterative camera processing)
│ ├── kalman.py # Constant-velocity Kalman filter per joint
│ ├── helpers.py # Velocity estimation, utility functions
│ ├── visualization.py # Skeleton rendering, joint angles, form flags
│ ├── evaluation.py # PCP evaluation + tracking metrics (MOTA)
│ ├── robustness.py # Outlier injection, occlusion sim, time delay
│ ├── run_pipeline.py # CLI entry point for the full pipeline
│ └── demo_single_cam.py # Single-camera exercise feedback (MediaPipe)
├── dataset/
│ └── crossview_dataset/ # Camera, calibration, and data loading code
│ ├── calib/
│ │ ├── camera.py # Camera class (projection, backprojection)
│ │ └── calibration.py # Calibration (triangulation, epipolar geometry)
│ ├── data_utils.py # FrameLoader, Pose2DLoader, Pose3DLoader
│ └── visualization/ # Original OpenCV/vispy visualization
├── bip_solver.py # GLPK graph partitioning for clustering
├── evaluate.py # Official PCP evaluation script (longcw)
├── display.py # Official visualization script (longcw)
├── cross_view_tracking_for_3d_pose_estimation.ipynb # Original baseline notebook
├── results/ # Output: comparison tables, plots, videos
│ ├── full_comparison.json # PCP results across all methods/perturbations
│ ├── summary.json # Latest run summary
│ ├── squat_feedback.mp4 # Exercise feedback demo output
│ └── *.png # Skeleton visualization frames
└── requirements.txt
Replaces the baseline SVD-on-DLT triangulation with iteratively reweighted least squares:
- Initialize with standard linear triangulation
- Compute per-camera reprojection residuals
- Assign robust weights via Huber or Tukey bisquare loss function
- Incorporate detection confidence scores as prior weights:
w̃ ∝ (s + ε) × w_irls - Re-solve the weighted system
- Repeat until convergence
Every decision is explainable — for any joint, you can inspect which cameras were trusted vs. down-weighted and the exact residual that triggered it.
Maps detection confidence scores to per-joint variances and uses Mahalanobis-like distances:
- Variance mapping: σ²_k = 1/(s_k + ε) — low confidence → high variance
- Matching cost: C(a,b) = Σ_k ||u_ik - u_jk||² / (σ²_ik + σ²_jk)
- High-confidence joints dominate; uncertain joints (occluded wrists, elbows) contribute less
Constant-velocity Kalman filter per 3D joint:
- State: [x, y, z, vx, vy, vz]
- Measurement noise scaled by IRLS residual magnitude (noisy triangulations trusted less)
- Replaces the baseline two-point velocity difference
Controlled perturbations applied to 2D detection data:
- Outlier noise: Gaussian pixel perturbations on random fraction of joints
- Occlusion simulation: Zero out random joints with score=0
- Limb drop: Drop entire limb groups (arm/leg) with configurable probability
- Time delay: Shift per-camera frame data to simulate unsynchronized streams
numpy, scipy, opencv-python, matplotlib, pandas, tqdm, cvxopt, prettytable, ipywidgets, ipympl, jupyter
For the single-camera demo: mediapipe (installed separately: pip install mediapipe)
- Chen et al. (2020) — Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS
- longcw/crossview_3d_pose_tracking — Official dataset and evaluation
- Varun-Tandon14 — Baseline reimplementation