Skip to content
Implementation of the GCPR19 Paper "Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views"
Python Shell Dockerfile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docker
mv3dpose new opose version Jul 31, 2019
openpose @ f2f2222
preprocessing fix order of second UMPM actor for right leg Jul 24, 2019
.gitignore evaluate on the umpm dataset Jul 18, 2019
.gitmodules
LICENSE
README.md
eval.sh
mvpose.sh
test_permutations.sh
visualize.sh add feet for processing, add visualization script Jun 4, 2019

README.md

mv3dpose

Off-the-shelf Multiple Person Multiple View 3D Pose Estimation.

out

Abstract

In this work we propose an approach for estimating 3D human poses of multiple people from a set of calibrated cameras. Estimating 3D human poses from multiple views has several compelling properties: human poses are estimated within a global coordinate space and multiple cameras provide an extended field of view which helps in resolving ambiguities, occlusions and motion blurs. Our approach builds upon a real-time 2D multi-person pose estimation system and greedily solves the association problem between multiple views. We utilize bipartite matching to track multiple people over multiple frames. This proofs to be especially efficient as problems associated with greedy matching such as occlusion can be easily resolved in 3D. Our approach achieves state-of-the-art results on popular benchmarks and may serve as a baseline for future work.

Install

This project requires nvidia-docker and drivers that support cuda 10.

Clone this repository with its submodules as follows:

git clone --recursive https://github.com/jutanke/mv3dpose.git

Usage

Your dataset must reside in a pre-defined folder structure:

  • dataset
    • dataset.json
    • cameras
      • camera00
        • frame00xxxxxxm.json
      • camera01
        • frame00xxxxxxm.json
      • ...
      • camera_n
        • frame00xxxxxxm.json
    • videos
      • camera00
        • frame00xxxxxxm.png
      • camera01
        • frame00xxxxxxm.png
      • ...
      • camera_n
        • frame00xxxxxxm.png

The file names per frame utilize the following schema:

"frame%09d.{png/json}"

The camera json files follow two types of structures: A simple camera with only the projection matrix and width and height:

{
  "P" : [ 3 x 4 ],
  "w" : int(width),
  "h" : int(height)
}

or a more complex camera setup with distortion coefficients. This camera is based on OpenCV.

{
  "K" : [ 3 x 3 ], /* intrinsic paramters */
  "rvec": [ 1 x 3 ], /* rotation vector */
  "tvec": [ 1 x 3 ], /* translation vector */
  "discCoef": [ 1 x 5 ], /* distortion coefficient */
  "w" : int(width),
  "h" : int(height)
}

The system expects a camera for each view at each point in time. If your dataset uses fixed cameras you will need to simply repeat them for all frames.

The dataset.json file contains general information for the model:

{
  "n_cameras": int(#cameras), /* number of cameras */
  "scale_to_mm": 1, /* scales the calibration to mm */
}

The variable scale_to_mm is needed as we operate in [mm] but calibrations might be in other metrics. For example, when the calibration is done in meters, scale_to_mm must be set to 1000.

optional Parameters

  • valid_frames: if frames do not start at 0 and/or are not continious you can set a list of frames here
  • epi_threshold: epipolar line distance threshold in PIXEL
  • max_distance_between_tracks: maximal distance in [mm] between tracks so that they can be associated
  • min_track_length: drop any track which is shorter than min_track_length frames
  • last_seen_delay: allow to skip last_seen_delay frames for connecting a lost track
  • smoothing_sigma: sigma value for Gaussian smoothing of tracks
  • smoothing_interpolation_range: define how far fill-ins should be reaching
  • do_smoothing: should smoothing be done at all? (Default is True)

Run the system

./mvpose.sh /path/to/your/dataset

The resulting tracks will be in your dataset folder under tracks3d, each track represents a single person. The files are organised as follows:

{
  "J": int(joint number), /* number of joints */
  "frames": [int, int], /* ordered list of the frames where this track is residing */
  "poses": [ n_frames x J x 3 ] /* 3D poses, 3d location OR None, if joint is missing */
}
You can’t perform that action at this time.