Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
/ d3d-hoi Public archive

We create D3D-HOI a dataset of monocular videos with ground truth annotations of 3D object pose and part motion during human-object interaction.


Notifications You must be signed in to change notification settings


Repository files navigation

D3D-HOI: Dynamic 3D Human-Object Interactions from Videos

Xiang Xu, Hanbyul Joo, Greg Mori, Manolis Savva.

[arXiv] [Bibtex]

D3D-HOI Video Dataset

Data Statistics

D3D-HOI video dataset contains a total of 256 videos from 8 categories. Every frame is annotated with the object rotation, translation, size, and part motion in 3D. Below is the per-category breakdown.

Download D3D-HOI dataset

Download original videos

Data Layout

The structure of the D3D-HOI video dataset is as follows:


The file 3d_info.txt contains the object global rotation, global translation and real-world dimensions (in cm) at rest state. It also provide the ground-truth CAD model ID, interacting part ID, and the estimated camera focal lengths.

The file jointstate.txt stores the ground-truth per-frame part motion. This is either in degree for revolute joint or in cm for primatic joint.

Original images are in the frame folder. Per-frame object masks are in the gt_mask folder. Estimated 2D SMPL vertice coordinates are in smplv2d folder. Estimated 3D SMPL vertices (after orthographic projection) are in smplmesh folder. Estimated 3D SMPL joints are in the joints3d folder. We use pretrained model from EFT to estimate the SMPL parameters.

SAPIEN PartNet-Mobility Models

Pre-processing the Data

Due to legal issue, we can not directly re-distribute the post-processed data. We provide the 24 CAD ID together with the motion parameters used in our paper here. Please refer to the preprocess folder on how to run the CAD process code for PartNet-Mobility models.

Data Layout

After post-process, the structure of the CAD model folder should be as follows:


The file motion.json contains the ground-truth rotation or translation axis origin and direction (at canonical space). It also provides the motion range and motion type (revolute or prismatic). The commonly interacted object vertices are also stored in the contact list.

The file motion.gif provides visualization for all possible motions of the model.

Canonical object part meshes are stored in the final folder.



We recommend using a conda environment:

conda create -n d3dhoi python=3.8
conda activate d3dhoi
pip install -r requirements.txt

Install the torch version that corresponds to your version of CUDA, eg for CUDA 11.0, use:

conda install -c pytorch pytorch=1.7.0 torchvision cudatoolkit=11.0

Install pytorch3d, use:

conda install -c conda-forge -c fvcore fvcore
conda install pytorch3d=0.3.0 -c pytorch3d

Setting up External Dependencies

Compile the mesh-fusion libraries for running the preprocess code.

Running the Code

Refer to the optimization folder on how to run the optimization and evaluation code. You can download our optimized results. The zip file also provides scripts for reproducing the results for each category.

More optimization results are available here.

You can also use code the in visualization folder to explore the annotated dataset.


Our code is released under CC-BY-NC 4.0. See the LICENSE file. However, our code depends on other libraries, including SMPL, which each have their own respective licenses that must also be followed.


We use mesh-fusion to process PartNet-Mobility Dataset, and EFT to estimate the SMPL parameters.

Citing D3D-HOI

If you use find this code helpful, please consider citing:

  title={D3D-HOI: Dynamic 3D Human-Object Interactions from Videos},
  author={Xiang Xu and Hanbyul Joo and Greg Mori and Manolis Savva},
  journal={arXiv preprint arXiv:2108.08420},


We create D3D-HOI a dataset of monocular videos with ground truth annotations of 3D object pose and part motion during human-object interaction.



Code of conduct

Security policy





No releases published