Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Learning 3D Dynamic Scene Representations for Robot Manipulation

Zhenjia Xu1*, Zhanpeng He1*, Jiajun Wu2, Shuran Song1
1Columbia University, 2Stanford University
CoRL 2020

Project Page | Video | arXiv


This repo contains the PyTorch implementation for paper "Learning 3D Dynamic Scene Representations for Robot Manipulation". teaser



The code is built with Python 3.6. Libraries are listed in requirements.txt:

Data Preparation

Download Testing Data

The following two testing datasets can be download.

  • Sim: 400 sequences, generated in pybullet.
  • Real: 150 sequences, with full annotations.

Generate Training Data

Download object mesh: shapenet and ycb.

To generate data in simulation, one can run

python --data_path [path to data] --train_num [number of training sequences] --test_num [number of testing sequences] --object_type [type of objects]

Where the object_type can be cube, shpenet, or ycb. The training data in the paper can be generated with the followint scripts:

# cube
python --data_path data/cube_train --train_num 4000 --test_num 400 --object_type cube

# shapenet
python --data_path data/shapenet_train --train_num 4000 --test_num 400 --object_type shapenet

Pretrained Models

Some of the pretrained models can be download in pretrained_models. To evaluate the pretrained models, one can run

python --resume [path to model] --data_path [path to data] --model_type [type of model] --test_type [type of test]

where model_type can be one of the following:

  • dsr: DSR-Net introduced in the paper.
  • single: It does not use any history aggregation.
  • nowarp: It does not warp the representation before aggregation.
  • gtwarp: It warps the representation with ground truth motion (i.e., performance oracle)
  • 3dflow: It predicts per-voxel scene flow for the entire 3D volume.

Both motion prediction and mask prediction can be evaluated by choosing different test_type:

  • motion prediction: motion_visible or motion_full
  • mask prediction: mask_ordered or mask_unordered

(Please refer to our paper for detailed explanation of each type of evaluation)

Here are several examples:

# evaluate mask prediction (ordered) of DSR-Net using real data:
python --resume [path to dsr model] --data_path [path to real data] --model_type dsr --test_type mask_ordered

# evaluate mask prediction (unordered) of DSR-Net(finetuned) using real data:
python --resume [path to dsr_ft model] --data_path [path to real data] --model_type dsr --test_type mask_unordered

# evaluate motion prediction (visible surface) of NoWarp model using sim data:
python --resume [path to nowarp model] --data_path [path to sim data] --model_type nowarp --test_type motion_visible

# evaluate motion prediction (full volume) of SingleStep model using sim data:
python --resume [path to single model] --data_path [path to sim data] --model_type single --test_type motion_full


Various training options can be modified or toggled on/off with different flags (run python -h to see all options):

usage: [-h] [--exp EXP] [--gpus GPUS [GPUS ...]] [--resume RESUME]
                [--data_path DATA_PATH] [--object_num OBJECT_NUM]
                [--seq_len SEQ_LEN] [--batch BATCH] [--workers WORKERS]
                [--model_type {dsr,single,nowarp,gtwarp,3dflow}]
                [--transform_type {affine,se3euler,se3aa,se3spquat,se3quat}]
                [--alpha_motion ALPHA_MOTION] [--alpha_mask ALPHA_MASK]
                [--snapshot_freq SNAPSHOT_FREQ] [--epoch EPOCH] [--finetune]
                [--seed SEED] [--dist_backend DIST_BACKEND]
                [--dist_url DIST_URL]

Training of DSR-Net

Since the aggregation ability depends on the accuracy of motion prediction, we split the training process into three stages from easy to hard: (1) single-step on cube dataset; (2) multi-step on cube dataset; (3) multi-step on ShapeNet dataset.

# Stage 1 (single-step on cube dataset)
python --exp dsr_stage1 --data_path [path to cube dataset] --seq_len 1 --model_type dsr --epoch 30

# Stage 2 (multi-step on cube dataset)
python --exp dsr_stage2 --resume [path to stage1] --data_path [path to cube dataset] --seq_len 10 --model_type dsr --epoch 20 --finetune

# Stage 3 (multi-step on shapenet dataset)
python --exp dsr_stage3 --resume [path to stage2] --data_path [path to shapenet dataset] --seq_len 10 --model_type dsr --epoch 20 --finetune

Training of Baselines

  • nowarp and gtwarp. Use the same scripts as DSR-Net with corresponding model_type.

  • single and 3dflow. Two-stage training: (1) single step on cube dataset; (2) single step on Shapenet dataset.


    title={Learning 3D Dynamic Scene Representations for Robot Manipulation},
    author={Xu, Zhenjia and He, Zhanpeng and Wu, Jiajun and Song, Shuran},
    booktitle={Conference on Robot Learning (CoRL)},


This repository is released under the MIT license. See LICENSE for additional details.