Skip to content


Repository files navigation

ShapeFormer: Transformer-based Shape Completion via Sparse Representation


This repository is the official pytorch implementation of our paper, ShapeFormer: Transformer-based Shape Completion via Sparse Representation.

Xinggaung Yan1, Liqiang Lin1, Niloy Mitra2, Dani Lischinski3, Danny Cohen-Or4, Hui Huang1†
1Shenzhen University, 2University College London, 3Hebrew University of Jerusalem, 4Tel Aviv University


  • Core model code is released, please check core_code/
  • The complete code is released! Please have a try!
  • (DFAUST) The data preprocess code for D-FAUST human shape is released!
  • Add Google Colab


The code is tested in docker enviroment pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel. The following are instructions for setting up the environment in a Linux system from scratch. You can also directly pull our provided docker environment: sudo docker pull qheldiv/shapeformer Or build the docker environment by yourself with the setup files in the Docker folder.

First, clone this repository with submodule xgutils. xgutils contains various useful system/numpy/pytorch/3D rendering related functions that will be used by ShapeFormer.

  git clone --recursive

Then, create a conda environment with the yaml file. (Sometimes the conda is very slow to solve the complex dependencies of this environment, so mamba is highly recommended)

  conda env create -f environment.yaml
  conda activate shapeformer

Next, we need to install torch_scatter through this command

  pip install torch-scatter==2.0.7 -f


First, download the pretrained model from this google drive URL and extract the content to experiments/

Then run the following command to test VQDIF. The results are in experiments/demo_vqdif/results

  python -m shapeformer.trainer --opts configs/demo/demo_vqdif.yaml --gpu 0 --mode "run"

Run the following command to test ShapeFormer for shape completion. The results are in experiments/demo_shapeformer/results

  python -m shapeformer.trainer --opts configs/demo/demo_shapeformer.yaml --gpu 0 --mode "run"


We use the dataset from IMNet, which is obtained from HSP.

The dataset we adopted is a downsampled version (64^3) from these dataset (which is 256 resolution). Please download our processed dataset from this google drive URL. And then extract the data to datasets/IMNet2_64/.

To use the full resolution dataset, please first download the original IMNet and HSP datasets, and run the make_imnet_dataset function in shapeformer/data/imnet_datasets/

D-FAUST Human Dataset

We also provide the scripts for process the D-FAUST human shapes. First, download the official D-FAUST dataset from this link and extract to datasets/DFAUST Then, execute the following lines to generate obj files and generate sdf samples for the human meshes.

  cd shapeformer/data/dfaust_datasets/datagen


First, train VQDIF-16 with

  python -m shapeformer.trainer --opts configs/vqdif/shapenet_res16.yaml --gpu 0

After VQDIF is trained, train ShapeFormer with

  python -m shapeformer.trainer --opts configs/shapeformer/shapenet_scale.yaml --gpu 0

For testing, you just need to append --mode test to the above commands. And if you only want to run callbacks (such as visualization/generation), set the mode to run

There is a visualization callback for both VQDIF and ShapeFormer, who will call the model to obtain 3D meshes and render them to images. The results will be save in experiments/$exp_name$/results/$callback_name$/ The callbacks will be automatically called during training and testing, so to get the generation results you just need to test the model.

ALso notice that in the configuration files batch sizes are set to very small so that the model can run on a 12GB memory GPU. You can tune it up if your GPU has a larger memory.


Notice that to use multiple GPUs, just specify the GPU ids. For example --gpu 0 1 2 4 is to use the 0th, 1st, 2nd, 4th GPU for training. Inside the program their indices will be mapped to 0 1 2 3 for simplicity.

Frequently Asked Questions

What is the meaning of the variables Xbd, Xtg, Ytg... ?

Here is a brief description of the variable names:

tg stands for target, which is the samples (probes) of the target occupancy fields. bd, or boundary stands for the points sampled from the shape surface. ct stands for context, which is the partial point cloud that we want to complete. X stands for point coordinate. Y stands for the occupancy value of the point coordinate.

The target and context names come from the field of meta-learning.

Notice that the Ytg in the hdf5 file stands for the occupancy value of the probes Xtg. In the case of IMNET2_64, Xtg is the collection of the 64-grid coordinates, which has the shape of (64**3, 3) and Ytg is the corresponding occupancy value. It is easy to visualize the shape with marching cubes if Xtg is points of a grid. But you can use arbitrarily sampled points as Xtg and Ytg for training.

How can I evaluate the ShapeFormer?

Here is an incomplete collection of evaluation code of ShapeFormer.

📔 Citation

If you find our work useful for your research, please consider citing the following papers :)

      title={ShapeFormer: Transformer-based Shape Completion via Sparse Representation}, 
      author={Xingguang Yan and Liqiang Lin and Niloy J. Mitra and Dani Lischinski and Danny Cohen-Or and Hui Huang},

📢: Shout-outs

The architecture of our method is inspired by ConvONet, Taming-transformers and DCTransformer. Thanks to the authors.

Also, make sure to check this amazing transformer-based image completion project(ICT)!

📧 Contact

This repo is currently maintained by Xingguang (@qheldiv) and is for academic research use only. Discussions and questions are welcome via