Skip to content


Repository files navigation


Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

Haotian Tang*, Zhijian Liu*, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang, Song Han

ECCV 2020

SPVNAS achieves state-of-the-art performance on the SemanticKITTI leaderboard (as of July 2020) and outperforms MinkowskiNet with 3x speedup, 8x MACs reduction.


[2020-09] We release the baseline training code for SPVCNN and MinkowskiNet.

[2020-08] Please check out our ECCV 2020 tutorial on AutoML for Efficient 3D Deep Learning, which summarizes the algorithm in this codebase.

[2020-07] Our paper is accepted to ECCV 2020.



The code is built with following libraries:

Recommended Installation

For easy installation, use conda:

conda create -n torch python=3.7
conda activate torch
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
conda install numba opencv
pip install torchpack
pip install --upgrade git+

Data Preparation


Please follow the instructions from here to download the SemanticKITTI dataset (both KITTI Odometry dataset and SemanticKITTI labels) and extract all the files in the sequences folder to /dataset/semantic-kitti. You shall see 22 folders 00, 01, …, 21; each with subfolders named velodyne and labels.

Model Zoo


We share the pretrained models for MinkowskiNets, our manually designed SPVCNN models and also SPVNAS models found by our 3D-NAS pipeline. All the pretrained models are available in the model zoo. Currently, we release the models trained on sequences 00-07 and 09-10 and evaluated on sequence 08.

#Params (M) MACs (G) mIoU (paper) mIoU (reprod.)
SemanticKITTI_val_MinkUNet@29GMACs 5.5 28.5 58.9 59.3
SemanticKITTI_val_SPVCNN@30GMACs 5.5 30.0 60.7 60.8 ± 0.5
SemanticKITTI_val_SPVNAS@20GMACs 3.3 20.0 61.5 -
SemanticKITTI_val_SPVNAS@25GMACs 4.5 24.6 62.9 -
SemanticKITTI_val_MinkUNet@46GMACs 8.8 45.9 60.3 60.0
SemanticKITTI_val_SPVCNN@47GMACs 8.8 47.4 61.4 61.5 ± 0.2
SemanticKITTI_val_SPVNAS@35GMACs 7.0 34.7 63.5 -
SemanticKITTI_val_MinkUNet@114GMACs 21.7 113.9 61.1 61.9
SemanticKITTI_val_SPVCNN@119GMACs 21.8 118.6 63.8 63.7 ± 0.4
SemanticKITTI_val_SPVNAS@65GMACs 10.8 64.5 64.7 -

Here, the results are reproduced using 8 NVIDIA RTX 2080Ti GPUs. Result variation for each single model is due to the existence of floating point atomic addition operation in our TorchSparse CUDA backend.

Testing Pretrained Models

You can run the following command to test the performance of SPVNAS / SPVCNN / MinkUNet models.

torchpack dist-run -np [num_of_gpus] python configs/semantic_kitti/default.yaml --name [num_of_net]

For example, to test the model SemanticKITTI_val_SPVNAS@65GMACs on one GPU, you may run

torchpack dist-run -np 1 python configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs


You can run the following command (on a headless server) to visualize the predictions of SPVNAS / SPVCNN / MinkUNet models:

xvfb-run --server-args="-screen 0 1024x768x24" python

If you are running the code on a computer with monitor, you may also directly run


The visualizations will be generated in assets/.



We currently release the training code for manually-designed baseline models (SPVCNN and MinkowskiNets). You may run the following code to train the model from scratch:

torchpack dist-run -np [num_of_gpus] python configs/semantic_kitti/[model name]/[config name].yaml

For example, to train the model SemanticKITTI_val_SPVCNN@30GMACs, you may run

torchpack dist-run -np [num_of_gpus] python configs/semantic_kitti/spvcnn/cr0p5.yaml

To train the model in a non-distributed environment without MPI, i.e. on a single GPU, you may directly call with the --distributed False argument:

python configs/semantic_kitti/spvcnn/cr0p5.yaml --distributed False


The code related to architecture search will be coming soon!


If you use this code for your research, please cite our paper.

  title = {Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution},
  author = {Tang, Haotian* and Liu, Zhijian* and Zhao, Shengyu and Lin, Yujun and Lin, Ji and Wang, Hanrui and Han, Song},
  booktitle = {European Conference on Computer Vision},
  year = {2020}