Skip to content

getterupper/DiScene

Repository files navigation

DiScene

Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation [paper]

RA-L 2025

TODO

  • Initial commit
  • Model zoo
  • arXiv version

Introduction

Occupancy prediction provides critical geometric and semantic understanding for robotics but faces efficiency-accuracy trade-offs. Current dense methods suffer computational waste on empty voxels, while sparse query-based approaches lack robustness in diverse and complex indoor scenes. In this paper, we propose DiScene, a novel sparse query-based framework that leverages multi-level distillation to achieve efficient and robust occupancy prediction. In particular, our method incorporates two key innovations: (1) a Multi-level Consistent Knowledge Distillation strategy, which transfers hierarchical representations from large teacher models to lightweight students through coordinated alignment across four levels, including encoder-level feature alignment, query-level feature matching, prior-level spatial guidance, and anchor-level high-confidence knowledge transfer and (2) a Teacher-Guided Initialization policy, employing optimized parameter warm-up to accelerate model convergence. Validated on the Occ-Scannet benchmark, DiScene achieves 23.2 FPS without depth priors while outperforming our baseline method, OPUS, by 36.1% and even better than the depth-enhanced version, OPUS†. With depth integration, DiScene† attains new SOTA performance, surpassing EmbodiedOcc by 3.7% with 1.62× faster inference speed. Furthermore, experiments on the Occ3D-nuScenes benchmark and in-the-wild scenarios demonstrate the versatility of our approach in various environments.

Getting Started

Installation

Follow instructions HERE to prepare the environment.

Data Preparation

Please download posed_images and gathered_data from the Occ-ScanNet Benchmark and move them to data/occscannet, zip files need extraction.

Folder structure

DiScene
├── ...
├── data/
│   ├── occscannet/
│   │   ├── gathered_data/
│   │   ├── posed_images/
│   │   ├── train.txt
│   │   ├── test.txt
├── ...

Train and Eval

  1. Train different models using 8 GPUs on Occ-ScanNet Benchmark:

    # train student model
    bash dist_train.sh 8 configs/occscannet/r50/discene_960x16_student_r50.py
    
    # train teacher model
    bash dist_train.sh 8 configs/occscannet/internxl/discene_960x16_teacher_internxl.py
    
    # train distilled model (DiScene†)
    bash dist_train.sh 8 configs/occscannet/r50/discene_960x16_guided_distill_r50.py  # Please modify 'teacher_weight' in the configuration file accordingly.
    
    # training without pre-trained depth model
    # student model
    bash dist_train.sh 8 configs/occscannet/r50/discene_960x16_vanilla_r50.py
    # teacher model
    bash dist_train.sh 8 configs/occscannet/internxl/discene_960x16_teacher_vanilla_internxl.py
    # distilled model (DiScene)
    bash dist_train.sh 8 configs/occscannet/r50/discene_960x16_guided_distill_vanilla_r50.py  # Please modify 'teacher_weight' in the configuration file accordingly.
  2. Evaluate model using 8 GPUs on Occ-ScanNet Benchmark:

    # evaluate distilled model (DiScene†)
    bash dist_val.sh 8 configs/occscannet/r50/discene_960x16_guided_distill_r50.py /path/to/checkpoints

Model Zoo

3D Occupancy Prediction (on Occ-Scannet Benchmark)

Method mIoU Config Checkpoints
DiScene† 47.17 config model

Acknowledgement

Our code is developed on top of OPUS. We sincerely appreciate their amazing works.

Also, we would like to thank these excellent open source projects:

Bibtex

If you find this work useful, please consider citing:

@article{li2025enhancing,
  title={Enhancing Indoor Occupancy Prediction Via Sparse Query-Based Multi-Level Consistent Knowledge Distillation},
  author={Li, Xiang and Zheng, Yupeng and Li, Pengfei and Chen, Yilun and Zhang, Ya-Qin and Ding, Wenchao},
  journal={IEEE Robotics and Automation Letters},
  year={2025},
  volume={10},
  number={11},
  pages={11690-11697},
  publisher={IEEE},
  doi={10.1109/LRA.2025.3615532}
}

About

[RA-L 2025] Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors