GitHub - Leon1207/3DRefTR: This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"

This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding".

0. Installation

(1) Install environment with environment.yml file:

conda env create -f environment.yml --name 3dreftr

or you can install manually:

conda create -n 3dreftr python=3.7
conda activate 3dreftr
conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia
pip install numpy ipython psutil traitlets transformers termcolor ipdb scipy tensorboardX h5py wandb plyfile tabulate

(2) Install spacy for text parsing

pip install spacy
# 3.3.0
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.3.0/en_core_web_sm-3.3.0.tar.gz

(3) Compile pointnet++
```
cd ~/3DRefTR
sh init.sh
```
(4) Install segmentator from https://github.com/Karbo123/segmentator

1. Quick visualization demo

We showing visualization via wandb for superpoints, kps points, bad case analyse, predict/ground_truth masks and box.

superpoints in 'src/joint_det_dataset.py' line 71

self.visualization_superpoint = False

others in 'src/groungd_evaluation.py' line 66 ~ 70

self.visualization_pred = False
self.visualization_gt = False
self.bad_case_visualization = False
self.kps_points_visualization = False
self.bad_case_threshold = 0.15

2. Data preparation

The final required files are as follows:

├── [DATA_ROOT]
│	├── [1] train_v3scans.pkl # Packaged ScanNet training set
│	├── [2] val_v3scans.pkl   # Packaged ScanNet validation set
│	├── [3] ScanRefer/        # ScanRefer utterance data
│	│	│	├── ScanRefer_filtered_train.json
│	│	│	├── ScanRefer_filtered_val.json
│	│	│	└── ...
│	├── [4] ReferIt3D/        # NR3D/SR3D utterance data
│	│	│	├── nr3d.csv
│	│	│	├── sr3d.csv
│	│	│	└── ...
│	├── [5] group_free_pred_bboxes/  # detected boxes (optional)
│	├── [6] gf_detector_l6o256.pth   # pointnet++ checkpoint (optional)
│	├── [7] roberta-base/     # roberta pretrained language model
│	├── [8] checkpoints/      # 3dreftr pretrained models

[1] [2] Prepare ScanNet Point Clouds Data

1) Download ScanNet v2 data. Follow the ScanNet instructions to apply for dataset permission, and you will get the official download script download-scannet.py. Then use the following command to download the necessary files:

python2 download-scannet.py -o [SCANNET_PATH] --type _vh_clean_2.ply
python2 download-scannet.py -o [SCANNET_PATH] --type _vh_clean_2.labels.ply
python2 download-scannet.py -o [SCANNET_PATH] --type .aggregation.json
python2 download-scannet.py -o [SCANNET_PATH] --type _vh_clean_2.0.010000.segs.json
python2 download-scannet.py -o [SCANNET_PATH] --type .txt

where [SCANNET_PATH] is the output folder. The scannet dataset structure should look like below:

├── [SCANNET_PATH]
│   ├── scans
│   │   ├── scene0000_00
│   │   │   ├── scene0000_00.txt
│   │   │   ├── scene0000_00.aggregation.json
│   │   │   ├── scene0000_00_vh_clean_2.ply
│   │   │   ├── scene0000_00_vh_clean_2.labels.ply
│   │   │   ├── scene0000_00_vh_clean_2.0.010000.segs.json
│   │   ├── scene.......

2) Package the above files into two .pkl files(train_v3scans.pkl and val_v3scans.pkl):
```
python Pack_scan_files.py --scannet_data [SCANNET_PATH] --data_root [DATA_ROOT]
```

[3] ScanRefer: Download ScanRefer annotations following the instructions HERE. Unzip inside [DATA_ROOT].
[4] ReferIt3D: Download ReferIt3D annotations following the instructions HERE. Unzip inside [DATA_ROOT].
[5] group_free_pred_bboxes: Download object detector's outputs. Unzip inside [DATA_ROOT]. (not used in single-stage method)
[6] gf_detector_l6o256.pth: Download PointNet++ checkpoint into [DATA_ROOT].

[7] roberta-base: Download the roberta pytorch model:

cd [DATA_ROOT]
git clone https://huggingface.co/roberta-base
cd roberta-base
rm -rf pytorch_model.bin
wget https://huggingface.co/roberta-base/resolve/main/pytorch_model.bin

[8] checkpoints: Our pre-trained models (see 3. Models).
[9] ScanNetv2: Prepare the preporcessed ScanNetv2 dataset follow "Data Preparation" section from https://github.com/sunjiahao1999/SPFormer, obtaining the dataset file with the following structure:

ScanNetv2
├── data
│   ├── scannetv2
│   │   ├── scans
│   │   ├── scans_test
│   │   ├── train
│   │   ├── val
│   │   ├── test
│   │   ├── val_gt

[10] superpoints: Prepare superpoints for each scene preprocessed from Step. 9.
```
cd [DATA_ROOT]
python superpoint_maker.py  # modify data_root & split
```

3. Models

Dataset/Model	REC mAP@0.25	RES mIoU	Model
ScanRefer/3DRefTR-SP	55.45	40.76	GoogleDrive
ScanRefer/3DRefTR-SP (Single-Stage)	54.43	40.23	GoogleDrive
ScanRefer/3DRefTR-HR	55.04	41.24	GoogleDrive
ScanRefer/3DRefTR-HR (Single-Stage)	54.40	40.75	GoogleDrive
SR3D/3DRefTR-SP	68.45	44.61	GoogleDrive
NR3D/3DRefTR-SP	52.55	36.17	GoogleDrive

4. Training

Please specify the paths of --data_root, --log_dir, --pp_checkpoint in the train_*.sh script first.

For ScanRefer training

sh scripts/train_scanrefer_3dreftr_hr.sh
sh scripts/train_scanrefer_3dreftr_sp.sh

For ScanRefer (single stage) training

sh scripts/train_scanrefer_3dreftr_hr_single.sh
sh scripts/train_scanrefer_3dreftr_sp_single.sh

For SR3D training

sh scripts/train_sr3d_3dreftr_hr.sh
sh scripts/train_sr3d_3dreftr_sp.sh

For NR3D training

sh scripts/train_nr3d_3dreftr_hr.sh
sh scripts/train_nr3d_3dreftr_sp.sh

5. Evaluation

Please specify the paths of --data_root, --log_dir, --checkpoint_path in the test_*.sh script first.

For ScanRefer evaluation

sh scripts/test_scanrefer_3dreftr_hr.sh
sh scripts/test_scanrefer_3dreftr_sp.sh

For ScanRefer (single stage) evaluation

sh scripts/test_scanrefer_3dreftr_hr_single.sh
sh scripts/test_scanrefer_3dreftr_sp_single.sh

For SR3D evaluation

sh scripts/test_sr3d_3dreftr_hr.sh
sh scripts/test_sr3d_3dreftr_sp.sh

For NR3D evaluation

sh scripts/test_nr3d_3dreftr_hr.sh
sh scripts/test_nr3d_3dreftr_sp.sh

6. Acknowledgements

This repository is built on reusing codes of EDA. We recommend using their code repository in your research and reading the related article. We are also quite grateful for SPFormer, BUTD-DETR, GroupFree, ScanRefer, and SceneGraphParser.

7. Citation

If you find our work useful in your research, please consider citing:

@misc{lin2023unified,
      title={A Unified Framework for 3D Point Cloud Visual Grounding}, 
      author={Haojia Lin and Yongdong Luo and Xiawu Zheng and Lijiang Li and Fei Chao and Taisong Jin and Donghao Luo and Chengjie Wang and Yan Wang and Liujuan Cao},
      year={2023},
      eprint={2308.11887},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
data		data
models		models
pointnet2		pointnet2
scripts		scripts
sng_parser		sng_parser
src		src
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
Pack_scan_files.py		Pack_scan_files.py
README.md		README.md
environment.yml		environment.yml
init.sh		init.sh
main_utils.py		main_utils.py
mapping_full2rio27.json		mapping_full2rio27.json
prepare_data.py		prepare_data.py
superpoint_maker.py		superpoint_maker.py
train_dist_mod.py		train_dist_mod.py

License

Leon1207/3DRefTR

Folders and files

Latest commit

History

Repository files navigation

0. Installation

1. Quick visualization demo

2. Data preparation

3. Models

4. Training

5. Evaluation

6. Acknowledgements

7. Citation

About

Resources

License

Stars

Watchers

Forks

Languages