Skip to content

The source code of paper Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention

Notifications You must be signed in to change notification settings

onmyoji-xiao/3dvg_SeCG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SeCG:Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention

image

Environment

Requirements

  • CUDA: >=11.3
  • Python: >=3.8
  • PyTorch: >=1.12.0

Installation

pip install h5py
pip install transformers
pip install pickle
pip install tensorboardX

cd external_tools/pointnet2
python setup.py install

Data Preparation

ScanNet v2

Download the ScanNet V2 dataset.
Prepare for ScanNet data and package it into "scannet_00_views.pkl" or "scannet_0x_views.pkl"

cd data
python prepare_scannet_data.py --process-only-zero-view [1/0]

Pretrained Model

Download Bert files from Hugging Face or from our drive
Download the first encoder checkpoint " ckpt_cls40.pth" from our drive

Evaluation

Download SeCG model "ckpt_nr3d.pth" and "ckpt_sr3d.pth" from our drive, put it into "./checkpoints"

//nr3d
python evaluation.py 
	--scannet-file ./scannet/scannet_00_views.pkl 
	--refer_test_file ./data/referit3d/nr3d_test.csv 
	--weight ./checkpoints/ckpt_nr3d.pth
	--bert-pretrain-path /pretrained/bert
//sr3d
python evaluation.py 
	--scannet-file ./scannet/scannet_00_views.pkl;./scannet/scannet_0x_views.pkl
	--refer_test_file ./data/referit3d/sr3d_test.csv 
	--weight ./checkpoints/ckpt_sr3d.pth
	--bert-pretrain-path /pretrained/bert

Training

//nr3d
python train.py 
	--scannet-file ./scannet/scannet_00_views.pkl 
	--refer_train_file ./data/referit3d/nr3d_train.csv
	--refer_val_file ./data/referit3d/nr3d_test.csv
        --pn-path ./pretrained/ckpt_cls40.pth
	--n-workers 8
        --batch-size 36
	--bert-pretrain-path /pretrained/bert
//sr3d
python train.py  
	--scannet-file ./scannet/scannet_00_views.pkl;./scannet/scannet_0x_views.pkl
	--refer_train_file ./data/referit3d/sr3d_train.csv
	--refer_val_file ./data/referit3d/sr3d_test.csv
        --pn-path ./pretrained/ckpt_cls40.pth
	--n-workers 8
        --batch-size 36
	--bert-pretrain-path /pretrained/bert

Acknowledgment

Our codes references the following codebases. We gratefully thank the authors for their wonderful works.
referit3d, ScanRefer, MVT-3DVG, VQA_ReGAT

About

The source code of paper Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published