Official implementation of OpenMap: A zero-shot open-vocabulary visual-language mapping framework for instruction grounding (ACM MM 2025)
clone our project: git clone https://github.com/openmap-project/OpenMap.git
conda create --name openmap python=3.8
conda activate openmap
conda install -y pytorch==2.0.0 torchvision==0.15.0 pytorch-cuda=11.8 -c pytorch -c nvidia
mmengine installation:
pip install -U openmim
python -m pip install https://github.com/openmmlab/mmengine/archive/refs/tags/v0.8.5.zip
mmcv installation (note that older version mmcv before this commit may cause bugs):
TORCH_CUDA_ARCH_LIST="{COMCAP}" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" CUDA_HOME=$(dirname $(dirname $(which nvcc))) LD_LIBRARY_PATH=$(dirname $(dirname $(which nvcc)))/lib MMCV_WITH_OPS=1 FORCE_CUDA=1 python -m pip install git+https://github.com/open-mmlab/mmcv.git@4f65f91db6502d990ce2ee5de0337441fb69dd10
Please determine the Compute Capability (COMCAP) of your GPU:
What is the `Compute Capability` of NVIDIA {YOUR GPU MODEL}? Please only output the number, without text.
Other relevant packages:
python -m pip install \ https://github.com/open-mmlab/mmdetection/archive/refs/tags/v3.1.0.zip \ https://github.com/open-mmlab/mmsegmentation/archive/refs/tags/v1.1.1.zip \ https://github.com/open-mmlab/mmpretrain/archive/refs/tags/v1.0.1.zip python -m pip install git+https://github.com/cocodataset/panopticapi.git \ git+https://github.com/HarborYuan/lvis-api.git \ tqdm terminaltables pycocotools scipy tqdm ftfy regex timm scikit-image kornia
The official installation of Cropformer is composed of two steps: installing detectron2 and then Cropformer. For your convenience, I have combined the two steps into the following scripts. If you have any problems, please refer to the original Cropformer installation guide.
cd third_party
git clone git@github.com:facebookresearch/detectron2.git
cd detectron2 pip install -e .
cd ../
git clone git@github.com:qqlu/Entity.git
cp -r Entity/Entityv2/CropFormer detectron2/projects
cd detectron2/projects/CropFormer/entity_api/PythonAPI
make
cd ../..
cd mask2former/modeling/pixel_decoder/ops
sh make.sh
We provide an additional script in CropFormer to sequentially process all input sequences.
# modify third_party/detectron2/projects/CropFormer/demo_cropformer/predictor.py ..mask2former third_party/detectron2/projects/CropFormer/mask2former/modeling/criterion_view.py ..util
Finally, download the CropFormer checkpoint and modify the 'cropformer_path' variable in script.py.
Install the open clip library by pip install open_clip_torch
For the checkpoint, when you run the script, it will automatically download the checkpoint. However, if you want to download it manually, you can download it from here and set the path when loading CLIP model using 'create_model_and_transforms' function.
pip install -r requirements.txt
pip install timm==1.0.14
Please follow the official ScanNet guide to sign the agreement and send it to scannet@googlegroups.com. After receiving the response, you can download the data. We use only the scannet validation split, which contains 312 scenes. You only need to download the ['.aggregation.json', '.sens', '.txt', '_vh_clean_2.0.010000.segs.json', '_vh_clean_2.ply', '_vh_clean_2.labels.ply'] files. Please also set the 'label_map' on to download the 'scannetv2-labels.combined.tsv' file. After downloading, you will get the following folder structure .
data/scannet
├──raw
├──scans
├──scene0011_00
├──scene0011_00.sens
├──scene0011_00_vh_clean_2.ply
├──scene0011_00.0.010000.segs.json
├──scene0011_00_vh_clean_2.aggregation.json
├──scene0011_00_vh_clean_2.labels.ply'
├──scene0011_00.txt
├──scene0011_01
├──...
└── scannetv2-labels.combined.tsv
After downloading the data, run the following script to preprocess it. Please update the 'raw_data_dir', 'target_data_dir', 'split_file_path', 'label_map_file' and 'gt_dir' variables before you run.
cd dataset/preprocess/scannet
python process_val.py
python prepare_gt.py
After running the script, the resulting directory structure will be:
data/scannet
├── processed
├── scene0011_00
├── pose <- folder with camera poses
│ ├── 0.txt
│ ├── 10.txt
│ └── ...
├── color <- folder with RGB images
│ ├── 0.jpg (or .png/.jpeg)
│ ├── 10.jpg (or .png/.jpeg)
│ └── ...
├── depth <- folder with depth images
│ ├── 0.png (or .jpg/.jpeg)
│ ├── 10.png (or .jpg/.jpeg)
│ └── ...
├── intrinsic
│ └── intrinsic_depth.txt <- camera intrinsics
| └── ...
└── scene0011_00_vh_clean_2.ply <- point cloud of the scene
└── gt <- folder with ground truth 3D instance masks
├── scene0011_00.txt
└── ...
2.2 MatterPort3D
Please follow the official MatterPort3D guide to sign the agreement and download the data. We use a subset of its testing scenes to ensure Mask3D remains within memory constraints. The list of scenes we use can be found in splits/mpr2r.txt. Download only the following: ['undistorted_color_images', 'undistorted_depth_images', 'undistorted_camera_parameters', 'house_segmentations']. Upon download, unzip the files. The expected directory structure is as follows (paths in 'dataset/preprocess/matterport3d/process.py' and 'dataset/matterport.py' can be modified if needed):
data/matterport3d/scans
├── 2t7WUuJeko7
├── 2t7WUuJeko7
├── house_segmentations
| ├── 2t7WUuJeko7.ply
| └── ...
├── undistorted_camera_parameters
| └── 2t7WUuJeko7.conf
├── undistorted_color_images
| ├── xxx_i0_0.jpg
| └── ...
└── undistorted_depth_images
├── xxx_d0_0.png
└── ...
├── ARNzJeq3xxb
├── ...
└── YVUC4YcDtcY
Please change the category_mapping
path in dataset/reprocess/matterport3d/constants.py, and raw_data_dir
,gt_dir
, split_file_path
in dataset/reprocess/matterport3d/process.py, then run the following script to prepare the ground-truth annotations:
cd dataset/preprocess/matterport3d
python process.py
Before running experiments, please modify the project_root
field in conf/base_config.yaml
and the checkpoint file path in third_party/ovsam/sam_r50x16_fpn.py
.
Download all checkpoint files from here and place them under ${project_root}/third_party/ovsam/ckpt/
.
You can run the entire pipeline with a single command:
bash run.sh
Alternatively, you can run the pipeline step by step (make sure to adjust the configuration files according to your directory structure):
#!/bin/bash
python -m apps.get_segmentation
python -m third_party.ovsam.semantic_handler
python -m apps.setup_graph
python -m apps.get_clip_feature
python -m apps.export_instances
python -m evaluation.evaluate