Skip to content

nicogorlo/isar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

106 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ISAR

Open-Vocabulary Instance Segmentation and Re-identification Benchmark

For the development of effective 3D perception and object-level SLAM systems, it's crucial to reliably detect, segment, and re-identify objects. Moreover, achieving this shouldn't necessitate the use of millions of training examples. We've created this benchmark to highlight this issue and expedite research into algorithms for single-shot and few-shot object instance segmentation and re-identification.

Getting started

This work has been tested in a python 3.9 environment.

  1. Install dependencies

    pip install cython
    pip install -r requirements.txt
    
  2. Download Dataset (>70GB)

dataset release pending

Replicate results

  1. Download model weights

    python3 ./isar/util/download_model_weights.py
    
  2. run benchmark

    python3 benchmark.py (...)
    

Benchmark

CLI arguments:

  • mc, method_config - path to method config file
  • d, datadir - path to directory of dataset
  • o, outdir - path to output directory
  • -dev, device - device to use. Choices: ["cpu", "cuda"]

Implement new method:

The easiest way to test a new method on the dataset and recieve results in the same format as the baseline method is:

  1. Create new class inheriting from detector.GenericDetector
  2. Implement all member functions
  3. Replace the detector in benchmark.Benchmark with your own implementation

Folder structure:

the datasets are structured as follows:

Dataset_name
|--multi_object
--|--task_name
----|--info.json (task info)
----|--train
------|--scene_name
--------|--attributes.json (scene attributes)
--------|--camera_poses.json (6DOF camera pose of each frame)
--------|--color_map.json (unique mapping: semantic_id->rgb_color of scene)
--------|--prompts_single.json (prompts for single-shot case)
--------|--prompts_multi.json (prompts for multi-shot case)
--------|--rgb
-----------|--xxxxxxx.jpg
-----------|-- ...
--------|--(optional:depth)
-----------|--xxxxxxx.png
-----------|-- ...
------|-- ...
----|--test
------|--scene_name
--------|--attributes.json (scene attributes)
--------|--camera_poses.json (6DOF camera pose of each frame)
--------|--color_map.json (unique mapping: semantic_id->rgb_color of scene)
--------|--rgb
-----------|--xxxxxxx.jpg
-----------|-- ...
--------|--(optional:depth)
-----------|--xxxxxxx.png
-----------|-- ...
--------|--semantic (this is used for visualization)
-----------|--xxxxxxx.png
-----------|-- ...
--------|--semantic_raw (this is used for eval)
-----------|--xxxxxxx.png
-----------|-- ...
------|-- ...
--|-- ...

Acknowledgements

The ISAR benchmark dataset is a synthetic dataset built with the AI-Habitat Simulator using data of the Replica-Dataset, the habitat-matterport-3D-dataset and the ycb-object-and-model-set. Further it uses the objects mentioned in ./isar/attribution/README.md .

The baseline method of ISAR builds on and utilizes previous works such as Segment Anything and DINOv2. Legacy versions of the method build on OW-DETR (which builds on Deformable DETR, Detreg, and OWOD) and CLIP.

When using the dataset in your research, please also cite:

About

Open-Vocabulary *I*nstance *S*egmentation *a*nd *R*e-identification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages