ImageBind with SAM Simple Demo: Segment with Different Modalities

Thanks a lot for release such an amazing work!

We implement a simple and interesting demo by combing ImageBind with SAM here: [ImageBind-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/playground/ImageBind_SAM) which can segment things with different modalities, and the project is still under develop

This basic idea is followed with [IEA: Image Editing Anything](https://github.com/feizc/IEA) and [CLIP-SAM](https://github.com/maxi-w/CLIP-SAM) which generate the referring mask with the following steps:

- Step 1: Generate auto masks with `SamAutomaticMaskGenerator`
- Step 2: Crop all the box region from the masks
- Step 3: Compute the similarity with cropped images and different modalities
- Step 4: Merge the highest similarity mask region

And the result is shown as:

<div align="center">

| Input Model | Modality | Generate Mask |
|:----:|:----:|:----:|
| ![](https://github.com/IDEA-Research/Grounded-Segment-Anything/blob/main/playground/ImageBind_SAM/.assets/car_image.jpg?raw=true) | [car audio](https://github.com/IDEA-Research/Grounded-Segment-Anything/blob/main/playground/ImageBind_SAM/.assets/car_audio.wav) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/imagebind_sam/audio_sam_merged_mask_new.jpg?raw=true) |
| ![](https://github.com/IDEA-Research/Grounded-Segment-Anything/blob/main/playground/ImageBind_SAM/.assets/car_image.jpg?raw=true) | "A car" | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/imagebind_sam/text_sam_merged_mask.jpg?raw=true) |

</div>

And the threshold for each box will influence a lot on the final result, we will do more test on it!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ImageBind with SAM Simple Demo: Segment with Different Modalities #53

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ImageBind with SAM Simple Demo: Segment with Different Modalities #53

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions