-
Notifications
You must be signed in to change notification settings - Fork 827
Open
Description
Thanks a lot for release such an amazing work!
We implement a simple and interesting demo by combing ImageBind with SAM here: ImageBind-SAM which can segment things with different modalities, and the project is still under develop
This basic idea is followed with IEA: Image Editing Anything and CLIP-SAM which generate the referring mask with the following steps:
- Step 1: Generate auto masks with
SamAutomaticMaskGenerator - Step 2: Crop all the box region from the masks
- Step 3: Compute the similarity with cropped images and different modalities
- Step 4: Merge the highest similarity mask region
And the result is shown as:
| Input Model | Modality | Generate Mask |
|---|---|---|
![]() |
car audio | ![]() |
![]() |
"A car" | ![]() |
And the threshold for each box will influence a lot on the final result, we will do more test on it!
TLi347, juandavidGF, missflash, FrancescoSaverioZuppichini, vasylcf and 1 morerentainhe
Metadata
Metadata
Assignees
Labels
No labels


