Text as prompts #93

peiwang062 · 2023-04-08T05:26:46Z

Thanks for leasing this wonderful work!
I saw the demo shows examples of using point, box as input prompt. Does the demo support text as prompt?

stefanjaspers · 2023-04-08T17:36:27Z

Following! Text prompting has been mentioned in the research paper but hasn't been released yet. Really looking forward to this feature because I need it for a specific use case.

darvilabtech · 2023-04-09T07:43:50Z

Exactly, wait for it to be released

HaoZhang990127 · 2023-04-09T10:38:06Z

Thank you for your exciting work!

I also want to use text as prompt to generate mask in my project. Now i am using ClipSeg to generate the mask, but it can not performance well in fine grained semantics.

When do you plan to open source the code of text as prompt? What is the approximate time line? Waiting for this amazing work.

jy00161yang · 2023-04-09T11:59:46Z

following

eware-godaddy · 2023-04-09T15:26:32Z

following

0xbitches · 2023-04-09T21:03:05Z

The paper mentioned they used CLIP to handle text prompts:

We represent points and boxes by positional encodings [95] summed with learned embeddings for each prompt type and free-form text with an off-the-shelf text encoder from CLIP [82].

It appears the demo does not seem to allow textual inputs though.

darvilabtech · 2023-04-09T22:54:41Z

@peiwang062 @stefanjaspers @HaoZhang990127 @eware-godaddy @jy00161yang
https://github.com/IDEA-Research/Grounded-Segment-Anything does what we are all looking after.

peiwang062 · 2023-04-10T00:24:08Z

@peiwang062 @stefanjaspers @HaoZhang990127 @eware-godaddy @jy00161yang https://github.com/IDEA-Research/Grounded-Segment-Anything does what we are all looking after.

yes, we could simply combine these two, but if SAM can do it better, why do we need two models. We don’t know if Grounding Dino is the bottleneck if we just use its output to SAM.

alexw994 · 2023-04-10T02:07:03Z

@peiwang062 @stefanjaspers @HaoZhang990127 @eware-godaddy @jy00161yang https://github.com/IDEA-Research/Grounded-Segment-Anything does what we are all looking after.

why not use output of SAM as bounding box?

narbhar · 2023-04-10T05:52:50Z

@peiwang062 @stefanjaspers @HaoZhang990127 @eware-godaddy @jy00161yang https://github.com/IDEA-Research/Grounded-Segment-Anything does what we are all looking after.

why not use output of SAM as bounding box?

The current version of SAM without of the CLIP text encoder only gives instances from image points or bounding boxes as prompts. Thus SAM's output is only instances without semantic information about the segmentation. With a text encoder you can correlate SAM's output with text such as an object of interest in an image. If you have a promptable text based object detector you can bridge this link for the time being thus SAM's output not generic instance segmentation anymore which is what Grounded Segment Anything is helping to do.

luca-medeiros · 2023-04-10T06:13:01Z

Put together a demo of grounded-segment-anything with radio for better testing.
I tested using clip, open-clip, and groundingdino. Groudingdino performs much better with a great performance. Less than 1 sec on a A100 for DINO+SAM. Maybe ill add the clip versions as well.
https://github.com/luca-medeiros/lang-segment-anything

alexw994 · 2023-04-10T06:25:06Z

@peiwang062 @stefanjaspers @HaoZhang990127 @eware-godaddy @jy00161yang https://github.com/IDEA-Research/Grounded-Segment-Anything does what we are all looking after.

why not use output of SAM as bounding box?

The current version of SAM without of the CLIP text encoder only gives instances from image points or bounding boxes as prompts. Thus SAM's output is only instances without semantic information about the segmentation. With a text encoder you can correlate SAM's output with text such as an object of interest in an image. If you have a promptable text based object detector you can bridge this link for the time being thus SAM's output not generic instance segmentation anymore which is what Grounded Segment Anything is helping to do.

If SAM just segment in the bounding box, I think many methods can be used for this also. Like BoxInstSeg https://github.com/LiWentomng/BoxInstSeg

peiwang062 · 2023-04-10T06:31:01Z

@peiwang062 @stefanjaspers @HaoZhang990127 @eware-godaddy @jy00161yang https://github.com/IDEA-Research/Grounded-Segment-Anything does what we are all looking after.

why not use output of SAM as bounding box?

The current version of SAM without of the CLIP text encoder only gives instances from image points or bounding boxes as prompts. Thus SAM's output is only instances without semantic information about the segmentation. With a text encoder you can correlate SAM's output with text such as an object of interest in an image. If you have a promptable text based object detector you can bridge this link for the time being thus SAM's output not generic instance segmentation anymore which is what Grounded Segment Anything is helping to do.

If SAM just segment in the bounding box, I think many methods can be used for this also. Like BoxInstSeg https://github.com/LiWentomng/BoxInstSeg

It should be able to support boxes, points, masks and text as prompts as the paper mentions, no?

nikolausWest · 2023-04-10T10:24:22Z

following

yash0307 · 2023-04-10T12:39:02Z

following

9p15p · 2023-04-11T12:43:06Z

Following

fyuf · 2023-04-11T17:33:46Z

Following

Zhangwenyao1 · 2023-04-12T15:22:39Z

following

Eli-YiLi · 2023-04-13T05:32:42Z

Our work can achieve text to mask with SAM:
https://github.com/xmed-lab/CLIP_Surgery

This is our work about CLIP's explainability. It's able to guide SAM to achieve text to mask without manual points.

Besides, it's very simple without any fine-tuning, using the CLIP model itself only.

Furthermore, it enhances many open-vocabulary tasks, like segmentation, multi-label classification, multimodal visualization.

This is the jupyter demo:
https://github.com/xmed-lab/CLIP_Surgery/blob/master/demo.ipynb

zaojiahua · 2023-04-17T04:04:41Z

following

FrancisDacian · 2023-04-18T02:11:12Z

can try the result using this explorer extension
https://chrome.google.com/webstore/detail/text-prompts-for-segment/jndfmkiclniflknfifngodjnmlibhjdo/related

ignoHH · 2023-04-21T14:13:30Z

following

bjccdsrlcr · 2023-04-24T06:52:02Z

following

mydcxiao · 2023-04-25T06:32:32Z

following

xuxiaoxxxx · 2023-05-04T06:50:16Z

+1

daminnock · 2023-05-19T04:18:26Z

following

Alice1820 · 2023-05-22T06:29:44Z

following

freshman97 · 2023-06-06T19:56:07Z

following

zhangjingxian1998 · 2023-07-25T07:19:58Z

waiting for it

N-one · 2023-09-15T03:56:29Z

following

moktsuiqin · 2023-09-22T06:54:00Z

following

nikhilaravi added the enhancement New feature or request label Apr 12, 2023

baibizhe mentioned this issue Jun 10, 2023

Thanks for wonderful word .Question about text prompt SHI-Labs/Matting-Anything#2

Closed

Taeyoung96 mentioned this issue Dec 27, 2023

Text as prompt robot-learning-freiburg/ros_sam#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text as prompts #93

Text as prompts #93

peiwang062 commented Apr 8, 2023

stefanjaspers commented Apr 8, 2023

darvilabtech commented Apr 9, 2023

HaoZhang990127 commented Apr 9, 2023

jy00161yang commented Apr 9, 2023

eware-godaddy commented Apr 9, 2023

0xbitches commented Apr 9, 2023

darvilabtech commented Apr 9, 2023 •

edited

Loading

peiwang062 commented Apr 10, 2023

alexw994 commented Apr 10, 2023

narbhar commented Apr 10, 2023

luca-medeiros commented Apr 10, 2023

alexw994 commented Apr 10, 2023 •

edited

Loading

peiwang062 commented Apr 10, 2023

nikolausWest commented Apr 10, 2023

yash0307 commented Apr 10, 2023

9p15p commented Apr 11, 2023

fyuf commented Apr 11, 2023

Zhangwenyao1 commented Apr 12, 2023

Eli-YiLi commented Apr 13, 2023

zaojiahua commented Apr 17, 2023

FrancisDacian commented Apr 18, 2023

ignoHH commented Apr 21, 2023

bjccdsrlcr commented Apr 24, 2023

mydcxiao commented Apr 25, 2023

xuxiaoxxxx commented May 4, 2023

daminnock commented May 19, 2023

Alice1820 commented May 22, 2023

freshman97 commented Jun 6, 2023

zhangjingxian1998 commented Jul 25, 2023

N-one commented Sep 15, 2023

moktsuiqin commented Sep 22, 2023

Text as prompts #93

Text as prompts #93

Comments

peiwang062 commented Apr 8, 2023

stefanjaspers commented Apr 8, 2023

darvilabtech commented Apr 9, 2023

HaoZhang990127 commented Apr 9, 2023

jy00161yang commented Apr 9, 2023

eware-godaddy commented Apr 9, 2023

0xbitches commented Apr 9, 2023

darvilabtech commented Apr 9, 2023 • edited Loading

peiwang062 commented Apr 10, 2023

alexw994 commented Apr 10, 2023

narbhar commented Apr 10, 2023

luca-medeiros commented Apr 10, 2023

alexw994 commented Apr 10, 2023 • edited Loading

peiwang062 commented Apr 10, 2023

nikolausWest commented Apr 10, 2023

yash0307 commented Apr 10, 2023

9p15p commented Apr 11, 2023

fyuf commented Apr 11, 2023

Zhangwenyao1 commented Apr 12, 2023

Eli-YiLi commented Apr 13, 2023

zaojiahua commented Apr 17, 2023

FrancisDacian commented Apr 18, 2023

ignoHH commented Apr 21, 2023

bjccdsrlcr commented Apr 24, 2023

mydcxiao commented Apr 25, 2023

xuxiaoxxxx commented May 4, 2023

daminnock commented May 19, 2023

Alice1820 commented May 22, 2023

freshman97 commented Jun 6, 2023

zhangjingxian1998 commented Jul 25, 2023

N-one commented Sep 15, 2023

moktsuiqin commented Sep 22, 2023

darvilabtech commented Apr 9, 2023 •

edited

Loading

alexw994 commented Apr 10, 2023 •

edited

Loading