Is there a way to do multi-label classification with CLIP? #334

justlike-prog · 2023-01-02T10:34:55Z

The concrete use case is a as following. I have the classes baby, child, teen, adult. My idea was to use similarity between text and image features (for text features I used the prompt 'there is at least one (c) in the photo', c being one of the 4 classes).

I went through quite a lot of examples, but I am running into the issue that the similarity scores are often very different for a fixed class or/and classes that appear might have a very similar threshold (like baby and child). For similarity scores I use the cosine similarity multiplied by 2.5 to stretch the score into the interval [0, 1] as is done in the CLIP Score paper.

Setting a threshold in that sense doesn't seem possible.

Does anyone have an idea for that? I feel quite stuck here, how I should proceed.

mitchellnw · 2023-01-02T17:37:12Z

not sure if it would work but have you by any chance looked at using captions like "this is a photo of a ','.join(subset)" where subset iterates over all subsets of your current classes? so then you'd have 2^4 classes instead of 4

AmericanPresidentJimmyCarter · 2023-03-22T13:59:23Z

I am attempting this now training on captions with multiple labels and then querying with single labels, and it works pretty badly compared to any normal multi-label classifier.

{'f1': 0.08291136675917679, 'precision': 0.07481833065257353, 'recall': 0.10588978264912757}

If I figure this out I will let you know.

Msalehi237 · 2023-06-23T05:36:18Z

Take a look at this paper:
"DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations"

I struggled with this problem for a while and this approach is working for me.

travellingsasa · 2024-03-28T19:29:33Z

@AmericanPresidentJimmyCarter did find a way to improve the multi-labelling performance?

AmericanPresidentJimmyCarter · 2024-03-31T18:15:59Z

No, I just trained multilabel classifiers instead and those worked.

miguelalba96 · 2024-04-19T12:04:26Z

@travellingsasa

You can do some sort of anti-text or placeholder text to do multi-label classification, ex:

your objective is checking in there is the presence of "red" in an image of a dress, then use:

["a red dress", "a dress"]

that will give you a probability distribution and you take the zero index

AmericanPresidentJimmyCarter · 2024-04-20T20:21:45Z

@travellingsasa

You can do some sort of anti-text or placeholder text to do multi-label classification, ex:

your objective is checking in there is the presence of "red" in an image of a dress, then use:
["a red dress", "a dress"]
that will give you a probability distribution and you take the zero index

How does that work? If the image contains neither your result will be essentially random. I think it only works if you have a multi-label classifier to identify a dress in the first place.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to do multi-label classification with CLIP? #334

Is there a way to do multi-label classification with CLIP? #334

justlike-prog commented Jan 2, 2023

mitchellnw commented Jan 2, 2023

AmericanPresidentJimmyCarter commented Mar 22, 2023

Msalehi237 commented Jun 23, 2023

travellingsasa commented Mar 28, 2024

AmericanPresidentJimmyCarter commented Mar 31, 2024

miguelalba96 commented Apr 19, 2024

AmericanPresidentJimmyCarter commented Apr 20, 2024

Is there a way to do multi-label classification with CLIP? #334

Is there a way to do multi-label classification with CLIP? #334

Comments

justlike-prog commented Jan 2, 2023

mitchellnw commented Jan 2, 2023

AmericanPresidentJimmyCarter commented Mar 22, 2023

Msalehi237 commented Jun 23, 2023

travellingsasa commented Mar 28, 2024

AmericanPresidentJimmyCarter commented Mar 31, 2024

miguelalba96 commented Apr 19, 2024

AmericanPresidentJimmyCarter commented Apr 20, 2024