Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to do multi-label classification with CLIP? #334

Open
justlike-prog opened this issue Jan 2, 2023 · 7 comments
Open

Is there a way to do multi-label classification with CLIP? #334

justlike-prog opened this issue Jan 2, 2023 · 7 comments

Comments

@justlike-prog
Copy link

The concrete use case is a as following. I have the classes baby, child, teen, adult. My idea was to use similarity between text and image features (for text features I used the prompt 'there is at least one (c) in the photo', c being one of the 4 classes).

I went through quite a lot of examples, but I am running into the issue that the similarity scores are often very different for a fixed class or/and classes that appear might have a very similar threshold (like baby and child). For similarity scores I use the cosine similarity multiplied by 2.5 to stretch the score into the interval [0, 1] as is done in the CLIP Score paper.

Setting a threshold in that sense doesn't seem possible.

Does anyone have an idea for that? I feel quite stuck here, how I should proceed.

@mitchellnw
Copy link
Contributor

not sure if it would work but have you by any chance looked at using captions like "this is a photo of a ','.join(subset)" where subset iterates over all subsets of your current classes? so then you'd have 2^4 classes instead of 4

@AmericanPresidentJimmyCarter

I am attempting this now training on captions with multiple labels and then querying with single labels, and it works pretty badly compared to any normal multi-label classifier.

{'f1': 0.08291136675917679, 'precision': 0.07481833065257353, 'recall': 0.10588978264912757}

If I figure this out I will let you know.

@Msalehi237
Copy link
Contributor

Take a look at this paper:
"DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations"

I struggled with this problem for a while and this approach is working for me.

@travellingsasa
Copy link

@AmericanPresidentJimmyCarter did find a way to improve the multi-labelling performance?

@AmericanPresidentJimmyCarter

No, I just trained multilabel classifiers instead and those worked.

@miguelalba96
Copy link

@travellingsasa

You can do some sort of anti-text or placeholder text to do multi-label classification, ex:

your objective is checking in there is the presence of "red" in an image of a dress, then use:

["a red dress", "a dress"]

that will give you a probability distribution and you take the zero index

@AmericanPresidentJimmyCarter

@travellingsasa

You can do some sort of anti-text or placeholder text to do multi-label classification, ex:

your objective is checking in there is the presence of "red" in an image of a dress, then use:

["a red dress", "a dress"]

that will give you a probability distribution and you take the zero index

How does that work? If the image contains neither your result will be essentially random. I think it only works if you have a multi-label classifier to identify a dress in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants