New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to do multi-label classification with CLIP? #334
Comments
not sure if it would work but have you by any chance looked at using captions like |
I am attempting this now training on captions with multiple labels and then querying with single labels, and it works pretty badly compared to any normal multi-label classifier.
If I figure this out I will let you know. |
Take a look at this paper: I struggled with this problem for a while and this approach is working for me. |
@AmericanPresidentJimmyCarter did find a way to improve the multi-labelling performance? |
No, I just trained multilabel classifiers instead and those worked. |
You can do some sort of anti-text or placeholder text to do multi-label classification, ex: your objective is checking in there is the presence of "red" in an image of a dress, then use:
that will give you a probability distribution and you take the zero index |
How does that work? If the image contains neither your result will be essentially random. I think it only works if you have a multi-label classifier to identify a dress in the first place. |
The concrete use case is a as following. I have the classes baby, child, teen, adult. My idea was to use similarity between text and image features (for text features I used the prompt 'there is at least one (c) in the photo', c being one of the 4 classes).
I went through quite a lot of examples, but I am running into the issue that the similarity scores are often very different for a fixed class or/and classes that appear might have a very similar threshold (like baby and child). For similarity scores I use the cosine similarity multiplied by 2.5 to stretch the score into the interval [0, 1] as is done in the CLIP Score paper.
Setting a threshold in that sense doesn't seem possible.
Does anyone have an idea for that? I feel quite stuck here, how I should proceed.
The text was updated successfully, but these errors were encountered: