Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZSL] Results doesn't match hugging face demo #44

Closed
ismailmaj opened this issue Aug 6, 2023 · 5 comments
Closed

[ZSL] Results doesn't match hugging face demo #44

ismailmaj opened this issue Aug 6, 2023 · 5 comments

Comments

@ismailmaj
Copy link

ismailmaj commented Aug 6, 2023

./bin/zsl -m ../../laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.f16.bin --image  ../pic.png --text "playing music" --text "playing sports"
clip_model_load: loading model from '../../laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.f16.bin' - please wait....................................................clip_model_load: model size =   288.93 MB / num tensors = 397
clip_model_load: model loaded

playing music = 0.5308
playing sports = 0.4692

Expected results:
playing music = 1.000
playing sports = 0.000
https://huggingface.co/laion/CLIP-ViT-B-32-laion2B-s34B-b79K
results

@mhmdjaouhari
Copy link

Seconded. I was about to post a similar issue.

The results are inaccurate a lot of the time. On some images it even gives inverted results, classifying X as Y and Y as X... Not clear why this is happening.

This library that has great potential, so any help is much appreciated, @monatis. Thank you!

@ismailmaj
Copy link
Author

I think it's because the tokenization strategy is different from HuggingFace CLIP tokenizer.

@monatis
Copy link
Owner

monatis commented Sep 14, 2023

Fixed in #56

@monatis monatis closed this as completed Sep 14, 2023
@mhmdjaouhari
Copy link

Thanks for the fix, @monatis! However, I'm still getting inaccurate results. For example, when trying to determine if it's a man or a woman, it almost always classifies women as men. Also, strangely enough, in some cases, the score of the text "man" is higher for some images of women than for some images of men! Please take a look at the example below:

Expectation:

27

Result:

$ ./build/bin/zsl -m ./ggml_openai_clip-vit-base-patch32/openai_clip-vit-base-patch32.ggmlv0.f16.bin --text woman --text man --image ./img/27.jpg

man = 0.9785
woman = 0.0215

Expectation:

29

Result:

$ ./build/bin/zsl -m ./ggml_openai_clip-vit-base-patch32/openai_clip-vit-base-patch32.ggmlv0.f16.bin --text woman --text man --image ./img/29.jpg

man = 0.9889
woman = 0.0111

Expectation:

32

Result:

$ ./build/bin/zsl -m ./ggml_openai_clip-vit-base-patch32/openai_clip-vit-base-patch32.ggmlv0.f16.bin --text woman --text man --image ./img/32.jpg

man = 0.9860
woman = 0.0140

As you can see in this example, the photo of the man got 0.9785 as the score for the text "man", while the 2 photos of women got 0.9889 and 0.9860, which is very weird.

@z3ugma
Copy link

z3ugma commented Oct 30, 2023

I will echo that I'm experiencing the same kind of bias when calling zsl with the classes man and woman, it overwhelmingly predicts man using CLIP-ViT-B-32-laion2B-s34B-b79K f16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants