[ZSL] Results doesn't match hugging face demo #44

ismailmaj · 2023-08-06T16:03:35Z

./bin/zsl -m ../../laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.f16.bin --image  ../pic.png --text "playing music" --text "playing sports"
clip_model_load: loading model from '../../laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.f16.bin' - please wait....................................................clip_model_load: model size =   288.93 MB / num tensors = 397
clip_model_load: model loaded

playing music = 0.5308
playing sports = 0.4692

Expected results:
playing music = 1.000
playing sports = 0.000
https://huggingface.co/laion/CLIP-ViT-B-32-laion2B-s34B-b79K

The text was updated successfully, but these errors were encountered:

mhmdjaouhari · 2023-08-06T21:09:07Z

Seconded. I was about to post a similar issue.

The results are inaccurate a lot of the time. On some images it even gives inverted results, classifying X as Y and Y as X... Not clear why this is happening.

This library that has great potential, so any help is much appreciated, @monatis. Thank you!

ismailmaj · 2023-08-06T23:28:20Z

I think it's because the tokenization strategy is different from HuggingFace CLIP tokenizer.

monatis · 2023-09-14T18:49:13Z

Fixed in #56

mhmdjaouhari · 2023-09-16T22:45:36Z

Thanks for the fix, @monatis! However, I'm still getting inaccurate results. For example, when trying to determine if it's a man or a woman, it almost always classifies women as men. Also, strangely enough, in some cases, the score of the text "man" is higher for some images of women than for some images of men! Please take a look at the example below:

Expectation:

Result:

$ ./build/bin/zsl -m ./ggml_openai_clip-vit-base-patch32/openai_clip-vit-base-patch32.ggmlv0.f16.bin --text woman --text man --image ./img/27.jpg

man = 0.9785
woman = 0.0215

Expectation:

Result:

$ ./build/bin/zsl -m ./ggml_openai_clip-vit-base-patch32/openai_clip-vit-base-patch32.ggmlv0.f16.bin --text woman --text man --image ./img/29.jpg

man = 0.9889
woman = 0.0111

Expectation:

Result:

$ ./build/bin/zsl -m ./ggml_openai_clip-vit-base-patch32/openai_clip-vit-base-patch32.ggmlv0.f16.bin --text woman --text man --image ./img/32.jpg

man = 0.9860
woman = 0.0140

As you can see in this example, the photo of the man got 0.9785 as the score for the text "man", while the 2 photos of women got 0.9889 and 0.9860, which is very weird.

z3ugma · 2023-10-30T18:00:30Z

I will echo that I'm experiencing the same kind of bias when calling zsl with the classes man and woman, it overwhelmingly predicts man using CLIP-ViT-B-32-laion2B-s34B-b79K f16

monatis mentioned this issue Sep 12, 2023

Improve zero-shot labeling #54

Closed

monatis closed this as completed Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ZSL] Results doesn't match hugging face demo #44

[ZSL] Results doesn't match hugging face demo #44

ismailmaj commented Aug 6, 2023 •

edited

Loading

mhmdjaouhari commented Aug 6, 2023

ismailmaj commented Aug 6, 2023

monatis commented Sep 14, 2023

mhmdjaouhari commented Sep 16, 2023

z3ugma commented Oct 30, 2023

[ZSL] Results doesn't match hugging face demo #44

[ZSL] Results doesn't match hugging face demo #44

Comments

ismailmaj commented Aug 6, 2023 • edited Loading

mhmdjaouhari commented Aug 6, 2023

ismailmaj commented Aug 6, 2023

monatis commented Sep 14, 2023

mhmdjaouhari commented Sep 16, 2023

Expectation:

Result:

Expectation:

Result:

Expectation:

Result:

z3ugma commented Oct 30, 2023

ismailmaj commented Aug 6, 2023 •

edited

Loading