Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation bug when using GELU vs QuickGELU -- changes the results for some benchmarks #35

Open
bryant1410 opened this issue Dec 18, 2023 · 1 comment

Comments

@bryant1410
Copy link

bryant1410 commented Dec 18, 2023

Hey, I believe there's a bug in the evaluation. NegCLIP code is based on open_clip, which in turn supports both the original ViT-B/32 architecture, which uses QuickGELU (they name it ViT-B-32-quickgelu) and their "standard" one (ViT-B-32).

When you use their code, you need to specify the model (config) and pretrained checkpoint, where the pretrained checkpoint is either a name supported for the given model, or a path. They support "openai" checkpoint for both ViT-B-32 and ViT-B-32-quickgelu, because they hardcode this pretrained checkpoint name to change it to a QuickGELU implementation, regardless of which one of these two was used.

The problem is that ViT-B-32 also seems to have been used for evaluation (by specifying a path to pretrained instead of "openai"). However, this will make QuickGELU not to be used but GELU because the hardcoded if path won't be triggered. And this affects the results. This is an error-prone behavior from open_clip, in my humble opinion. The needed change to fix it would be to use ViT-B-32-quickgelu in the evaluation or use the flag --force-quick-gelu.

How do I know that you ran it this way for evaluation (that you ran into this bug)? Because: When I use GELU, I can reproduce your numbers from Table 6, but when I use QuickGELU, I get different numbers. I'm reproducing the numbers using a fork of open_clip and running my own evaluation of ARO, using the checkpoint you shared.

Numbers for NegCLIP:

VG-Attribution VG-Relation Flickr30k-Order COCO-Order
Reported results from Table 6 71.0 81.0 91.0 86.0
My evaluation using ViT-B-32 70.5 80.1 90.1 87.0
My evaluation using ViT-B-32-quickgelu 72.0 81.8 86.4 82.7

On top of this, the zero-shot results on other benchmarks significantly improve after fixing this bug:

CIFAR10 CIFAR100 ImageNet
My evaluation using ViT-B-32 85.9 60.9 55.8
My evaluation using ViT-B-32-quickgelu 88.9 62.9 61.0
My evaluation for CLIP 89.8 64.2 63.4

BTW, these results also match those reproduced by someone who commented in OpenReview and by #29.

As we can see, the numbers are much closer to the original pre-trained CLIP numbers when fixing this bug.


For the sake of completeness (and to see that my evaluation is consistent), I also reproduced it for the OpenAI-pretrained CLIP ViT-B/32 in ARO
VG-Attribution VG-Relation Flickr30k-Order COCO-Order
Reported results from Table 6 for CLIP 62.0 59.0 59.0 46.0
My evaluation for CLIP 63.4 59.9 59.2 47.7
@ytaek-oh
Copy link

ytaek-oh commented Mar 12, 2024

From the author's negclip folk, I found that the negclip model has been fine-tuned upon openai:ViT-B-32 model, where QuickGeLU is applied by default. So, in my opinion, to maintain the consistency between training and evaluation, the negclip model in this repo needs to be initialized with quick_gelu=True during evaluation, which is not set true due to open_clip's behavior.

I also put my own evaluation results on NegCLIP

Model GELU VG_rel VG_attr Flickr30k_Order COCO_Order Avg
NegCLIP-ViT-B-32 False 80.2 70.5 91.0 86.8 82.1
NegCLIP-ViT-B-32 True 81.8 72.1 87.1 82.6 80.9

In addition, aliged to your observation, setting quick_gelu=True is especially critical to the zero-shot recognition performance, for my own results as below.

Model GELU imagenet cifar10 cifar100
NegCLIP-ViT-B-32 False 55.7 85.9 61.2
NegCLIP-ViT-B-32 True 60.9 88.9 63.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants