Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear probing or KNN evaluation of BEiT #419

Closed
JegZheng opened this issue Aug 30, 2021 · 11 comments
Closed

Linear probing or KNN evaluation of BEiT #419

JegZheng opened this issue Aug 30, 2021 · 11 comments
Assignees

Comments

@JegZheng
Copy link

Hello,

Thanks for the great work!
When I read the paper I found the results are fine-tuned end-to-end, and I am curious how BEiT performs on KNN evaluation or linear probing, like done with other pretraining methods such as contrastive learning.
I am recently playing with the BEiT models (both BEiT-base-16patch and BEiT-large-16patch) provided here and build knn and linear classification scripts following the experiment settings in the paper. I got the following results, which I think are a little bit weird:

For linear probing:
BEiT-large-16patch-224 18.28% (50 epochs, with mean pooling) | 18.19% (50 epochs, wo mean pooling)
BEiT-base-16patch-224 20.84% (50 epochs, with mean pooling) | 2.0% (100 epochs, wo mean pooling)

For KNN evaluation:
BEiT-large-16patch-224 top1/top5: 6.826, 13.744 (with mean pooling) | top1/top5: 2.804, 6.658 (wo mean pooling)
BEiT-base-16patch-224 top1/top5: 4.492, 9.88 (with mean pooling) | top1/top5: 2.236, 5.632 (wo mean pooling)

I guess there might be something wrong. Could you please help check if there are some issues, and if possible could you please provide some results or scripts for KNN and linear probing evaluation? Thanks!

@donglixp donglixp self-assigned this Aug 31, 2021
@donglixp
Copy link
Contributor

Table 9 in BEiT reports the numbers https://openreview.net/pdf?id=p-BhZSz59o4

@woctezuma
Copy link

woctezuma commented Oct 26, 2021

For reference:
Table 9

I think DINO concatenates features for this task. So I would expect its results to have the "*" at the end of the line. Maybe I am missing something here.

@donglixp
Copy link
Contributor

For reference: Table 9

I think DINO concatenates features for this task. So I would expect its results to have the "*" at the end of the line. Maybe I am missing something here.

@woctezuma Dino concatenates [cls] and averaged feature vectors, while iGPT concatenates different layers. We use the "*" to denote the concatenation of multiple layers in the table.

@LiweiPeng
Copy link

@JegZheng Regarding to your original post where linear probing accuracy is low(see below), did you get significantly higher numbers than below? If so, can you share what fixes you do? I am using my own dataset to do pretrain + linear-probing. The linear accuracy is under my expectation.

"For linear probing:
BEiT-large-16patch-224 18.28% (50 epochs, with mean pooling) | 18.19% (50 epochs, wo mean pooling)
BEiT-base-16patch-224 20.84% (50 epochs, with mean pooling) | 2.0% (100 epochs, wo mean pooling)
"

@JegZheng
Copy link
Author

@JegZheng Regarding to your original post where linear probing accuracy is low(see below), did you get significantly higher numbers than below? If so, can you share what fixes you do? I am using my own dataset to do pretrain + linear-probing. The linear accuracy is under my expectation.

"For linear probing: BEiT-large-16patch-224 18.28% (50 epochs, with mean pooling) | 18.19% (50 epochs, wo mean pooling) BEiT-base-16patch-224 20.84% (50 epochs, with mean pooling) | 2.0% (100 epochs, wo mean pooling) "

@LiweiPeng Sorry I did not fix that issue. The linear probing setting could vary when using different pertaining methods and different data. Maybe we can ask the authors to see if they have any suggestions.

@LiweiPeng
Copy link

@JegZheng Thanks for the quick response. Based on the above paper results, it seems low linear probing results are expected.

One technique the paper used to improve linear probing is to select the best layer(e.g. BEiT-base 9th layer in hte paper). How much difference could it be between the best layer and last layer?

@addf400
Copy link
Contributor

addf400 commented Mar 26, 2022

@LiweiPeng , @JegZheng We upload our linear probe code, you can have a try by this link
This code will train a linear classifier for each transformer layer in one forward pass, which can save many computing resources.

@LiweiPeng
Copy link

@addf400 Thanks for the quick response. This is very helpful.

@donglixp
Copy link
Contributor

@addf400 Awesome!

@JegZheng
Copy link
Author

@addf400 Thanks!

@leonsick
Copy link

Hey!
Seems like the linear probing problem was solved, but has anyone been able to solve k-NN classification problem?
I'm also getting super low numbers, similar to @JegZheng.
Can you maybe help here as well @addf400? Would be super appreciated!

Thanks in advance :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants