Linear probing or KNN evaluation of BEiT #419

JegZheng · 2021-08-30T19:53:10Z

Hello,

Thanks for the great work!
When I read the paper I found the results are fine-tuned end-to-end, and I am curious how BEiT performs on KNN evaluation or linear probing, like done with other pretraining methods such as contrastive learning.
I am recently playing with the BEiT models (both BEiT-base-16patch and BEiT-large-16patch) provided here and build knn and linear classification scripts following the experiment settings in the paper. I got the following results, which I think are a little bit weird:

For linear probing:
BEiT-large-16patch-224 18.28% (50 epochs, with mean pooling) | 18.19% (50 epochs, wo mean pooling)
BEiT-base-16patch-224 20.84% (50 epochs, with mean pooling) | 2.0% (100 epochs, wo mean pooling)

For KNN evaluation:
BEiT-large-16patch-224 top1/top5: 6.826, 13.744 (with mean pooling) | top1/top5: 2.804, 6.658 (wo mean pooling)
BEiT-base-16patch-224 top1/top5: 4.492, 9.88 (with mean pooling) | top1/top5: 2.236, 5.632 (wo mean pooling)

I guess there might be something wrong. Could you please help check if there are some issues, and if possible could you please provide some results or scripts for KNN and linear probing evaluation? Thanks!

donglixp · 2021-10-26T06:06:29Z

Table 9 in BEiT reports the numbers https://openreview.net/pdf?id=p-BhZSz59o4

woctezuma · 2021-10-26T07:18:52Z

For reference:

I think DINO concatenates features for this task. So I would expect its results to have the "*" at the end of the line. Maybe I am missing something here.

donglixp · 2021-10-26T09:26:14Z

For reference:

I think DINO concatenates features for this task. So I would expect its results to have the "*" at the end of the line. Maybe I am missing something here.

@woctezuma Dino concatenates [cls] and averaged feature vectors, while iGPT concatenates different layers. We use the "*" to denote the concatenation of multiple layers in the table.

LiweiPeng · 2022-03-20T23:42:28Z

@JegZheng Regarding to your original post where linear probing accuracy is low(see below), did you get significantly higher numbers than below? If so, can you share what fixes you do? I am using my own dataset to do pretrain + linear-probing. The linear accuracy is under my expectation.

"For linear probing:
BEiT-large-16patch-224 18.28% (50 epochs, with mean pooling) | 18.19% (50 epochs, wo mean pooling)
BEiT-base-16patch-224 20.84% (50 epochs, with mean pooling) | 2.0% (100 epochs, wo mean pooling)
"

JegZheng · 2022-03-23T22:27:24Z

@JegZheng Regarding to your original post where linear probing accuracy is low(see below), did you get significantly higher numbers than below? If so, can you share what fixes you do? I am using my own dataset to do pretrain + linear-probing. The linear accuracy is under my expectation.

"For linear probing: BEiT-large-16patch-224 18.28% (50 epochs, with mean pooling) | 18.19% (50 epochs, wo mean pooling) BEiT-base-16patch-224 20.84% (50 epochs, with mean pooling) | 2.0% (100 epochs, wo mean pooling) "

@LiweiPeng Sorry I did not fix that issue. The linear probing setting could vary when using different pertaining methods and different data. Maybe we can ask the authors to see if they have any suggestions.

LiweiPeng · 2022-03-23T23:46:44Z

@JegZheng Thanks for the quick response. Based on the above paper results, it seems low linear probing results are expected.

One technique the paper used to improve linear probing is to select the best layer(e.g. BEiT-base 9th layer in hte paper). How much difference could it be between the best layer and last layer?

addf400 · 2022-03-26T13:01:05Z

@LiweiPeng , @JegZheng We upload our linear probe code, you can have a try by this link
This code will train a linear classifier for each transformer layer in one forward pass, which can save many computing resources.

LiweiPeng · 2022-03-26T14:30:43Z

@addf400 Thanks for the quick response. This is very helpful.

donglixp · 2022-03-26T15:12:58Z

@addf400 Awesome!

JegZheng · 2022-03-26T16:59:41Z

@addf400 Thanks!

leonsick · 2023-05-26T05:35:08Z

Hey!
Seems like the linear probing problem was solved, but has anyone been able to solve k-NN classification problem?
I'm also getting super low numbers, similar to @JegZheng.
Can you maybe help here as well @addf400? Would be super appreciated!

Thanks in advance :)

donglixp self-assigned this Aug 31, 2021

donglixp closed this as completed Oct 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linear probing or KNN evaluation of BEiT #419

Linear probing or KNN evaluation of BEiT #419

JegZheng commented Aug 30, 2021

donglixp commented Oct 26, 2021

woctezuma commented Oct 26, 2021 •

edited

Loading

donglixp commented Oct 26, 2021

LiweiPeng commented Mar 20, 2022

JegZheng commented Mar 23, 2022

LiweiPeng commented Mar 23, 2022

addf400 commented Mar 26, 2022

LiweiPeng commented Mar 26, 2022

donglixp commented Mar 26, 2022

JegZheng commented Mar 26, 2022

leonsick commented May 26, 2023

Linear probing or KNN evaluation of BEiT #419

Linear probing or KNN evaluation of BEiT #419

Comments

JegZheng commented Aug 30, 2021

donglixp commented Oct 26, 2021

woctezuma commented Oct 26, 2021 • edited Loading

donglixp commented Oct 26, 2021

LiweiPeng commented Mar 20, 2022

JegZheng commented Mar 23, 2022

LiweiPeng commented Mar 23, 2022

addf400 commented Mar 26, 2022

LiweiPeng commented Mar 26, 2022

donglixp commented Mar 26, 2022

JegZheng commented Mar 26, 2022

leonsick commented May 26, 2023

woctezuma commented Oct 26, 2021 •

edited

Loading