Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what makes pooling competitive performance or even more than attention? #43

Closed
sudo1609 opened this issue Dec 13, 2022 · 1 comment
Closed

Comments

@sudo1609
Copy link

In this paper, you confirm that the success of ViT does not come from the attention token mixer but from a general architecture defined as metaFormer. And the special thing is that in the paper you just need to replace attention to a super simple pooling operator and it gives a SOTA performance. So the question is what makes pooling competitive performance or even more than attention?

@yuweihao
Copy link
Collaborator

Hi @TheK2NumberOne , thanks for your attention.

Model MetaFormer Token mixing ability Local inductive bias Params MACs Top-1 Acc
ResNet-50 No Strong More 26M 4.1G 79.8
PoolFormer-S24 Yes Weak More 21M 3.4G 80.3
DeiT-S (Transformer) Yes Strong Less 22M 4.6G 79.8
  1. Compared with ResNet, since the local spatial modeling ability of the pooling layer is much worse than the ResNet, the competitive performance of PoolFormer can only be attributed to its general architecture MetaFormer.
  2. Compared with DeiT, the better performance of PoolFormer may result from the more local inductive bias of pooling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants