what makes pooling competitive performance or even more than attention? #43

sudo1609 · 2022-12-13T03:47:47Z

In this paper, you confirm that the success of ViT does not come from the attention token mixer but from a general architecture defined as metaFormer. And the special thing is that in the paper you just need to replace attention to a super simple pooling operator and it gives a SOTA performance. So the question is what makes pooling competitive performance or even more than attention?

yuweihao · 2022-12-14T06:39:41Z

Hi @TheK2NumberOne , thanks for your attention.

Model	MetaFormer	Token mixing ability	Local inductive bias	Params	MACs	Top-1 Acc
ResNet-50	No	Strong	More	26M	4.1G	79.8
PoolFormer-S24	Yes	Weak	More	21M	3.4G	80.3
DeiT-S (Transformer)	Yes	Strong	Less	22M	4.6G	79.8

Compared with ResNet, since the local spatial modeling ability of the pooling layer is much worse than the ResNet, the competitive performance of PoolFormer can only be attributed to its general architecture MetaFormer.
Compared with DeiT, the better performance of PoolFormer may result from the more local inductive bias of pooling.

yuweihao closed this as completed Dec 19, 2022

yuweihao mentioned this issue Apr 2, 2023

About poolformer as a tool for demonstration of MetaFormer #51

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what makes pooling competitive performance or even more than attention? #43

what makes pooling competitive performance or even more than attention? #43

sudo1609 commented Dec 13, 2022

yuweihao commented Dec 14, 2022

what makes pooling competitive performance or even more than attention? #43

what makes pooling competitive performance or even more than attention? #43

Comments

sudo1609 commented Dec 13, 2022

yuweihao commented Dec 14, 2022