You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this paper, you confirm that the success of ViT does not come from the attention token mixer but from a general architecture defined as metaFormer. And the special thing is that in the paper you just need to replace attention to a super simple pooling operator and it gives a SOTA performance. So the question is what makes pooling competitive performance or even more than attention?
The text was updated successfully, but these errors were encountered:
Compared with ResNet, since the local spatial modeling ability of the pooling layer is much worse than the ResNet, the competitive performance of PoolFormer can only be attributed to its general architecture MetaFormer.
Compared with DeiT, the better performance of PoolFormer may result from the more local inductive bias of pooling.
In this paper, you confirm that the success of ViT does not come from the attention token mixer but from a general architecture defined as metaFormer. And the special thing is that in the paper you just need to replace attention to a super simple pooling operator and it gives a SOTA performance. So the question is what makes pooling competitive performance or even more than attention?
The text was updated successfully, but these errors were encountered: