Interaction between patches through a transpose may have a stronger role to play ? #2

rakshith291 · 2021-06-27T04:37:23Z

Hi, I was going through your exp report. You have made a point that since you are able to get a good performance without using attention layer so good performance of ViT may be more to do with it's embedding layer than attention .

But I believe, It's also may be to do with how you have established an interaction between patches through a transpose very similar to what was done in MLP-Mixer .

Would love to know your thoughts on this ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interaction between patches through a transpose may have a stronger role to play ? #2

Interaction between patches through a transpose may have a stronger role to play ? #2

rakshith291 commented Jun 27, 2021

Interaction between patches through a transpose may have a stronger role to play ? #2

Interaction between patches through a transpose may have a stronger role to play ? #2

Comments

rakshith291 commented Jun 27, 2021