Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performer Pytorch Slower than Expected and Please Help with Understanding Parameter Count #92

Open
michaelweihaosong opened this issue Dec 12, 2022 · 1 comment

Comments

@michaelweihaosong
Copy link

Hi,

First of all, this is a great package from lucidrains and I find it very helpful in my research.

A quick question is that I noticed ViT-performer is slower than the regular ViT from lucidrains. For example running on mnist from pytorch will take 15 sec/epoch for regular ViT with the configuration below while ViT performer takes 23 sec/epoch.

Checking the parameter count also shows ViT-performer has double the size of regular ViT.

Screen Shot 2022-12-12 at 11 32 41 PM

Screen Shot 2022-12-12 at 11 28 50 PM

I am hoping that someone has intuition about the speed of ViT performer vs regular ViT and their parameter counts.

Thank you very much in advance!

@michaelweihaosong
Copy link
Author

Just found out why model size is twice as big.

feed forward layer has a multiplier of 4 for the dimension, after adding ff_mult=1, it's the same size.

Screen Shot 2022-12-13 at 12 15 29 AM

However, performer is still slow compared to the regular ViT using torchvision.datasets.MNIST training set on RTX 3090

Regular ViT:
Average seconds for training 1 epoch: 15.101385951042175
Average seconds for testing: 0.6326647281646729

Performer ViT:
Average seconds for training 1 epoch: 28.795904541015624
Average seconds for testing: 0.9286866903305053

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant