Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expansion_factor on tokens is actually a bottleneck in original codebase #11

Closed
chazzmoney opened this issue Feb 14, 2022 · 1 comment
Closed

Comments

@chazzmoney
Copy link

chazzmoney commented Feb 14, 2022

Thanks for your implementation. In comparing your codebase to the author's implementation, I discovered that while you have a single expansion factor in your configuration, the authors have separate values - one for tokens and one for channels.

Specifically, their channels expansion factor is 4, but their tokens expansion factor is 0.5. (The hidden_dim is the base projection size). Note that they actually use a feature count, but I'm translating to the mechanism you use in this codebase.

Thus, when executing the MixerBlock, the tokens "expansion" is actually a bottleneck.

The parameters can be verified as well in Table 1 ("Specifications of Mixer Architectures") at the top of page 4 in version 4 (the current version as of Feb 14, 2022) of their paper.

I'm not suggesting that anything necessarily needs to change in your implementation. However, if you wanted to align your codebase to be able to fully replicate the author's work, you may consider allowing for two separate parameters - token_expansion_factor and channels_expansion_factor.

Thank you again for this work, and for all your contributions generally. You are a an incredible asset to the community.

@lucidrains
Copy link
Owner

@chazzmoney Hi Charles! You are correct and I have made the change in 0.1.0! https://github.com/lucidrains/mlp-mixer-pytorch/releases/tag/0.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants