expansion_factor on tokens is actually a bottleneck in original codebase #11

chazzmoney · 2022-02-14T19:52:00Z

Thanks for your implementation. In comparing your codebase to the author's implementation, I discovered that while you have a single expansion factor in your configuration, the authors have separate values - one for tokens and one for channels.

Specifically, their channels expansion factor is 4, but their tokens expansion factor is 0.5. (The hidden_dim is the base projection size). Note that they actually use a feature count, but I'm translating to the mechanism you use in this codebase.

Thus, when executing the MixerBlock, the tokens "expansion" is actually a bottleneck.

The parameters can be verified as well in Table 1 ("Specifications of Mixer Architectures") at the top of page 4 in version 4 (the current version as of Feb 14, 2022) of their paper.

I'm not suggesting that anything necessarily needs to change in your implementation. However, if you wanted to align your codebase to be able to fully replicate the author's work, you may consider allowing for two separate parameters - token_expansion_factor and channels_expansion_factor.

Thank you again for this work, and for all your contributions generally. You are a an incredible asset to the community.

lucidrains · 2022-02-17T05:07:58Z

@chazzmoney Hi Charles! You are correct and I have made the change in 0.1.0! https://github.com/lucidrains/mlp-mixer-pytorch/releases/tag/0.1.0

lucidrains closed this as completed Mar 15, 2022

zhaoyanlyu mentioned this issue Sep 11, 2023

Expansion factor choices #14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expansion_factor on tokens is actually a bottleneck in original codebase #11

expansion_factor on tokens is actually a bottleneck in original codebase #11

chazzmoney commented Feb 14, 2022 •

edited

lucidrains commented Feb 17, 2022

expansion_factor on tokens is actually a bottleneck in original codebase #11

expansion_factor on tokens is actually a bottleneck in original codebase #11

Comments

chazzmoney commented Feb 14, 2022 • edited

lucidrains commented Feb 17, 2022

chazzmoney commented Feb 14, 2022 •

edited