You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I'm training a small Mamba2 model (~60m non-embedding parameters to start), and I'm doing some benchmarks before committing to larger runs. Any ideas why I'm seeing slow speeds with higher d_state values? I'm importing the Mamba2 blocks directly from this repo, and I've tested Mamba2 vs Mamba2Simple and haven't noticed any differences there.
I guess the higher d_state, the larger the model parameters.
Happy to see that you successfully runs the Mamba2
I meet many errors when installing mamba2.
Would you mind tell me your torch version, triton version, and causal_conv1d version?
I guess the higher d_state, the larger the model parameters. Happy to see that you successfully runs the Mamba2 I meet many errors when installing mamba2. Would you mind tell me your torch version, triton version, and causal_conv1d version?
Hi! I'm training a small Mamba2 model (~60m non-embedding parameters to start), and I'm doing some benchmarks before committing to larger runs. Any ideas why I'm seeing slow speeds with higher d_state values? I'm importing the Mamba2 blocks directly from this repo, and I've tested Mamba2 vs Mamba2Simple and haven't noticed any differences there.
With different d_state values, I get:
d_state=128: 65k tokens/device/second (58m non-embedding parameters total)
d_state=64: 100k tokens/device/second (56m non-embedding parameters total)
d_state=32: 150k tokens/device/second (53.7m non-embedding parameters total)
d_state=16: 195k tokens/device/second (52.9m non-embedding parameters total)
All experiments are running on H100s using DDP, and I'm using these other parameters:
n_layers: 16
vocab_size: 50280
embedding_size: 50304
d_conv: 4
expand: 2
ngroups: 1 (have also tried 8)
headdim: 64
In comparison, we get ~350k tokens/second/device with a 60m transformer model with the same codebase.
The text was updated successfully, but these errors were encountered: