using mamba as an headless "vanilla RNN" #38

theodorblackbird · 2023-12-08T19:29:39Z

Thank you for your amazing work,

I'm trying to use Mamba as an drop-in replacement of some RNN in a encoder/decoder architecture, for that I've turned off the logits head. How should I properly get the hidden state during inference ? I'm a bit lost in the decoding logic.

theodorblackbird · 2023-12-09T12:02:00Z

Ok, I was looking at the wrong place

JVP15 · 2023-12-29T17:50:10Z

I am also looking to use Mamba as a vanilla RNN, but I think I've been looking in the wrong places too for the decoding/hidden state logic. Where did you find it?

theodorblackbird · 2023-12-31T13:34:02Z

@JVP15
I ended up using it like this :
Instantiate an InferenceParams from utils.generation with seqlen_offset=1.
Then pass it to your MixerModel as a keyword argument inference_params, it caches everything automatically.

JVP15 · 2023-12-31T16:52:27Z

utils.generation, that is what I was missing, thank you!

theodorblackbird closed this as completed Dec 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using mamba as an headless "vanilla RNN" #38

using mamba as an headless "vanilla RNN" #38

theodorblackbird commented Dec 8, 2023

theodorblackbird commented Dec 9, 2023

JVP15 commented Dec 29, 2023

theodorblackbird commented Dec 31, 2023 •

edited

Loading

JVP15 commented Dec 31, 2023

using mamba as an headless "vanilla RNN" #38

using mamba as an headless "vanilla RNN" #38

Comments

theodorblackbird commented Dec 8, 2023

theodorblackbird commented Dec 9, 2023

JVP15 commented Dec 29, 2023

theodorblackbird commented Dec 31, 2023 • edited Loading

JVP15 commented Dec 31, 2023

theodorblackbird commented Dec 31, 2023 •

edited

Loading