Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using mamba as an headless "vanilla RNN" #38

Closed
theodorblackbird opened this issue Dec 8, 2023 · 4 comments
Closed

using mamba as an headless "vanilla RNN" #38

theodorblackbird opened this issue Dec 8, 2023 · 4 comments

Comments

@theodorblackbird
Copy link

Thank you for your amazing work,

I'm trying to use Mamba as an drop-in replacement of some RNN in a encoder/decoder architecture, for that I've turned off the logits head. How should I properly get the hidden state during inference ? I'm a bit lost in the decoding logic.

@theodorblackbird
Copy link
Author

Ok, I was looking at the wrong place

@JVP15
Copy link

JVP15 commented Dec 29, 2023

I am also looking to use Mamba as a vanilla RNN, but I think I've been looking in the wrong places too for the decoding/hidden state logic. Where did you find it?

@theodorblackbird
Copy link
Author

theodorblackbird commented Dec 31, 2023

@JVP15
I ended up using it like this :
Instantiate an InferenceParams from utils.generation with seqlen_offset=1.
Then pass it to your MixerModel as a keyword argument inference_params, it caches everything automatically.

@JVP15
Copy link

JVP15 commented Dec 31, 2023

utils.generation, that is what I was missing, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants