Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mamba-ssm support #4830

Closed
Maykeye opened this issue Dec 6, 2023 · 6 comments
Closed

Add mamba-ssm support #4830

Maykeye opened this issue Dec 6, 2023 · 6 comments
Labels
enhancement New feature or request stale

Comments

@Maykeye
Copy link

Maykeye commented Dec 6, 2023

Description

Recently new SSM-based mamba was released trained on 300B tokens. It already has weights on HF. The issue is it's weights only. Repo has no tokenizer(uses neox), doesn't have custom modeling_mamba to use trust_remote_code with the standard loader.

So request it to add new loader mamba-ssm to be able to use it.

Additional Context

Example of generation from the official repo

@Maykeye Maykeye added the enhancement New feature or request label Dec 6, 2023
@IggoOnCode
Copy link
Contributor

I found a repo with basic support here: https://github.com/trap20/text-generation-webui/tree/mamba-ssm

Since the owner didn't make a PR here and the branch had merge conflicts, I just took the changes and manually merged them into the recent main.

Here is the pull request: #5228

@IggoOnCode
Copy link
Contributor

Training support has been added too.

@hchasens
Copy link

hchasens commented Mar 6, 2024

@IggoOnCode , do you have any plans to support this? I'd love to see Mamba make its way to text-gen-webui, or just a fork with support. With all the benefits is has over transformers I suspect it'll grow in popularity. Especially considering it's better efficiency in both space and compute a low parameter counts it'll likely be the best option for local LLMs and edge AI.

@IggoOnCode
Copy link
Contributor

@hchasens Luckily, I don't need to. At least not as stand-alone solution.

Mamba support got merged into transformers two days ago. I just tried the transformers main branch in text-generation-webui and inference of the demo mamba models from ArthurZ works out of the box. For the original mamba model from state-spaces I'm trying to find to the correct config now. Then I'll try training

After the update to the next transformer release text-generation-webui will get Mamba support.

@hchasens
Copy link

hchasens commented Mar 8, 2024

This is awesome news! Would you know if there were any API changes or will it work out of the box with text-gen?

Copy link

github-actions bot commented May 7, 2024

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

@github-actions github-actions bot closed this as completed May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

3 participants