Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION]Mamba-2-hybrid Weights #864

Closed
Mooler0410 opened this issue Jun 13, 2024 · 4 comments
Closed

[QUESTION]Mamba-2-hybrid Weights #864

Mooler0410 opened this issue Jun 13, 2024 · 4 comments

Comments

@Mooler0410
Copy link

Your question
An Empirical Study of Mamba-based Language Models](https://github.com/NVIDIA/Megatron-LM/tree/ssm/examples/mamba)
Hi! I'm impressed by this work and cannot wait to try the new mamba-2-hybrid. This paper mentioned that the weights are released on Huggingface. But I cannot find any. Wondering have they been released? If yes, where can I download them?

Thanks a lot for your folks' contribution to the community!

@ruipeterpan
Copy link

I think the model weights are released here: https://huggingface.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c

@Mooler0410
Copy link
Author

I think the model weights are released here: https://huggingface.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c

Thanks! I've already found it. While when this question is posted, the weights haven't been set as public.

Now, I'm looking for the tokenizer🤣. To run the example, a tokenizer is required. But I cannot find any. Any idea about this?

@ruipeterpan
Copy link

I think the tokenizer path should point to the .model file in the huggingface repos. For example, I downloaded the mamba2-hybrid-8b-3t-4k repo from huggingface, and mamba2-hybrid-8b-3t-4k/mt_nlg_plus_multilingual_ja_zh_the_stack_frac_015_256k.model is the tokenizer. I'm running inference using run_text_gen_server_8b.sh and the checkpoint/tokenizer paths are

CHECKPOINT_PATH="/workspace/checkpoints/mamba2-hybrid-8b-3t-4k/"
TOKENIZER_PATH="/workspace/checkpoints/mamba2-hybrid-8b-3t-4k/mt_nlg_plus_multilingual_ja_zh_the_stack_frac_015_256k.model"

respectively.

@Mooler0410
Copy link
Author

Wow, thank you so much for your guidance! It took me hours to find something like a tokenizer.

Never used megatron before🙃. You did save my life!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants