Mamba-Chat 🐍

Mamba-Chat is the first chat language model based on a state-space model architecture, not a transformer.

The model is based on Albert Gu's and Tri Dao's work Mamba: Linear-Time Sequence Modeling with Selective State Spaces (paper) as well as their model implementation. This repository provides training / fine-tuning code for the model based on some modifications of the Huggingface Trainer class.

Mamba-Chat is based on Mamba-2.8B and was fine-tuned on 16,000 samples of the HuggingFaceH4/ultrachat_200k dataset. To learn more, you can:

Take a look at the model on Huggingface 🤗
Talk to us on the Haven Community Discord 🧑‍🤝‍🧑
Talk to Mamba-Chat on Google Colab

Run Mamba-Chat

We provide code for testing and fine-tuning our model. Here's how to get started and what you can do with it:

Clone repository and install dependencies:

git clone https://github.com/havenhq/mamba-chat.git
cd mamba-chat
pip install -r requirements.txt

Talk to Mamba-Chat (CLI chatbot):

python chat.py

Talk to Mamba-Chat (gradio app):

pip install gradio==4.8.0
python app.py --share

Fine-Tune Mamba (the base model) on a subset of the Ultrachat dataset:

python train_mamba.py --model state-spaces/mamba-2.8b --tokenizer EleutherAI/gpt-neox-20b --learning_rate 5e-5 --batch_size 4 --data_path ./data/ultrachat_small.jsonl --num_epochs 3

If you have a 24GB card (3090, 4090, etc.) you can use these settings:

python train_mamba.py --model state-spaces/mamba-2.8b --tokenizer EleutherAI/gpt-neox-20b --learning_rate 5e-5 --batch_size 1 --gradient_accumulation_steps 4 --optim paged_adamw_8bit --data_path ./data/ultrachat_small.jsonl --num_epochs 3

Citation

bibtex
@misc{haven2023mambachat,
  title        = {Mamba-Chat},
  author       = {Justus Mattern and Konstantin Hohr},
  year         = {2023},
  howpublished = {GitHub},
  url          = {https://github.com/havenhq/mamba-chat}
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data		data
scripts		scripts
trainer		trainer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
chat.py		chat.py
requirements.txt		requirements.txt
train_mamba.py		train_mamba.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mamba-Chat 🐍

Run Mamba-Chat

Citation

About

Releases

Packages

Contributors 4

Languages

License

redotvideo/mamba-chat

Folders and files

Latest commit

History

Repository files navigation

Mamba-Chat 🐍

Run Mamba-Chat

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages