This repo is an unofficial implementation of the Jamba
model as introduced in Lieber et al., (2024). One can find the official project webpage here. This repo is developed mainly for didactic purposes to spell out the details of the how to hybridize SSM
with Transformers
.
- Put all the essential pieces together:
Mamba
,MoE
. - Add functioning training script (Lightning)
- Show some results
@article{lieber2024jamba,
title={Jamba: A Hybrid Transformer-Mamba Language Model},
author={Lieber, Opher and Lenz, Barak and Bata, Hofit and Cohen, Gal and Osin, Jhonathan and Dalmedigos, Itay and Safahi, Erez and Meirom, Shaked and Belinkov, Yonatan and Shalev-Shwartz, Shai and others},
journal={arXiv preprint arXiv:2403.19887},
year={2024}
}