-
Notifications
You must be signed in to change notification settings - Fork 0
Mamba
Some of us have started digging into Mamba (and it's predecessors (and successors))
BeeGass has started on a series of presentations on the Discord.
See https://github.com/sap-ient-ai/ssm
^ With fewer than 300 weights, it precompiles in 7 minutes on an A6000 and produces good output in 10 epochs (3 minutes) when trained on TinyShakespeare.
That's pretty amazing.
Below is summarized by GPT4. I don't like it and will manually redo at some point. (pi)
Mamba represents a significant advancement in sequence modeling by utilizing structured state spaces. It provides remarkable efficiency and performance improvements. The development of Mamba has been through a historical pathway of innovations, each contributing to its current capabilities. Below is an organized structure of Mamba's development, key innovations, and useful resources.
The development of Mamba is built upon several key innovations in sequence modeling:
- Based: Newer: TODO Research this: https://hazyresearch.stanford.edu/blog/2023-12-11-zoology2-based
- Mamba (S6): Current version, emphasizing linear-time processing and efficiency.
- S4 (Structured State Space Sequences): Predecessor to Mamba, focusing on structured state spaces for sequence modeling.
- HHH (Hungry Hungry Hippos): A significant step towards advanced language modeling with state space models.
- HiPPO: Fundamental theory for continuous-time sequence modeling.
- LMU (Legendre Memory Units): Improved efficiency and scaling compared to traditional models.
- Voelker's Contributions: Early work on improving spiking dynamical networks and introducing higher-order synapses.
- Contribution: Mamba introduces a linear-time algorithm for sequence modeling, significantly outperforming previous models in efficiency and scalability.
- Mamba on Arxiv
- GitHub Repository
- Training Code
S4 stands as a direct precursor to Mamba, introducing the concept of structured state spaces.
- Contribution: S4's structured approach allows for more efficient and effective sequence modeling, directly influencing Mamba's development.
- Annotated S4
- S4 GitHub Repository
- Gu's Dissertation on S4
- Review and summarize the key points from the "Beyond Attention" and "Mamba AI" videos for a deeper understanding of the current state and future potential of sequence modeling technologies.