🤖 MegaBlocks

MegaBlocks is a light-weight library for mixture-of-experts (MoE) training. The core of the system is efficient "dropless-MoE" (dMoE, paper) and standard MoE layers.

MegaBlocks is built on top of Megatron-LM, where we support data, expert and pipeline parallel training of MoEs. We're working on extending more frameworks to support MegaBlocks.

🚀 Performance

MegaBlocks dMoEs outperform MoEs trained with Tutel by up to 40% compared to Tutel's best performing capacity_factor configuration. MegaBlocks dMoEs use a reformulation of MoEs in terms of block-sparse operations, which allows us to avoid token dropping without sacrificing hardware efficiency. In addition to being faster, MegaBlocks simplifies MoE training by removing the capacity_factor hyperparameter alltogether. Compared to dense Transformers trained with Megatron-LM, MegaBlocks dMoEs can accelerate training by as much as 2.4x. Check out our paper for more details!

🏗️ Installation

Training models with Megatron-LM: We recommend using NGC's nvcr.io/nvidia/pytorch:23.01-py3 PyTorch container. The Dockerfile builds on this image with additional dependencies. To build the image, run docker build . -t megablocks-dev and then bash docker.sh to launch the container. Once inside the container, install MegaBlocks with pip install .. See Usage for instructions on training MoEs with MegaBlocks + Megatron-LM.

Using MegaBlocks in other packages: To install the MegaBlocks package for use in other frameworks, run pip install megablocks.

Note that the block-sparse kernels used to implement dMoE are currently limited to A100 GPUs.

🚂 Usage

We provide scripts for pre-training Transformer MoE and dMoE language models under the top-level directory. The quickest way to get started is to use one of the experiment launch scripts. These scripts require a dataset in Megatron-LM's format, which can be created by following their instructions.

✍️ Citation

@article{megablocks-arxiv,
  author    = {Trevor Gale and Deepak Narayanan and Cliff Young and Matei Zaharia},
  title     = {MegaBlocks: Efficient Sparse Training with Mixture-of-Experts},
  journal   = {CoRR},
  volume    = {abs/2211.15841},
  year      = {2022},
}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
csrc		csrc
exp		exp
media		media
megablocks		megablocks
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
docker.sh		docker.sh
requirements.txt		requirements.txt
setup.py		setup.py

License

EricSteinberger/megablocks

Folders and files

Latest commit

History

Repository files navigation

🤖 MegaBlocks

🚀 Performance

🏗️ Installation

🚂 Usage

✍️ Citation

About

Resources

License

Stars

Watchers

Forks

Languages