stablemoe

Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"

Install Customized Fairseq

This project is developed based on the fairseq codebase. Before running the code, please first run the following command in the root directory of this project to install our customized fairseq:

pip install --user --editable .
python setup.py build_ext --inplace

Prepare Data

For language model, you can preprocess your own corpus with the fairseq-preprocess command. Please refer to the official example in fairseq for more details. If the corpus is too large, you may need to split it into shards for better efficiency.

Train a StableMoE Language Model

An example training command to train a 16-expert StableMoE model (with 16 GPUs) is as follows:

DATADIR=(the directory that saves the preprocessed data)
jobname=(the job name)
mkdir -p ../checkpoints/$jobname
python -m torch.distributed.launch \
    --nproc_per_node=16 \
    train.py $DATADIR \
    --task language_modeling \
    --save-dir ../checkpoints/$jobname \
    --arch transformer_lm_BaseGPT_x1_small \
    --moe-type base_layer \
    --two-stage-updates 6000 \
    --distill-assignment \
    --distilled-model wordemb \
    --distill-factor 0.3 \
    --criterion xentropy_aux \
    --balance-loss balance \
    --balance-factor 0.3 \
    --capacity-factor 2 \
    --assignment-algorithm GA \
    --share-decoder-input-output-embed \
    --optimizer adam \
    --adam-betas '(0.9, 0.98)' \
    --clip-norm 0.1 \
    --lr 0.0006 \
    --lr-scheduler polynomial_decay \
    --total-num-update 60000 \
    --warmup-updates 2000 \
    --tokens-per-sample 1024 \
    --sample-break-mode none \
    --batch-size 8 \
    --pad-to-fixed-length \
    --pad-to-fixed-bsz \
    --update-freq 4 \
    --max-update 60000 \
    --ddp-backend=legacy_ddp \
    --log-interval 100 \
    --log-file ../checkpoints/$jobname/log.txt \
    --log-format tqdm \
    --validate-interval-updates 500 \
    --save-interval 5 \
    --tensorboard-logdir ../tblogs/$jobname \
    --distributed-no-spawn \
    --fp16-no-flatten-grads \
    --fp16

Evaluate the Model

After training, an example command to evaluate a 16-expert StableMoE model is as follows:

python -m torch.distributed.launch \
    --nproc_per_node=16 \
    fairseq_cli/eval_lm.py $DATADIR \
    --path ../checkpoints/$jobname/checkpoint_best.pt \
    --tokens-per-sample 1024 \
    --batch-size 8 \
    --ddp-backend=legacy_ddp \
    --distributed-no-spawn \
    --fp16-no-flatten-grads \
    --fp16 \
    --save-each-rank

Citation

If you use this code for your research, please kindly cite our ACL-2022 paper:

@inproceedings{dai2022stable,
  author    = {Damai Dai and
               Li Dong and
               Shuming Ma and
               Bo Zheng and
               Zhifang Sui and
               Baobao Chang and
               Furu Wei},
  title     = {StableMoE: Stable Routing Strategy for Mixture of Experts},
  booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational
               Linguistics (Volume 1: Long Papers), {ACL} 2022, Dublin, Ireland,
               May 22-27, 2022},
  pages     = {7085--7095},
  year      = {2022},
}

Contact

Damai Dai: daidamai@pku.edu.cn Li Dong: lidong1@microsoft.com

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
build		build
docs		docs
examples		examples
fairseq.egg-info		fairseq.egg-info
fairseq		fairseq
fairseq_cli		fairseq_cli
scripts		scripts
tests		tests
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stablemoe

Install Customized Fairseq

Prepare Data

Train a StableMoE Language Model

Evaluate the Model

Citation

Contact

About

Releases

Packages

Languages

License

Hunter-DDM/stablemoe

Folders and files

Latest commit

History

Repository files navigation

stablemoe

Install Customized Fairseq

Prepare Data

Train a StableMoE Language Model

Evaluate the Model

Citation

Contact

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages