v0.1.6 Released!
Main features
- ColoTensor supports hybrid parallel (tensor parallel and data parallel)
- ColoTensor supports ZeRO (with chunk)
- Config tensor parallel by module via ColoTensor
- ZeroInitContext and ShardedModelV2 support loading checkpoint and hugging face
from_pretrain()
What's Changed
ColoTensor
- [tensor] refactor colo-tensor by @ver217 in #992
- [tensor] refactor parallel action by @ver217 in #1007
- [tensor] impl ColoDDP for ColoTensor by @ver217 in #1009
- [Tensor] add module handler for linear by @Wesley-Jzy in #1021
- [Tensor] add module check and bert test by @Wesley-Jzy in #1031
- [Tensor] add Parameter inheritance for ColoParameter by @Wesley-Jzy in #1041
- [tensor] ColoTensor supports ZeRo by @ver217 in #1015
- [zero] add chunk size search for chunk manager by @ver217 in #1052
Zero
- [zero] add load_state_dict for sharded model by @ver217 in #894
- [zero] add zero optimizer for ColoTensor by @ver217 in #1046
Hotfix
- [hotfix] fix colo init context by @ver217 in #1026
- [hotfix] fix some bugs caused by size mismatch. by @YuliangLiu0306 in #1011
- [kernel] fixed the include bug in dropout kernel by @FrankLeeeee in #999
- fix typo in constants by @ryanrussell in #1027
- [engine] fixed bug in gradient accumulation dataloader to keep the last step by @FrankLeeeee in #1030
- [hotfix] fix dist spec mgr by @ver217 in #1045
- [hotfix] fix import error in sharded model v2 by @ver217 in #1053
Unit test
CI
- [ci] update the docker image name by @FrankLeeeee in #1017
- [ci] added nightly build (#1018) by @FrankLeeeee in #1019
- [ci] fixed nightly build workflow by @FrankLeeeee in #1022
- [ci] fixed nightly build workflow by @FrankLeeeee in #1029
- [ci] fixed nightly build workflow by @FrankLeeeee in #1040
CLI
- [cli] remove unused imports by @FrankLeeeee in #1001
Documentation
- Hotfix/format by @binmakeswell in #987
- [doc] update docker instruction by @FrankLeeeee in #1020
Misc
- [NFC] Hotfix/format by @binmakeswell in #984
- Revert "[NFC] Hotfix/format" by @ver217 in #986
- remove useless import in tensor dir by @feifeibear in #997
- [NFC] fix download link by @binmakeswell in #998
- [Bot] Synchronize Submodule References by @github-actions in #1003
- [NFC] polish colossalai/kernel/cuda_native/csrc/colossal_C_frontend.c… by @zhengzangw in #1010
- [NFC] fix paper link by @binmakeswell in #1012
- [p2p]add object list send/recv by @YuliangLiu0306 in #1024
- [Bot] Synchronize Submodule References by @github-actions in #1034
- [NFC] add inference by @binmakeswell in #1044
- [titans]remove model zoo by @YuliangLiu0306 in #1042
- [NFC] add inference submodule in path by @binmakeswell in #1047
- [release] update version.txt by @FrankLeeeee in #1048
- [Bot] Synchronize Submodule References by @github-actions in #1049
- updated collective ops api by @kurisusnowdeng in #1054
- [pipeline]refactor ppschedule to support tensor list by @YuliangLiu0306 in #1050
New Contributors
- @ryanrussell made their first contribution in #1027
Full Changelog: v0.1.5...v0.1.6