Skip to content

Releases: hpcaitech/ColossalAI

Version v0.1.12 Release Today!

09 Dec 17:59
63fbba3
Compare
Choose a tag to compare

What's Changed

Zero

  • [zero] add L2 gradient clipping for ZeRO (#2112) by HELSON

Gemini

Hotfix

Colotensor

  • [ColoTensor] throw error when ColoInitContext meets meta parameter. (#2105) by Jiarui Fang

Autoparallel

Version

Pipeline middleware

  • [Pipeline Middleware] fix data race in Pipeline Scheduler for DAG (#2087) by Ziyue Jiang
  • [Pipeline Middleware] Adapt scheduler for Topo (#2066) by Ziyue Jiang

Fx

Example

Device

Test

Pipeline

Examples

Full Changelog: v0.1.12...v0.1.11rc5

Version v0.1.11rc5 Release Today!

30 Nov 16:26
d3499c9
Compare
Choose a tag to compare

What's Changed

Release

Cli

  • [cli] updated installation cheheck with more inforamtion (#2050) by Frank Lee

Gemini

Setup

Test

Hotfix

Zero

Testing

Rpc

Autoparallel

Fx

Github

Workflow

  • [workflow] removed unused pypi release workflow (#2022) by Frank Lee

Full Changelog: v0.1.11rc5...v0.1.11rc4

Version v0.1.11rc4 Release Today!

23 Nov 09:26
7242bff
Compare
Choose a tag to compare

What's Changed

Workflow

  • [workflow] fixed the python and cpu arch mismatch (#2010) by Frank Lee
  • [workflow] fixed the typo in condarc (#2006) by Frank Lee
  • [workflow] added conda cache and fixed no-compilation bug in release (#2005) by Frank Lee

Gemini

Autoparallel

Fx

Hotfix

Example

Kernel

  • [kernel] move all symlinks of kernel to colossalai._C (#1971) by ver217

Polish

Zero

Colotensor

  • [ColoTensor] reconfig ColoInitContext, decouple default_pg and default_dist_spec. (#1953) by Jiarui Fang
  • [ColoTensor] ColoInitContext initialize parameters in shard mode. (#1937) by Jiarui Fang

Tutorial

Tensorparallel

Sc demo

Sc

Full Changelog: v0.1.11rc4...v0.1.11rc3

Version v0.1.11rc3 Release Today!

13 Nov 07:37
b42b672
Compare
Choose a tag to compare

What's Changed

Release

Tutorial

Example

Sc

Nfc

  • [NFC] polish colossalai/amp/naive_amp/init.py code style (#1905) by Junming Wu
  • [NFC] remove redundant dependency (#1869) by binmakeswell
  • [NFC] polish .github/workflows/scripts/build_colossalai_wheel.py code style (#1856) by yuxuan-lou
  • [NFC] polish .github/workflows/scripts/generate_release_draft.py code style (#1855) by Ofey Chan
  • [NFC] polish workflows code style (#1854) by Kai Wang (Victor Kai)
  • [NFC] polish colossalai/amp/apex_amp/init.py code style (#1853) by LuGY
  • [NFC] polish .readthedocs.yaml code style (#1852) by nuszzh
  • [NFC] polish <.github/workflows/release_nightly.yml> code style (#1851) by RichardoLuo
  • [NFC] polish amp.naive_amp.grad_scaler code style by zbian
  • [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/operator_handler.py code style (#1845) by HELSON
  • [NFC] polish ./colossalai/amp/torch_amp/init.py code style (#1836) by Genghan Zhang
  • [NFC] polish .github/workflows/build.yml code style (#1837) by xyupeng
  • [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/conv_handler.py code style (#1829) by Sze-qq
  • [NFC] polish colossalai/amp/torch_amp/_grad_scaler.py code style (#1823) by Ziyue Jiang
  • [NFC] polish .github/workflows/release_docker.yml code style by Maruyama_Aya
  • [NFC] polish .github/workflows/submodule.yml code style (#1822) by shenggan
  • [NFC] polish .github/workflows/draft_github_release_post.yml code style (#1820) by Arsmart1
  • [NFC] polish colossalai/amp/naive_amp/_fp16_optimizer.py code style (#1819) by Fazzie-Maqianli
  • [NFC] polish colossalai/amp/naive_amp/_utils.py code style (#1816) by CsRic
  • [NFC] polish .github/workflows/build_gpu_8.yml code style (#1813) by Zangwei Zheng
  • [NFC] polish MANIFEST.in code style (#1814) by Zirui Zhu
  • [NFC] polish strategies_constructor.py code style (#1806) by binmakeswell

Doc

Zero

Autoparallel

Fx

Hotfix

Inference

  • [inference] overlap comm and compute in Linear1D_Row when stream_chunk_num > 1 (#1876) by Jiarui Fang
  • [inference] streaming Linear 1D Row inference (#1874) by Jiarui Fang

Amp

Diffusion

Utils

Full Changelog: v0.1.11rc3...v0.1.11rc2

Version v0.1.11rc2 Release Today!

08 Nov 14:44
4ac7d3e
Compare
Choose a tag to compare

What's Changed

Autoparallel

Kernel

Gemini

Checkpointio

  • [CheckpointIO] a uniform checkpoint I/O module (#1689) by ver217

Doc

Example

Nfc

  • [NFC] update gitignore remove DS_Store (#1830) by Jiarui Fang
  • [NFC] polish type hint for shape consistency (#1801) by Jiarui Fang
  • [NFC] polish tests/test_layers/test_3d/test_3d.py code style (#1740) by Ziheng Qin
  • [NFC] polish tests/test_layers/test_3d/checks_3d/common.py code style (#1733) by lucasliunju
  • [NFC] polish colossalai/nn/metric/_utils.py code style (#1727) by Sze-qq
  • [NFC] polish tests/test_layers/test_3d/checks_3d/check_layer_3d.py code style (#1731) by Xue Fuzhao
  • [NFC] polish tests/test_layers/test_sequence/checks_seq/check_layer_seq.py code style (#1723) by xyupeng
  • [NFC] polish accuracy_2d.py code style (#1719) by Ofey Chan
  • [NFC] polish .github/workflows/scripts/build_colossalai_wheel.py code style (#1721) by Arsmart1
  • [NFC] polish _checkpoint_hook.py code style (#1722) by LuGY
  • [NFC] polish test_2p5d/checks_2p5d/check_operation_2p5d.py code style (#1718) by Kai Wang (Victor Kai)
  • [NFC] polish colossalai/zero/sharded_param/init.py code style (#1717) by CsRic
  • [NFC] polish colossalai/nn/lr_scheduler/linear.py code style (#1716) by yuxuan-lou
  • [NFC] polish tests/test_layers/test_2d/checks_2d/check_operation_2d.py code style (#1715) by binmakeswell
  • [NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714) by shenggan

Fx

Hotfix

Pipeline

Ci

Compatibility

Feat

Fx/profiler

  • [fx/profiler] debug the fx.profiler / add an example test script for fx.profiler (#1730) by Super Daniel

Workflow

  • [workflow] handled the git directory ownership error (#1741) by Frank Lee

Full Changelog: v0.1.11rc2...v0.1.11rc1

Version v0.1.11rc1 Release Today!

19 Oct 03:49
d373e67
Compare
Choose a tag to compare

What's Changed

Hotfix

Release

Doc

Zero

  • [zero] add chunk init function for users (#1729) by HELSON
  • [zero] add constant placement policy (#1705) by HELSON

Pre-commit

Autoparallel

Fx/meta/rpc

  • [fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710) by Super Daniel

Embeddings

Unittest

  • [unittest] added doc for the pytest wrapper (#1704) by Frank Lee
  • [unittest] supported condititonal testing based on env var (#1701) by Frank Lee

Embedding

Fx/profiler

  • [fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 (#1679) by Super Daniel
  • [fx/profiler] provide a table of sum...
Read more

Version v0.1.10 Release Today!

08 Sep 10:03
b0f4c0b
Compare
Choose a tag to compare

What's Changed

Embedding

  • [embedding] cache_embedding small improvement (#1564) by CsRic
  • [embedding] polish parallel embedding tablewise (#1545) by Jiarui Fang
  • [embedding] freq_aware_embedding: add small functions for caller application (#1537) by CsRic
  • [embedding] fix a bug in table wise sharding (#1538) by Jiarui Fang
  • [embedding] tablewise sharding polish (#1535) by Jiarui Fang
  • [embedding] add tablewise sharding for FAW (#1526) by CsRic

Nfc

Pipeline/tuning

  • [pipeline/tuning] improve dispatch performance both time and space cost (#1544) by Kirigaya Kazuto

Fx

Autoparallel

Utils

  • [utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548) by ver217
  • [utils] optimize partition_tensor_parallel_state_dict (#1546) by ver217
  • [utils] Add use_reetrant=False in utils.activation_checkpoint (#1460) by Boyuan Yao
  • [utils] Impl clip_grad_norm for ColoTensor and ZeroOptimizer (#1442) by ver217

Hotfix

Pipeline/pipleline_process_group

  • [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508) by Kirigaya Kazuto

Doc

Autoparellel

Faw

Pipeline/rpc

  • [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy (#1497) by Kirigaya Kazuto
  • [pipeline/rpc] implement distributed optimizer | test with assert_close (#1486) by Kirigaya Kazuto
  • [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B (#1483) by Kirigaya Kazuto
  • [pipeline/rpc] implement a demo for PP with cuda rpc framework (#1470) by Kirigaya Kazuto

Tensor

Fce

  • [FCE] update interface for frequency statistics in FreqCacheEmbedding (#1462) by Geng Zhang

Workflow

  • [workflow] added TensorNVMe to compatibility test (#1449) by Frank Lee

Test

Engin/schedule

  • [engin/schedule] use p2p_v2 to ...
Read more

Version v0.1.9 Release Today!

11 Aug 13:16
74bee5f
Compare
Choose a tag to compare

What's Changed

Zero

  • [zero] add chunk_managerV2 for all-gather chunk (#1441) by HELSON
  • [zero] add chunk size searching algorithm for parameters in different groups (#1436) by HELSON
  • [zero] add has_inf_or_nan in AgChunk; enhance the unit test of AgChunk (#1426) by HELSON
  • [zero] add unit test for AgChunk's append, close, access (#1423) by HELSON
  • [zero] add AgChunk (#1417) by HELSON
  • [zero] ZeroDDP supports controlling outputs' dtype (#1399) by ver217
  • [zero] alleviate memory usage in ZeRODDP state_dict (#1398) by HELSON
  • [zero] chunk manager allows filtering ex-large params (#1393) by ver217
  • [zero] zero optim state_dict takes only_rank_0 (#1384) by ver217

Fx

Recommendation System

Global Tensor

Hotfix

  • [hotfix] zero optim prevents calling inner optim.zero_grad (#1422) by ver217
  • [hotfix] fix CPUAdam kernel nullptr (#1410) by ver217
  • [hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388) by HELSON
  • [hotfix] fix a running error in test_colo_checkpoint.py (#1387) by HELSON
  • [hotfix] fix some bugs during gpt2 testing (#1379) by YuliangLiu0306
  • [hotfix] fix zero optim save/load state dict (#1381) by ver217
  • [hotfix] fix zero ddp buffer cast (#1376) by ver217
  • [hotfix] fix no optimizer in save/load (#1363) by HELSON
  • [hotfix] fix megatron_init in test_gpt2.py (#1357) by HELSON
  • [hotfix] ZeroDDP use new process group (#1333) by ver217
  • [hotfix] shared model returns cpu state_dict (#1328) by ver217
  • [hotfix] fix ddp for unit test test_gpt2 (#1326) by HELSON
  • [hotfix] fix unit test test_module_spec (#1321) by HELSON
  • [hotfix] fix PipelineSharedModuleGradientHandler (#1314) by ver217
  • [hotfix] fix ColoTensor GPT2 unitest (#1309) by HELSON
  • [hotfix] add missing file (#1308) by Jiarui Fang
  • [hotfix] remove potiential circle import (#1307) by Jiarui Fang
  • [hotfix] skip some unittest due to CI environment. (#1301) by YuliangLiu0306
  • [hotfix] fix shape error in backward when using ColoTensor (#1298) by HELSON
  • [hotfix] Dist Mgr gather torch version (#1284) by Jiarui Fang

Communication

Device

Chunk

DDP

  • [DDP] test ddp state dict uses more strict threshold (#1382) by ver217

Checkpoint

  • [checkpoint] add kwargs for load_state_dict (#1374) by HELSON
  • [checkpoint] use args, kwargs in save_checkpoint, load_checkpoint (#1368) by HELSON
  • [checkpoint] sharded optim save/load grad scaler (#1350) by ver217
  • [checkpoint] use gather_tensor in checkpoint and update its unit test (#1339) by HELSON
  • [checkpoint] add ColoOptimizer checkpointing (#1316) by Jiarui Fang
  • [checkpoint] add test for bert and hotfix save bugs (#1297) by Jiarui Fang

Util

Nvme

  • [nvme] CPUAdam and HybridAdam support NVMe offload (#1360) by ver217

Colotensor

  • [colotensor] use cpu memory to store state_dict (#1367) by HELSON
  • [colotensor] add Tensor.view op and its unit test (#1343) by HELSON

Unit test

  • [unit test] add megatron init test in zero_optim (#1358) by HELSON

Docker

Doc

Refactor

  • [refactor] refactor ColoTensor's unit tests (#1340) by HELSON

Workflow

  • [workflow] update docker build workflow to use proxy (#1334) by Frank Lee
  • [workflow] update 8-gpu test to use torch 1.11 (#1332) by Frank Lee
  • [workflow] roll back to use torch 1.11 for unit testing (#1325) by Frank Lee
  • [workflow] fixed trigger condition for 8-gpu unit test (#1323) by Frank Lee
  • [workflow] updated release bdist workflow (#1318) by Frank Lee
  • [workflow] disable SHM for compatibility CI on rtx3080 (#1315) by Frank Lee
  • [workflow] updated pytorch compatibility test (#1311) by Frank Lee

Test

Read more

Version v0.1.8 Release Today!

12 Jul 16:10
7e8114a
Compare
Choose a tag to compare

What's Changed

Hotfix

Tensor

Fx

Rename

Checkpoint

Polish

  • [polish] polish repr for ColoTensor, DistSpec, ProcessGroup (#1235) by HELSON

Refactor

Context

Ddp

Colotensor

Zero

  • [zero] sharded optim supports loading local state dict (#1170) by ver217
  • [zero] zero optim supports loading local state dict (#1171) by ver217

Workflow

Gemini

Pipeline

Ci

  • [ci] added scripts to auto-generate release post text (#1142) by Frank Lee

Full Changelog: v0.1.8...v0.1.7

Version v0.1.7 Released Today

21 Jun 04:10
6690a61
Compare
Choose a tag to compare

Version v0.1.7 Released Today

Highlights

  • Started torch.fx for auto-parallel training
  • Update the zero mechanism with ColoTensor
  • Fixed various bugs

What's Changed

Hotfix

Zero

  • [zero] avoid zero hook spam by changing log to debug level (#1137) by Frank Lee
  • [zero] added error message to handle on-the-fly import of torch Module class (#1135) by Frank Lee
  • [zero] fixed api consistency (#1098) by Frank Lee
  • [zero] zero optim copy chunk rather than copy tensor (#1070) by ver217

Optim

Ddp

  • [ddp] add save/load state dict for ColoDDP (#1127) by ver217
  • [ddp] add set_params_to_ignore for ColoDDP (#1122) by ver217
  • [ddp] supported customized torch ddp configuration (#1123) by Frank Lee

Pipeline

Fx

Gemini

  • [gemini] gemini mgr supports "cpu" placement policy (#1118) by ver217
  • [gemini] zero supports gemini (#1093) by ver217

Test

Release

Tensor

Amp

  • [amp] included dict for type casting of model output (#1102) by Frank Lee

Workflow

Engine

Doc

  • [doc] added documentation to chunk and chunk manager (#1094) by Frank Lee

Context

Refactory

Cudnn

  • [cudnn] set False to cudnn benchmark by default (#1063) by Frank Lee

Full Changelog: v0.1.7...v0.1.6