Releases · hpcaitech/ColossalAI

09 Dec 17:59

github-actions

v0.1.12

63fbba3

Version v0.1.12 Release Today!

What's Changed

Zero

[zero] add L2 gradient clipping for ZeRO (#2112) by HELSON

Gemini

[gemini] get the param visited order during runtime (#2108) by Jiarui Fang
[Gemini] NFC, polish search_chunk_configuration (#2107) by Jiarui Fang
[Gemini] gemini use the runtime memory tracer (RMT) (#2099) by Jiarui Fang
[Gemini] make RuntimeMemTracer work correctly (#2096) by Jiarui Fang
[Gemini] remove eval in gemini unittests! (#2092) by Jiarui Fang
[Gemini] remove GLOBAL_MODEL_DATA_TRACER (#2091) by Jiarui Fang
[Gemini] remove GLOBAL_CUDA_MEM_INFO (#2090) by Jiarui Fang
[Gemini] use MemStats in Runtime Memory tracer (#2088) by Jiarui Fang
[Gemini] use MemStats to store the tracing data. Seperate it from Collector. (#2084) by Jiarui Fang
[Gemini] remove static tracer (#2083) by Jiarui Fang
[Gemini] ParamOpHook -> ColoParamOpHook (#2080) by Jiarui Fang
[Gemini] polish runtime tracer tests (#2077) by Jiarui Fang
[Gemini] rename hooks related to runtime mem tracer (#2076) by Jiarui Fang
[Gemini] add albert in test models. (#2075) by Jiarui Fang
[Gemini] rename ParamTracerWrapper -> RuntimeMemTracer (#2073) by Jiarui Fang
[Gemini] remove not used MemtracerWrapper (#2072) by Jiarui Fang
[Gemini] fix grad unreleased issue and param recovery issue (#2052) by Zihao

Hotfix

[hotfix] fix a type in ColoInitContext (#2106) by Jiarui Fang
[hotfix] update test for latest version (#2060) by YuliangLiu0306
[hotfix] skip gpt tracing test (#2064) by YuliangLiu0306

Colotensor

[ColoTensor] throw error when ColoInitContext meets meta parameter. (#2105) by Jiarui Fang

Autoparallel

[autoparallel] support linear function bias addition (#2104) by YuliangLiu0306
[autoparallel] support addbmm computation (#2102) by YuliangLiu0306
[autoparallel] add sum handler (#2101) by YuliangLiu0306
[autoparallel] add bias addtion function class (#2098) by YuliangLiu0306
[autoparallel] complete gpt related module search (#2097) by YuliangLiu0306
[autoparallel]add embedding handler (#2089) by YuliangLiu0306
[autoparallel] add tensor constructor handler (#2082) by YuliangLiu0306
[autoparallel] add non_split linear strategy (#2078) by YuliangLiu0306
[autoparallel] Add F.conv metainfo (#2069) by Boyuan Yao
[autoparallel] complete gpt block searching (#2065) by YuliangLiu0306
[autoparallel] add binary elementwise metainfo for auto parallel (#2058) by Boyuan Yao
[autoparallel] fix forward memory calculation (#2062) by Boyuan Yao
[autoparallel] adapt solver with self attention (#2037) by YuliangLiu0306

Version

[version] 0.1.11rc5 -> 0.1.12 (#2103) by Jiarui Fang

Pipeline middleware

[Pipeline Middleware] fix data race in Pipeline Scheduler for DAG (#2087) by Ziyue Jiang
[Pipeline Middleware] Adapt scheduler for Topo (#2066) by Ziyue Jiang

Fx

[fx] An experimental version of ColoTracer.' (#2002) by Super Daniel

Example

[example] update GPT README (#2095) by ZijianYY

Device

[device] update flatten device mesh usage (#2079) by YuliangLiu0306

Test

[test] bert test in non-distributed way (#2074) by Jiarui Fang

Pipeline

[Pipeline] Add Topo Class (#2059) by Ziyue Jiang

Examples

[examples] update autoparallel demo (#2061) by YuliangLiu0306

Full Changelog: v0.1.12...v0.1.11rc5

Assets 2

30 Nov 16:26

github-actions

v0.1.11rc5

d3499c9

Version v0.1.11rc5 Release Today!

What's Changed

Release

[release] update to 0.1.11rc5 (#2053) by Frank Lee

Cli

[cli] updated installation cheheck with more inforamtion (#2050) by Frank Lee

Gemini

[gemini] fix init bugs for modules (#2047) by HELSON
[gemini] add arguments (#2046) by HELSON
[Gemini] free and allocate cuda memory by tensor.storage, add grad hook (#2040) by Zihao
[Gemini] more tests for Gemini (#2038) by Jiarui Fang
[Gemini] more rigorous unit tests for run_fwd_bwd (#2034) by Jiarui Fang
[Gemini] paramWrapper paramTracerHook unitest (#2030) by Zihao
[Gemini] patch for supporting orch.add_ function for ColoTensor (#2003) by Jiarui Fang
[gemini] param_trace_hook (#2020) by Zihao
[Gemini] add unitests to check gemini correctness (#2015) by Jiarui Fang
[Gemini] ParamMemHook (#2008) by Zihao
[Gemini] param_tracer_wrapper and test case (#2009) by Zihao

Setup

[setup] supported conda-installed torch (#2048) by Frank Lee

Test

[test] align model name with the file name. (#2045) by Jiarui Fang

Hotfix

[hotfix] hotfix Gemini for no leaf modules bug (#2043) by Jiarui Fang
[hotfix] add bert test for gemini fwd bwd (#2035) by Jiarui Fang
[hotfix] revert bug PRs (#2016) by Jiarui Fang

Zero

[zero] fix testing parameters (#2042) by HELSON
[zero] fix unit-tests (#2039) by HELSON
[zero] test gradient accumulation (#1964) by HELSON

Testing

[testing] fix testing models (#2036) by HELSON

Rpc

[rpc] split with dag (#2028) by Ziyue Jiang

Autoparallel

[autoparallel] add split handler (#2032) by YuliangLiu0306
[autoparallel] add experimental permute handler (#2029) by YuliangLiu0306
[autoparallel] add runtime pass and numerical test for view handler (#2018) by YuliangLiu0306
[autoparallel] add experimental view handler (#2011) by YuliangLiu0306
[autoparallel] mix gather (#1977) by Genghan Zhang

Fx

[fx]Split partition with DAG information (#2025) by Ziyue Jiang

Github

[GitHub] update issue template (#2023) by binmakeswell

Workflow

[workflow] removed unused pypi release workflow (#2022) by Frank Lee

Full Changelog: v0.1.11rc5...v0.1.11rc4

Assets 2

23 Nov 09:26

github-actions

v0.1.11rc4

7242bff

Version v0.1.11rc4 Release Today!

What's Changed

Workflow

[workflow] fixed the python and cpu arch mismatch (#2010) by Frank Lee
[workflow] fixed the typo in condarc (#2006) by Frank Lee
[workflow] added conda cache and fixed no-compilation bug in release (#2005) by Frank Lee

Gemini

[Gemini] add an inline_op_module to common test models and polish unitests. (#2004) by Jiarui Fang
[Gemini] open grad checkpoint when model building (#1984) by Jiarui Fang
[Gemini] add bert for MemtracerWrapper unintests (#1982) by Jiarui Fang
[Gemini] MemtracerWrapper unittests (#1981) by Jiarui Fang
[Gemini] memory trace hook (#1978) by Jiarui Fang
[Gemini] independent runtime tracer (#1974) by Jiarui Fang
[Gemini] ZeROHookV2 -> GeminiZeROHook (#1972) by Jiarui Fang
[Gemini] clean no used MemTraceOp (#1970) by Jiarui Fang
[Gemini] polish memstats collector (#1962) by Jiarui Fang
[Gemini] add GeminiAdamOptimizer (#1960) by Jiarui Fang

Autoparallel

[autoparallel] Add metainfo support for F.linear (#1987) by Boyuan Yao
[autoparallel] use pytree map style to process data (#1989) by YuliangLiu0306
[autoparallel] adapt handlers with attention block (#1990) by YuliangLiu0306
[autoparallel] support more flexible data type (#1967) by YuliangLiu0306
[autoparallel] add pooling metainfo (#1968) by Boyuan Yao
[autoparallel] support distributed dataloader option (#1906) by YuliangLiu0306
[autoparallel] Add alpha beta (#1973) by Genghan Zhang
[autoparallel] add torch.nn.ReLU metainfo (#1868) by Boyuan Yao
[autoparallel] support addmm in tracer and solver (#1961) by YuliangLiu0306
[autoparallel] remove redundancy comm node (#1893) by YuliangLiu0306

Fx

[fx] add more meta_registry for MetaTensor execution. (#2000) by Super Daniel

Hotfix

[hotfix] make Gemini work for conv DNN (#1998) by Jiarui Fang

Example

[example] add diffusion inference (#1986) by Fazzie-Maqianli
[example] enhance GPT demo (#1959) by Jiarui Fang
[example] add vit (#1942) by Jiarui Fang

Kernel

[kernel] move all symlinks of kernel to colossalai._C (#1971) by ver217

Polish

[polish] remove useless file _mem_tracer_hook.py (#1963) by Jiarui Fang

Zero

[zero] fix memory leak for zero2 (#1955) by HELSON

Colotensor

[ColoTensor] reconfig ColoInitContext, decouple default_pg and default_dist_spec. (#1953) by Jiarui Fang
[ColoTensor] ColoInitContext initialize parameters in shard mode. (#1937) by Jiarui Fang

Tutorial

[tutorial] polish all README (#1946) by binmakeswell
[tutorial] added missing dummy dataloader (#1944) by Frank Lee
[tutorial] fixed pipeline bug for sequence parallel (#1943) by Frank Lee

Tensorparallel

[tensorparallel] fixed tp layers (#1938) by アマデウス

Sc demo

[sc demo] add requirements to spmd README (#1941) by YuliangLiu0306

Sc

[SC] remove redundant hands on (#1939) by Boyuan Yao

Full Changelog: v0.1.11rc4...v0.1.11rc3

Assets 2

13 Nov 07:37

github-actions

v0.1.11rc3

b42b672

Version v0.1.11rc3 Release Today!

What's Changed

Release

[release] update version (#1931) by ver217

Tutorial

[tutorial] polish README and OPT files (#1930) by binmakeswell
[tutorial] add synthetic dataset for opt (#1924) by ver217
[tutorial] updated hybrid parallel readme (#1928) by Frank Lee
[tutorial] added synthetic data for sequence parallel (#1927) by Frank Lee
[tutorial] removed huggingface model warning (#1925) by Frank Lee
Hotfix/tutorial readme index (#1922) by Frank Lee
[tutorial] modify hands-on of auto activation checkpoint (#1920) by Boyuan Yao
[tutorial] added synthetic data for hybrid parallel (#1921) by Frank Lee
[tutorial] added synthetic data for hybrid parallel (#1919) by Frank Lee
[tutorial] added synthetic dataset for auto parallel demo (#1918) by Frank Lee
[tutorial] updated auto parallel demo with latest data path (#1917) by Frank Lee
[tutorial] added data script and updated readme (#1916) by Frank Lee
[tutorial] add cifar10 for diffusion (#1907) by binmakeswell
[tutorial] removed duplicated tutorials (#1904) by Frank Lee
[tutorial] edited hands-on practices (#1899) by BoxiangW

Example

[example] update auto_parallel img path (#1910) by binmakeswell
[example] add cifar10 dadaset for diffusion (#1902) by Fazzie-Maqianli
[example] migrate diffusion and auto_parallel hands-on (#1871) by binmakeswell
[example] initialize tutorial (#1865) by binmakeswell
Merge pull request #1842 from feifeibear/jiarui/polish by Fazzie-Maqianli
[example] polish diffusion readme by jiaruifang

Sc

[SC] add GPT example for auto checkpoint (#1889) by Boyuan Yao
[sc] add examples for auto checkpoint. (#1880) by Super Daniel

Nfc

[NFC] polish colossalai/amp/naive_amp/init.py code style (#1905) by Junming Wu
[NFC] remove redundant dependency (#1869) by binmakeswell
[NFC] polish .github/workflows/scripts/build_colossalai_wheel.py code style (#1856) by yuxuan-lou
[NFC] polish .github/workflows/scripts/generate_release_draft.py code style (#1855) by Ofey Chan
[NFC] polish workflows code style (#1854) by Kai Wang (Victor Kai)
[NFC] polish colossalai/amp/apex_amp/init.py code style (#1853) by LuGY
[NFC] polish .readthedocs.yaml code style (#1852) by nuszzh
[NFC] polish <.github/workflows/release_nightly.yml> code style (#1851) by RichardoLuo
[NFC] polish amp.naive_amp.grad_scaler code style by zbian
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/operator_handler.py code style (#1845) by HELSON
[NFC] polish ./colossalai/amp/torch_amp/init.py code style (#1836) by Genghan Zhang
[NFC] polish .github/workflows/build.yml code style (#1837) by xyupeng
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/conv_handler.py code style (#1829) by Sze-qq
[NFC] polish colossalai/amp/torch_amp/_grad_scaler.py code style (#1823) by Ziyue Jiang
[NFC] polish .github/workflows/release_docker.yml code style by Maruyama_Aya
[NFC] polish .github/workflows/submodule.yml code style (#1822) by shenggan
[NFC] polish .github/workflows/draft_github_release_post.yml code style (#1820) by Arsmart1
[NFC] polish colossalai/amp/naive_amp/_fp16_optimizer.py code style (#1819) by Fazzie-Maqianli
[NFC] polish colossalai/amp/naive_amp/_utils.py code style (#1816) by CsRic
[NFC] polish .github/workflows/build_gpu_8.yml code style (#1813) by Zangwei Zheng
[NFC] polish MANIFEST.in code style (#1814) by Zirui Zhu
[NFC] polish strategies_constructor.py code style (#1806) by binmakeswell

Doc

[doc] add news (#1901) by binmakeswell

Zero

[zero] migrate zero1&2 (#1878) by HELSON

Autoparallel

[autoparallel] user-friendly API for CheckpointSolver. (#1879) by Super Daniel
[autoparallel] fix linear logical convert issue (#1857) by YuliangLiu0306

Fx

[fx] metainfo_trace as an API. (#1873) by Super Daniel

Hotfix

[hotfix] pass test_complete_workflow (#1877) by Jiarui Fang

Inference

[inference] overlap comm and compute in Linear1D_Row when stream_chunk_num > 1 (#1876) by Jiarui Fang
[inference] streaming Linear 1D Row inference (#1874) by Jiarui Fang

Amp

[amp] add torch amp test (#1860) by xcnick

Diffusion

[diffusion] fix package conflicts (#1875) by HELSON

Utils

[utils] fixed lazy init context (#1867) by Frank Lee
[utils] remove lazy_memory_allocate from ColoInitContext (#1844) by Jiarui Fang

Full Changelog: v0.1.11rc3...v0.1.11rc2

Assets 2

4 Join discussion

08 Nov 14:44

github-actions

v0.1.11rc2

4ac7d3e

Version v0.1.11rc2 Release Today!

What's Changed

Autoparallel

[autoparallel] fix bugs caused by negative dim key (#1808) by YuliangLiu0306
[autoparallel] fix bias addition module (#1800) by YuliangLiu0306
[autoparallel] add batch norm metainfo (#1815) by Boyuan Yao
[autoparallel] add conv metainfo class for auto parallel (#1796) by Boyuan Yao
[autoparallel]add essential CommActions for broadcast oprands (#1793) by YuliangLiu0306
[autoparallel] refactor and add rotorc. (#1789) by Super Daniel
[autoparallel] add getattr handler (#1767) by YuliangLiu0306
[autoparallel] added matmul handler (#1763) by Frank Lee
[autoparallel] fix conv handler numerical test (#1771) by YuliangLiu0306
[autoparallel] move ckpt solvers to autoparallel folder / refactor code (#1764) by Super Daniel
[autoparallel] add numerical test for handlers (#1769) by YuliangLiu0306
[autoparallel] update CommSpec to CommActions (#1768) by YuliangLiu0306
[autoparallel] add numerical test for node strategies (#1760) by YuliangLiu0306
[autoparallel] refactor the runtime apply pass and add docstring to passes (#1757) by YuliangLiu0306
[autoparallel] added binary elementwise node handler (#1758) by Frank Lee
[autoparallel] fix param hook issue in transform pass (#1755) by YuliangLiu0306
[autoparallel] added addbmm handler (#1751) by Frank Lee
[autoparallel] shard param and buffer as expected (#1753) by YuliangLiu0306
[autoparallel] add sequential order to communication actions (#1735) by YuliangLiu0306
[autoparallel] recovered skipped test cases (#1748) by Frank Lee
[autoparallel] fixed wrong sharding strategy in conv handler (#1747) by Frank Lee
[autoparallel] fixed wrong generated strategy for dot op (#1746) by Frank Lee
[autoparallel] handled illegal sharding strategy in shape consistency (#1744) by Frank Lee
[autoparallel] handled illegal strategy in node handler (#1743) by Frank Lee
[autoparallel] handled illegal sharding strategy (#1728) by Frank Lee

Kernel

[kernel] added jit warmup (#1792) by アマデウス
[kernel] more flexible flashatt interface (#1804) by oahzxl
[kernel] skip tests of flash_attn and triton when they are not available (#1798) by Jiarui Fang

Gemini

[Gemini] make gemini usage simple (#1821) by Jiarui Fang

Checkpointio

[CheckpointIO] a uniform checkpoint I/O module (#1689) by ver217

Doc

[doc] polish diffusion README (#1840) by binmakeswell
[doc] remove obsolete API demo (#1833) by binmakeswell
[doc] add diffusion (#1827) by binmakeswell
[doc] add FastFold (#1766) by binmakeswell

Example

[example] remove useless readme in diffusion (#1831) by Jiarui Fang
[example] add TP to GPT example (#1828) by Jiarui Fang
[example] add stable diffuser (#1825) by Fazzie-Maqianli
[example] simplify the GPT2 huggingface example (#1826) by Jiarui Fang
[example] opt does not depend on Titans (#1811) by Jiarui Fang
[example] add GPT by Jiarui Fang
[example] add opt model in lauguage (#1809) by Jiarui Fang
[example] add diffusion to example (#1805) by Jiarui Fang

Nfc

[NFC] update gitignore remove DS_Store (#1830) by Jiarui Fang
[NFC] polish type hint for shape consistency (#1801) by Jiarui Fang
[NFC] polish tests/test_layers/test_3d/test_3d.py code style (#1740) by Ziheng Qin
[NFC] polish tests/test_layers/test_3d/checks_3d/common.py code style (#1733) by lucasliunju
[NFC] polish colossalai/nn/metric/_utils.py code style (#1727) by Sze-qq
[NFC] polish tests/test_layers/test_3d/checks_3d/check_layer_3d.py code style (#1731) by Xue Fuzhao
[NFC] polish tests/test_layers/test_sequence/checks_seq/check_layer_seq.py code style (#1723) by xyupeng
[NFC] polish accuracy_2d.py code style (#1719) by Ofey Chan
[NFC] polish .github/workflows/scripts/build_colossalai_wheel.py code style (#1721) by Arsmart1
[NFC] polish _checkpoint_hook.py code style (#1722) by LuGY
[NFC] polish test_2p5d/checks_2p5d/check_operation_2p5d.py code style (#1718) by Kai Wang (Victor Kai)
[NFC] polish colossalai/zero/sharded_param/init.py code style (#1717) by CsRic
[NFC] polish colossalai/nn/lr_scheduler/linear.py code style (#1716) by yuxuan-lou
[NFC] polish tests/test_layers/test_2d/checks_2d/check_operation_2d.py code style (#1715) by binmakeswell
[NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714) by shenggan

Fx

[fx] add a symbolic_trace api. (#1812) by Super Daniel
[fx] skip diffusers unitest if it is not installed (#1799) by Jiarui Fang
[fx] Add linear metainfo class for auto parallel (#1783) by Boyuan Yao
[fx] support module with bias addition (#1780) by YuliangLiu0306
[fx] refactor memory utils and extend shard utils. (#1754) by Super Daniel
[fx] test tracer on diffuser modules. (#1750) by Super Daniel

Hotfix

[hotfix] fix build error when torch version >= 1.13 (#1803) by xcnick
[hotfix] polish flash attention (#1802) by oahzxl
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786) by HELSON
[hotfix] polish chunk import (#1787) by Jiarui Fang
[hotfix] autoparallel unit test (#1752) by YuliangLiu0306

Pipeline

[Pipeline]Adapt to Pipelinable OPT (#1782) by Ziyue Jiang

Ci

[CI] downgrade fbgemm. (#1778) by Super Daniel

Compatibility

[compatibility] ChunkMgr import error (#1772) by Jiarui Fang

Feat

[feat] add flash attention (#1762) by oahzxl

Fx/profiler

[fx/profiler] debug the fx.profiler / add an example test script for fx.profiler (#1730) by Super Daniel

Workflow

[workflow] handled the git directory ownership error (#1741) by Frank Lee

Full Changelog: v0.1.11rc2...v0.1.11rc1

Assets 2

0 Join discussion

19 Oct 03:49

github-actions

v0.1.11rc1

d373e67

Version v0.1.11rc1 Release Today!

What's Changed

Hotfix

[hotfix] resharding cost issue (#1742) by YuliangLiu0306
[hotfix] solver bug caused by dict type comm cost (#1686) by YuliangLiu0306
[hotfix] fix wrong type name in profiler (#1678) by Boyuan Yao
[hotfix]unit test (#1670) by YuliangLiu0306
[hotfix] add recompile after graph manipulatation (#1621) by YuliangLiu0306
[hotfix] got sliced types (#1614) by YuliangLiu0306

Release

[release] update to v0.1.11 (#1736) by Frank Lee

Doc

[doc] update recommendation system catalogue (#1732) by binmakeswell
[doc] update recommedation system urls (#1725) by Jiarui Fang

Zero

[zero] add chunk init function for users (#1729) by HELSON
[zero] add constant placement policy (#1705) by HELSON

Pre-commit

[pre-commit] update pre-commit (#1726) by HELSON

Autoparallel

[autoparallel] runtime_backward_apply (#1720) by YuliangLiu0306
[autoparallel] moved tests to test_tensor_shard (#1713) by Frank Lee
[autoparallel] resnet block runtime apply (#1709) by YuliangLiu0306
[autoparallel] fixed broken node handler tests (#1708) by Frank Lee
[autoparallel] refactored the autoparallel module for organization (#1706) by Frank Lee
[autoparallel] adapt runtime passes (#1703) by YuliangLiu0306
[autoparallel] collated all deprecated files (#1700) by Frank Lee
[autoparallel] init new folder structure (#1696) by Frank Lee
[autoparallel] adapt solver and CostGraph with new handler (#1695) by YuliangLiu0306
[autoparallel] add output handler and placeholder handler (#1694) by YuliangLiu0306
[autoparallel] add pooling handler (#1690) by YuliangLiu0306
[autoparallel] where_handler_v2 (#1688) by YuliangLiu0306
[autoparallel] fix C version rotor inconsistency (#1691) by Boyuan Yao
[autoparallel] added sharding spec conversion for linear handler (#1687) by Frank Lee
[autoparallel] add reshape handler v2 and fix some previous bug (#1683) by YuliangLiu0306
[autoparallel] add unary element wise handler v2 (#1674) by YuliangLiu0306
[autoparallel] add following node generator (#1673) by YuliangLiu0306
[autoparallel] add layer norm handler v2 (#1671) by YuliangLiu0306
[autoparallel] fix insecure subprocess (#1680) by Boyuan Yao
[autoparallel] add rotor C version (#1658) by Boyuan Yao
[autoparallel] added utils for broadcast operation (#1665) by Frank Lee
[autoparallel] update CommSpec (#1667) by YuliangLiu0306
[autoparallel] added bias comm spec to matmul strategy (#1664) by Frank Lee
[autoparallel] add batch norm handler v2 (#1666) by YuliangLiu0306
[autoparallel] remove no strategy nodes (#1652) by YuliangLiu0306
[autoparallel] added compute resharding costs for node handler (#1662) by Frank Lee
[autoparallel] added new strategy constructor template (#1661) by Frank Lee
[autoparallel] added node handler for bmm (#1655) by Frank Lee
[autoparallel] add conv handler v2 (#1663) by YuliangLiu0306
[autoparallel] adapt solver with gpt (#1653) by YuliangLiu0306
[autoparallel] implemented all matmul strategy generator (#1650) by Frank Lee
[autoparallel] change the following nodes strategies generation logic (#1636) by YuliangLiu0306
[autoparallel] where handler (#1651) by YuliangLiu0306
[autoparallel] implemented linear projection strategy generator (#1639) by Frank Lee
[autoparallel] adapt solver with mlp (#1638) by YuliangLiu0306
[autoparallel] Add pofo sequence annotation (#1637) by Boyuan Yao
[autoparallel] add elementwise handler (#1622) by YuliangLiu0306
[autoparallel] add embedding handler (#1620) by YuliangLiu0306
[autoparallel] protect bcast handler from invalid strategies (#1631) by YuliangLiu0306
[autoparallel] add layernorm handler (#1629) by YuliangLiu0306
[autoparallel] recover the merged node strategy index (#1613) by YuliangLiu0306
[autoparallel] added new linear module handler (#1616) by Frank Lee
[autoparallel] added new node handler (#1612) by Frank Lee
[autoparallel]add bcast matmul strategies (#1605) by YuliangLiu0306
[autoparallel] refactored the data structure for sharding strategy (#1610) by Frank Lee
[autoparallel] add bcast op handler (#1600) by YuliangLiu0306
[autoparallel] added all non-bcast matmul strategies (#1603) by Frank Lee
[autoparallel] added strategy generator and bmm strategies (#1602) by Frank Lee
[autoparallel] add reshape handler (#1594) by YuliangLiu0306
[autoparallel] refactored shape consistency to remove redundancy (#1591) by Frank Lee
[autoparallel] add resnet autoparallel unit test and add backward weight communication cost (#1589) by YuliangLiu0306
[autoparallel] added generate_sharding_spec to utils (#1590) by Frank Lee
[autoparallel] added solver option dataclass (#1588) by Frank Lee
[autoparallel] adapt solver with resnet (#1583) by YuliangLiu0306

Fx/meta/rpc

[fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710) by Super Daniel

Embeddings

[embeddings] add doc in readme (#1711) by Jiarui Fang
[embeddings] more detailed timer (#1692) by Jiarui Fang
[embeddings] cache option (#1635) by Jiarui Fang
[embeddings] use cache_ratio instead of cuda_row_num (#1611) by Jiarui Fang
[embeddings] add already_split_along_rank flag for tablewise mode (#1584) by CsRic

Unittest

[unittest] added doc for the pytest wrapper (#1704) by Frank Lee
[unittest] supported condititonal testing based on env var (#1701) by Frank Lee

Embedding

[embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699) by Jiarui Fang
[embedding] polish async copy (#1657) by Jiarui Fang
[embedding] add more detail profiling (#1656) by Jiarui Fang
[embedding] print profiling results (#1654) by Jiarui Fang
[embedding] non-blocking cpu-gpu copy (#1647) by Jiarui Fang
[embedding] isolate cache_op from forward (#1645) by CsRic
[embedding] rollback for better FAW performance (#1625) by Jiarui Fang
[embedding] updates some default parameters by Jiarui Fang

Fx/profiler

[fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 (#1679) by Super Daniel
[fx/profiler] provide a table of sum...

Assets 2

08 Sep 10:03

github-actions

v0.1.10

b0f4c0b

Version v0.1.10 Release Today!

What's Changed

Embedding

[embedding] cache_embedding small improvement (#1564) by CsRic
[embedding] polish parallel embedding tablewise (#1545) by Jiarui Fang
[embedding] freq_aware_embedding: add small functions for caller application (#1537) by CsRic
[embedding] fix a bug in table wise sharding (#1538) by Jiarui Fang
[embedding] tablewise sharding polish (#1535) by Jiarui Fang
[embedding] add tablewise sharding for FAW (#1526) by CsRic

Nfc

[NFC] polish test component gpt code style (#1567) by アマデウス
[NFC] polish doc style for ColoTensor (#1457) by Jiarui Fang
[NFC] global vars should be upper case (#1456) by Jiarui Fang

Pipeline/tuning

[pipeline/tuning] improve dispatch performance both time and space cost (#1544) by Kirigaya Kazuto

Fx

[fx] provide a stable but not accurate enough version of profiler. (#1547) by Super Daniel
[fx] Add common node in model linearize (#1542) by Boyuan Yao
[fx] support meta tracing for aten level computation graphs like functorch. (#1536) by Super Daniel
[fx] Modify solver linearize and add corresponding test (#1531) by Boyuan Yao
[fx] add test for meta tensor. (#1527) by Super Daniel
[fx]patch nn.functional convolution (#1528) by YuliangLiu0306
[fx] Fix wrong index in annotation and minimal flops in ckpt solver (#1521) by Boyuan Yao
[fx] hack torch_dispatch for meta tensor and autograd. (#1515) by Super Daniel
[fx] Fix activation codegen dealing with checkpointing first op (#1510) by Boyuan Yao
[fx] fix the discretize bug (#1506) by Boyuan Yao
[fx] fix wrong variable name in solver rotor (#1502) by Boyuan Yao
[fx] Add activation checkpoint solver rotor (#1496) by Boyuan Yao
[fx] add more op patches for profiler and error message for unsupported ops. (#1495) by Super Daniel
[fx] fixed adapative pooling size concatenation error (#1489) by Frank Lee
[fx] add profiler for fx nodes. (#1480) by Super Daniel
[fx] Fix ckpt functions' definitions in forward (#1476) by Boyuan Yao
[fx] fix MetaInfoProp for incorrect calculations and add detections for inplace op. (#1466) by Super Daniel
[fx] add rules to linearize computation graphs for searching. (#1461) by Super Daniel
[fx] Add use_reentrant=False to checkpoint in codegen (#1463) by Boyuan Yao
[fx] fix test and algorithm bugs in activation checkpointing. (#1451) by Super Daniel
[fx] Use colossalai checkpoint and add offload recognition in codegen (#1439) by Boyuan Yao
[fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174. (#1446) by Super Daniel

Autoparallel

[autoparallel]add backward cost info into strategies (#1524) by YuliangLiu0306
[autoparallel] support fucntion in operator handler (#1529) by YuliangLiu0306
[autoparallel] change the merge node logic (#1533) by YuliangLiu0306
[autoparallel] added liveness analysis (#1516) by Frank Lee
[autoparallel] add more sharding strategies to conv (#1487) by YuliangLiu0306
[autoparallel] add cost graph class (#1481) by YuliangLiu0306
[autoparallel] added namespace constraints (#1490) by Frank Lee
[autoparallel] integrate auto parallel with torch fx (#1479) by Frank Lee
[autoparallel] added dot handler (#1475) by Frank Lee
[autoparallel] introduced baseclass for op handler and reduced code redundancy (#1471) by Frank Lee
[autoparallel] standardize the code structure (#1469) by Frank Lee
[autoparallel] Add conv handler to generate strategies and costs info for conv (#1467) by YuliangLiu0306

Utils

[utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548) by ver217
[utils] optimize partition_tensor_parallel_state_dict (#1546) by ver217
[utils] Add use_reetrant=False in utils.activation_checkpoint (#1460) by Boyuan Yao
[utils] Impl clip_grad_norm for ColoTensor and ZeroOptimizer (#1442) by ver217

Hotfix

[hotfix] change namespace for meta_trace. (#1541) by Super Daniel
[hotfix] fix init context (#1543) by ver217
[hotfix] avoid conflict of meta registry with torch 1.13.0. (#1530) by Super Daniel
[hotfix] fix coloproxy typos. (#1519) by Super Daniel

Pipeline/pipleline_process_group

[pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508) by Kirigaya Kazuto

Doc

[doc] docstring for FreqAwareEmbeddingBag (#1525) by Jiarui Fang
[doc] update readme with the new xTrimoMultimer project (#1477) by Sze-qq
[doc] update docstring in ProcessGroup (#1468) by Jiarui Fang
[Doc] add more doc for ColoTensor. (#1458) by Jiarui Fang

Autoparellel

[autoparellel]add strategies constructor (#1505) by YuliangLiu0306

Faw

[FAW] cpu caching operations (#1520) by Jiarui Fang
[FAW] refactor reorder() for CachedParamMgr (#1514) by Jiarui Fang
[FAW] LFU initialize with dataset freq (#1513) by Jiarui Fang
[FAW] shrink freq_cnter size (#1509) by CsRic
[FAW] remove code related to chunk (#1501) by Jiarui Fang
[FAW] add more docs and fix a warning (#1500) by Jiarui Fang
[FAW] FAW embedding use LRU as eviction strategy intialized with dataset stats (#1494) by CsRic
[FAW] LFU cache for the FAW by CsRic
[FAW] init an LFU implementation for FAW (#1488) by Jiarui Fang
[FAW] reorganize the inheritance struct of FreqCacheEmbedding (#1448) by Geng Zhang

Pipeline/rpc

[pipeline/rpc] update outstanding mechanism | optimize dispatching strategy (#1497) by Kirigaya Kazuto
[pipeline/rpc] implement distributed optimizer | test with assert_close (#1486) by Kirigaya Kazuto
[pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B (#1483) by Kirigaya Kazuto
[pipeline/rpc] implement a demo for PP with cuda rpc framework (#1470) by Kirigaya Kazuto

Tensor

[tensor]add 1D device mesh (#1492) by YuliangLiu0306
[tensor] support runtime ShardingSpec apply (#1453) by YuliangLiu0306
[tensor] shape consistency generate transform path and communication cost (#1435) by YuliangLiu0306
[tensor] added linear implementation for the new sharding spec (#1416) by Frank Lee

Fce

[FCE] update interface for frequency statistics in FreqCacheEmbedding (#1462) by Geng Zhang

Workflow

[workflow] added TensorNVMe to compatibility test (#1449) by Frank Lee

Test

[test] fixed the activation codegen test (#1447) by Frank Lee

Engin/schedule

[engin/schedule] use p2p_v2 to ...

Assets 2

0 Join discussion

11 Aug 13:16

github-actions

v0.1.9

74bee5f

Version v0.1.9 Release Today!

What's Changed

Zero

[zero] add chunk_managerV2 for all-gather chunk (#1441) by HELSON
[zero] add chunk size searching algorithm for parameters in different groups (#1436) by HELSON
[zero] add has_inf_or_nan in AgChunk; enhance the unit test of AgChunk (#1426) by HELSON
[zero] add unit test for AgChunk's append, close, access (#1423) by HELSON
[zero] add AgChunk (#1417) by HELSON
[zero] ZeroDDP supports controlling outputs' dtype (#1399) by ver217
[zero] alleviate memory usage in ZeRODDP state_dict (#1398) by HELSON
[zero] chunk manager allows filtering ex-large params (#1393) by ver217
[zero] zero optim state_dict takes only_rank_0 (#1384) by ver217

Fx

[fx] add vanilla activation checkpoint search with test on resnet and densenet (#1433) by Super Daniel
[fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages (#1425) by Super Daniel
[fx] fixed torchaudio conformer tracing (#1392) by Frank Lee
[fx] patched torch.max and data movement operator (#1391) by Frank Lee
[fx] fixed indentation error in checkpointing codegen (#1385) by Frank Lee
[fx] patched torch.full for huggingface opt (#1386) by Frank Lee
[fx] update split module pass and add customized policy (#1373) by YuliangLiu0306
[fx] add torchaudio test (#1369) by Super Daniel
[fx] Add colotracer compatibility test on torchrec (#1370) by Boyuan Yao
[fx]add gpt2 passes for pipeline performance test (#1366) by YuliangLiu0306
[fx] added activation checkpoint codegen support for torch < 1.12 (#1359) by Frank Lee
[fx] added activation checkpoint codegen (#1355) by Frank Lee
[fx] fixed apex normalization patch exception (#1352) by Frank Lee
[fx] added activation checkpointing annotation (#1349) by Frank Lee
[fx] update MetaInforProp pass to process more complex node.meta (#1344) by YuliangLiu0306
[fx] refactor tracer to trace complete graph (#1342) by YuliangLiu0306
[fx] tested the complete workflow for auto-parallel (#1336) by Frank Lee
[fx]refactor tracer (#1335) by YuliangLiu0306
[fx] recovered skipped pipeline tests (#1338) by Frank Lee
[fx] fixed compatiblity issue with torch 1.10 (#1331) by Frank Lee
[fx] fixed unit tests for torch 1.12 (#1327) by Frank Lee
[fx] add balanced policy v2 (#1251) by YuliangLiu0306
[fx] Add unit test and fix bugs for transform_mlp_pass (#1299) by XYE
[fx] added apex normalization to patched modules (#1300) by Frank Lee

Recommendation System

[FAW] export FAW in _ops (#1438) by Jiarui Fang
[FAW] move coloparam setting in test code. (#1429) by Jiarui Fang
[FAW] parallel FreqAwareEmbedding (#1424) by Jiarui Fang
[FAW] add cache manager for the cached embedding (#1419) by Jiarui Fang

Global Tensor

[tensor] add shape consistency feature to support auto spec transform (#1418) by YuliangLiu0306
[tensor]build sharding spec to replace distspec in future. (#1405) by YuliangLiu0306

Hotfix

[hotfix] zero optim prevents calling inner optim.zero_grad (#1422) by ver217
[hotfix] fix CPUAdam kernel nullptr (#1410) by ver217
[hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388) by HELSON
[hotfix] fix a running error in test_colo_checkpoint.py (#1387) by HELSON
[hotfix] fix some bugs during gpt2 testing (#1379) by YuliangLiu0306
[hotfix] fix zero optim save/load state dict (#1381) by ver217
[hotfix] fix zero ddp buffer cast (#1376) by ver217
[hotfix] fix no optimizer in save/load (#1363) by HELSON
[hotfix] fix megatron_init in test_gpt2.py (#1357) by HELSON
[hotfix] ZeroDDP use new process group (#1333) by ver217
[hotfix] shared model returns cpu state_dict (#1328) by ver217
[hotfix] fix ddp for unit test test_gpt2 (#1326) by HELSON
[hotfix] fix unit test test_module_spec (#1321) by HELSON
[hotfix] fix PipelineSharedModuleGradientHandler (#1314) by ver217
[hotfix] fix ColoTensor GPT2 unitest (#1309) by HELSON
[hotfix] add missing file (#1308) by Jiarui Fang
[hotfix] remove potiential circle import (#1307) by Jiarui Fang
[hotfix] skip some unittest due to CI environment. (#1301) by YuliangLiu0306
[hotfix] fix shape error in backward when using ColoTensor (#1298) by HELSON
[hotfix] Dist Mgr gather torch version (#1284) by Jiarui Fang

Communication

[communication] add p2p_v2.py to support communication with List[Any] (#1407) by Kirigaya Kazuto

Device

[device] add DeviceMesh class to support logical device layout (#1394) by YuliangLiu0306

Chunk

[chunk] add PG check for tensor appending (#1383) by Jiarui Fang

DDP

[DDP] test ddp state dict uses more strict threshold (#1382) by ver217

Checkpoint

[checkpoint] add kwargs for load_state_dict (#1374) by HELSON
[checkpoint] use args, kwargs in save_checkpoint, load_checkpoint (#1368) by HELSON
[checkpoint] sharded optim save/load grad scaler (#1350) by ver217
[checkpoint] use gather_tensor in checkpoint and update its unit test (#1339) by HELSON
[checkpoint] add ColoOptimizer checkpointing (#1316) by Jiarui Fang
[checkpoint] add test for bert and hotfix save bugs (#1297) by Jiarui Fang

Util

[util] standard checkpoint function naming (#1377) by Frank Lee

Nvme

[nvme] CPUAdam and HybridAdam support NVMe offload (#1360) by ver217

Colotensor

[colotensor] use cpu memory to store state_dict (#1367) by HELSON
[colotensor] add Tensor.view op and its unit test (#1343) by HELSON

Unit test

[unit test] add megatron init test in zero_optim (#1358) by HELSON

Docker

[docker] add tensornvme in docker (#1354) by ver217

Doc

[doc] update rst and docstring (#1351) by ver217

Refactor

[refactor] refactor ColoTensor's unit tests (#1340) by HELSON

Workflow

[workflow] update docker build workflow to use proxy (#1334) by Frank Lee
[workflow] update 8-gpu test to use torch 1.11 (#1332) by Frank Lee
[workflow] roll back to use torch 1.11 for unit testing (#1325) by Frank Lee
[workflow] fixed trigger condition for 8-gpu unit test (#1323) by Frank Lee
[workflow] updated release bdist workflow (#1318) by Frank Lee
[workflow] disable SHM for compatibility CI on rtx3080 (#1315) by Frank Lee
[workflow] updated pytorch compatibility test (#1311) by Frank Lee

Test

[test] removed outdated unit test for meta context (#1329) by [Frank Lee](https://api.github.com/users/Fra...

Assets 2

0 Join discussion

12 Jul 16:10

github-actions

v0.1.8

7e8114a

Version v0.1.8 Release Today!

What's Changed

Hotfix

[hotfix] torchvison fx unittests miss import pytest (#1277) by Jiarui Fang
[hotfix] fix an assertion bug in base schedule. (#1250) by YuliangLiu0306
[hotfix] fix sharded optim step and clip_grad_norm (#1226) by ver217
[hotfix] fx get comm size bugs (#1233) by Jiarui Fang
[hotfix] fx shard 1d pass bug fixing (#1220) by Jiarui Fang
[hotfix]fixed p2p process send stuck (#1181) by YuliangLiu0306
[hotfix]different overflow status lead to communication stuck. (#1175) by YuliangLiu0306
[hotfix]fix some bugs caused by refactored schedule. (#1148) by YuliangLiu0306

Tensor

[tensor] distributed checkpointing for parameters (#1240) by Jiarui Fang
[tensor] redistribute among different process groups (#1247) by Jiarui Fang
[tensor] a shorter shard and replicate spec (#1245) by Jiarui Fang
[tensor] redirect .data.get to a tensor instance (#1239) by HELSON
[tensor] add zero_like colo op, important for Optimizer (#1236) by Jiarui Fang
[tensor] fix some unittests (#1234) by Jiarui Fang
[tensor] fix a assertion in colo_tensor cross_entropy (#1232) by HELSON
[tensor] add unitest for colo_tensor 1DTP cross_entropy (#1230) by HELSON
[tensor] torch function return colotensor (#1229) by Jiarui Fang
[tensor] improve robustness of class 'ProcessGroup' (#1223) by HELSON
[tensor] sharded global process group (#1219) by Jiarui Fang
[Tensor] add cpu group to ddp (#1200) by Jiarui Fang
[tensor] remove gpc in tensor tests (#1186) by Jiarui Fang
[tensor] revert local view back (#1178) by Jiarui Fang
[Tensor] rename some APIs in TensorSpec and Polish view unittest (#1176) by Jiarui Fang
[Tensor] rename parallel_action (#1174) by Ziyue Jiang
[Tensor] distributed view supports inter-process hybrid parallel (#1169) by Jiarui Fang
[Tensor] remove ParallelAction, use ComputeSpec instread (#1166) by Jiarui Fang
[tensor] add embedding bag op (#1156) by ver217
[tensor] add more element-wise ops (#1155) by ver217
[tensor] fixed non-serializable colo parameter during model checkpointing (#1153) by Frank Lee
[tensor] dist spec s2s uses all-to-all (#1136) by ver217
[tensor] added repr to spec (#1147) by Frank Lee

Fx

[fx] added ndim property to proxy (#1253) by Frank Lee
[fx] fixed tracing with apex-based T5 model (#1252) by Frank Lee
[fx] refactored the file structure of patched function and module (#1238) by Frank Lee
[fx] methods to get fx graph property. (#1246) by YuliangLiu0306
[fx]add split module pass and unit test from pipeline passes (#1242) by YuliangLiu0306
[fx] fixed huggingface OPT and T5 results misalignment (#1227) by Frank Lee
[fx]get communication size between partitions (#1224) by YuliangLiu0306
[fx] added patches for tracing swin transformer (#1228) by Frank Lee
[fx] fixed timm tracing result misalignment (#1225) by Frank Lee
[fx] added timm model tracing testing (#1221) by Frank Lee
[fx] added torchvision model tracing testing (#1216) by Frank Lee
[fx] temporarily used (#1215) by XYE
[fx] added testing for all albert variants (#1211) by Frank Lee
[fx] added testing for all gpt variants (#1210) by Frank Lee
[fx]add uniform policy (#1208) by YuliangLiu0306
[fx] added testing for all bert variants (#1207) by Frank Lee
[fx] supported model tracing for huggingface bert (#1201) by Frank Lee
[fx] added module patch for pooling layers (#1197) by Frank Lee
[fx] patched conv and normalization (#1188) by Frank Lee
[fx] supported data-dependent control flow in model tracing (#1185) by Frank Lee

Rename

[rename] convert_to_dist -> redistribute (#1243) by Jiarui Fang

Checkpoint

[checkpoint] save sharded optimizer states (#1237) by Jiarui Fang
[checkpoint]support generalized scheduler (#1222) by Yi Zhao
[checkpoint] make unitest faster (#1217) by Jiarui Fang
[checkpoint] checkpoint for ColoTensor Model (#1196) by Jiarui Fang

Polish

[polish] polish repr for ColoTensor, DistSpec, ProcessGroup (#1235) by HELSON

Refactor

[refactor] move process group from _DistSpec to ColoTensor. (#1203) by Jiarui Fang
[refactor] remove gpc dependency in colotensor's _ops (#1189) by Jiarui Fang
[refactor] move chunk and chunkmgr to directory gemini (#1182) by Jiarui Fang

Context

[context]support arbitary module materialization. (#1193) by YuliangLiu0306
[context]use meta tensor to init model lazily. (#1187) by YuliangLiu0306

Ddp

[ddp] ColoDDP uses bucket all-reduce (#1177) by ver217
[ddp] refactor ColoDDP and ZeroDDP (#1146) by ver217

Colotensor

[ColoTensor] add independent process group (#1179) by Jiarui Fang
[ColoTensor] rename APIs and add output_replicate to ComputeSpec (#1168) by Jiarui Fang
[ColoTensor] improves init functions. (#1150) by Jiarui Fang

Zero

[zero] sharded optim supports loading local state dict (#1170) by ver217
[zero] zero optim supports loading local state dict (#1171) by ver217

Workflow

[workflow] polish readme and dockerfile (#1165) by Frank Lee
[workflow] auto-publish docker image upon release (#1164) by Frank Lee
[workflow] fixed release post workflow (#1154) by Frank Lee
[workflow] fixed format error in yaml file (#1145) by Frank Lee
[workflow] added workflow to auto draft the release post (#1144) by Frank Lee

Gemini

[gemini] refactor gemini mgr (#1151) by ver217

Pipeline

[pipeline]add customized policy (#1139) by YuliangLiu0306
[pipeline]support more flexible pipeline (#1138) by YuliangLiu0306

Ci

[ci] added scripts to auto-generate release post text (#1142) by Frank Lee

Full Changelog: v0.1.8...v0.1.7

Assets 2

21 Jun 04:10

FrankLeeeee

v0.1.7

6690a61

Version v0.1.7 Released Today

Highlights

Started torch.fx for auto-parallel training
Update the zero mechanism with ColoTensor
Fixed various bugs

What's Changed

Hotfix

[hotfix] prevent nested ZeRO (#1140) by ver217
[hotfix]fix bugs caused by refactored pipeline (#1133) by YuliangLiu0306
[hotfix] fix param op hook (#1131) by ver217
[hotfix] fix zero init ctx numel (#1128) by ver217
[hotfix]change to fit latest p2p (#1100) by YuliangLiu0306
[hotfix] fix chunk comm src rank (#1072) by ver217

Zero

[zero] avoid zero hook spam by changing log to debug level (#1137) by Frank Lee
[zero] added error message to handle on-the-fly import of torch Module class (#1135) by Frank Lee
[zero] fixed api consistency (#1098) by Frank Lee
[zero] zero optim copy chunk rather than copy tensor (#1070) by ver217

Optim

[optim] refactor fused sgd (#1134) by ver217

Ddp

[ddp] add save/load state dict for ColoDDP (#1127) by ver217
[ddp] add set_params_to_ignore for ColoDDP (#1122) by ver217
[ddp] supported customized torch ddp configuration (#1123) by Frank Lee

Pipeline

[pipeline]support List of Dict data (#1125) by YuliangLiu0306
[pipeline] supported more flexible dataflow control for pipeline parallel training (#1108) by Frank Lee
[pipeline] refactor the pipeline module (#1087) by Frank Lee

Fx

[fx]add autoparallel passes (#1121) by YuliangLiu0306
[fx] added unit test for coloproxy (#1119) by Frank Lee
[fx] added coloproxy (#1115) by Frank Lee

Gemini

[gemini] gemini mgr supports "cpu" placement policy (#1118) by ver217
[gemini] zero supports gemini (#1093) by ver217

Test

[test] fixed hybrid parallel test case on 8 GPUs (#1106) by Frank Lee
[test] skip tests when not enough GPUs are detected (#1090) by Frank Lee
[test] ignore 8 gpu test (#1080) by Frank Lee

Release

[release] update version.txt (#1103) by Frank Lee

Tensor

[tensor] refactor param op hook (#1097) by ver217
[tensor] refactor chunk mgr and impl MemStatsCollectorV2 (#1077) by ver217
[Tensor] fix equal assert (#1091) by Ziyue Jiang
[Tensor] 1d row embedding (#1075) by Ziyue Jiang
[tensor] chunk manager monitor mem usage (#1076) by ver217
[Tensor] fix optimizer for CPU parallel (#1069) by Ziyue Jiang
[Tensor] add hybrid device demo and fix bugs (#1059) by Ziyue Jiang

Amp

[amp] included dict for type casting of model output (#1102) by Frank Lee

Workflow

[workflow] fixed 8-gpu test workflow (#1101) by Frank Lee
[workflow] added regular 8 GPU testing (#1099) by Frank Lee
[workflow] disable p2p via shared memory on non-nvlink machine (#1086) by Frank Lee

Engine

[engine] fixed empty op hook check (#1096) by Frank Lee

Doc

[doc] added documentation to chunk and chunk manager (#1094) by Frank Lee

Context

[context] support lazy init of module (#1088) by Frank Lee
[context] maintain the context object in with statement (#1073) by Frank Lee

Refactory

[refactory] add nn.parallel module (#1068) by Jiarui Fang

Cudnn

[cudnn] set False to cudnn benchmark by default (#1063) by Frank Lee

Full Changelog: v0.1.7...v0.1.6

Assets 2

0 Join discussion

Releases: hpcaitech/ColossalAI

Version v0.1.12 Release Today!

What's Changed

Zero

Gemini

Hotfix

Colotensor

Autoparallel

Version

Pipeline middleware

Fx

Example

Device

Test

Pipeline

Examples

Version v0.1.11rc5 Release Today!

What's Changed

Release

Cli

Gemini

Setup

Test

Hotfix

Zero

Testing

Rpc

Autoparallel

Fx

Github

Workflow

Version v0.1.11rc4 Release Today!

What's Changed

Workflow

Gemini

Autoparallel

Fx

Hotfix

Example

Kernel

Polish

Zero

Colotensor

Tutorial

Tensorparallel

Sc demo

Sc

Version v0.1.11rc3 Release Today!

What's Changed

Release

Tutorial

Example

Sc

Nfc

Doc

Zero

Autoparallel

Fx

Hotfix

Inference

Amp

Diffusion

Utils

Version v0.1.11rc2 Release Today!

What's Changed

Autoparallel

Kernel

Gemini

Checkpointio

Doc

Example

Nfc

Fx

Hotfix

Pipeline

Ci

Compatibility

Feat

Fx/profiler

Workflow