Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gptq] rebase to main #4695

Merged
merged 166 commits into from
Sep 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
166 commits
Select commit Hold shift + click to select a range
6ccecc0
[gemini] fix tensor storage cleaning in state dict collection (#4396)
Fridge003 Aug 10, 2023
d86ddd9
[hotfix] fix unsafe async comm in zero (#4404)
Gy-Lu Aug 11, 2023
6d41c3f
[doc] update Coati README (#4405)
CWHer Aug 14, 2023
ff83679
[doc] fix a typo in examples/tutorial/auto_parallel/README.md (#4430)
tiansiyuan Aug 14, 2023
5e1a9d4
[cluster] add process group mesh (#4039)
ver217 Jun 20, 2023
4225442
[pipeline] add stage manager (#4093)
ver217 Jun 27, 2023
45fdc9b
[pipeline] implement p2p communication (#4100)
ver217 Jun 28, 2023
f51ce1b
[pipeline] refactor 1f1b schedule (#4115)
ver217 Jun 29, 2023
e8e7e49
[pipeline]add pipeline policy and bert forward (#4130)
CjhHa1 Jul 4, 2023
5c897dd
[pipeline] add stage manager (#4093)
ver217 Jun 27, 2023
c552cef
[pipeline]add pipeline policy and bert forward (#4130)
CjhHa1 Jul 4, 2023
90a65ea
[pipeline] build bloom model and policy , revise the base class of po…
CjhHa1 Jul 5, 2023
59f6f57
[pipeline] update shardformer policy
ver217 Jul 5, 2023
b0b8ad2
[pipeline] update shardformer docstring
ver217 Jul 5, 2023
2d6cc07
[test] update shardformer tests
ver217 Jul 5, 2023
5fc60a3
[test] add shard util tests
ver217 Jul 5, 2023
1ed3f8a
[shardformer] rename policy file name
ver217 Jul 5, 2023
d35bd7d
[shardformer] fix type hint
ver217 Jul 5, 2023
c5ea728
[pipeline] add bert_for_pretraining bert_lmhead forward and policy (#…
CjhHa1 Jul 6, 2023
f3bcc29
[pipeline] move bert related pipeline components to shardformer (#4187)
CjhHa1 Jul 7, 2023
890774b
[shardformer] support lazy init (#4202)
ver217 Jul 10, 2023
1094e0f
[pipeline] Bert pipeline for shardformer and its tests (#4197)
CjhHa1 Jul 10, 2023
1622031
[pipeline] Llama pipeline (#4205)
CjhHa1 Jul 11, 2023
31bcf86
[pipeline] Llama causal lm and llama for sequence classification pipe…
CjhHa1 Jul 11, 2023
37d22f6
[pipeline] add bloom model pipeline (#4210)
CjhHa1 Jul 13, 2023
208ac8f
[pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224)
Fridge003 Jul 13, 2023
7e4de52
[shardformer] fix base policy (#4229)
ver217 Jul 14, 2023
a14d352
[pipeline] add pipeline forward for variants of gpt2 (#4238)
Fridge003 Jul 17, 2023
e7cc62d
[pipeline] All bert models (#4233)
CjhHa1 Jul 17, 2023
34f0e34
[pipeline] finish bloom models pipeline and tests (#4223)
CjhHa1 Jul 17, 2023
d9be047
[bugs] hot fix some testing bugs for new models (#4268)
CjhHa1 Jul 18, 2023
2a2eacf
[pipeline] support shardformer for GPT2ForQuestionAnswering & complet…
Fridge003 Jul 19, 2023
d921ce8
[shardformer] support inplace sharding (#4251)
ver217 Jul 20, 2023
b774d5e
[pipeline] refactor gpt2 pipeline forwards (#4287)
Fridge003 Jul 20, 2023
d8408d1
[pipeline] OPT model pipeline (#4258)
CjhHa1 Jul 20, 2023
0a8f3c8
[hotfix] fix opt pipeline (#4293)
CjhHa1 Jul 20, 2023
18ebcf4
[pipeline] reformat for unified design (#4283)
CjhHa1 Jul 21, 2023
36e546b
[pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300)
Fridge003 Jul 21, 2023
d080712
[pipeline] test pure pipeline process using llama (#4218)
CjhHa1 Jul 25, 2023
083d7da
[pipeline] add pipeline support for all T5 models (#4310)
Fridge003 Jul 25, 2023
b3f5d7a
[shardformer] support pipeline base vit model (#4284)
FoolPlayer Jul 25, 2023
261eab0
[plugin] add 3d parallel plugin (#4295)
ver217 Jul 25, 2023
411cf1d
[hotfix] fix gemini and zero test (#4333)
ver217 Jul 27, 2023
da3cef2
[pipeline] fix return_dict/fix pure_pipeline_test (#4331)
Fridge003 Jul 27, 2023
d3c6cd6
[pipeline] add unit test for 1f1b (#4303)
Gy-Lu Jul 31, 2023
f13954c
[pipeline] refactor test pipeline and remove useless utils in pipelin…
CjhHa1 Aug 1, 2023
0ceec8f
[pipeline] support fp32 for HybridPlugin/merge shardformer test and p…
Fridge003 Aug 1, 2023
c59d7ac
Feature/vit support (#4182)
klhhhhh Jul 7, 2023
dd2bf02
[shardformer] support SAM (#4231)
FoolPlayer Jul 14, 2023
9ee4ebe
[shardformer] support whisper (#4212)
FoolPlayer Jul 17, 2023
ed34bb1
Feature/chatglm (#4240)
klhhhhh Jul 20, 2023
f60162b
[shardformer] added tests
klhhhhh Jul 4, 2023
c492869
[shardformer] vit test finish and support
klhhhhh Jul 6, 2023
7377be7
import chatglm
klhhhhh Jul 7, 2023
6ee4c9e
[shardformer] add test kit in model zoo for chatglm
klhhhhh Jul 7, 2023
8620009
[sharformer] add first version of policy of chatglm
klhhhhh Jul 10, 2023
1a29e8f
[shardformer] polish chatglm code
klhhhhh Jul 12, 2023
cbb54d3
[shardformer] polish code
klhhhhh Jul 13, 2023
dad00c4
[shardformer] support chatglm without layernorm
klhhhhh Jul 14, 2023
00f6ef1
[shardformer] delete some file
klhhhhh Jul 17, 2023
f155ae8
[shardformer] ChatGLM support layernorm sharding
klhhhhh Jul 17, 2023
91850fe
[shardformer] register without auto policy
klhhhhh Jul 18, 2023
4da0505
[shardformer] pre-commit check files
klhhhhh Jul 19, 2023
8120eca
[shardformer] support ChatGLMForConditionalGeneration & add fusedlaye…
klhhhhh Jul 20, 2023
879301d
[shardformer] support Blip2 (#4243)
FoolPlayer Jul 25, 2023
726541a
update some module with new api version
FoolPlayer Aug 1, 2023
c3ca53c
[test] skip some not compatible models
FoolPlayer Aug 2, 2023
5c6f183
[test] Hotfix/fix some model test and refactor check util api (#4369)
FoolPlayer Aug 3, 2023
b1feece
[shardformer] add util functions for shardformer tests/fix sync_share…
Fridge003 Aug 3, 2023
a88e922
[pipeline] add chatglm (#4363)
CjhHa1 Aug 4, 2023
906426c
[Shardformer] Merge flash attention branch to pipeline branch (#4362)
flybird11111 Aug 7, 2023
ed4c448
[pipeline] rewrite t5 tests & support multi-tensor transmitting in pi…
Fridge003 Aug 8, 2023
7a3dfd0
[shardformer] update shardformer to use flash attention 2 (#4392)
flybird11111 Aug 9, 2023
d2cd48e
[shardformer] test all optimizations (#4399)
flybird11111 Aug 10, 2023
7596e9a
[pipeline] rewrite bert tests and fix some bugs (#4409)
CjhHa1 Aug 11, 2023
21e0a42
[shardformer]fix, test gpt2 for AMP+TP (#4403)
flybird11111 Aug 11, 2023
7711bd5
[shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395)
Fridge003 Aug 11, 2023
1edc9b5
[shardformer] update tests for all optimization (#4413)
flybird11111 Aug 11, 2023
108e54a
[shardformer]update t5 tests for using all optimizations. (#4407)
flybird11111 Aug 14, 2023
328a791
[shardformer] update bloom/llama/vit/chatglm tests (#4420)
flybird11111 Aug 14, 2023
172f7fa
[misc] resolve code factor issues (#4433)
ver217 Aug 14, 2023
9223022
[misc] update requirements
ver217 Aug 15, 2023
73a4144
[shardformer] fix embedding
ver217 Aug 15, 2023
5d4efdf
[shardformer] fix import
ver217 Aug 15, 2023
d20dceb
[format] applied code formatting on changed files in pull request 444…
github-actions[bot] Aug 16, 2023
424629f
[shardformer/sequence parallel] Cherry pick commit to new branch (#4450)
FoolPlayer Aug 16, 2023
6ef33f7
[shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446)
Fridge003 Aug 16, 2023
26e29d5
[devops] add large-scale distributed test marker (#4452)
ver217 Aug 16, 2023
a78daf6
[shardformer] support interleaved pipeline (#4448)
Gy-Lu Aug 16, 2023
7c8be77
[shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/…
FoolPlayer Aug 18, 2023
0ecd71e
[shardformer] bloom support sequence parallel (#4465)
flybird11111 Aug 18, 2023
a27e0bb
[shardformer] bert support sequence parallel. (#4455)
flybird11111 Aug 18, 2023
8739aa7
[shardformer] Pipeline/whisper (#4456)
CjhHa1 Aug 18, 2023
1c7df56
[shardformer] support tp+zero for shardformer (#4472)
Fridge003 Aug 21, 2023
285fe7b
[chat] update config and prompt (#4139)
MichelleMa8 Aug 21, 2023
5545114
rename chatglm to chatglm2 (#4484)
CjhHa1 Aug 22, 2023
351351a
[shardformer/sequence parallel] not support opt of seq-parallel, add …
FoolPlayer Aug 22, 2023
59e252e
[shardformer] chatglm support sequence parallel (#4482)
flybird11111 Aug 22, 2023
e04436a
[shardformer] tests for 3d parallel (#4493)
CjhHa1 Aug 23, 2023
2706142
[gemini] improve compatibility and add static placement policy (#4479)
ver217 Aug 24, 2023
3353e55
[shardformer] vit/llama/t5 ignore the sequence parallelism flag and s…
flybird11111 Aug 24, 2023
c0efc3e
[format] applied code formatting on changed files in pull request 447…
github-actions[bot] Aug 25, 2023
839847b
[zero]support zero2 with gradient accumulation (#4511)
Gy-Lu Aug 25, 2023
de8a65b
[shardformer] opt fix. (#4514)
flybird11111 Aug 25, 2023
44eab2b
[shardformer] support sharded checkpoint IO for models of HybridParal…
Fridge003 Aug 25, 2023
376533a
[shardformer] zero1+pp and the corresponding tests (#4517)
CjhHa1 Aug 28, 2023
c554b7f
[shardformer/fix overlap bug] fix overlap bug, add overlap as an opti…
FoolPlayer Aug 28, 2023
0b00def
[example] add llama2 example (#4527)
ver217 Aug 28, 2023
0387a47
[shardformer] fix emerged bugs after updating transformers (#4526)
Fridge003 Aug 29, 2023
1467e3b
[coati] add chatglm model (#4539)
yingliu-hpc Aug 29, 2023
e241b74
[shardformer] Add overlap support for gpt2 (#4535)
FoolPlayer Aug 29, 2023
1c43bfd
[coati] update ci
ver217 Aug 30, 2023
661a1ef
Merge pull request #4541 from ver217/coati/chatglm
yingliu-hpc Aug 30, 2023
c648dc0
fix colossalai version in coati examples
yingliu-hpc Aug 30, 2023
d367b88
[shardformer] fix opt test hanging (#4521)
flybird11111 Aug 30, 2023
9f852f2
keep requirements same with main branch
yingliu-hpc Aug 30, 2023
12c95a9
fix runtime prepare pass (#4502)
vincentccc Aug 30, 2023
ec18fc7
[shardformer] support pp+tp+zero1 tests (#4531)
flybird11111 Aug 30, 2023
8e2e199
[example] update streamlit 0.73.1 to 1.11.1 (#4386)
ChengDaqi2023 Aug 30, 2023
f1ae8c9
[example] change accelerate version (#4431)
tiansiyuan Aug 30, 2023
c7b60f7
[devops] cancel previous runs in the PR (#4546)
ver217 Aug 30, 2023
2c787d7
[shardformer] fix submodule replacement bug when enabling pp (#4544)
Fridge003 Aug 31, 2023
c9625db
[shardformer] support sharded optimizer checkpointIO of HybridParalle…
Fridge003 Aug 31, 2023
38ccb8b
[shardformer] support from_pretrained when loading model with HybridP…
Fridge003 Sep 1, 2023
cbac782
[zero]fix zero ckptIO with offload (#4529)
Gy-Lu Sep 1, 2023
eb952ea
Update Dockerfile (#4499)
data-infra Sep 1, 2023
cfa6070
[Fix] Fix compile error (#4357)
HAOCHENYE Sep 1, 2023
508ca36
[pipeline] 1f1b schedule receive microbatch size (#4589)
ver217 Sep 1, 2023
63ecafb
[checkpointio] optimize zero optim checkpoint io (#4591)
ver217 Sep 4, 2023
7a978eb
[DOC] hotfix/llama2news (#4595)
binmakeswell Sep 4, 2023
8d7b022
[doc] add llama2 benchmark (#4604)
binmakeswell Sep 4, 2023
aaeb520
Merge pull request #4542 from hpcaitech/chatglm
yingliu-hpc Sep 4, 2023
24c0768
[shardformer] Pytree fix (#4533)
CjhHa1 Sep 4, 2023
0a94fcd
[shardformer] update bert finetune example with HybridParallelPlugin …
flybird11111 Sep 4, 2023
e79b1e8
[checkpointio] support huggingface from_pretrained for all plugins (#…
Fridge003 Sep 4, 2023
a39a5c6
Merge branch 'main' into feature/shardformer
ver217 Sep 4, 2023
86d2258
[shardformer] Add overlap optional for HybridParallelPlugin (#4615)
FoolPlayer Sep 5, 2023
ec08668
[shardformer] update shardformer readme (#4617)
flybird11111 Sep 5, 2023
e71d245
[test] ignore gpt2 shardformer test (#4619)
ver217 Sep 5, 2023
807e01a
[zero] hotfix master param sync (#4618)
ver217 Sep 5, 2023
bd18678
[test] fix gemini checkpoint and gpt test (#4620)
ver217 Sep 5, 2023
89fe027
[legacy] move trainer to legacy (#4545)
ver217 Aug 31, 2023
8accecd
[legacy] move engine to legacy (#4560)
ver217 Sep 4, 2023
ac178ca
[legacy] move builder and registry to legacy (#4603)
ver217 Sep 4, 2023
fae6c92
Merge branch 'main' into feature/shardformer
ver217 Sep 5, 2023
efba0f4
Merge pull request #4612 from hpcaitech/feature/shardformer
ver217 Sep 5, 2023
9709b8f
[release] update version (#4623)
ver217 Sep 6, 2023
c3d5fa3
[shardformer] Support customized policy for llamav2 based model with …
eric8607242 Sep 7, 2023
660eed9
[pipeline] set optimizer to optional in execute_pipeline (#4630)
Fridge003 Sep 7, 2023
295b38f
[example] update vit example for hybrid parallel plugin (#4641)
Fridge003 Sep 7, 2023
a686f9d
[devops] fix concurrency group and compatibility test (#4665)
ver217 Sep 8, 2023
7486ed7
[shardformer] update llama2/opt finetune example and fix llama2 polic…
flybird11111 Sep 9, 2023
536397c
[devops] fix concurrency group (#4667)
ver217 Sep 11, 2023
554aa95
[legacy] move communication and nn to legacy and refactor logger (#4671)
ver217 Sep 11, 2023
eedaa3e
[shardformer]fix gpt2 double head (#4663)
flybird11111 Sep 11, 2023
bce0f16
[Feature] The first PR to Add TP inference engine, kv-cache manager a…
tiandiao123 Sep 11, 2023
1d45473
[doc] Update booster user documents. (#4669)
Fridge003 Sep 12, 2023
8844691
[shardformer] update shardformer readme (#4689)
flybird11111 Sep 12, 2023
564f54d
[gptq] add gptq kernel (#4416)
Xu-Kai Aug 21, 2023
bdcb1dd
[gptq] faster gptq cuda kernel (#4494)
Xu-Kai Aug 23, 2023
1753bdc
add gptq tensor parallel
Xu-Kai Aug 29, 2023
880ef70
add gptq tp
Xu-Kai Aug 29, 2023
6b14822
delete print
Xu-Kai Aug 29, 2023
4b0f7d5
add test gptq check
Xu-Kai Aug 30, 2023
ddb3c54
add test auto gptq check
Xu-Kai Aug 30, 2023
ee944de
Merge branch 'feature/quant-gptq' into gptq_infer
Xu-Kai Sep 12, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
14 changes: 7 additions & 7 deletions .github/workflows/build_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ jobs:
run:
shell: bash
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-repare-cache
cancel-in-progress: true
steps:
- name: Copy testmon cache
run: | # branch name may contain slash, we need to replace it with space
Expand All @@ -87,8 +87,8 @@ jobs:
anyLibraryFileChanged: ${{ steps.find-lib-change.outputs.any_changed }}
runs-on: ubuntu-latest
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-detect-change
cancel-in-progress: true
steps:
- uses: actions/checkout@v2
with:
Expand Down Expand Up @@ -147,8 +147,8 @@ jobs:
run:
shell: bash
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-run-test
cancel-in-progress: true
steps:
- name: Checkout TensorNVMe
uses: actions/checkout@v2
Expand Down Expand Up @@ -208,7 +208,7 @@ jobs:

- name: Execute Unit Testing
run: |
CURL_CA_BUNDLE="" PYTHONPATH=$PWD pytest --testmon --testmon-cov=. --durations=10 tests/
CURL_CA_BUNDLE="" PYTHONPATH=$PWD pytest -m "not largedist" --testmon --testmon-forceselect --testmon-cov=. --durations=10 tests/
env:
DATA: /data/scratch/cifar-10
NCCL_SHM_DISABLE: 1
Expand Down
7 changes: 3 additions & 4 deletions .github/workflows/compatiblity_test_on_dispatch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:
name: Test for PyTorch Compatibility
needs: matrix_preparation
if: github.repository == 'hpcaitech/ColossalAI'
runs-on: [self-hosted, gpu]
runs-on: [self-hosted, 8-gpu]
strategy:
fail-fast: false
matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
Expand All @@ -64,7 +64,7 @@ jobs:
- name: Install tensornvme
run: |
cd TensorNVMe
conda install cmake
apt update && apt install -y cmake
pip install -r requirements.txt
pip install -v .
- uses: actions/checkout@v2
Expand All @@ -83,8 +83,7 @@ jobs:
fi
- name: Install Colossal-AI
run: |
pip install -r requirements/requirements.txt
pip install -v --no-cache-dir .
CUDA_EXT=1 pip install -v .
pip install -r requirements/requirements-test.txt
- name: Unit Testing
run: |
Expand Down
14 changes: 7 additions & 7 deletions .github/workflows/compatiblity_test_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ jobs:
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-prepare-matrix
cancel-in-progress: true
steps:
- uses: actions/checkout@v3
- id: set-matrix
Expand All @@ -35,7 +35,7 @@ jobs:
name: Test for PyTorch Compatibility
needs: matrix_preparation
if: github.repository == 'hpcaitech/ColossalAI'
runs-on: [self-hosted, gpu]
runs-on: [self-hosted, 8-gpu]
strategy:
fail-fast: false
matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
Expand All @@ -44,8 +44,8 @@ jobs:
options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10
timeout-minutes: 120
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-run-test-${{ matrix.container }}
cancel-in-progress: true
steps:
- name: Install dependencies
run: |
Expand All @@ -58,7 +58,7 @@ jobs:
- name: Install tensornvme
run: |
cd TensorNVMe
conda install cmake
apt update && apt install -y cmake
pip install -r requirements.txt
pip install -v .
- uses: actions/checkout@v2
Expand All @@ -78,7 +78,7 @@ jobs:

- name: Install Colossal-AI
run: |
pip install -v --no-cache-dir .
CUDA_EXT=1 pip install -v .
pip install -r requirements/requirements-test.txt
- name: Unit Testing
run: |
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/compatiblity_test_on_schedule.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
name: Test for PyTorch Compatibility
needs: matrix_preparation
if: github.repository == 'hpcaitech/ColossalAI'
runs-on: [self-hosted, gpu]
runs-on: [self-hosted, 8-gpu]
strategy:
fail-fast: false
matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
Expand All @@ -54,7 +54,7 @@ jobs:
- name: Install tensornvme
run: |
cd TensorNVMe
conda install cmake
apt update && apt install -y cmake
pip install -r requirements.txt
pip install -v .
- uses: actions/checkout@v2
Expand All @@ -75,7 +75,7 @@ jobs:

- name: Install Colossal-AI
run: |
pip install -v --no-cache-dir .
CUDA_EXT=1 pip install -v .
pip install -r requirements/requirements-test.txt

- name: Unit Testing
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/doc_check_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ jobs:
github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI'
runs-on: ubuntu-latest
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-check-i18n
cancel-in-progress: true
steps:
- uses: actions/checkout@v2

Expand All @@ -35,8 +35,8 @@ jobs:
github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI'
runs-on: ubuntu-latest
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-check-doc
cancel-in-progress: true
steps:
- uses: actions/checkout@v2
with:
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/doc_test_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ jobs:
any_changed: ${{ steps.changed-files.outputs.any_changed }}
changed_files: ${{ steps.changed-files.outputs.all_changed_files }}
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-detect-change
cancel-in-progress: true
name: Detect changed example files
steps:
- uses: actions/checkout@v3
Expand Down Expand Up @@ -63,8 +63,8 @@ jobs:
run:
shell: bash
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-run-doctest
cancel-in-progress: true
steps:
- name: Checkout ColossalAI-Documentation
uses: actions/checkout@v2
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/example_check_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ jobs:
anyChanged: ${{ steps.setup-matrix.outputs.anyChanged }}
name: Detect changed example files
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-detect-change
cancel-in-progress: true
steps:
- uses: actions/checkout@v3
with:
Expand Down Expand Up @@ -81,8 +81,8 @@ jobs:
options: --gpus all --rm -v /data/scratch/examples-data:/data/
timeout-minutes: 10
concurrency:
group: ${{ github.head_ref }}
cancel-in-progress: false
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-run-example-${{ matrix.directory }}
cancel-in-progress: true
steps:
- uses: actions/checkout@v3

Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/run_chatgpt_examples.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,8 @@ jobs:
- name: Checkout ColossalAI
uses: actions/checkout@v2

- name: Install ColossalAI and ChatGPT
- name: Install ChatGPT
run: |
pip install -e .
cd applications/Chat
pip install -v .
pip install -r examples/requirements.txt
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/run_chatgpt_unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,8 @@ jobs:
- name: Checkout ColossalAI
uses: actions/checkout@v2

- name: Install ColossalAI and ChatGPT
- name: Install ChatGPT
run: |
pip install -e .
cd applications/Chat
pip install -v .
pip install -r requirements-test.txt
Expand Down
33 changes: 33 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -397,6 +397,39 @@ Copyright 2021- HPC-AI Technology Inc. All rights reserved.
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.


---------------- LICENSE FOR VLLM TEAM ----------------

from VLLM TEAM:

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

https://github.com/vllm-project/vllm/blob/main/LICENSE

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

---------------- LICENSE FOR LIGHTLLM TEAM ----------------

from LIGHTLLM TEAM:

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

https://github.com/ModelTC/lightllm/blob/main/LICENSE

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

---------------- LICENSE FOR AutoGPTQ ----------------

From AutoGPTQ:
Expand Down
13 changes: 11 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
</div>

## Latest News
* [2023/09] [70 Billion Parameter LLaMA2 Model Training Accelerated by 195%](https://www.hpc-ai.tech/blog/70b-llama2-training)
* [2023/07] [HPC-AI Tech Raises 22 Million USD in Series A Funding](https://www.hpc-ai.tech/blog/hpc-ai-tech-raises-22-million-usd-in-series-a-funding-to-fuel-team-expansion-and-business-growth)
* [2023/07] [65B Model Pretraining Accelerated by 38%, Best Practices for Building LLaMA-Like Base Models Open-Source](https://www.hpc-ai.tech/blog/large-model-pretraining)
* [2023/03] [ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
Expand All @@ -50,7 +51,7 @@
<li>
<a href="#Parallel-Training-Demo">Parallel Training Demo</a>
<ul>
<li><a href="#LLaMA">LLaMA</a></li>
<li><a href="#LLaMA2">LLaMA 1/2</a></li>
<li><a href="#GPT-3">GPT-3</a></li>
<li><a href="#GPT-2">GPT-2</a></li>
<li><a href="#BERT">BERT</a></li>
Expand Down Expand Up @@ -217,8 +218,16 @@ Acceleration of [AlphaFold Protein Structure](https://alphafold.ebi.ac.uk/)
<p align="right">(<a href="#top">back to top</a>)</p>

## Parallel Training Demo
### LLaMA2
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/llama2_pretraining.png" width=600/>
</p>

- 70 billion parameter LLaMA2 model training accelerated by 195%
[[code]](https://github.com/hpcaitech/ColossalAI/tree/example/llama/examples/language/llama)
[[blog]](https://www.hpc-ai.tech/blog/70b-llama2-training)

### LLaMA
### LLaMA1
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/images/LLaMA_pretraining.png" width=600/>
</p>
Expand Down
Loading
Loading