Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add deepspeed chat arxiv report #4110

Merged
merged 6 commits into from
Aug 9, 2023
Merged

add deepspeed chat arxiv report #4110

merged 6 commits into from
Aug 9, 2023

Conversation

conglongli
Copy link
Contributor

No description provided.

@conglongli conglongli added this pull request to the merge queue Aug 8, 2023
@yaozhewei yaozhewei removed this pull request from the merge queue due to a manual request Aug 8, 2023
@yaozhewei yaozhewei added this pull request to the merge queue Aug 9, 2023
Merged via the queue into master with commit 78d985a Aug 9, 2023
16 checks passed
@conglongli conglongli deleted the add_dschat_arxiv branch August 9, 2023 03:08
hughpu pushed a commit to hughpu/DeepSpeed that referenced this pull request Aug 9, 2023
* add deepspeed chat arxiv report

* add zeroquant v2 and fp

* add selective enhencement

* add ignore for 'Youn' in spell checker

---------

Co-authored-by: yaozhewei <zheweiy@berkeley.edu>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
github-merge-queue bot pushed a commit that referenced this pull request Aug 29, 2023
…port inputs require no grad (#4118)

* feat: add `non_reentrant_checkpoint`

* feat: add missing output postprocess and change the hook to record leaf forward tensor refs

* fix: make the multi_grad_hook registered after graph construction

* fix: backward compatibility for multi_tensor_hook

* fix: nonlocal reference error of deepspeed_saved_tensors

* fix: reduce repeating hook registration

* test: add test for `activation_checkpointing.checkpointing.non_reentrant_checkpoint`

* Pass correct node size for ZeRO++ (#4085)

* Pass correct node size

* formatting

---------

Co-authored-by: Connor Holmes <development@cmikeh2.me>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* add deepspeed chat arxiv report (#4110)

* add deepspeed chat arxiv report

* add zeroquant v2 and fp

* add selective enhencement

* add ignore for 'Youn' in spell checker

---------

Co-authored-by: yaozhewei <zheweiy@berkeley.edu>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* style: change flake8 detected style missmatch

* test: hack to clone the `test_activation_checkpointing` module for reuse and add regression tests

* doc: explain the introduction of `non_reentrant_checkpoint`

* doc: explain the test of `non_reentrant_checkpoint`

---------

Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
Co-authored-by: Connor Holmes <development@cmikeh2.me>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com>
Co-authored-by: yaozhewei <zheweiy@berkeley.edu>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
github-merge-queue bot pushed a commit that referenced this pull request Sep 6, 2023
… true for activation checkpoint layer in pipeline train. (#4224)

* feat: add `non_reentrant_checkpoint`

* feat: add missing output postprocess and change the hook to record leaf forward tensor refs

* fix: make the multi_grad_hook registered after graph construction

* fix: backward compatibility for multi_tensor_hook

* fix: nonlocal reference error of deepspeed_saved_tensors

* fix: reduce repeating hook registration

* test: add test for `activation_checkpointing.checkpointing.non_reentrant_checkpoint`

* Pass correct node size for ZeRO++ (#4085)

* Pass correct node size

* formatting

---------

Co-authored-by: Connor Holmes <development@cmikeh2.me>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* add deepspeed chat arxiv report (#4110)

* add deepspeed chat arxiv report

* add zeroquant v2 and fp

* add selective enhencement

* add ignore for 'Youn' in spell checker

---------

Co-authored-by: yaozhewei <zheweiy@berkeley.edu>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* style: change flake8 detected style missmatch

* test: hack to clone the `test_activation_checkpointing` module for reuse and add regression tests

* doc: explain the introduction of `non_reentrant_checkpoint`

* doc: explain the test of `non_reentrant_checkpoint`

* apply non_reentrant_checkpoint in pipeline parallel training

* ut pass

* fix ci

* reduce check level for ci

---------

Co-authored-by: hughpu <hughpu@hotmail.com>
Co-authored-by: Hugh Pu <31498041+hughpu@users.noreply.github.com>
Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
Co-authored-by: Connor Holmes <development@cmikeh2.me>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com>
Co-authored-by: yaozhewei <zheweiy@berkeley.edu>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants