-
Notifications
You must be signed in to change notification settings - Fork 395
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* deepspeed * shard * full param deepspeed works by this commit * offload optimizer & documentation * format & fix save deepspeed weight * format & update save_checkpoint * update pipfile * update pipfile * zero init for transformers * add some new config * fix bug * min 1e6 * update deepspeed config * Update requirements.txt * remove duplicate code * throw warning when compile w/ deepspeed * black * integrate deepspeed into wrap_model_distributed * remove unuse code * style * fix bug * fix bug * max token len to 16k * deepspeed save lora * update get optimizer * fix check disk * comment out offload CPU * Pipfile.lock * Update requirements.txt * make black * add default * minor fix * minor fix * minor fix * fix val loader * potential val loader fix * update * lock * Update requirements.txt * improve model saving for deepspeed * solved INFLIGHT problem * update doc * deepspeed default push to hub by cpu * Revert "improve model saving for deepspeed" This reverts commit 62fc9c5. * remove unuse code * Update requirements.txt * deepspeed==0.11.1 * Update requirements.txt * temp fix for deepspeed slow gen * style * style * fix --------- Co-authored-by: haqishen <haqishen@gmail.com> Co-authored-by: Philipp Singer <killver@gmail.com> Co-authored-by: psinger <psinger@users.noreply.github.com>
- Loading branch information
1 parent
08475e3
commit 67d3a3c
Showing
15 changed files
with
814 additions
and
494 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
1 change: 1 addition & 0 deletions
1
documentation/docs/tooltips/experiments/_deepspeed-offload-optimizer.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Whether to offload optimizer to cpu for saving more GPU ram during training. Note that turn on offload_optimizer would further slow down training. |
1 change: 1 addition & 0 deletions
1
documentation/docs/tooltips/experiments/_deepspeed-reduce-bucket-size.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Number of elements reduced/allreduced at a time. Limits the memory required for the allgather for large model sizes. Smaller values use less memory, but slow down training. |
1 change: 1 addition & 0 deletions
1
documentation/docs/tooltips/experiments/_deepspeed-stage3-max-live-parameters.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
The maximum number of parameters resident per GPU before releasing. Smaller values use less memory, but slow down training. |
1 change: 1 addition & 0 deletions
1
documentation/docs/tooltips/experiments/_deepspeed-stage3-max-reuse-distance.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Do not release a parameter if it will be reused within this threshold of parameters. Smaller values use less memory, but slow down training. |
1 change: 1 addition & 0 deletions
1
...ion/docs/tooltips/experiments/_deepspeed-stage3-param-persistence-threshold.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Do not partition parameters smaller than this threshold. Smaller values use less memory, but can greatly increase communication and slow down training. (especially latency-bound messages). |
1 change: 1 addition & 0 deletions
1
documentation/docs/tooltips/experiments/_deepspeed-stage3-prefetch-bucket-size.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Maximum number of parameter elements to fetch ahead of use. Smaller values use less memory, but slow down training.. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Whether to use deepspeed for saving GPU ram during training. Note that turning on DeepSpeed can slow down training. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.