Skip to content

Commit

Permalink
add accelerate user guides
Browse files Browse the repository at this point in the history
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
  • Loading branch information
woshiyyya committed Aug 17, 2023
1 parent 47fa502 commit c30c8fd
Show file tree
Hide file tree
Showing 11 changed files with 306 additions and 46 deletions.
13 changes: 0 additions & 13 deletions doc/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -225,8 +225,6 @@ py_test_run_all_subdirectory(
include = ["source/train/doc_code/*.py"],
exclude = [
"source/train/doc_code/hf_trainer.py", # Too large
"source/train/doc_code/accelerate_torch_trainer.py", # GPU test
"source/train/doc_code/deepspeed_torch_trainer.py", # GPU test
],
extra_srcs = [],
tags = ["exclusive", "team:ml"],
Expand Down Expand Up @@ -272,17 +270,6 @@ py_test(
args = ["--path", "doc/source/train/examples/pytorch/pytorch_resnet_finetune.ipynb"]
)

py_test_run_all_subdirectory(
size = "large",
include = [
"source/train/doc_code/accelerate_torch_trainer.py", # GPU test
"source/train/doc_code/deepspeed_torch_trainer.py", # GPU test
],
exclude = [],
extra_srcs = [],
tags = ["exclusive", "team:ml", "gpu"],
)

# --------------------------------------------------------------------
# Test all doc/external code
# --------------------------------------------------------------------
Expand Down
24 changes: 24 additions & 0 deletions doc/source/images/deepspeed_logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 12 additions & 0 deletions doc/source/train/deepspeed.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
.. _train-deepspeed:

Training with DeepSpeed
=======================


.. dropdown:: Code example

.. literalinclude:: ./doc_code/deepspeed_torch_trainer.py
:language: python
:start-after: __deepspeed_torch_basic_example_start__
:end-before: __deepspeed_torch_basic_example_end__
2 changes: 1 addition & 1 deletion doc/source/train/doc_code/accelerate_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def train_loop_per_worker():
print(f"epoch: {epoch}, loss: {loss.item()}")

train.report(
{},
metrics={"epoch": epoch, "loss": loss.item()},
checkpoint=Checkpoint.from_dict(
dict(epoch=epoch, model=accelerator.unwrap_model(model).state_dict())
),
Expand Down
8 changes: 8 additions & 0 deletions doc/source/train/examples/accelerate/accelerate_example.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
:orphan:

.. _accelerate_example:

Hugging Face Accelerate Distributed Training Example with Ray Train
===================================================================

.. literalinclude:: /../../python/ray/train/examples/accelerate/accelerate_torch_trainer.py
8 changes: 8 additions & 0 deletions doc/source/train/examples/deepspeed/deepspeed_example.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
:orphan:

.. _deepspeed_example:

DeepSpeed ZeRO-3 Distributed Training Example with Ray Train
============================================================

.. literalinclude:: /../../python/ray/train/examples/deepspeed/deepspeed_torch_trainer.py
Loading

0 comments on commit c30c8fd

Please sign in to comment.