-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Train] Update docstring and user guides for train_loop_config
#43691
Changes from 2 commits
df46a30
1f30c72
f8644ed
6aa01bd
5c6e520
6e55db7
6ce05f3
0a5cfb5
22d1177
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,7 +23,7 @@ For reference, the final code is as follows: | |
from ray.train.torch import TorchTrainer | ||
from ray.train import ScalingConfig | ||
|
||
def train_func(config): | ||
woshiyyya marked this conversation as resolved.
Show resolved
Hide resolved
|
||
def train_func(): | ||
# Your PyTorch Lightning training code here. | ||
|
||
scaling_config = ScalingConfig(num_workers=2, use_gpu=True) | ||
|
@@ -190,6 +190,13 @@ Begin by wrapping your code in a :ref:`training function <train-overview-trainin | |
|
||
Each distributed training worker executes this function. | ||
|
||
You can specify the input argument for `train_func` via the Trainer's `train_loop_config` parameter. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Optionally, we can extract this section out to a separate file and include it, similar to what's being done here. In the future we may just have a full separate user guide for this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea. I've extracted the common paragraph into a separate doc. |
||
|
||
.. note:: | ||
woshiyyya marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Avoid passing large data objects through `train_loop_config` to reduce the | ||
serialization and deserialization overhead. Instead, it's preferred to | ||
initialize large objects (e.g. datasets, models) directly in `train_func`. | ||
|
||
Ray Train sets up your distributed process group on each worker. You only need to | ||
make a few changes to your Lightning Trainer definition. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not showing
config
argument in the first place, since we didn't specifytrain_loop_config
inTorchTrainer
in this code snippet. Users will be confused about where to put thetrain_func
arguments.