-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[train] simplify TorchTrainer docstring #38049
[train] simplify TorchTrainer docstring #38049
Conversation
Signed-off-by: Matthew Deng <matt@anyscale.com>
Related to your comment; #33429 the fix is to not override new in the base trainer |
Yeah... not clear to me how to though. 😢 |
Would it be helpful if I made a draft/closed PR, and perhaps we can find someone to help shepherd a proper fix into 2.7? |
Yeah that would be great if you have an idea how to - I tried doing something in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Matt. This looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really, really great improvement. Thank you!
@@ -16,217 +16,121 @@ | |||
class TorchTrainer(DataParallelTrainer): | |||
"""A Trainer for data parallel PyTorch training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same training for all torch-based trainers? Is it appropriate to mention that Lightning and HF also follow this pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but will make this change in a future PR when this becomes the standard way!
|
||
import ray | ||
from ray import train | ||
from ray.train import Checkpoint, CheckpointConfig, RunConfig, ScalingConfig | ||
from ray.train.torch import TorchTrainer | ||
|
||
# If using GPUs, set this to True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're setting it to True by default, does it make sense to make the comment: "If not using GPUs, set this to False."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
num_epochs = 20 | ||
num_workers = 3 | ||
use_gpu = True | ||
num_workers = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does everyone know what a worker is, or should we add a comment about what this is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment
2. Sets up a PyTorch Distributed environment | ||
on these workers as defined by the ``torch_config``. | ||
3. Ingests the input ``datasets`` based on the ``dataset_config``. | ||
4. Runs the input ``train_loop_per_worker(train_loop_config)`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a user-defined function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it is
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Signed-off-by: matthewdeng <matthew.j.deng@gmail.com>
Signed-off-by: Matthew Deng <matt@anyscale.com>
Signed-off-by: Matthew Deng <matt@anyscale.com>
Signed-off-by: Matthew Deng <matt@anyscale.com> Signed-off-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Signed-off-by: NripeshN <nn2012@hw.ac.uk>
Signed-off-by: Matthew Deng <matt@anyscale.com> Signed-off-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Signed-off-by: harborn <gangsheng.wu@intel.com>
Signed-off-by: Matthew Deng <matt@anyscale.com> Signed-off-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>
Signed-off-by: Matthew Deng <matt@anyscale.com> Signed-off-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
Signed-off-by: Matthew Deng <matt@anyscale.com> Signed-off-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Signed-off-by: Victor <vctr.y.m@example.com>
Why are these changes needed?
Generated API Reference: TorchTrainer
This PR simplifies the
TorchTrainer
docstring with the following changes:dataset_config
immediately afterdatasets
).Note: One very unfortunate thing is that the arguments do not get rendered properly, so it's not possible to see the type hints right now. I would like to follow up on this and get this fixed in the future...
![image](https://private-user-images.githubusercontent.com/3967392/257991343-2261b461-fc7d-4ca5-8639-2a71913978f9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA1NzEyMjEsIm5iZiI6MTcyMDU3MDkyMSwicGF0aCI6Ii8zOTY3MzkyLzI1Nzk5MTM0My0yMjYxYjQ2MS1mYzdkLTRjYTUtODYzOS0yYTcxOTEzOTc4ZjkucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcxMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MTBUMDAyMjAxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZmQ3NjVlNDgzNTMxYzMzNzU5ZGY1YjM2MTAyYWY5ZWJlMDc1YzgwMTQ1ZGQxNmYyNDJiYzcxMjYyODYzOWY0ZSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.AtCA2zi78C8SImOtVVlDr31KKcpdPvgO_QeN3Fpk6DQ)
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.