-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Unified TorchTrainer] Add PyTorch Lightning Trainer Utilities #37989
[Unified TorchTrainer] Add PyTorch Lightning Trainer Utilities #37989
Conversation
…/add_lightning_utilities
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really clean. I think we should showcase one example with the previous API compare it with the new API.
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
…/add_lightning_utilities
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
If the user chooses not to report to train, we won't be able to print any related metrics in the train output right? What will happen to this training result table?
|
…/add_lightning_utilities
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
…/add_lightning_utilities
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read half of it, will continue later today. Looks great so far! Would be great to get a version that renders in RTD for preview.
datamodule = MyLightningDataModule(...) | ||
|
||
trainer = pl.Trainer( | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: let's use comments here so that the python code at least would parse correctly (even if it isn't executed)
... | |
# ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this actually works in python!
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
…/add_lightning_utilities
@krfricke Updated rendered doc: https://anyscale-ray--37989.com.readthedocs.build/en/37989/ |
…/add_lightning_utilities
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks great to me, I think that we can polish and shift around some of the documentation content in a follow up PR.
doc/source/train/distributed-pytorch/converting-existing-training-loop.rst
Outdated
Show resolved
Hide resolved
doc/source/train/distributed-pytorch/converting-existing-training-loop.rst
Outdated
Show resolved
Hide resolved
doc/source/train/distributed-pytorch/converting-existing-training-loop.rst
Outdated
Show resolved
Hide resolved
doc/source/train/distributed-pytorch/converting-existing-training-loop.rst
Show resolved
Hide resolved
doc/source/train/distributed-pytorch/converting-existing-training-loop.rst
Outdated
Show resolved
Hide resolved
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, minor nits
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
…roject#37989) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
…roject#37989) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: NripeshN <nn2012@hw.ac.uk>
…roject#37989) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: harborn <gangsheng.wu@intel.com>
…roject#37989) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
…roject#37989) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
…roject#37989) Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Signed-off-by: Victor <vctr.y.m@example.com>
Why are these changes needed?
Add Lightning related utilities for Unified TorchTrainer.
POC example: Text Classification with Ray Data + Lightning
Rendered Doc: https://anyscale-ray--37989.com.readthedocs.build/en/37989/
New utilities
RayDDPStrategy
,RayFSDPStrategy
,RayDeepSpeedStrategy
RayLightningEnvironment
RayTrainReportCallback
prepare_trainer()
: To check if the users correctly configured thepl.Trainer
Users can inject these utilities to run Lightning code in TorchTrainer.
Metrics and checkpoint reporting
We propose to let the users define the logs in a lightning callback themselves.
The users can choose whether to report the metrics to Ray Train or not.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.