Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ray Train] Explain how to set timeout when using PyTorch Lightning Trainer #36315

Closed
scottsun94 opened this issue Jun 12, 2023 · 5 comments · Fixed by #40376
Closed

[Ray Train] Explain how to set timeout when using PyTorch Lightning Trainer #36315

scottsun94 opened this issue Jun 12, 2023 · 5 comments · Fixed by #40376
Assignees
Labels
docs An issue or change related to documentation train Ray Train Related Issue

Comments

@scottsun94
Copy link
Contributor

Description

It seems that users need to set timeout using torchConfig.

It's not clear when I should set it in ray.train.lightning.LightningConfigBuilder.strategy or in TorchCnfig.

Link

No response

@scottsun94 scottsun94 added triage Needs triage (eg: priority, bug/not-bug, and owning component) docs An issue or change related to documentation labels Jun 12, 2023
@scottsun94
Copy link
Contributor Author

cc: @woshiyyya @matthewdeng

@woshiyyya
Copy link
Member

woshiyyya commented Jun 12, 2023

Yeah that's a problem, and I think this one of the action items for unifying configurations between lightning and AIR.

Currently we have two sets of configuration for Lightning and Ray AIR (checkpoint configs, backend configs, and scaling configs), which makes it hard for users to figure out the right place to provide them.

The ideal state from my mind is the user only need to provide lightning config, and we'll create a corresponding AIR config for them. But still there are many details to consider. I can draft a proposal on this topic later.

@woshiyyya woshiyyya self-assigned this Jun 12, 2023
@woshiyyya woshiyyya added train Ray Train Related Issue and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jun 12, 2023
@stale
Copy link

stale bot commented Oct 15, 2023

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

  • If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
  • If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Oct 15, 2023
@scottsun94
Copy link
Contributor Author

This should be straightforward in 2.7? @woshiyyya

@scottsun94 scottsun94 removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Oct 16, 2023
@woshiyyya
Copy link
Member

woshiyyya commented Oct 16, 2023

Actually not. For the new API, they still need to specify timeouts in TorchConfig instead of RayDDPStrategy.

I think we still need to update the docstring for RayDDPStrategy to clarify this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs An issue or change related to documentation train Ray Train Related Issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants