Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Train] Support Accelerator Type in
ScalingConfig
#43090[Train] Support Accelerator Type in
ScalingConfig
#43090Changes from 20 commits
567690c
6d5456a
e046b40
cf8e4a4
096f0c3
f0da060
620f69f
7daad93
f1ac1e0
7339b97
3fdef03
ddab5f3
49eacc6
be081d4
d098c9c
d0566d8
482e5e2
c16b336
3842c89
3f19960
e227213
bbc3d62
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we can clean up this branching logic a bit now since we don't return early anymore, but we can do it in a separate PR...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let's merge it first. I'll post a followup PR to remove the branchings like colab & num_workers=None, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use
setdefault
similar to GPU in the line above? In case the user manually setsresources_per_worker
. Or validate that it's not set in the resources dict if we want this to be the only interface where the user can specify.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
"accelerator_type:{type}"
key should only be used as internal api, users should not specify it inresource_per_worker
, as we provided theScalingConfig(accelerator_type=)
api as the only entry.For example, if a user needs two A100 GPUs, they can do:
But I agree to change it to
setdefault
to provide more flexibility for advanced use cases.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to explicitly set it here too? Won't it get merged with rank 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch. This shouldn't be here after we merged the previous colocate pr.