-
Notifications
You must be signed in to change notification settings - Fork 301
Add DTensor layout map class method for OPT #1000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/gcbrun |
e5702c9
to
73442f2
Compare
/gcbrun |
73442f2
to
f268aec
Compare
/gcbrun |
/gcbrun |
These failures look like some upstream issues with dtensor (repro). Checking it out with dtensor folks now! |
/gcbrun |
ea2fcc7
to
894a5f8
Compare
/gcbrun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
|
||
@classmethod | ||
def create_layout_map(cls, mesh): | ||
"""Create a DTensor layout map for an OPTCasualLM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider parameterizing the docstrings if there's a lot of copypasta going forward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wasn't taking on the "base version" of this quite yet, but it would be very valid to. Probably there is very little changes that need to made here per backbone/task.
894a5f8
to
836b142
Compare
/gcbrun |
836b142
to
8fb9904
Compare
/gcbrun |
This drafts the dtensor API we want to add for the OPT backbone and task models. Alternately, we could start adding these to the base Backbone and Task methods, just keeping it simple with just the OPT classes for now.
/gcbrun |
/gcbrun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Do we have any test runs on GCP using this code?
No great automated test coverage yet. The best way to test this would be an actual multi-worker setup, which would take some playing around with resources. We could also probably fake a multi device setup on a single CPU, but even then we would probably need to run this in a separate process from our main test. Let's do that as a follow up, as it is fairly involved. |
Sorry I was actually just asking if we've ever tried a manual run with this code as a sanity check. |
* Add DTensor layout map class method for OPT This drafts the dtensor API we want to add for the OPT backbone and task models. Alternately, we could start adding these to the base Backbone and Task methods, just keeping it simple with just the OPT classes for now. * Workaround for DTensor failures * Copy edits
This drafts the dtensor API we want to add for the OPT backbone and task models.
Alternately, we could start adding these to the base Backbone and Task methods, just keeping it simple with just the OPT classes for now.