-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create distribution_lib for TF backend #890
Conversation
hubingallin
commented
Sep 14, 2023
* Add Mesh and Layout helper functions
device_type = ( | ||
device_type.lower() if device_type else dtensor.preferred_device_type() | ||
) | ||
return dtensor.local_devices(device_type=device_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems only return the local devices, which is different from the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that this is not addressed, did I miss anything?
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #890 +/- ##
===========================================
- Coverage 76.56% 60.47% -16.09%
===========================================
Files 329 320 -9
Lines 31422 28893 -2529
Branches 6113 5531 -582
===========================================
- Hits 24057 17473 -6584
- Misses 5786 10078 +4292
+ Partials 1579 1342 -237
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
Please also fix the code format via keras_core/shell/{lint|format}.sh |
device_type = ( | ||
device_type.upper() if device_type else dtensor.preferred_device_type() | ||
) | ||
return tf.config.list_logical_devices(device_type=device_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might not be correct. It probably should map to https://www.tensorflow.org/api_docs/python/tf/experimental/dtensor/Mesh#global_devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted offline. let's include some context here as comment for why we have local device here.
A `tf.dtensor.Mesh` instance. | ||
""" | ||
mesh_dims = list(zip(device_mesh.axis_names, device_mesh.shape)) | ||
return dtensor.create_mesh( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the usage of create_mesh will limit it to be a single worker mesh. We might want to use the create_distributed_mesh https://www.tensorflow.org/api_docs/python/tf/experimental/dtensor/create_distributed_mesh
The unit tests are failing. Please check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the unit test.
Keras Core is becoming Keras 3, and we're switching development to the main repository! Please reopen this PR in the keras-team/keras repository. Unfortunately we aren't able to automatically transfer PRs (but we have transferred all issues). |