Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create distribution_lib for TF backend #890

Closed
wants to merge 8 commits into from

Conversation

hubingallin
Copy link

* Add Mesh and Layout helper functions

    * Add Mesh and Layout helper functions
@hubingallin hubingallin marked this pull request as ready for review September 14, 2023 23:55
@qlzh727 qlzh727 self-requested a review September 14, 2023 23:58
device_type = (
device_type.lower() if device_type else dtensor.preferred_device_type()
)
return dtensor.local_devices(device_type=device_type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems only return the local devices, which is different from the docstring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that this is not addressed, did I miss anything?

keras_core/backend/tensorflow/distribution_lib.py Outdated Show resolved Hide resolved
keras_core/backend/tensorflow/distribution_lib.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Sep 18, 2023

Codecov Report

Patch coverage: 40.00% and project coverage change: -16.09% ⚠️

Comparison is base (e8db3b6) 76.56% compared to head (88d3fa2) 60.47%.
Report is 72 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #890       +/-   ##
===========================================
- Coverage   76.56%   60.47%   -16.09%     
===========================================
  Files         329      320        -9     
  Lines       31422    28893     -2529     
  Branches     6113     5531      -582     
===========================================
- Hits        24057    17473     -6584     
- Misses       5786    10078     +4292     
+ Partials     1579     1342      -237     
Flag Coverage Δ
keras_core 60.47% <40.00%> (-16.00%) ⬇️
keras_core-numpy 60.47% <40.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
keras_core/backend/__init__.py 64.10% <ø> (-30.90%) ⬇️
keras_core/backend/tensorflow/distribution_lib.py 35.71% <35.71%> (ø)
keras_core/backend/tensorflow/__init__.py 100.00% <100.00%> (ø)

... and 201 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@qlzh727
Copy link
Member

qlzh727 commented Sep 18, 2023

Please also fix the code format via keras_core/shell/{lint|format}.sh

device_type = (
device_type.upper() if device_type else dtensor.preferred_device_type()
)
return tf.config.list_logical_devices(device_type=device_type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might not be correct. It probably should map to https://www.tensorflow.org/api_docs/python/tf/experimental/dtensor/Mesh#global_devices.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted offline. let's include some context here as comment for why we have local device here.

A `tf.dtensor.Mesh` instance.
"""
mesh_dims = list(zip(device_mesh.axis_names, device_mesh.shape))
return dtensor.create_mesh(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the usage of create_mesh will limit it to be a single worker mesh. We might want to use the create_distributed_mesh https://www.tensorflow.org/api_docs/python/tf/experimental/dtensor/create_distributed_mesh

@qlzh727
Copy link
Member

qlzh727 commented Sep 22, 2023

The unit tests are failing. Please check.

@qlzh727 qlzh727 self-requested a review September 22, 2023 16:32
Copy link
Member

@qlzh727 qlzh727 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the unit test.

@fchollet
Copy link
Member

Keras Core is becoming Keras 3, and we're switching development to the main repository! Please reopen this PR in the keras-team/keras repository. Unfortunately we aren't able to automatically transfer PRs (but we have transferred all issues).

@fchollet fchollet closed this Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants