Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dest_offsets directly in LoadPlanner #7155

Merged
merged 3 commits into from
Jun 11, 2024
Merged

Use dest_offsets directly in LoadPlanner #7155

merged 3 commits into from
Jun 11, 2024

Conversation

jonb377
Copy link
Collaborator

@jonb377 jonb377 commented May 30, 2024

In create_read_items_for_chunk_list, the ReadItem's dest_offsets are set to index into the local shard. Despite this, we performed an translation from global to local indices. This caused issues restoring checkpoints with padded tensors, i.e. where the global tensor size is not evenly divisible by the shard size.

We should instead rely on dest_offsets directly to narrow the local shards.

@jonb377 jonb377 requested a review from alanwaketan May 30, 2024 17:33
@jonb377 jonb377 changed the title Jonbolin/dcp Use dest_offsets directly May 30, 2024
@jonb377 jonb377 changed the title Use dest_offsets directly Use dest_offsets directly in LoadPlanner May 30, 2024
@dasoto
Copy link

dasoto commented Jun 11, 2024

Any reason this has not been merged?

As context this also fixed an issue we have by saving a distributed checkpoint on a 8x16 and then loading it in other topology.

@jonb377
Copy link
Collaborator Author

jonb377 commented Jun 11, 2024

@alanwaketan Do you have a sec to review? I'll also add @JackCaoG

@jonb377 jonb377 requested a review from JackCaoG June 11, 2024 18:15
@JackCaoG
Copy link
Collaborator

does this change needs to go into 2.4 release(branch cut was yesterday)?

@jonb377 jonb377 merged commit 7be1d3d into master Jun 11, 2024
20 checks passed
@jonb377
Copy link
Collaborator Author

jonb377 commented Jun 11, 2024

Yes, we should cherry-pick this. I'll open a PR.

Copy link
Collaborator

@alanwaketan alanwaketan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Sorry for missing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants