Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor deepcopy logic to improve planning speed #665

Closed
wants to merge 1 commit into from
Closed

Refactor deepcopy logic to improve planning speed #665

wants to merge 1 commit into from

Conversation

ge0405
Copy link
Contributor

@ge0405 ge0405 commented Sep 27, 2022

Summary:

Summary

When there are many possible proposals to plan with, e.g. >1000 proposals from GreedyProposer for ads models in [fused, fused_uvm_caching] modes, the planner can run >20 min to come up with a best plan. The speed bottleneck comes from deepcopy(List[ShardingOption]) in proposers and partitioner. With the diff to reorganize the deepcopy operations, the planning time can be decreased to ~2.5 min.

Idea

(1) Reduce times of deepcopy
When List[ShardingOption] is passed as plan/proposal between proposer, partitioner and planner, two times of deepcopy(List[ShardingOption]) (one in proposer, the other in partitioner) are done to isolate the state of List[ShardingOption] between planning stesp. As deepcopy is expensive, we just need to keep one deepcopy in partitioner. Doing so reduces the planning time from ~20 min to ~10 min.

(2) Custom deepcopy
During partitioning, the only field updated in ShardingOption is shard.ranks. To avoid copying big objects (e.g. tensor, module) to save time, a custom __deepcopy__ function is created in ShardingOption that only deepcopies shards, and that everything else is passed with the original object. Doing so reduces the planning time from ~10 min to 2.5 min.

(3) Freeze _tensor and _module
With idea (2), the tensor and module fields in ShardingOption are changed to be read-only to avoid modification during planning.

Differential Revision: D39822605

@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Sep 27, 2022
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39822605

4 similar comments
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39822605

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39822605

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39822605

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39822605

Summary:
Pull Request resolved: #665

# Summary
When there are many possible proposals to plan with, e.g. >1000 proposals from GreedyProposer for ads models in [fused, fused_uvm_caching] modes, the planner can run >20 min to come up with a best plan. The speed bottleneck comes from `deepcopy(List[ShardingOption])` in proposers and partitioner. With the diff to reorganize the deepcopy operations, the planning time can be decreased to ~2.5 min.

# Idea
(1) Decrease times of deepcopy
When `List[ShardingOption]` is passed as plan/proposal between proposer, partitioner and planner, two times of `deepcopy(List[ShardingOption])` (one in proposer, the other in partitioner) are done to isolate the state of List[ShardingOption] between planning steps. As deepcopy is expensive, we just need to keep one deepcopy in partitioner. Doing so reduces the planning time from ~20 min to ~10 min.

(2) Custom `__deepcopy__` function in ShardingOption
Even with just one time of deepcopy in partitioner, the overall planning time is still dominated by that. Looking into the partitioner code, we found the only field updated in ShardingOption is `shard.ranks`. To avoid copying big objects (e.g. tensor, module) to save time, a custom `__deepcopy__` function is created in ShardingOption that only deepcopies everything except `tensors` and `modules`, which are passed with the original object. With this custom deepcopy function, the planning time further decreases from ~10 min to 2.5 min.

(3) Freeze `tensor` and `module` in ShardingOption
With idea (2), the `tensor` and `module` fields in ShardingOption are changed to be read-only to avoid modification during planning.

Reviewed By: bigning

Differential Revision: D39822605

fbshipit-source-id: eb99a8efe9ff580a88ebd6fa446059f223337e3a
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39822605

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants