New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Refactor deepcopy logic to improve planning speed #665

Closed

ge0405 wants to merge 1 commit into pytorch:main from ge0405:export-D39822605

Contributor

ge0405 commented Sep 27, 2022

Summary:

Summary

When there are many possible proposals to plan with, e.g. >1000 proposals from GreedyProposer for ads models in [fused, fused_uvm_caching] modes, the planner can run >20 min to come up with a best plan. The speed bottleneck comes from deepcopy(List[ShardingOption]) in proposers and partitioner. With the diff to reorganize the deepcopy operations, the planning time can be decreased to ~2.5 min.

Idea

(1) Reduce times of deepcopy
When List[ShardingOption] is passed as plan/proposal between proposer, partitioner and planner, two times of deepcopy(List[ShardingOption]) (one in proposer, the other in partitioner) are done to isolate the state of List[ShardingOption] between planning stesp. As deepcopy is expensive, we just need to keep one deepcopy in partitioner. Doing so reduces the planning time from ~20 min to ~10 min.

(2) Custom deepcopy
During partitioning, the only field updated in ShardingOption is shard.ranks. To avoid copying big objects (e.g. tensor, module) to save time, a custom __deepcopy__ function is created in ShardingOption that only deepcopies shards, and that everything else is passed with the original object. Doing so reduces the planning time from ~10 min to 2.5 min.

(3) Freeze _tensor and _module
With idea (2), the tensor and module fields in ShardingOption are changed to be read-only to avoid modification during planning.

Differential Revision: D39822605

facebook-github-bot added CLA Signed fb-exported labels

Contributor

facebook-github-bot commented Sep 27, 2022

This pull request was exported from Phabricator. Differential Revision: D39822605

4 similar comments

Contributor

facebook-github-bot commented Sep 27, 2022

This pull request was exported from Phabricator. Differential Revision: D39822605

Contributor

facebook-github-bot commented Sep 27, 2022

This pull request was exported from Phabricator. Differential Revision: D39822605

Contributor

facebook-github-bot commented Sep 28, 2022

This pull request was exported from Phabricator. Differential Revision: D39822605

Contributor

facebook-github-bot commented Sep 30, 2022

This pull request was exported from Phabricator. Differential Revision: D39822605


          Refactor deepcopy logic to improve planning speed (#665)

3ff47f8

Summary:
Pull Request resolved: #665

# Summary
When there are many possible proposals to plan with, e.g. >1000 proposals from GreedyProposer for ads models in [fused, fused_uvm_caching] modes, the planner can run >20 min to come up with a best plan. The speed bottleneck comes from `deepcopy(List[ShardingOption])` in proposers and partitioner. With the diff to reorganize the deepcopy operations, the planning time can be decreased to ~2.5 min.

# Idea
(1) Decrease times of deepcopy
When `List[ShardingOption]` is passed as plan/proposal between proposer, partitioner and planner, two times of `deepcopy(List[ShardingOption])` (one in proposer, the other in partitioner) are done to isolate the state of List[ShardingOption] between planning steps. As deepcopy is expensive, we just need to keep one deepcopy in partitioner. Doing so reduces the planning time from ~20 min to ~10 min.

(2) Custom `__deepcopy__` function in ShardingOption
Even with just one time of deepcopy in partitioner, the overall planning time is still dominated by that. Looking into the partitioner code, we found the only field updated in ShardingOption is `shard.ranks`. To avoid copying big objects (e.g. tensor, module) to save time, a custom `__deepcopy__` function is created in ShardingOption that only deepcopies everything except `tensors` and `modules`, which are passed with the original object. With this custom deepcopy function, the planning time further decreases from ~10 min to 2.5 min.

(3) Freeze `tensor` and `module` in ShardingOption
With idea (2), the `tensor` and `module` fields in ShardingOption are changed to be read-only to avoid modification during planning.

Reviewed By: bigning

Differential Revision: D39822605

fbshipit-source-id: eb99a8efe9ff580a88ebd6fa446059f223337e3a

Contributor

facebook-github-bot commented Oct 1, 2022

This pull request was exported from Phabricator. Differential Revision: D39822605

facebook-github-bot closed this in

77db94c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment