-
Notifications
You must be signed in to change notification settings - Fork 563
Dynamic Sharding API + Test for EBC, TW, ShardedTensor #2852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request was exported from Phabricator. Differential Revision: D69095169 |
e642bb6
to
dfa4051
Compare
This pull request was exported from Phabricator. Differential Revision: D69095169 |
…2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
dfa4051
to
0e30a20
Compare
This pull request was exported from Phabricator. Differential Revision: D69095169 |
…2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
0e30a20
to
c4e97c1
Compare
This pull request was exported from Phabricator. Differential Revision: D69095169 |
…2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
c4e97c1
to
f07293d
Compare
This pull request was exported from Phabricator. Differential Revision: D69095169 |
…rch#2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
f07293d
to
0a802c7
Compare
This pull request was exported from Phabricator. Differential Revision: D69095169 |
0a802c7
to
a0fdc30
Compare
…rch#2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
This pull request was exported from Phabricator. Differential Revision: D69095169 |
…rch#2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
a0fdc30
to
4ca5fd4
Compare
This pull request was exported from Phabricator. Differential Revision: D69095169 |
…rch#2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
…rch#2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
…rch#2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
…rch#2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
…rch#2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
…rch#2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
4ca5fd4
to
f54a1cb
Compare
…rch#2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
This pull request was exported from Phabricator. Differential Revision: D69095169 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D69095169 |
f54a1cb
to
62a77ed
Compare
…rch#2852) Summary: Pull Request resolved: meta-pytorch#2852 Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
…rch#2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
62a77ed
to
54d7619
Compare
This pull request was exported from Phabricator. Differential Revision: D69095169 |
…rch#2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
54d7619
to
d0cf0e0
Compare
This pull request was exported from Phabricator. Differential Revision: D69095169 |
…rch#2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
d0cf0e0
to
d0cdc85
Compare
This pull request was exported from Phabricator. Differential Revision: D69095169 |
…rch#2852) Summary: Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs. What's added here: 1. A `reshard` API which implements the `update_shards` APIs for `ShardedEmbeddingBagCollection` 2. Util functions for dynamic sharding - these are used by the `update_shards` API: 1. `extend_shard_name`: for extending `table_i` to `embedding_bags.table_i.weight` 2. `shards_all_to_all`: containing the all to all collective call to redistribute shards in a distributed environment, based on the `changed_sharding_params` 3. `update_state_dict_post_resharding`: for updating a given `state_dict` with new shard `placements` and `local_shards`. 3. A multi-process unit test `test_dynamic_sharding_ebc_tw` testing TW sharded EBCs calling the `reshard` API, sampling from various: `world_sizes`, `num_tables`, `data_types`. 1. This unit test also uses a few utils to generate random inputs and rank placements. A future todo will be to merge this input generation to use the generate call here D71703434 Future work items (features not yet supported in this diff): * CW, RW, and many other sharding types * Optimizer saving * DTensor implementation Differential Revision: D69095169
Summary:
Add initial dynamic sharding API and test. This current version supports EBC, TW, and Sharded Tensor. Other variants beyond those configurations (e.g. CW, RW, DTensor etc..) to be added in next few diffs.
Motivation for Dynamic Sharding: Doc [Work in Progress]
Design: [WIP]
What's added here:
A
reshard
API which implements theupdate_shards
APIs forShardedEmbeddingBagCollection
Util functions for dynamic sharding - these are used by the
update_shards
API:extend_shard_name
: for extendingtable_i
toembedding_bags.table_i.weight
shards_all_to_all
: containing the all to all collective call to redistribute shards in a distributed environment, based on thechanged_sharding_params
update_state_dict_post_resharding
: for updating a givenstate_dict
with new shardplacements
andlocal_shards
.A multi-process unit test
test_dynamic_sharding_ebc_tw
testing TW sharded EBCs calling thereshard
API, sampling from various:world_sizes
,num_tables
,data_types
.Future work items (features not yet supported in this diff):
Differential Revision: D69095169