-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add variable length (batch size) support to TBE training #1653
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs canceled.
|
This pull request was exported from Phabricator. Differential Revision: D43259020 |
Summary: Pull Request resolved: pytorch#1653 This diff adds the variable length (or variable batch size) support in split TBE training on GPU. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE). # If `feature_table_map` is None, `len(Bs)` must be as same as `len(embedding_specs)` # If `feature_table_map` is not None, `len(Bs)` must be as same as `len(feature_table_map)` Bs = [2, 3, 4, 5] # Pass a list of batch_sizes to forward. # !! Make sure to pass batch_sizes as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_sizes=Bs) ``` **Output** {F854479754} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Differential Revision: D43259020 fbshipit-source-id: ac5950387d2908ab15f09d50c8ffeec483da5047
This pull request was exported from Phabricator. Differential Revision: D43259020 |
Summary: Pull Request resolved: pytorch#1653 This diff adds the variable length (or variable batch size) support in split TBE training on GPU. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE). # If `feature_table_map` is None, `len(Bs)` must be as same as `len(embedding_specs)` # If `feature_table_map` is not None, `len(Bs)` must be as same as `len(feature_table_map)` Bs = [2, 3, 4, 5] # Pass a list of batch_sizes to forward. # !! Make sure to pass batch_sizes as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_sizes=Bs) ``` **Output** {F854479754} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Differential Revision: D43259020 fbshipit-source-id: 603a4fc3851ececce5eccf957df58dea9de121a1
This pull request was exported from Phabricator. Differential Revision: D43259020 |
Summary: Pull Request resolved: pytorch#1653 This diff adds the variable length (or variable batch size) support in split TBE training on GPU. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE). # If `feature_table_map` is None, `len(Bs)` must be as same as `len(embedding_specs)` # If `feature_table_map` is not None, `len(Bs)` must be as same as `len(feature_table_map)` Bs = [2, 3, 4, 5] # Pass a list of batch_sizes to forward. # !! Make sure to pass batch_sizes as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_sizes=Bs) ``` **Output** {F854479754} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: f353e3c86bf873d2f999a21d27be5eff646da682
This pull request was exported from Phabricator. Differential Revision: D43259020 |
Summary: Pull Request resolved: pytorch#1653 This diff adds the variable length (or variable batch size) support in split TBE training on GPU. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE). # If `feature_table_map` is None, `len(Bs)` must be as same as `len(embedding_specs)` # If `feature_table_map` is not None, `len(Bs)` must be as same as `len(feature_table_map)` Bs = [2, 3, 4, 5] # Pass a list of batch_sizes to forward. # !! Make sure to pass batch_sizes as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_sizes=Bs) ``` **Output** {F854479754} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: 1b922366319f929a782044d45b5bbff796a58756
This pull request was exported from Phabricator. Differential Revision: D43259020 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D43259020 |
Summary: Pull Request resolved: pytorch#1653 This diff adds the variable length (or variable batch size) support in split TBE training on GPU. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE). # If `feature_table_map` is None, `len(Bs)` must be as same as `len(embedding_specs)` # If `feature_table_map` is not None, `len(Bs)` must be as same as `len(feature_table_map)` Bs = [2, 3, 4, 5] # Pass a list of batch_sizes to forward. # !! Make sure to pass batch_sizes as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_sizes=Bs) ``` **Output** {F854479754} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: 614b92e9dbf66286afb645e7b90800f85c922816
Summary: Pull Request resolved: pytorch#1653 This diff adds the variable length (or variable batch size) support in split TBE training on GPU. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE). # If `feature_table_map` is None, `len(Bs)` must be as same as `len(embedding_specs)` # If `feature_table_map` is not None, `len(Bs)` must be as same as `len(feature_table_map)` Bs = [2, 3, 4, 5] # Pass a list of batch_sizes to forward. # !! Make sure to pass batch_sizes as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_sizes=Bs) ``` **Output** {F854479754} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: 3302c540a0f4227ae6299442e28328248a89ddf1
This pull request was exported from Phabricator. Differential Revision: D43259020 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D43259020 |
Summary: Pull Request resolved: pytorch#1653 This diff adds the variable batch size (or variable length) support in split TBE training on GPU. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE). # If `feature_table_map` is None, `len(Bs)` must be as same as `len(embedding_specs)` # If `feature_table_map` is not None, `len(Bs)` must be as same as `len(feature_table_map)` Bs = [2, 3, 4, 5] # Pass a list of batch_sizes to forward. # !! Make sure to pass batch_sizes as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_sizes=Bs) ``` **Output** {F854479754} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: 91a48a40ce2b9a4b427294ab6a14937dc2a6cfcb
Summary: Pull Request resolved: pytorch#1653 This diff adds the variable batch size (or variable length) support in split TBE training on GPU. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE and each RANK). # Example: num_features = 2, num_ranks = 4 batch_size_per_feature_per_rank = [ [1, 2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0 [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1 ] # Pass a list of batch_size_per_feature_per_rank to forward. # !! Make sure to pass batch_size_per_feature_per_rank as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_size_per_feature_per_rank=Bs_feature_rank) ``` **Output format** {F967393126} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: dc9f88e62086bf335f1662a56ed4c10c2fdcbe0c
This pull request was exported from Phabricator. Differential Revision: D43259020 |
Summary: Pull Request resolved: pytorch#1653 This diff adds the variable batch size (or variable length) support in split TBE training on GPU. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE and each RANK). # Example: num_features = 2, num_ranks = 4 batch_size_per_feature_per_rank = [ [1, 2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0 [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1 ] # Pass a list of batch_size_per_feature_per_rank to forward. # !! Make sure to pass batch_size_per_feature_per_rank as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_size_per_feature_per_rank=Bs_feature_rank) ``` **Output format** {F967393126} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: cb62702ad49b8380bd120a33617a129708fdbc29
This pull request was exported from Phabricator. Differential Revision: D43259020 |
Summary: Pull Request resolved: pytorch#1653 This diff adds the variable batch size (or variable length) support in split TBE training on GPU. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE and each RANK). # Example: num_features = 2, num_ranks = 4 batch_size_per_feature_per_rank = [ [1, 2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0 [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1 ] # Pass a list of batch_size_per_feature_per_rank to forward. # !! Make sure to pass batch_size_per_feature_per_rank as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank) ``` **Output format** {F967393126} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: 6577637feb35c6473f2708fffa50e71ef8dbff9c
This pull request was exported from Phabricator. Differential Revision: D43259020 |
Summary: Pull Request resolved: pytorch#1653 This diff adds the variable batch size (or variable length) support in split TBE training on GPU. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE and each RANK). # Example: num_features = 2, num_ranks = 4 batch_size_per_feature_per_rank = [ [1, 2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0 [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1 ] # Pass a list of batch_size_per_feature_per_rank to forward. # !! Make sure to pass batch_size_per_feature_per_rank as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank) ``` **Output format** {F967393126} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: 1dbe4830e72826e2846b596926387ea12ee08a71
This pull request was exported from Phabricator. Differential Revision: D43259020 |
Summary: Pull Request resolved: pytorch#1653 This diff adds support for variable batch size (or variable length) in split TBE training on GPU (the extension is called "VBE"). VBE is enabled for the following usecase: - split (`SplitTableBatchedEmbeddingBagsCodegen`), and - pooled (`pooling_mode != PoolingMode.NONE`), and - weighted/unweighted, and - rowwise Adagrad optimizer (`optimizer == OptimType.EXACT_ROWWISE_ADAGRAD`) Important note: This feature is enabled for a specific use case in order to keep the binary size of the FBGEMM library within limits. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE and each RANK). # Example: num_features = 2, num_ranks = 4 batch_size_per_feature_per_rank = [ [1, 2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0 [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1 ] # Pass a list of batch_size_per_feature_per_rank to forward. # !! Make sure to pass batch_size_per_feature_per_rank as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank) ``` **Output format** {F982891369} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: 1f29816aab4ea7005bdd7da18940fd1c1aeba511
This pull request was exported from Phabricator. Differential Revision: D43259020 |
Summary: Pull Request resolved: pytorch#1653 This diff adds support for variable batch size (or variable length) in split TBE training on GPU (the extension is called "VBE"). VBE is enabled for the following usecase: - split (`SplitTableBatchedEmbeddingBagsCodegen`), and - pooled (`pooling_mode != PoolingMode.NONE`), and - weighted/unweighted, and - rowwise Adagrad optimizer (`optimizer == OptimType.EXACT_ROWWISE_ADAGRAD`) Important note: This feature is enabled for a specific use case in order to keep the binary size of the FBGEMM library within limits. This diff adds ~40 MB to the library size. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE and each RANK). # Example: num_features = 2, num_ranks = 4 batch_size_per_feature_per_rank = [ [1, 2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0 [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1 ] # Pass a list of batch_size_per_feature_per_rank to forward. # !! Make sure to pass batch_size_per_feature_per_rank as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank) ``` **Output format** {F982891369} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: 5b5f26c481da4193412c22ae6e2870fc7bf8ffcb
This pull request was exported from Phabricator. Differential Revision: D43259020 |
Summary: Pull Request resolved: pytorch#1653 This diff adds support for variable batch size (or variable length) in split TBE training on GPU (the extension is called "VBE"). VBE is enabled for the following usecase: - split (`SplitTableBatchedEmbeddingBagsCodegen`), and - pooled (`pooling_mode != PoolingMode.NONE`), and - weighted/unweighted, and - rowwise Adagrad optimizer (`optimizer == OptimType.EXACT_ROWWISE_ADAGRAD`) Important note: This feature is enabled for a specific use case in order to keep the binary size of the FBGEMM library within limits. This diff adds ~40 MB to the library size. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE and each RANK). # Example: num_features = 2, num_ranks = 4 batch_size_per_feature_per_rank = [ [1, 2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0 [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1 ] # Pass a list of batch_size_per_feature_per_rank to forward. # !! Make sure to pass batch_size_per_feature_per_rank as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank) ``` **Output format** {F982891369} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: 9cec581e56059c328adcade7870636706659d695
Summary: Pull Request resolved: pytorch#1653 This diff adds support for variable batch size (or variable length) in split TBE training on GPU (the extension is called "VBE"). VBE is enabled for the following usecase: - split (`SplitTableBatchedEmbeddingBagsCodegen`), and - pooled (`pooling_mode != PoolingMode.NONE`), and - weighted/unweighted, and - rowwise Adagrad optimizer (`optimizer == OptimType.EXACT_ROWWISE_ADAGRAD`) Important note: This feature is enabled for a specific use case in order to keep the binary size of the FBGEMM library within limits. This diff adds ~40 MB to the library size. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE and each RANK). # Example: num_features = 2, num_ranks = 4 batch_size_per_feature_per_rank = [ [1, 2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0 [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1 ] # Pass a list of batch_size_per_feature_per_rank to forward. # !! Make sure to pass batch_size_per_feature_per_rank as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank) ``` **Output format** {F982891369} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: dd11698ab91f747bff148b18e28083ffe20f0bd5
Summary: Pull Request resolved: pytorch#1653 This diff adds support for variable batch size (or variable length) in split TBE training on GPU (the extension is called "VBE"). VBE is enabled for the following usecase: - split (`SplitTableBatchedEmbeddingBagsCodegen`), and - pooled (`pooling_mode != PoolingMode.NONE`), and - weighted/unweighted, and - rowwise Adagrad optimizer (`optimizer == OptimType.EXACT_ROWWISE_ADAGRAD`) Important note: This feature is enabled for a specific use case in order to keep the binary size of the FBGEMM library within limits. This diff adds ~40 MB to the library size. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE and each RANK). # Example: num_features = 2, num_ranks = 4 batch_size_per_feature_per_rank = [ [1, 2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0 [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1 ] # Pass a list of batch_size_per_feature_per_rank to forward. # !! Make sure to pass batch_size_per_feature_per_rank as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank) ``` **Output format** {F982891369} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: 9702b63511a91e8beabd7b9ce56f627dfdd7282a
This pull request was exported from Phabricator. Differential Revision: D43259020 |
Summary: Pull Request resolved: pytorch#1653 This diff adds support for variable batch size (or variable length) in split TBE training on GPU (the extension is called "VBE"). VBE is enabled for the following usecase: - split (`SplitTableBatchedEmbeddingBagsCodegen`), and - pooled (`pooling_mode != PoolingMode.NONE`), and - weighted/unweighted, and - rowwise Adagrad optimizer (`optimizer == OptimType.EXACT_ROWWISE_ADAGRAD`) Important note: This feature is enabled for a specific use case in order to keep the binary size of the FBGEMM library within limits. This diff adds ~40 MB to the library size. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE and each RANK). # Example: num_features = 2, num_ranks = 4 batch_size_per_feature_per_rank = [ [1, 2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0 [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1 ] # Pass a list of batch_size_per_feature_per_rank to forward. # !! Make sure to pass batch_size_per_feature_per_rank as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank) ``` **Output format** {F982891369} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: 3b82b6f6015a208273aab18ebc861f0ec27d7707
This pull request was exported from Phabricator. Differential Revision: D43259020 |
Summary: Pull Request resolved: pytorch#1653 This diff adds support for variable batch size (or variable length) in split TBE training on GPU (the extension is called "VBE"). VBE is enabled for the following usecase: - split (`SplitTableBatchedEmbeddingBagsCodegen`), and - pooled (`pooling_mode != PoolingMode.NONE`), and - weighted/unweighted, and - rowwise Adagrad optimizer (`optimizer == OptimType.EXACT_ROWWISE_ADAGRAD`) Important note: This feature is enabled for a specific use case in order to keep the binary size of the FBGEMM library within limits. This diff adds ~40 MB to the library size. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE and each RANK). # Example: num_features = 2, num_ranks = 4 batch_size_per_feature_per_rank = [ [1, 2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0 [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1 ] # Pass a list of batch_size_per_feature_per_rank to forward. # !! Make sure to pass batch_size_per_feature_per_rank as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank) ``` **Output format** {F982891369} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: 4b801c6b419096d1b1a6570b3696a18b6ae24ab7
This pull request was exported from Phabricator. Differential Revision: D43259020 |
Summary: Pull Request resolved: pytorch#1653 This diff adds support for variable batch size (or variable length) in split TBE training on GPU (the extension is called "VBE"). VBE is enabled for the following usecase: - split (`SplitTableBatchedEmbeddingBagsCodegen`), and - pooled (`pooling_mode != PoolingMode.NONE`), and - weighted/unweighted, and - rowwise Adagrad optimizer (`optimizer == OptimType.EXACT_ROWWISE_ADAGRAD`) Important note: This feature is enabled for a specific use case in order to keep the binary size of the FBGEMM library within limits. This diff adds ~40 MB to the library size. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE and each RANK). # Example: num_features = 2, num_ranks = 4 batch_size_per_feature_per_rank = [ [1, 2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0 [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1 ] # Pass a list of batch_size_per_feature_per_rank to forward. # !! Make sure to pass batch_size_per_feature_per_rank as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank) ``` **Output format** {F982891369} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: 7a635d25962dd33fe7a52767b64978850d696380
Summary: Pull Request resolved: pytorch#1653 This diff adds support for variable batch size (or variable length) in split TBE training on GPU (the extension is called "VBE"). VBE is enabled for the following usecase: - split (`SplitTableBatchedEmbeddingBagsCodegen`), and - pooled (`pooling_mode != PoolingMode.NONE`), and - weighted/unweighted, and - rowwise Adagrad optimizer (`optimizer == OptimType.EXACT_ROWWISE_ADAGRAD`) Important note: This feature is enabled for a specific use case in order to keep the binary size of the FBGEMM library within limits. This diff adds ~40 MB to the library size. **Usage:** ``` # Initialize TBE as same as previously emb_op = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=[...], ... # other params ) # batch sizes (one for each FEATURE and each RANK). # Example: num_features = 2, num_ranks = 4 batch_size_per_feature_per_rank = [ [1, 2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0 [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1 ] # Pass a list of batch_size_per_feature_per_rank to forward. # !! Make sure to pass batch_size_per_feature_per_rank as a keyword arg because there can be other keyword args in forward. !! output = emb_op(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank) ``` **Output format** {F982891369} **Limitation:** `T` and `max_B` have to fit in 32 bits. - We use lower `info_B_num_bits` bits to store `b` (bag ID; `b` < `max_B`). Supported `max_B` = `2^info_B_num_bits` - We use upper `32 - info_B_num_bits` bits to store `t` (table ID; `t` < `T`). Supported `T` = `2^(32 - info_B_num_bits)` Note that we adjust `info_B_num_bits` automatically at runtime based on `max_B` and `T`. If they cannot fit into 32 bits, it will abort. Reviewed By: jianyuh Differential Revision: D43259020 fbshipit-source-id: a185c20af972e76195e1a844141a440f1f734290
This pull request has been merged in f46904e. |
Summary:
This diff adds the variable length (or variable batch size) support in split TBE training on GPU.
Usage:
Output
{F854479754}
Limitation:
T
andmax_B
have to fit in 32 bits.info_B_num_bits
bits to storeb
(bag ID;b
<max_B
). Supportedmax_B
=2^info_B_num_bits
32 - info_B_num_bits
bits to storet
(table ID;t
<T
). SupportedT
=2^(32 - info_B_num_bits)
Note that we adjust
info_B_num_bits
automatically at runtime based onmax_B
andT
. If they cannot fit into 32 bits, it will abort.Differential Revision: D43259020