-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SplitTBE optimizer (defuse bwd and optim) #1821
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs canceled.
|
This pull request was exported from Phabricator. Differential Revision: D44772326 |
Summary: Pull Request resolved: pytorch#1821 Differential Revision: D44772326 fbshipit-source-id: 268d468d731a6c8e629fa4c54d79860d4fe10a79
This pull request was exported from Phabricator. Differential Revision: D44772326 |
Summary: Pull Request resolved: pytorch#1821 This diff adds the sparse optimizer op support in FBGEMM GPU. Before this diff, FBGEMM GPU only provided the optimizer support via TBE backward (i.e., TBE's backward was fused with the optimizer step). However, fused backward and optimizer prevented many exploration usecases. Thus, in this diff, we provide individual sparse optimizer operators. We call them "`SplitTBE` optimizers" as they are only applicable for `SplitTBE`'s parameters. **Limitations**: - Only support `SplitTBE`'s parameters - Only support `rowwise_adagrad` - All embedding tables must have the same embedding dimension Usage: ``` from fbgemm_gpu.split_embedding_optimizer_ops import ( SplitEmbeddingArgs, SplitEmbeddingOptimizerParams, SplitEmbeddingRowwiseAdagrad, ) # Init SplitTBE split_tbe = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=embedding_specs, optimizer=OptimType.NONE, feature_table_map=feature_table_map, ) # Create arguments for SplitTBE optimizer params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev) embedding_args = SplitEmbeddingArgs( weights_placements=cc.weights_placements, weights_offsets=cc.weights_offsets, max_D=cc.max_D, ) # Init SplitTBE optimizer optim = SplitEmbeddingRowwiseAdagrad( params, embedding_args, embedding_specs, feature_table_map, learning_rate=lr, eps=eps, stochastic_rounding=stochastic_rounding, ) # Invoke optimizer's step optim.step() ``` Differential Revision: D44772326 fbshipit-source-id: fbe8c873c88a7783ab1341ab26f16142afea4ee7
This pull request was exported from Phabricator. Differential Revision: D44772326 |
Summary: Pull Request resolved: pytorch#1821 This diff adds the sparse optimizer op support in FBGEMM GPU. Before this diff, FBGEMM GPU only provided the optimizer support via TBE backward (i.e., TBE's backward was fused with the optimizer step). However, fused backward and optimizer prevented many exploration usecases. Thus, in this diff, we provide individual sparse optimizer operators. We call them "`SplitTBE` optimizers" as they are only applicable for `SplitTBE`'s parameters. **Limitations**: - Only support `SplitTBE`'s parameters - Only support `rowwise_adagrad` - All embedding tables must have the same embedding dimension Usage: ``` from fbgemm_gpu.split_embedding_optimizer_ops import ( SplitEmbeddingArgs, SplitEmbeddingOptimizerParams, SplitEmbeddingRowwiseAdagrad, ) # Init SplitTBE split_tbe = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=embedding_specs, optimizer=OptimType.NONE, feature_table_map=feature_table_map, ) # Create arguments for SplitTBE optimizer params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev) embedding_args = SplitEmbeddingArgs( weights_placements=cc.weights_placements, weights_offsets=cc.weights_offsets, max_D=cc.max_D, ) # Init SplitTBE optimizer optim = SplitEmbeddingRowwiseAdagrad( params, embedding_args, embedding_specs, feature_table_map, learning_rate=lr, eps=eps, stochastic_rounding=stochastic_rounding, ) # Invoke optimizer's step optim.step() ``` Reviewed By: jianyuh Differential Revision: D44772326 fbshipit-source-id: 4d6e54bc0d94cdc6cec3469cd16b805daa5a3394
This pull request was exported from Phabricator. Differential Revision: D44772326 |
Summary: Pull Request resolved: pytorch#1821 This diff adds the sparse optimizer op support in FBGEMM GPU. Before this diff, FBGEMM GPU only provided the optimizer support via TBE backward (i.e., TBE's backward was fused with the optimizer step). However, fused backward and optimizer prevented many exploration usecases. Thus, in this diff, we provide individual sparse optimizer operators. We call them "`SplitTBE` optimizers" as they are only applicable for `SplitTBE`'s parameters. **Limitations**: - Only support `SplitTBE`'s parameters - Only support `rowwise_adagrad` - All embedding tables must have the same embedding dimension **Usage:** ``` from fbgemm_gpu.split_embedding_optimizer_ops import ( SplitEmbeddingArgs, SplitEmbeddingOptimizerParams, SplitEmbeddingRowwiseAdagrad, ) # Init SplitTBE split_tbe = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=embedding_specs, optimizer=OptimType.NONE, feature_table_map=feature_table_map, ) # Create arguments for SplitTBE optimizer params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev) embedding_args = SplitEmbeddingArgs( weights_placements=cc.weights_placements, weights_offsets=cc.weights_offsets, max_D=cc.max_D, ) # Init SplitTBE optimizer optim = SplitEmbeddingRowwiseAdagrad( params, embedding_args, embedding_specs, feature_table_map, learning_rate=lr, eps=eps, stochastic_rounding=stochastic_rounding, ) # Invoke optimizer's step optim.step() ``` Reviewed By: jianyuh Differential Revision: D44772326 fbshipit-source-id: bce4950db51f0335e7d13229550eb8d073e31161
This pull request was exported from Phabricator. Differential Revision: D44772326 |
Summary: Pull Request resolved: pytorch#1821 This diff adds the sparse optimizer op support in FBGEMM GPU. Before this diff, FBGEMM GPU only provided the optimizer support via TBE backward (i.e., TBE's backward was fused with the optimizer step). However, fused backward and optimizer prevented many exploration usecases. Thus, in this diff, we provide individual sparse optimizer operators. We call them "`SplitTBE` optimizers" as they are only applicable for `SplitTBE`'s parameters. **Limitations**: - Only support `SplitTBE`'s parameters - Only support `rowwise_adagrad` - All embedding tables must have the same embedding dimension **Usage:** ``` from fbgemm_gpu.split_embedding_optimizer_ops import ( SplitEmbeddingArgs, SplitEmbeddingOptimizerParams, SplitEmbeddingRowwiseAdagrad, ) # Init SplitTBE split_tbe = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=embedding_specs, optimizer=OptimType.NONE, feature_table_map=feature_table_map, ) # Create arguments for SplitTBE optimizer params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev) embedding_args = SplitEmbeddingArgs( weights_placements=cc.weights_placements, weights_offsets=cc.weights_offsets, max_D=cc.max_D, ) # Init SplitTBE optimizer optim = SplitEmbeddingRowwiseAdagrad( params, embedding_args, embedding_specs, feature_table_map, learning_rate=lr, eps=eps, stochastic_rounding=stochastic_rounding, ) # Invoke optimizer's step optim.step() ``` Reviewed By: jianyuh Differential Revision: D44772326 fbshipit-source-id: bb70a403a3d9e89938e39021426343a617ad0bb5
This pull request was exported from Phabricator. Differential Revision: D44772326 |
Summary: Pull Request resolved: pytorch#1821 This diff adds the sparse optimizer op support in FBGEMM GPU. Before this diff, FBGEMM GPU only provided the optimizer support via TBE backward (i.e., TBE's backward was fused with the optimizer step). However, fused backward and optimizer prevented many exploration usecases. Thus, in this diff, we provide individual sparse optimizer operators. We call them "`SplitTBE` optimizers" as they are only applicable for `SplitTBE`'s parameters. **Limitations**: - Only support `SplitTBE`'s parameters - Only support `rowwise_adagrad` - All embedding tables must have the same embedding dimension **Usage:** ``` from fbgemm_gpu.split_embedding_optimizer_ops import ( SplitEmbeddingArgs, SplitEmbeddingOptimizerParams, SplitEmbeddingRowwiseAdagrad, ) # Init SplitTBE split_tbe = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=embedding_specs, optimizer=OptimType.NONE, feature_table_map=feature_table_map, ) # Create arguments for SplitTBE optimizer params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev) embedding_args = SplitEmbeddingArgs( weights_placements=cc.weights_placements, weights_offsets=cc.weights_offsets, max_D=cc.max_D, ) # Init SplitTBE optimizer optim = SplitEmbeddingRowwiseAdagrad( params, embedding_args, embedding_specs, feature_table_map, learning_rate=lr, eps=eps, stochastic_rounding=stochastic_rounding, ) # Invoke optimizer's step optim.step() ``` Reviewed By: jianyuh Differential Revision: D44772326 fbshipit-source-id: cbd0f96df11ba90d2109c92914b7304d95de2cad
This pull request was exported from Phabricator. Differential Revision: D44772326 |
Summary: Pull Request resolved: pytorch#1821 This diff adds the sparse optimizer op support in FBGEMM GPU. Before this diff, FBGEMM GPU only provided the optimizer support via TBE backward (i.e., TBE's backward was fused with the optimizer step). However, fused backward and optimizer prevented many exploration usecases. Thus, in this diff, we provide individual sparse optimizer operators. We call them "`SplitTBE` optimizers" as they are only applicable for `SplitTBE`'s parameters. **Limitations**: - Only support `SplitTBE`'s parameters - Only support `rowwise_adagrad` - All embedding tables must have the same embedding dimension **Usage:** ``` from fbgemm_gpu.split_embedding_optimizer_ops import ( SplitEmbeddingArgs, SplitEmbeddingOptimizerParams, SplitEmbeddingRowwiseAdagrad, ) # Init SplitTBE split_tbe = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=embedding_specs, optimizer=OptimType.NONE, feature_table_map=feature_table_map, ) # Create arguments for SplitTBE optimizer params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev) embedding_args = SplitEmbeddingArgs( weights_placements=cc.weights_placements, weights_offsets=cc.weights_offsets, max_D=cc.max_D, ) # Init SplitTBE optimizer optim = SplitEmbeddingRowwiseAdagrad( params, embedding_args, embedding_specs, feature_table_map, learning_rate=lr, eps=eps, stochastic_rounding=stochastic_rounding, ) # Invoke optimizer's step optim.step() ``` Reviewed By: jianyuh Differential Revision: D44772326 fbshipit-source-id: 96b541d57aad88d22d60e3c005f9011b81f3b8ac
This pull request was exported from Phabricator. Differential Revision: D44772326 |
Summary: Pull Request resolved: pytorch#1821 This diff adds the sparse optimizer op support in FBGEMM GPU. Before this diff, FBGEMM GPU only provided the optimizer support via TBE backward (i.e., TBE's backward was fused with the optimizer step). However, fused backward and optimizer prevented many exploration usecases. Thus, in this diff, we provide individual sparse optimizer operators. We call them "`SplitTBE` optimizers" as they are only applicable for `SplitTBE`'s parameters. **Limitations**: - Only support `SplitTBE`'s parameters - Only support `rowwise_adagrad` - All embedding tables must have the same embedding dimension **Usage:** ``` from fbgemm_gpu.split_embedding_optimizer_ops import ( SplitEmbeddingArgs, SplitEmbeddingOptimizerParams, SplitEmbeddingRowwiseAdagrad, ) # Init SplitTBE split_tbe = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=embedding_specs, optimizer=OptimType.NONE, feature_table_map=feature_table_map, ) # Create arguments for SplitTBE optimizer params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev) embedding_args = SplitEmbeddingArgs( weights_placements=cc.weights_placements, weights_offsets=cc.weights_offsets, max_D=cc.max_D, ) # Init SplitTBE optimizer optim = SplitEmbeddingRowwiseAdagrad( params, embedding_args, embedding_specs, feature_table_map, learning_rate=lr, eps=eps, stochastic_rounding=stochastic_rounding, ) # Invoke optimizer's step optim.step() ``` Reviewed By: jianyuh Differential Revision: D44772326 fbshipit-source-id: 6313cfc192effe7ff260e2040442a5f3141cfbb2
This pull request was exported from Phabricator. Differential Revision: D44772326 |
Summary: Pull Request resolved: pytorch#1821 This diff adds the sparse optimizer op support in FBGEMM GPU. Before this diff, FBGEMM GPU only provided the optimizer support via TBE backward (i.e., TBE's backward was fused with the optimizer step). However, fused backward and optimizer prevented many exploration usecases. Thus, in this diff, we provide individual sparse optimizer operators. We call them "`SplitTBE` optimizers" as they are only applicable for `SplitTBE`'s parameters. **Limitations**: - Only support `SplitTBE`'s parameters - Only support `rowwise_adagrad` - All embedding tables must have the same embedding dimension **Usage:** ``` from fbgemm_gpu.split_embedding_optimizer_ops import ( SplitEmbeddingArgs, SplitEmbeddingOptimizerParams, SplitEmbeddingRowwiseAdagrad, ) # Init SplitTBE split_tbe = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=embedding_specs, optimizer=OptimType.NONE, feature_table_map=feature_table_map, ) # Create arguments for SplitTBE optimizer params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev) embedding_args = SplitEmbeddingArgs( weights_placements=cc.weights_placements, weights_offsets=cc.weights_offsets, max_D=cc.max_D, ) # Init SplitTBE optimizer optim = SplitEmbeddingRowwiseAdagrad( params, embedding_args, embedding_specs, feature_table_map, learning_rate=lr, eps=eps, stochastic_rounding=stochastic_rounding, ) # Invoke optimizer's step optim.step() ``` Reviewed By: jianyuh Differential Revision: D44772326 fbshipit-source-id: 5f6447d7df1ede2c8c7004a38aa6e294dcba8e0b
This pull request was exported from Phabricator. Differential Revision: D44772326 |
Summary: Pull Request resolved: pytorch#1821 This diff adds the sparse optimizer op support in FBGEMM GPU. Before this diff, FBGEMM GPU only provided the optimizer support via TBE backward (i.e., TBE's backward was fused with the optimizer step). However, fused backward and optimizer prevented many exploration usecases. Thus, in this diff, we provide individual sparse optimizer operators. We call them "`SplitTBE` optimizers" as they are only applicable for `SplitTBE`'s parameters. **Limitations**: - Only support `SplitTBE`'s parameters - Only support `rowwise_adagrad` - All embedding tables must have the same embedding dimension **Usage:** ``` from fbgemm_gpu.split_embedding_optimizer_ops import ( SplitEmbeddingArgs, SplitEmbeddingOptimizerParams, SplitEmbeddingRowwiseAdagrad, ) # Init SplitTBE split_tbe = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=embedding_specs, optimizer=OptimType.NONE, feature_table_map=feature_table_map, ) # Create arguments for SplitTBE optimizer params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev) embedding_args = SplitEmbeddingArgs( weights_placements=cc.weights_placements, weights_offsets=cc.weights_offsets, max_D=cc.max_D, ) # Init SplitTBE optimizer optim = SplitEmbeddingRowwiseAdagrad( params, embedding_args, embedding_specs, feature_table_map, learning_rate=lr, eps=eps, stochastic_rounding=stochastic_rounding, ) # Invoke optimizer's step optim.step() ``` Reviewed By: jianyuh Differential Revision: D44772326 fbshipit-source-id: 464edcad7ec0df7f8653fb1ecaa3201e1e7527ce
This pull request was exported from Phabricator. Differential Revision: D44772326 |
Summary: Pull Request resolved: pytorch#1821 This diff adds the sparse optimizer op support in FBGEMM GPU. Before this diff, FBGEMM GPU only provided the optimizer support via TBE backward (i.e., TBE's backward was fused with the optimizer step). However, fused backward and optimizer prevented many exploration usecases. Thus, in this diff, we provide individual sparse optimizer operators. We call them "`SplitTBE` optimizers" as they are only applicable for `SplitTBE`'s parameters. **Limitations**: - Only support `SplitTBE`'s parameters - Only support `rowwise_adagrad` - All embedding tables must have the same embedding dimension **Usage:** ``` from fbgemm_gpu.split_embedding_optimizer_ops import ( SplitEmbeddingArgs, SplitEmbeddingOptimizerParams, SplitEmbeddingRowwiseAdagrad, ) # Init SplitTBE split_tbe = SplitTableBatchedEmbeddingBagsCodegen( embedding_specs=embedding_specs, optimizer=OptimType.NONE, feature_table_map=feature_table_map, ) # Create arguments for SplitTBE optimizer params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev) embedding_args = SplitEmbeddingArgs( weights_placements=cc.weights_placements, weights_offsets=cc.weights_offsets, max_D=cc.max_D, ) # Init SplitTBE optimizer optim = SplitEmbeddingRowwiseAdagrad( params, embedding_args, embedding_specs, feature_table_map, learning_rate=lr, eps=eps, stochastic_rounding=stochastic_rounding, ) # Invoke optimizer's step optim.step() ``` Reviewed By: jianyuh Differential Revision: D44772326 fbshipit-source-id: 95adb292ee3a248c540b51f6ca3686dfb461c0a6
This pull request was exported from Phabricator. Differential Revision: D44772326 |
This pull request has been merged in 96c3711. |
Differential Revision: D44772326