New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add SplitTBE optimizer (defuse bwd and optim) #1821

Closed

sryap wants to merge 1 commit into pytorch:main from sryap:export-D44772326

Contributor

sryap commented Jun 12, 2023

Differential Revision: D44772326

netlify bot commented Jun 12, 2023 •

edited

✅ Deploy Preview for pytorch-fbgemm-docs canceled.

Name	Link
🔨 Latest commit	`b47dd83`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6493a9d11ea21900076a66d7

facebook-github-bot added cla signed fb-exported labels

Contributor

facebook-github-bot commented Jun 12, 2023

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request


          Add SplitTBE optimizer (defuse bwd and optim) (pytorch#1821)

db4051a

Summary: Pull Request resolved: pytorch#1821

Differential Revision: D44772326

fbshipit-source-id: 268d468d731a6c8e629fa4c54d79860d4fe10a79

sryap force-pushed the export-D44772326 branch from 278eeb4 to db4051a Compare

June 12, 2023 18:05

Contributor

facebook-github-bot commented Jun 12, 2023

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request


          Add SplitTBE optimizer (defuse bwd and optim) (pytorch#1821)

6f9ff1c

Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

Usage:

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Differential Revision: D44772326

fbshipit-source-id: fbe8c873c88a7783ab1341ab26f16142afea4ee7

sryap force-pushed the export-D44772326 branch from db4051a to 6f9ff1c Compare

June 12, 2023 20:50

Contributor

facebook-github-bot commented Jun 12, 2023

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request


          Add SplitTBE optimizer (defuse bwd and optim) (pytorch#1821)

873e817

Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

Usage:

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: 4d6e54bc0d94cdc6cec3469cd16b805daa5a3394

sryap force-pushed the export-D44772326 branch from 6f9ff1c to 873e817 Compare

June 12, 2023 21:06

Contributor

facebook-github-bot commented Jun 12, 2023

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request


          Add SplitTBE optimizer (defuse bwd and optim) (pytorch#1821)

b377ff7

Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: bce4950db51f0335e7d13229550eb8d073e31161

sryap force-pushed the export-D44772326 branch from 873e817 to b377ff7 Compare

June 20, 2023 22:44

Contributor

facebook-github-bot commented Jun 20, 2023

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request


          Add SplitTBE optimizer (defuse bwd and optim) (pytorch#1821)

1d881df

Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: bb70a403a3d9e89938e39021426343a617ad0bb5

sryap force-pushed the export-D44772326 branch from b377ff7 to 1d881df Compare

June 20, 2023 22:52

Contributor

facebook-github-bot commented Jun 20, 2023

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request


          Add SplitTBE optimizer (defuse bwd and optim) (pytorch#1821)

98dba54

Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: cbd0f96df11ba90d2109c92914b7304d95de2cad

sryap force-pushed the export-D44772326 branch from 1d881df to 98dba54 Compare

June 21, 2023 07:01

Contributor

facebook-github-bot commented Jun 21, 2023

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request


          Add SplitTBE optimizer (defuse bwd and optim) (pytorch#1821)

f38a46c

Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: 96b541d57aad88d22d60e3c005f9011b81f3b8ac

sryap force-pushed the export-D44772326 branch from 98dba54 to f38a46c Compare

June 21, 2023 07:10

Contributor

facebook-github-bot commented Jun 21, 2023

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request


          Add SplitTBE optimizer (defuse bwd and optim) (pytorch#1821)

4d65fc8

Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: 6313cfc192effe7ff260e2040442a5f3141cfbb2

sryap force-pushed the export-D44772326 branch from f38a46c to 4d65fc8 Compare

June 21, 2023 07:20

Contributor

facebook-github-bot commented Jun 21, 2023

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request


          Add SplitTBE optimizer (defuse bwd and optim) (pytorch#1821)

a138ff3

Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: 5f6447d7df1ede2c8c7004a38aa6e294dcba8e0b

sryap force-pushed the export-D44772326 branch from 4d65fc8 to a138ff3 Compare

June 21, 2023 07:30

Contributor

facebook-github-bot commented Jun 21, 2023

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request


          Add SplitTBE optimizer (defuse bwd and optim) (pytorch#1821)

4db13dc

Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: 464edcad7ec0df7f8653fb1ecaa3201e1e7527ce

sryap force-pushed the export-D44772326 branch from a138ff3 to 4db13dc Compare

June 22, 2023 01:48

Contributor

facebook-github-bot commented Jun 22, 2023

This pull request was exported from Phabricator. Differential Revision: D44772326


          Add SplitTBE optimizer (defuse bwd and optim) (pytorch#1821)

b47dd83

Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: 95adb292ee3a248c540b51f6ca3686dfb461c0a6

sryap force-pushed the export-D44772326 branch from 4db13dc to b47dd83 Compare

June 22, 2023 01:54

Contributor

facebook-github-bot commented Jun 22, 2023

This pull request was exported from Phabricator. Differential Revision: D44772326

facebook-github-bot closed this in

96c3711

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented Jun 22, 2023

This pull request has been merged in 96c3711.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment