Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SplitTBE optimizer (defuse bwd and optim) #1821

Closed
wants to merge 1 commit into from

Conversation

sryap
Copy link
Contributor

@sryap sryap commented Jun 12, 2023

Differential Revision: D44772326

@netlify
Copy link

netlify bot commented Jun 12, 2023

Deploy Preview for pytorch-fbgemm-docs canceled.

Name Link
🔨 Latest commit b47dd83
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6493a9d11ea21900076a66d7

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request Jun 12, 2023
Summary: Pull Request resolved: pytorch#1821

Differential Revision: D44772326

fbshipit-source-id: 268d468d731a6c8e629fa4c54d79860d4fe10a79
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request Jun 12, 2023
Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

Usage:

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Differential Revision: D44772326

fbshipit-source-id: fbe8c873c88a7783ab1341ab26f16142afea4ee7
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request Jun 12, 2023
Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

Usage:

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: 4d6e54bc0d94cdc6cec3469cd16b805daa5a3394
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request Jun 20, 2023
Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: bce4950db51f0335e7d13229550eb8d073e31161
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request Jun 20, 2023
Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: bb70a403a3d9e89938e39021426343a617ad0bb5
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request Jun 21, 2023
Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: cbd0f96df11ba90d2109c92914b7304d95de2cad
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request Jun 21, 2023
Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: 96b541d57aad88d22d60e3c005f9011b81f3b8ac
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request Jun 21, 2023
Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: 6313cfc192effe7ff260e2040442a5f3141cfbb2
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request Jun 21, 2023
Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: 5f6447d7df1ede2c8c7004a38aa6e294dcba8e0b
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D44772326

sryap added a commit to sryap/FBGEMM that referenced this pull request Jun 22, 2023
Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: 464edcad7ec0df7f8653fb1ecaa3201e1e7527ce
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D44772326

Summary:
Pull Request resolved: pytorch#1821

This diff adds the sparse optimizer op support in FBGEMM GPU.  Before
this diff, FBGEMM GPU only provided the optimizer support via TBE
backward (i.e., TBE's backward was fused with the optimizer step).
However, fused backward and optimizer prevented many exploration
usecases.  Thus, in this diff, we provide individual sparse optimizer
operators.  We call them "`SplitTBE` optimizers" as they are only
applicable for `SplitTBE`'s parameters.

**Limitations**:
- Only support `SplitTBE`'s parameters
- Only support `rowwise_adagrad`
- All embedding tables must have the same embedding dimension

**Usage:**

```
from fbgemm_gpu.split_embedding_optimizer_ops import (
    SplitEmbeddingArgs,
    SplitEmbeddingOptimizerParams,
    SplitEmbeddingRowwiseAdagrad,
)

# Init SplitTBE
split_tbe = SplitTableBatchedEmbeddingBagsCodegen(
    embedding_specs=embedding_specs,
    optimizer=OptimType.NONE,
    feature_table_map=feature_table_map,
)

# Create arguments for SplitTBE optimizer
params = SplitEmbeddingOptimizerParams(weights_dev=cc.weights_dev)
embedding_args = SplitEmbeddingArgs(
    weights_placements=cc.weights_placements,
    weights_offsets=cc.weights_offsets,
    max_D=cc.max_D,
)

# Init SplitTBE optimizer
optim = SplitEmbeddingRowwiseAdagrad(
    params,
    embedding_args,
    embedding_specs,
    feature_table_map,
    learning_rate=lr,
    eps=eps,
    stochastic_rounding=stochastic_rounding,
)

# Invoke optimizer's step
optim.step()
```

Reviewed By: jianyuh

Differential Revision: D44772326

fbshipit-source-id: 95adb292ee3a248c540b51f6ca3686dfb461c0a6
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D44772326

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 96c3711.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants