New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Enable Global Weight Decay for VBE #2507

Closed

spcyppt wants to merge 1 commit into pytorch:main from spcyppt:export-D56200676

Contributor

spcyppt commented Apr 17, 2024

Summary:
Enable Global weight decay for VBE

Usage:
set

optimizer = OptimType.EXACT_ROWWISE_ADAGRAD
weight_decay_mode = WeightDecayMode.DECOUPLE_GLOBAL

#  for VBE, pass batch_size_per_feature_per_rank
# Example: num_features = 2, num_ranks = 4
batch_size_per_feature_per_rank = [
    [1,  2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0
    [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1
]

e.g.,

tbe = SplitTableBatchedEmbeddingBagsCodegen(
            embedding_specs=[
                (E, D, managed_option, ComputeDevice.CUDA) for (E, D) in zip(Es, Ds)
            ],
            optimizer=OptimType.EXACT_ROWWISE_ADAGRAD,
            learning_rate=0.1,
            eps=0.1,
            output_dtype=output_dtype,
            pooling_mode=pooling_mode,
            weight_decay_mode=WeightDecayMode.DECOUPLE_GLOBAL,
        )
output = tbe(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank)

Relevant diffs:
D53866750
D55660277
D55660762

Differential Revision: D56200676

facebook-github-bot added the cla signed label

netlify bot commented Apr 17, 2024 •

edited

Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`7ec38f6`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66845e471c5d0c0008b93f83
😎 Deploy Preview	https://deploy-preview-2507--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Contributor

facebook-github-bot commented Apr 17, 2024

This pull request was exported from Phabricator. Differential Revision: D56200676

facebook-github-bot added the fb-exported label

Contributor

facebook-github-bot commented Apr 17, 2024

This pull request was exported from Phabricator. Differential Revision: D56200676

spcyppt force-pushed the export-D56200676 branch from 9436f4f to 72f21ee Compare

April 17, 2024 01:46

spcyppt added a commit to spcyppt/FBGEMM that referenced this pull request


          Enable Global Weight Decay for VBE (pytorch#2507)

72f21ee

Summary:
Pull Request resolved: pytorch#2507

Enable Global weight decay for VBE
 ---
**Usage:**
set
```
optimizer = OptimType.EXACT_ROWWISE_ADAGRAD
weight_decay_mode = WeightDecayMode.DECOUPLE_GLOBAL

#  for VBE, pass batch_size_per_feature_per_rank
# Example: num_features = 2, num_ranks = 4
batch_size_per_feature_per_rank = [
    [1,  2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0
    [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1
]
```

e.g.,
```
tbe = SplitTableBatchedEmbeddingBagsCodegen(
            embedding_specs=[
                (E, D, managed_option, ComputeDevice.CUDA) for (E, D) in zip(Es, Ds)
            ],
            optimizer=OptimType.EXACT_ROWWISE_ADAGRAD,
            learning_rate=0.1,
            eps=0.1,
            output_dtype=output_dtype,
            pooling_mode=pooling_mode,
            weight_decay_mode=WeightDecayMode.DECOUPLE_GLOBAL,
        )
output = tbe(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank)
```
Relevant diffs:
D53866750
D55660277
D55660762

Differential Revision: D56200676

spcyppt added a commit to spcyppt/FBGEMM that referenced this pull request


          Enable Global Weight Decay for VBE (pytorch#2507)

d7274df

Summary:
Pull Request resolved: pytorch#2507

Enable Global weight decay for VBE
 ---
**Usage:**
set
```
optimizer = OptimType.EXACT_ROWWISE_ADAGRAD
weight_decay_mode = WeightDecayMode.DECOUPLE_GLOBAL

#  for VBE, pass batch_size_per_feature_per_rank
# Example: num_features = 2, num_ranks = 4
batch_size_per_feature_per_rank = [
    [1,  2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0
    [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1
]
```

e.g.,
```
tbe = SplitTableBatchedEmbeddingBagsCodegen(
            embedding_specs=[
                (E, D, managed_option, ComputeDevice.CUDA) for (E, D) in zip(Es, Ds)
            ],
            optimizer=OptimType.EXACT_ROWWISE_ADAGRAD,
            learning_rate=0.1,
            eps=0.1,
            output_dtype=output_dtype,
            pooling_mode=pooling_mode,
            weight_decay_mode=WeightDecayMode.DECOUPLE_GLOBAL,
        )
output = tbe(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank)
```
Relevant diffs:
D53866750
D55660277
D55660762

Differential Revision: D56200676

spcyppt force-pushed the export-D56200676 branch from 72f21ee to d7274df Compare

May 2, 2024 07:47

Contributor

facebook-github-bot commented May 2, 2024

This pull request was exported from Phabricator. Differential Revision: D56200676

1 similar comment

Contributor

facebook-github-bot commented Jul 2, 2024

This pull request was exported from Phabricator. Differential Revision: D56200676

spcyppt force-pushed the export-D56200676 branch from d7274df to 3e4d361 Compare

July 2, 2024 01:03

spcyppt added a commit to spcyppt/FBGEMM that referenced this pull request


          Enable Global Weight Decay for TBE VBE (pytorch#2507)

3e4d361

Summary:
Pull Request resolved: pytorch#2507

Enable Global weight decay for VBE
 ---
**Usage:**
set
```
optimizer = OptimType.EXACT_ROWWISE_ADAGRAD
weight_decay_mode = WeightDecayMode.DECOUPLE_GLOBAL

#  for VBE, pass batch_size_per_feature_per_rank
# Example: num_features = 2, num_ranks = 4
batch_size_per_feature_per_rank = [
    [1,  2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0
    [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1
]
```

e.g.,
```
tbe = SplitTableBatchedEmbeddingBagsCodegen(
            embedding_specs=[
                (E, D, managed_option, ComputeDevice.CUDA) for (E, D) in zip(Es, Ds)
            ],
            optimizer=OptimType.EXACT_ROWWISE_ADAGRAD,
            learning_rate=0.1,
            eps=0.1,
            output_dtype=output_dtype,
            pooling_mode=pooling_mode,
            weight_decay_mode=WeightDecayMode.DECOUPLE_GLOBAL,
        )
output = tbe(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank)
```
Relevant diffs:
D53866750
D55660277
D55660762

Reviewed By: sryap

Differential Revision: D56200676

Contributor

facebook-github-bot commented Jul 2, 2024

This pull request was exported from Phabricator. Differential Revision: D56200676

spcyppt added a commit to spcyppt/FBGEMM that referenced this pull request


          Enable Global Weight Decay for TBE VBE (pytorch#2507)

2ae3a71

Summary:
Pull Request resolved: pytorch#2507

Enable Global weight decay for VBE
 ---
**Usage:**
set
```
optimizer = OptimType.EXACT_ROWWISE_ADAGRAD
weight_decay_mode = WeightDecayMode.DECOUPLE_GLOBAL

#  for VBE, pass batch_size_per_feature_per_rank
# Example: num_features = 2, num_ranks = 4
batch_size_per_feature_per_rank = [
    [1,  2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0
    [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1
]
```

e.g.,
```
tbe = SplitTableBatchedEmbeddingBagsCodegen(
            embedding_specs=[
                (E, D, managed_option, ComputeDevice.CUDA) for (E, D) in zip(Es, Ds)
            ],
            optimizer=OptimType.EXACT_ROWWISE_ADAGRAD,
            learning_rate=0.1,
            eps=0.1,
            output_dtype=output_dtype,
            pooling_mode=pooling_mode,
            weight_decay_mode=WeightDecayMode.DECOUPLE_GLOBAL,
        )
output = tbe(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank)
```
Relevant diffs:
D53866750
D55660277
D55660762

Reviewed By: sryap

Differential Revision: D56200676

spcyppt force-pushed the export-D56200676 branch from 3e4d361 to 2ae3a71 Compare

July 2, 2024 01:20

Contributor

facebook-github-bot commented Jul 2, 2024

This pull request was exported from Phabricator. Differential Revision: D56200676

spcyppt added a commit to spcyppt/FBGEMM that referenced this pull request


          Enable Global Weight Decay for TBE VBE (pytorch#2507)

e5f2b58

Summary:
Pull Request resolved: pytorch#2507

Enable Global weight decay for VBE
 ---
**Usage:**
set
```
optimizer = OptimType.EXACT_ROWWISE_ADAGRAD
weight_decay_mode = WeightDecayMode.DECOUPLE_GLOBAL

#  for VBE, pass batch_size_per_feature_per_rank
# Example: num_features = 2, num_ranks = 4
batch_size_per_feature_per_rank = [
    [1,  2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0
    [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1
]
```

e.g.,
```
tbe = SplitTableBatchedEmbeddingBagsCodegen(
            embedding_specs=[
                (E, D, managed_option, ComputeDevice.CUDA) for (E, D) in zip(Es, Ds)
            ],
            optimizer=OptimType.EXACT_ROWWISE_ADAGRAD,
            learning_rate=0.1,
            eps=0.1,
            output_dtype=output_dtype,
            pooling_mode=pooling_mode,
            weight_decay_mode=WeightDecayMode.DECOUPLE_GLOBAL,
        )
output = tbe(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank)
```
Relevant diffs:
D53866750
D55660277
D55660762

Reviewed By: sryap

Differential Revision: D56200676

spcyppt force-pushed the export-D56200676 branch from 2ae3a71 to e5f2b58 Compare

July 2, 2024 01:33

Contributor

facebook-github-bot commented Jul 2, 2024

This pull request was exported from Phabricator. Differential Revision: D56200676

spcyppt force-pushed the export-D56200676 branch from 7362242 to 4b017aa Compare

July 2, 2024 02:45

Contributor

facebook-github-bot commented Jul 2, 2024

This pull request was exported from Phabricator. Differential Revision: D56200676

spcyppt added a commit to spcyppt/FBGEMM that referenced this pull request


          Enable Global Weight Decay for TBE VBE (pytorch#2507)

dee4542

Summary:
Pull Request resolved: pytorch#2507

Enable Global weight decay for VBE
 ---
**Usage:**
set
```
optimizer = OptimType.EXACT_ROWWISE_ADAGRAD
weight_decay_mode = WeightDecayMode.DECOUPLE_GLOBAL

#  for VBE, pass batch_size_per_feature_per_rank
# Example: num_features = 2, num_ranks = 4
batch_size_per_feature_per_rank = [
    [1,  2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0
    [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1
]
```

e.g.,
```
tbe = SplitTableBatchedEmbeddingBagsCodegen(
            embedding_specs=[
                (E, D, managed_option, ComputeDevice.CUDA) for (E, D) in zip(Es, Ds)
            ],
            optimizer=OptimType.EXACT_ROWWISE_ADAGRAD,
            learning_rate=0.1,
            eps=0.1,
            output_dtype=output_dtype,
            pooling_mode=pooling_mode,
            weight_decay_mode=WeightDecayMode.DECOUPLE_GLOBAL,
        )
output = tbe(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank)
```
Relevant diffs:
D53866750
D55660277
D55660762

Reviewed By: sryap

Differential Revision: D56200676

spcyppt force-pushed the export-D56200676 branch from 4b017aa to dee4542 Compare

July 2, 2024 18:56

Contributor

facebook-github-bot commented Jul 2, 2024

This pull request was exported from Phabricator. Differential Revision: D56200676

spcyppt added a commit to spcyppt/FBGEMM that referenced this pull request


          Enable Global Weight Decay for TBE VBE (pytorch#2507)

6fc9692

Summary:
Pull Request resolved: pytorch#2507

Enable Global weight decay for VBE
 ---
**Usage:**
set
```
optimizer = OptimType.EXACT_ROWWISE_ADAGRAD
weight_decay_mode = WeightDecayMode.DECOUPLE_GLOBAL

#  for VBE, pass batch_size_per_feature_per_rank
# Example: num_features = 2, num_ranks = 4
batch_size_per_feature_per_rank = [
    [1,  2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0
    [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1
]
```

e.g.,
```
tbe = SplitTableBatchedEmbeddingBagsCodegen(
            embedding_specs=[
                (E, D, managed_option, ComputeDevice.CUDA) for (E, D) in zip(Es, Ds)
            ],
            optimizer=OptimType.EXACT_ROWWISE_ADAGRAD,
            learning_rate=0.1,
            eps=0.1,
            output_dtype=output_dtype,
            pooling_mode=pooling_mode,
            weight_decay_mode=WeightDecayMode.DECOUPLE_GLOBAL,
        )
output = tbe(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank)
```
Relevant diffs:
D53866750
D55660277
D55660762

Reviewed By: sryap

Differential Revision: D56200676

spcyppt force-pushed the export-D56200676 branch from dee4542 to 6fc9692 Compare

July 2, 2024 19:57

Contributor

facebook-github-bot commented Jul 2, 2024

This pull request was exported from Phabricator. Differential Revision: D56200676

spcyppt added a commit to spcyppt/FBGEMM that referenced this pull request


          Enable Global Weight Decay for TBE VBE (pytorch#2507)

35a7c04

Summary:
Pull Request resolved: pytorch#2507

Enable Global weight decay for VBE
 ---
**Usage:**
set
```
optimizer = OptimType.EXACT_ROWWISE_ADAGRAD
weight_decay_mode = WeightDecayMode.DECOUPLE_GLOBAL

#  for VBE, pass batch_size_per_feature_per_rank
# Example: num_features = 2, num_ranks = 4
batch_size_per_feature_per_rank = [
    [1,  2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0
    [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1
]
```

e.g.,
```
tbe = SplitTableBatchedEmbeddingBagsCodegen(
            embedding_specs=[
                (E, D, managed_option, ComputeDevice.CUDA) for (E, D) in zip(Es, Ds)
            ],
            optimizer=OptimType.EXACT_ROWWISE_ADAGRAD,
            learning_rate=0.1,
            eps=0.1,
            output_dtype=output_dtype,
            pooling_mode=pooling_mode,
            weight_decay_mode=WeightDecayMode.DECOUPLE_GLOBAL,
        )
output = tbe(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank)
```
Relevant diffs:
D53866750
D55660277
D55660762

Reviewed By: sryap

Differential Revision: D56200676

spcyppt force-pushed the export-D56200676 branch from 6fc9692 to 35a7c04 Compare

July 2, 2024 20:02


          Enable Global Weight Decay for TBE VBE (pytorch#2507)

7ec38f6

Summary:
Pull Request resolved: pytorch#2507

Enable Global weight decay for VBE
 ---
**Usage:**
set
```
optimizer = OptimType.EXACT_ROWWISE_ADAGRAD
weight_decay_mode = WeightDecayMode.DECOUPLE_GLOBAL

#  for VBE, pass batch_size_per_feature_per_rank
# Example: num_features = 2, num_ranks = 4
batch_size_per_feature_per_rank = [
    [1,  2, 8, 3] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 0
    [6, 10, 3, 5] # batch sizes for [Rank 0, Rank 1, Rank 2, Rank 3] in Feature 1
]
```

e.g.,
```
tbe = SplitTableBatchedEmbeddingBagsCodegen(
            embedding_specs=[
                (E, D, managed_option, ComputeDevice.CUDA) for (E, D) in zip(Es, Ds)
            ],
            optimizer=OptimType.EXACT_ROWWISE_ADAGRAD,
            learning_rate=0.1,
            eps=0.1,
            output_dtype=output_dtype,
            pooling_mode=pooling_mode,
            weight_decay_mode=WeightDecayMode.DECOUPLE_GLOBAL,
        )
output = tbe(indices, offsets, batch_size_per_feature_per_rank=batch_size_per_feature_per_rank)
```
Relevant diffs:
D53866750
D55660277
D55660762

Reviewed By: sryap

Differential Revision: D56200676

Contributor

facebook-github-bot commented Jul 2, 2024

This pull request was exported from Phabricator. Differential Revision: D56200676

spcyppt force-pushed the export-D56200676 branch from 35a7c04 to 7ec38f6 Compare

July 2, 2024 20:08

facebook-github-bot closed this in

114bb0d

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented Jul 3, 2024

This pull request has been merged in 114bb0d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment