Skip to content

Conversation

lanlanfb
Copy link
Contributor

Summary:

  1. Fix position_weighted optimizer: Position weighted layer uses default optimizer but is actually gradient_slice, which will cause problem if we do not handle it properly in the new optimizier. The solution is to use sparseadagrad when it is gradient_slices.
  2. Optimizer implementation of v1 and v2: using 1st momentum with/without bias_correction.
  3. also implemented decoupled weight decay in the new optimizer.

Test Plan:
buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_2 -- test_mlp_optimization

buck test //caffe2/caffe2/python:optimizer_test -- TestDecayAdagrad

buck test //caffe2/caffe2/python/operator_test:decay_adagrad_test

ctr_mbl_feed work flow: f255731660
oc work flow: f255739503

Reviewed By: 0x10cxR1

Differential Revision: D26839668

… decoupled weight decay

Summary:
1. Fix position_weighted optimizer: Position weighted layer uses default optimizer but is actually gradient_slice, which will cause problem if we do not handle it properly in the new optimizier. The solution is to use sparseadagrad when it is gradient_slices.
2. Optimizer implementation of v1 and v2: using 1st momentum with/without bias_correction.
3. also implemented decoupled weight decay in the new optimizer.

Test Plan:
buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_2 -- test_mlp_optimization

buck test //caffe2/caffe2/python:optimizer_test -- TestDecayAdagrad

buck test //caffe2/caffe2/python/operator_test:decay_adagrad_test

ctr_mbl_feed work flow: f255731660
oc work flow: f255739503

Reviewed By: 0x10cxR1

Differential Revision: D26839668

fbshipit-source-id: 3e0a3646d8459c769caea19658217f1a32d539bb
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 12, 2021

💊 CI failures summary and remediations

As of commit 929ae4b (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-scanned failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D26839668

@codecov
Copy link

codecov bot commented Mar 12, 2021

Codecov Report

Merging #53881 (929ae4b) into master (8737c2a) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #53881      +/-   ##
==========================================
- Coverage   77.30%   77.30%   -0.01%     
==========================================
  Files        1888     1888              
  Lines      183589   183589              
==========================================
- Hits       141923   141918       -5     
- Misses      41666    41671       +5     

@lanlanfb lanlanfb closed this Mar 13, 2021
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D26839668

lanlanfb added a commit to lanlanfb/pytorch that referenced this pull request Mar 25, 2021
… decoupled weight decay (pytorch#54042)

Summary:
Pull Request resolved: pytorch#54042

Pull Request resolved: pytorch#53881

1. Fix position_weighted optimizer: Position weighted layer uses default optimizer but is actually gradient_slice, which will cause problem if we do not handle it properly in the new optimizier. The solution is to use sparseadagrad when it is gradient_slices.
2. Optimizer implementation of v1 and v2: using 1st momentum with/without bias_correction.
3. also implemented decoupled weight decay in the new optimizer.

Test Plan:
buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_2 -- test_mlp_optimization

buck test //caffe2/caffe2/python:optimizer_test -- TestDecayAdagrad

buck test //caffe2/caffe2/python/operator_test:decay_adagrad_test

ctr_mbl_feed work flow: f255731660
oc work flow: f255739503

Reviewed By: 0x10cxR1

Differential Revision: D26839668

fbshipit-source-id: 8a2170e317e695b861b1b1e566beb82ae0f08836
facebook-github-bot pushed a commit that referenced this pull request Mar 28, 2021
… decoupled weight decay (#54042)

Summary:
Pull Request resolved: #54042

Pull Request resolved: #53881

1. Fix position_weighted optimizer: Position weighted layer uses default optimizer but is actually gradient_slice, which will cause problem if we do not handle it properly in the new optimizier. The solution is to use sparseadagrad when it is gradient_slices.
2. Optimizer implementation of v1 and v2: using 1st momentum with/without bias_correction.
3. also implemented decoupled weight decay in the new optimizer.

Test Plan:
buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_2 -- test_mlp_optimization

buck test //caffe2/caffe2/python:optimizer_test -- TestDecayAdagrad

buck test //caffe2/caffe2/python/operator_test:decay_adagrad_test

ctr_mbl_feed work flow: f255731660
oc work flow: f255739503

Reviewed By: 0x10cxR1

Differential Revision: D26839668

fbshipit-source-id: 2b6881c1a88540ef5766be40f5e80001257e2199
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants