Skip to content

Conversation

@stmcgovern
Copy link
Contributor

@stmcgovern stmcgovern commented Aug 29, 2025

Fixes #141884

This fixes the issue for all optimizers and parameter options.
A member function overwrite_from is added to the optimizer base class. Each optimizer then implements this function for comparing their accepted parameters to defaults. A SFINAE approach to handle the different optimizer parameters generically (in optimizer.h only) was evaluated, but I think this is easier to review and maintain.

This mirrors the Python API up to one edge case. An example of the edge case is provided below.

Python can distinguish between 1) Key not present in dict = "not specified" and 2) Key present in dict = "explicitly set". The C++ implementation cannot.
The issue hinges on whether or not to track if a particular parameter was set by the user explicitly or not (discrepancy in the case when the constructor default is explicitly passed in).

To track this seems like it will take more intervention than would be worth it (modify TORCH_ARG to keep track, use std::optional for the parameter types, use bitset tracking) and was not pursued in the current PR. I'm happy to alter the design if appropriate.

Example of edge case hinging on CONSTRUCTOR DEFAULTS vs OPTIMIZER DEFAULTS

  1. CONSTRUCTOR DEFAULTS:
    These are the values you get when calling AdamOptions()
    AdamOptions().lr() = 0.001
    AdamOptions().weight_decay() = 0
    AdamOptions().eps() = 1e-08

  2. OPTIMIZER DEFAULTS:
    These are the values the user chose when creating the optimizer
    User's optimizer defaults:
    optimizer.lr() = 0.005
    optimizer.weight_decay() = 0.1
    optimizer.eps() = 1e-07

  3. THE PROBLEM SCENARIO:
    User wants to add a parameter group with explicit weight_decay=0.0
    User sets: weight_decay(0)

  4. THE CONFUSION:
    Constructor default weight_decay: 0
    User's explicit weight_decay: 0
    Are they equal? YES

    Since they're equal, our overwrite_from() logic thinks:
    "User didn't set weight_decay explicitly, use optimizer default"

  5. CURRENT BEHAVIOR:
    Final weight_decay: 0.1
    User expected: 0
    Match? ❌ NO

=== KEY INSIGHT ===
Constructor defaults are built into the C++ class definition.
Optimizer defaults are chosen by the user at runtime. We want to respect the user intention.

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161825

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 12 Pending

As of commit f5c263c with merge base 322091d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@stmcgovern stmcgovern changed the title 141884 C++ API handle optimizer defaults #141884 C++ API handle optimizer defaults Sep 3, 2025
@stmcgovern stmcgovern changed the title #141884 C++ API handle optimizer defaults C++ API handle optimizer defaults Sep 3, 2025
@stmcgovern
Copy link
Contributor Author

Hi @janeyx99, this is the PR I mentioned in issue #141884 . How can I link the PR to the issue? I thought the "Fixes #number" does that...

Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on the fix, this is an interesting bug indeed. I left a comment on the overall approach. Furthermore, since these tests don't run on CI--could you post a paste of the C++ test results locally?

ASSERT_NEAR(group1_opts.lr(), 0.002, 1e-6); // Inherited
ASSERT_EQ(group1_opts.betas(), std::make_tuple(0.8, 0.88)); // Inherited
ASSERT_NEAR(group1_opts.eps(), 1e-12, 1e-15); // Inherited
ASSERT_NEAR(group1_opts.weight_decay(), 0.11, 1e-6); // Preserved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come these can't be ASSERT_EQ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed. use in serialization tests left.

}

TEST(OptimTest, MergeWithDefaultOptions_AdamW) {
torch::manual_seed(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this important to the test? the actual params won't matter, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. removed

"You must override it in your subclass of torch::optim::OptimizerCloneableOptions<YourOptimizerOptions>.");
}

void OptimizerOptions::overwrite_from(const OptimizerOptions& source) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Some highlevel qs:
How come we need a whole new overwrite_from API?
From the toplevel I would expect us to fix the base class so that the user specified defaults override the original defaults and then are used in add_param_group, without the need for adding a new API.

Copy link
Contributor Author

@stmcgovern stmcgovern Sep 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we don't :) . I've tried to provide a fix to the base class, without adding a new API, but don't see a way to do it without one new virtual function call.

@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 11, 2025
@stmcgovern stmcgovern force-pushed the 141884-optimizer-defaults-clean branch from a041b6f to d205fa6 Compare September 20, 2025 18:31
@stmcgovern
Copy link
Contributor Author

stmcgovern commented Sep 20, 2025

Hi @janeyx99, thanks very much for your feedback. I've taken each one of your comments into consideration.
I've tried to find a complete fix to the base class, without adding a new API. The core problem is that we need a way to track whether the parameter field was explicitly set or not (to mimic Python dict behavior). I started with a function pointer registry and string based checking in the merge. Trying to move as much to compile time as possible, 1) led to the CRTP pattern for static dispatch and 2) avoiding string hash checking, with the bitset tracking, was adopted for performance and ease of serialization.
This uses a new macro (which I also don't like), but seems to solve the problem in an efficient and complete way (python api parity), without introducing a new API with much runtime overhead.

Here are the local optimizer tests (including the new ones). Is there a better way with gtest to see what the test actually ran (not just pass/fail)? dropping the filter and running 1020 tests pass too.

(base) [root@49023e5b8e19 pytorch]# ./build/bin/test_api --gtest_filter="*Optim*" -v
CUDA not available. Disabling CUDA and MultiCUDA tests
Note: Google Test filter = *Optim*-*_CUDA:*_MultiCUDA
[==========] Running 51 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 44 tests from OptimTest
[ RUN      ] OptimTest.OptimizerAccessors
[       OK ] OptimTest.OptimizerAccessors (1 ms)
[ RUN      ] OptimTest.OldInterface
[       OK ] OptimTest.OldInterface (0 ms)
[ RUN      ] OptimTest.XORConvergence_SGD
[       OK ] OptimTest.XORConvergence_SGD (655 ms)
[ RUN      ] OptimTest.XORConvergence_LBFGS
[       OK ] OptimTest.XORConvergence_LBFGS (434 ms)
[ RUN      ] OptimTest.XORConvergence_Adagrad
[       OK ] OptimTest.XORConvergence_Adagrad (250 ms)
[ RUN      ] OptimTest.XORConvergence_RMSprop
[       OK ] OptimTest.XORConvergence_RMSprop (236 ms)
[ RUN      ] OptimTest.XORConvergence_RMSpropWithMomentum
[       OK ] OptimTest.XORConvergence_RMSpropWithMomentum (703 ms)
[ RUN      ] OptimTest.XORConvergence_Adam
[       OK ] OptimTest.XORConvergence_Adam (264 ms)
[ RUN      ] OptimTest.XORConvergence_AdamWithAmsgrad
[       OK ] OptimTest.XORConvergence_AdamWithAmsgrad (278 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_Adam
[       OK ] OptimTest.ProducesPyTorchValues_Adam (92 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_AdamWithWeightDecay
[       OK ] OptimTest.ProducesPyTorchValues_AdamWithWeightDecay (96 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_AdamWithWeightDecayAndAMSGrad
[       OK ] OptimTest.ProducesPyTorchValues_AdamWithWeightDecayAndAMSGrad (99 ms)
[ RUN      ] OptimTest.XORConvergence_AdamW
[       OK ] OptimTest.XORConvergence_AdamW (267 ms)
[ RUN      ] OptimTest.XORConvergence_AdamWWithAmsgrad
[       OK ] OptimTest.XORConvergence_AdamWWithAmsgrad (266 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_AdamW
[       OK ] OptimTest.ProducesPyTorchValues_AdamW (94 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_AdamWWithoutWeightDecay
[       OK ] OptimTest.ProducesPyTorchValues_AdamWWithoutWeightDecay (95 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_AdamWWithAMSGrad
[       OK ] OptimTest.ProducesPyTorchValues_AdamWWithAMSGrad (99 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_Adagrad
[       OK ] OptimTest.ProducesPyTorchValues_Adagrad (78 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_AdagradWithWeightDecay
[       OK ] OptimTest.ProducesPyTorchValues_AdagradWithWeightDecay (83 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_AdagradWithWeightDecayAndLRDecay
[       OK ] OptimTest.ProducesPyTorchValues_AdagradWithWeightDecayAndLRDecay (83 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_RMSprop
[       OK ] OptimTest.ProducesPyTorchValues_RMSprop (82 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_RMSpropWithWeightDecay
[       OK ] OptimTest.ProducesPyTorchValues_RMSpropWithWeightDecay (86 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_RMSpropWithWeightDecayAndCentered
[       OK ] OptimTest.ProducesPyTorchValues_RMSpropWithWeightDecayAndCentered (94 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_RMSpropWithWeightDecayAndCenteredAndMomentum
[       OK ] OptimTest.ProducesPyTorchValues_RMSpropWithWeightDecayAndCenteredAndMomentum (99 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_SGD
[       OK ] OptimTest.ProducesPyTorchValues_SGD (69 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_SGDWithWeightDecay
[       OK ] OptimTest.ProducesPyTorchValues_SGDWithWeightDecay (71 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_SGDWithWeightDecayAndMomentum
[       OK ] OptimTest.ProducesPyTorchValues_SGDWithWeightDecayAndMomentum (72 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_SGDWithWeightDecayAndNesterovMomentum
[       OK ] OptimTest.ProducesPyTorchValues_SGDWithWeightDecayAndNesterovMomentum (80 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_LBFGS
[       OK ] OptimTest.ProducesPyTorchValues_LBFGS (68 ms)
[ RUN      ] OptimTest.ProducesPyTorchValues_LBFGS_with_line_search
[       OK ] OptimTest.ProducesPyTorchValues_LBFGS_with_line_search (298 ms)
[ RUN      ] OptimTest.ZeroGrad
[       OK ] OptimTest.ZeroGrad (0 ms)
[ RUN      ] OptimTest.ExternalVectorOfParameters
[       OK ] OptimTest.ExternalVectorOfParameters (0 ms)
[ RUN      ] OptimTest.AddParameter_LBFGS
[       OK ] OptimTest.AddParameter_LBFGS (0 ms)
[ RUN      ] OptimTest.CheckLRChange_StepLR_Adam
[       OK ] OptimTest.CheckLRChange_StepLR_Adam (0 ms)
[ RUN      ] OptimTest.CheckLRChange_ReduceLROnPlateau_Adam
[       OK ] OptimTest.CheckLRChange_ReduceLROnPlateau_Adam (0 ms)
[ RUN      ] OptimTest.MergeWithDefaultOptions_Adam
[       OK ] OptimTest.MergeWithDefaultOptions_Adam (0 ms)
[ RUN      ] OptimTest.MergeWithDefaultOptions_SGD
[       OK ] OptimTest.MergeWithDefaultOptions_SGD (0 ms)
[ RUN      ] OptimTest.MergeWithDefaultOptions_AdamW
[       OK ] OptimTest.MergeWithDefaultOptions_AdamW (0 ms)
[ RUN      ] OptimTest.MergeWithDefaultOptions_Adagrad
[       OK ] OptimTest.MergeWithDefaultOptions_Adagrad (0 ms)
[ RUN      ] OptimTest.MergeWithDefaultOptions_RMSprop
[       OK ] OptimTest.MergeWithDefaultOptions_RMSprop (0 ms)
[ RUN      ] OptimTest.MergeWithDefaultOptions_LBFGS
[       OK ] OptimTest.MergeWithDefaultOptions_LBFGS (0 ms)
[ RUN      ] OptimTest.MergeWithDefaultOptions_NoOptionsInheritance
[       OK ] OptimTest.MergeWithDefaultOptions_NoOptionsInheritance (0 ms)
[ RUN      ] OptimTest.SerializationPreservesFieldTracking_Adam
[       OK ] OptimTest.SerializationPreservesFieldTracking_Adam (9 ms)
[ RUN      ] OptimTest.SerializationPreservesFieldTracking_SGD
[       OK ] OptimTest.SerializationPreservesFieldTracking_SGD (0 ms)
[----------] 44 tests from OptimTest (5217 ms total)

[----------] 7 tests from SerializeTest
[ RUN      ] SerializeTest.Optim
[       OK ] SerializeTest.Optim (1 ms)
[ RUN      ] SerializeTest.Optim_Adagrad
[       OK ] SerializeTest.Optim_Adagrad (1 ms)
[ RUN      ] SerializeTest.Optim_SGD
[       OK ] SerializeTest.Optim_SGD (1 ms)
[ RUN      ] SerializeTest.Optim_Adam
[       OK ] SerializeTest.Optim_Adam (1 ms)
[ RUN      ] SerializeTest.Optim_AdamW
[       OK ] SerializeTest.Optim_AdamW (1 ms)
[ RUN      ] SerializeTest.Optim_RMSprop
[       OK ] SerializeTest.Optim_RMSprop (1 ms)
[ RUN      ] SerializeTest.Optim_LBFGS
[       OK ] SerializeTest.Optim_LBFGS (1 ms)
[----------] 7 tests from SerializeTest (12 ms total)

[----------] Global test environment tear-down
[==========] 51 tests from 2 test suites ran. (5230 ms total)
[  PASSED  ] 51 tests.
(base) [root@49023e5b8e19 pytorch]# 

@janeyx99
Copy link
Contributor

Hmmm the reason I was hesitant about the first approach was because it required modifying every optimizer, which this new approach unfortunately still requires. If that is unavoidable, I think it is okay to have as simple of a solution as possible, but have it be an internal detail vs something users can see.

To that effect, I would prefer the solution with as few additions to the public API surface + the lowest complexity. If the original approach was cleaner, then we can have a private _override_defaults API that the constructors call that users shouldn't have to care about + maybe some documentation for why the helper is necessary. What do you think?

@stmcgovern
Copy link
Contributor Author

Thanks for your comments @janeyx99 . That makes sense! I certainly agree that users should not have to be aware of these implementation details. I wasn't sure how much the runtime performance cost impacted your review. will revisit and simplify.

@stmcgovern stmcgovern force-pushed the 141884-optimizer-defaults-clean branch from d205fa6 to 0c5cf7f Compare October 1, 2025 19:16
@stmcgovern
Copy link
Contributor Author

Hi @janeyx99, here is my preferred approach. It does everything in optimizer.h/cpp, doesn't introduce any new API and most work is done at compile time. I've tried to follow how c10 uses template metaprogramming style. C++20 concepts can help smooth out some of the boilerplate, when they become available in PyTorch. Local tests are passing. Please let me know what you think. Thanks!

Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much nicer! Can we privatize all the helpers?

@stmcgovern stmcgovern force-pushed the 141884-optimizer-defaults-clean branch from 0c5cf7f to 1f7aed3 Compare October 2, 2025 21:06
@stmcgovern
Copy link
Contributor Author

Hi @janeyx99 , I've changed the helpers. I hope I've answered your questions. The proposed solution approach is to use constructor defaults as a comparison baseline to detect user intent, then inherit from optimizer defaults for unspecified fields.

};

template <typename Derived>
// Forward declarations for optimizer option types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the following classes and structs private as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since these are forward declarations, I'm inclined to not change the style to use the prefix (would need touching all optimizer files). I do have to change them to be struct (which resolves the inconsistency causing clang build failure).

janeyx99
janeyx99 previously approved these changes Oct 6, 2025
Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do privatize as much as possible so we are not inadvertently growing our API surface.


// SFINAE field detection - detects optimizer fields using public accessor methods
template <class T, class Enable = void>
struct has_lr : std::false_type {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These structs too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helper structs are in the private part of the class OptimizerCloneableOptions.
Do you just want me to prefix all the implementation stuff I've added with an underscore? Is that just a style convention (happy to follow) or is there some codegen or python binding specific transformation?

@janeyx99
Copy link
Contributor

janeyx99 commented Oct 6, 2025

Thank you so much for following through this change! We are very close to the end!

@stmcgovern stmcgovern force-pushed the 141884-optimizer-defaults-clean branch from 1f7aed3 to 41731f4 Compare October 7, 2025 23:00
@stmcgovern
Copy link
Contributor Author

OK I think this is ready. Thanks for your feedback and help @janeyx99 !

janeyx99
janeyx99 previously approved these changes Oct 8, 2025
@janeyx99
Copy link
Contributor

janeyx99 commented Oct 8, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 8, 2025
@pytorchmergebot
Copy link
Collaborator

PR targets viable/strict rather than main, refusing merge request

@amjames
Copy link
Collaborator

amjames commented Oct 8, 2025

@pytorchbot merge -r main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased 141884-optimizer-defaults-clean onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout 141884-optimizer-defaults-clean && git pull --rebase)

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 8, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@izaitsevfb
Copy link
Contributor

Hey, apologies for the revert, but your PR is causing undefined symbol errors when linking the crossplatform build targets internally at meta:

ld.lld: error: undefined symbol: typeinfo for torch::optim::AdamOptions
ld.lld: error: undefined symbol: torch::optim::AdamOptions::AdamOptions(double)
ld.lld: error: undefined symbol: vtable for torch::optim::LBFGSOptions

(Similar errors for AdamW, Adagrad, RMSprop, LBFGS)

probably need to move _merge_by_comparison() implementation from the header to optimizer.cpp where all optimizer option types are fully defined.

@facebook-github-bot
Copy link
Contributor

@pytorchbot revert -m="Diff reverted internally" -c="ghfirst"

This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).)

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Oct 10, 2025
This reverts commit f332017.

Reverted #161825 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](#161825 (comment)))
@pytorchmergebot
Copy link
Collaborator

@stmcgovern your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Oct 10, 2025
@pytorch-bot pytorch-bot bot dismissed stale reviews from janeyx99 and janeyx99 October 10, 2025 17:56

This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.

@stmcgovern
Copy link
Contributor Author

Hey, apologies for the revert, but your PR is causing undefined symbol errors when linking the crossplatform build targets internally at meta:

ld.lld: error: undefined symbol: typeinfo for torch::optim::AdamOptions
ld.lld: error: undefined symbol: torch::optim::AdamOptions::AdamOptions(double)
ld.lld: error: undefined symbol: vtable for torch::optim::LBFGSOptions

(Similar errors for AdamW, Adagrad, RMSprop, LBFGS)

probably need to move _merge_by_comparison() implementation from the header to optimizer.cpp where all optimizer option types are fully defined.

Thanks for the information @izaitsevfb. I'll move the function as you suggest and open another PR.

Addresses PyTorch issue pytorch#141884 by implementing automatic parameter group
inheritance that achieves Python-C++ API parity without breaking changes.

- Uses comparison-based merging to infer user intent vs default inheritance
- C++17 SFINAE patterns following PyTorch conventions (matches c10/util/TypeTraits.h)
-Add comprehensive tests for optimizer parameter group inheritance
@stmcgovern stmcgovern force-pushed the 141884-optimizer-defaults-clean branch from f5c263c to 5ae6650 Compare October 10, 2025 20:32
@pytorch-bot pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Oct 10, 2025
@stmcgovern stmcgovern closed this Oct 10, 2025
@stmcgovern
Copy link
Contributor Author

Follow-on PR is #165182

Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025
Fixes pytorch#141884

This fixes the issue for all optimizers and parameter options.
A member function `overwrite_from` is added to the optimizer base class. Each optimizer then implements this function for comparing their accepted parameters to defaults. A SFINAE approach to handle the different optimizer parameters generically (in optimizer.h only) was evaluated, but I think this is easier to review and maintain.

This mirrors the Python API up to one edge case. An example of the edge case is provided below.

Python can distinguish between 1) Key not present in dict = "not specified"  and 2) Key present in dict = "explicitly set". The C++ implementation cannot.
The issue hinges on whether or not to track if a particular parameter was set by the user explicitly or not (discrepancy in the case when the constructor default is explicitly passed in).

To track this seems like it will take more intervention than would be worth it (modify TORCH_ARG to keep track, use std::optional for the parameter types, use bitset tracking) and was not pursued in the current PR. I'm happy to alter the design if appropriate.

### Example of edge case hinging on CONSTRUCTOR DEFAULTS vs OPTIMIZER DEFAULTS

1. CONSTRUCTOR DEFAULTS:
   These are the values you get when calling AdamOptions()
   AdamOptions().lr() = 0.001
   AdamOptions().weight_decay() = 0
   AdamOptions().eps() = 1e-08

2. OPTIMIZER DEFAULTS:
   These are the values the user chose when creating the optimizer
   User's optimizer defaults:
   optimizer.lr() = 0.005
   optimizer.weight_decay() = 0.1
   optimizer.eps() = 1e-07

3. THE PROBLEM SCENARIO:
   User wants to add a parameter group with explicit weight_decay=0.0
   User sets: weight_decay(0)

4. THE CONFUSION:
   Constructor default weight_decay: 0
   User's explicit weight_decay:     0
   Are they equal? YES

   Since they're equal, our overwrite_from() logic thinks:
   "User didn't set weight_decay explicitly, use optimizer default"

5. CURRENT BEHAVIOR:
   Final weight_decay: 0.1
   User expected:      0
   Match? ❌ NO

=== KEY INSIGHT ===
Constructor defaults are built into the C++ class definition.
Optimizer defaults are chosen by the user at runtime. We want to respect the user intention.
Pull Request resolved: pytorch#161825
Approved by: https://github.com/janeyx99
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR Merged open source release notes: optim Reverted triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Default optimizer options are ignored in the C++ API

7 participants