Skip to content

fix: SGD results now reproducible#1541

Merged
thinkall merged 7 commits into
microsoft:mainfrom
immu4989:flaml-fix-sgd-reproducibility
May 10, 2026
Merged

fix: SGD results now reproducible#1541
thinkall merged 7 commits into
microsoft:mainfrom
immu4989:flaml-fix-sgd-reproducibility

Conversation

@immu4989
Copy link
Copy Markdown
Contributor

@immu4989 immu4989 commented May 2, 2026

Why are these changes needed?

Summary

  • Seed random_state on SGDEstimator so SGDClassifier / SGDRegressor produce deterministic results (uses config.get("random_seed", 10242048), matching the LinearSVC and ElasticNet fixes).
  • Add "sgd" to test_reproducibility_of_classification_models and test_reproducibility_of_regression_models.
  • Intentionally do not add "sgd" to the *_underlying_* variants: SGDEstimator wraps the sklearn model with a Normalizer preprocessing step that the test helper does not replicate, so a bare-sklearn refit cannot match the wrapper's CV result. A short comment in each list documents this.

Related issue number

Follows: #1369 (LGBM), #1374 (ElasticNet), #1376 (LinearSVC), #1364 (CatBoost).

Test plan

  • pytest test/automl/test_classification.py -k "reproducibility and sgd" — passes
  • pytest test/automl/test_regression.py -k "reproducibility and sgd" — passes

Checks

@immu4989
Copy link
Copy Markdown
Contributor Author

immu4989 commented May 2, 2026

@microsoft-github-policy-service agree

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make FLAML's SGDEstimator reproducible, aligning it with recent reproducibility fixes for other estimators in flaml/automl. It updates the SGD wrapper to seed the underlying sklearn model and extends the existing reproducibility test suites to cover SGD behavior.

Changes:

  • Seed SGDEstimator with a default random_state so SGDClassifier and SGDRegressor train deterministically.
  • Add "sgd" to the wrapper-level reproducibility tests for classification and regression.
  • Document why SGD is still excluded from the underlying-model parity tests due to its extra Normalizer preprocessing layer.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
flaml/automl/model.py Adds deterministic seeding to SGDEstimator initialization.
test/automl/test_classification.py Extends classification reproducibility coverage to SGD and documents the underlying-model test omission.
test/automl/test_regression.py Extends regression reproducibility coverage to SGD and documents the underlying-model test omission.

Comment thread flaml/automl/model.py Outdated
Comment thread test/automl/test_classification.py Outdated
Comment thread test/automl/test_regression.py Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread flaml/automl/model.py Outdated
Comment thread test/automl/test_classification.py Outdated
Comment thread test/automl/test_regression.py Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Comment thread test/automl/test_classification.py Outdated
Comment thread test/automl/test_regression.py Outdated
Copy link
Copy Markdown
Collaborator

@thinkall thinkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks.

@immu4989
Copy link
Copy Markdown
Contributor Author

immu4989 commented May 9, 2026

Thanks for the review! All comments addressed and resolved. Ready to merge whenever convenient

@thinkall thinkall merged commit 1959d2f into microsoft:main May 10, 2026
13 checks passed
@immu4989 immu4989 deleted the flaml-fix-sgd-reproducibility branch May 11, 2026 17:09
@immu4989 immu4989 mentioned this pull request May 11, 2026
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants