Is your feature request related to a problem? Please describe.
No response
Describe the solution you'd like
Background
@dannycg1996 has merged a series of reproducibility fixes for FLAML estimators:
These follow up on the long-standing reproducibility concern raised in #151.
Proposal
I'd like to extend that work and audit the remaining estimators in
flaml/automl/model.py that accept a random_state. After reading the
merged PRs above to confirm the pattern, my plan is to submit one focused
PR per estimator (or per class family where inheritance makes a single PR
appropriate), each with:
Estimators not yet covered
| Estimator |
Notes |
RandomForestEstimator |
Has fixed default random_state since v1.1.0; |
|
want to verify end-to-end reproducibility |
ExtraTreesEstimator |
Inherits from RandomForestEstimator |
XGBoostEstimator |
Non-sklearn API path |
XGBoostSklearnEstimator |
sklearn API path |
XGBoostLimitDepthEstimator |
Inherits from XGBoostSklearnEstimator |
LRL1Classifier |
LR with L1 — relevant for liblinear/saga |
LRL2Classifier |
LR with L2 — relevant for sag/saga |
SGDEstimator |
Stochastic gradient descent |
Questions for maintainers
- Is this audit welcome, and is the per-estimator-PR approach preferred
over a single combined PR?
- Should the audit also cover the corresponding "flamlized" zero-shot
estimators in flaml/default/estimator.py, or are those out of scope?
- Are there estimators on the list above that you'd like me to skip
(e.g., known-deprecated paths)?
I'd plan to start with RandomForestEstimator once you confirm the
direction. Happy to adjust scope based on your guidance.
Additional context
##Why this matters to me
In production ML for HR risk modeling, reproducibility is a regulatory
and audit requirement — being able to re-derive the same model from the
same data and config is non-negotiable. FLAML is increasingly used in
those settings (including via Microsoft Fabric), so closing this gap
across the estimator family seems worthwhile.
Is your feature request related to a problem? Please describe.
No response
Describe the solution you'd like
Background
@dannycg1996 has merged a series of reproducibility fixes for FLAML estimators:
These follow up on the long-standing reproducibility concern raised in #151.
Proposal
I'd like to extend that work and audit the remaining estimators in
flaml/automl/model.pythat accept arandom_state. After reading themerged PRs above to confirm the pattern, my plan is to submit one focused
PR per estimator (or per class family where inheritance makes a single PR
appropriate), each with:
__init__/config2paramschange required toensure a seeded run produces identical results across repeated fits
Estimators not yet covered
RandomForestEstimatorrandom_statesince v1.1.0;ExtraTreesEstimatorRandomForestEstimatorXGBoostEstimatorXGBoostSklearnEstimatorXGBoostLimitDepthEstimatorXGBoostSklearnEstimatorLRL1Classifierliblinear/sagaLRL2Classifiersag/sagaSGDEstimatorQuestions for maintainers
over a single combined PR?
estimators in
flaml/default/estimator.py, or are those out of scope?(e.g., known-deprecated paths)?
I'd plan to start with
RandomForestEstimatoronce you confirm thedirection. Happy to adjust scope based on your guidance.
Additional context
##Why this matters to me
In production ML for HR risk modeling, reproducibility is a regulatory
and audit requirement — being able to re-derive the same model from the
same data and config is non-negotiable. FLAML is increasingly used in
those settings (including via Microsoft Fabric), so closing this gap
across the estimator family seems worthwhile.