Skip to content

Reproducibility audit: remaining estimators (follow-up to #1364, #1369, #1374, #1376) #1540

@immu4989

Description

@immu4989

Is your feature request related to a problem? Please describe.

No response

Describe the solution you'd like

Background

@dannycg1996 has merged a series of reproducibility fixes for FLAML estimators:

These follow up on the long-standing reproducibility concern raised in #151.

Proposal

I'd like to extend that work and audit the remaining estimators in
flaml/automl/model.py that accept a random_state. After reading the
merged PRs above to confirm the pattern, my plan is to submit one focused
PR per estimator (or per class family where inheritance makes a single PR
appropriate), each with:

Estimators not yet covered

Estimator Notes
RandomForestEstimator Has fixed default random_state since v1.1.0;
want to verify end-to-end reproducibility
ExtraTreesEstimator Inherits from RandomForestEstimator
XGBoostEstimator Non-sklearn API path
XGBoostSklearnEstimator sklearn API path
XGBoostLimitDepthEstimator Inherits from XGBoostSklearnEstimator
LRL1Classifier LR with L1 — relevant for liblinear/saga
LRL2Classifier LR with L2 — relevant for sag/saga
SGDEstimator Stochastic gradient descent

Questions for maintainers

  1. Is this audit welcome, and is the per-estimator-PR approach preferred
    over a single combined PR?
  2. Should the audit also cover the corresponding "flamlized" zero-shot
    estimators in flaml/default/estimator.py, or are those out of scope?
  3. Are there estimators on the list above that you'd like me to skip
    (e.g., known-deprecated paths)?

I'd plan to start with RandomForestEstimator once you confirm the
direction. Happy to adjust scope based on your guidance.

Additional context

##Why this matters to me

In production ML for HR risk modeling, reproducibility is a regulatory
and audit requirement — being able to re-derive the same model from the
same data and config is non-negotiable. FLAML is increasingly used in
those settings (including via Microsoft Fabric), so closing this gap
across the estimator family seems worthwhile.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions