Reproducibility audit: remaining estimators (follow-up to #1364, #1369, #1374, #1376)

### Is your feature request related to a problem? Please describe.

_No response_

### Describe the solution you'd like

## Background

@dannycg1996 has merged a series of reproducibility fixes for FLAML estimators:

- #1364 — CatBoost metrics reproducibility
- #1369 — LGBM reproducibility
- #1374 — ElasticNetEstimator reproducibility
- #1376 — LinearSVC reproducibility

These follow up on the long-standing reproducibility concern raised in #151.

## Proposal

I'd like to extend that work and audit the remaining estimators in
`flaml/automl/model.py` that accept a `random_state`. After reading the
merged PRs above to confirm the pattern, my plan is to submit one focused
PR per estimator (or per class family where inheritance makes a single PR
appropriate), each with:

- A reproducibility unit test in the style of the tests added in #1369 / #1374
- The minimal config / `__init__` / `config2params` change required to
  ensure a seeded run produces identical results across repeated fits

## Estimators not yet covered

| Estimator                    | Notes                                              |
|------------------------------|----------------------------------------------------|
| `RandomForestEstimator`      | Has fixed default `random_state` since v1.1.0;     |
|                              | want to verify end-to-end reproducibility          |
| `ExtraTreesEstimator`        | Inherits from `RandomForestEstimator`              |
| `XGBoostEstimator`           | Non-sklearn API path                               |
| `XGBoostSklearnEstimator`    | sklearn API path                                   |
| `XGBoostLimitDepthEstimator` | Inherits from `XGBoostSklearnEstimator`            |
| `LRL1Classifier`             | LR with L1 — relevant for `liblinear`/`saga`       |
| `LRL2Classifier`             | LR with L2 — relevant for `sag`/`saga`             |
| `SGDEstimator`               | Stochastic gradient descent                        |

## Questions for maintainers

1. Is this audit welcome, and is the per-estimator-PR approach preferred
   over a single combined PR?
2. Should the audit also cover the corresponding "flamlized" zero-shot
   estimators in `flaml/default/estimator.py`, or are those out of scope?
3. Are there estimators on the list above that you'd like me to skip
   (e.g., known-deprecated paths)?

I'd plan to start with `RandomForestEstimator` once you confirm the
direction. Happy to adjust scope based on your guidance.



### Additional context

##Why this matters to me

In production ML for HR risk modeling, reproducibility is a regulatory
and audit requirement — being able to re-derive the same model from the
same data and config is non-negotiable. FLAML is increasingly used in
those settings (including via Microsoft Fabric), so closing this gap
across the estimator family seems worthwhile.

Estimator	Notes
`RandomForestEstimator`	Has fixed default `random_state` since v1.1.0;
	want to verify end-to-end reproducibility
`ExtraTreesEstimator`	Inherits from `RandomForestEstimator`
`XGBoostEstimator`	Non-sklearn API path
`XGBoostSklearnEstimator`	sklearn API path
`XGBoostLimitDepthEstimator`	Inherits from `XGBoostSklearnEstimator`
`LRL1Classifier`	LR with L1 — relevant for `liblinear`/`saga`
`LRL2Classifier`	LR with L2 — relevant for `sag`/`saga`
`SGDEstimator`	Stochastic gradient descent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility audit: remaining estimators (follow-up to #1364, #1369, #1374, #1376) #1540

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Background

Proposal

Estimators not yet covered

Questions for maintainers

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproducibility audit: remaining estimators (follow-up to #1364, #1369, #1374, #1376) #1540

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Background

Proposal

Estimators not yet covered

Questions for maintainers

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions