RFC SLEP006: verbose vs non-verbose declaration in meta-estimator

As the proposal and the implementation of meta-estimator routing (SLEP006) stands, if the user wants to use `sample_weight`, they need to be quite verbose in how they declare the estimators. Taking `AdaBoostClassifier` as an example, and imagining if `AdaBoostClassifier` would use the sub-estimator's `score` method, the user would have to write:

``` python
est = (
    AdaBoostClassifier(LogisticRegression().set_fit_request(sample_weight=True)
    .set_score_request(sample_weight=True))
    .fit(X, y, sample_weight=sw)
)
```

which is quite more verbose than the current code users need to write:

``` python
est = AdaBoostClassifier(LogisticRegression()).fit(X, y, sample_weight=sw)
```

There have been concerns about making users write quite verbose code in cases where the current pattern seems quite reasonable.

Without changing everything related to SLEP006, there are three paths we can take:

#### Option 1: Helper function
We can introduce helper functions to make the above code simpler. For instance, a `weighted` function could request `sample_weight` on all methods which accept `sample_weight` for a given estimator. Then the above code would look like:

``` python
est = AdaBoostClassifier(weighted(LogisticRegression())).fit(X, y, sample_weight=sw)
```

and if the sub-estimator is a pipeline:

``` python
est = AdaBoostClassifier(
    make_pipeline(weighted(StandardScaler()), weighted(LogisticRegression())))
).fit(X, y, sample_weight=sw)
```

Implementing `weighted` for a `Pipeline` (or other meta-estimators) would be tricky since `set_fit_request` is only available for consumers and not non-consumer routers; therefore the user needs to repeat the `weighted` call for all sub-estimators.

#### Option 2: Different meta-estimators
Have two classes of meta-estimators (or routers to be specific).

In this scenario, we divide meta-estimators into two classes, _simple_ and _complex_. Simple routers are the ones which simply forward `**kwargs` to sub-estimators, and by default the assume sub-estimators have requested those metadata. This simplifies the users' code and makes the existing code for simple meta-estimators to keep working, but it raises a few issues.

First is that there will be two classes of meta-estimators, and the user would need to know which estimator is of which class. It's also not clear what we should do if the user explicitly sets request values for metadata (we can probably respect those if present).

Another issue is that if a meta-estimator changes behavior, it needs to become a _complex_ meta-estimator if we want to keep backward compatibility for it. This doesn't seem like a good pattern.

#### Option 3: Keep as is
Do nothing, things are as is.

I'm in favor or option 1 because:
- with the helper function the user code doesn't look too verbose
- using metadata is not a beginner kinda thing and therefore this API is not hampering beginners' experience with the library
- it keeps consistency among meta-estimators/consumers

xref: https://github.com/scikit-learn/scikit-learn/pull/22986#pullrequestreview-1005144559


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RFC SLEP006: verbose vs non-verbose declaration in meta-estimator #23928

Option 1: Helper function

Option 2: Different meta-estimators

Option 3: Keep as is

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RFC SLEP006: verbose vs non-verbose declaration in meta-estimator #23928

Description

Option 1: Helper function

Option 2: Different meta-estimators

Option 3: Keep as is

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions