-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Description
As the proposal and the implementation of meta-estimator routing (SLEP006) stands, if the user wants to use sample_weight
, they need to be quite verbose in how they declare the estimators. Taking AdaBoostClassifier
as an example, and imagining if AdaBoostClassifier
would use the sub-estimator's score
method, the user would have to write:
est = (
AdaBoostClassifier(LogisticRegression().set_fit_request(sample_weight=True)
.set_score_request(sample_weight=True))
.fit(X, y, sample_weight=sw)
)
which is quite more verbose than the current code users need to write:
est = AdaBoostClassifier(LogisticRegression()).fit(X, y, sample_weight=sw)
There have been concerns about making users write quite verbose code in cases where the current pattern seems quite reasonable.
Without changing everything related to SLEP006, there are three paths we can take:
Option 1: Helper function
We can introduce helper functions to make the above code simpler. For instance, a weighted
function could request sample_weight
on all methods which accept sample_weight
for a given estimator. Then the above code would look like:
est = AdaBoostClassifier(weighted(LogisticRegression())).fit(X, y, sample_weight=sw)
and if the sub-estimator is a pipeline:
est = AdaBoostClassifier(
make_pipeline(weighted(StandardScaler()), weighted(LogisticRegression())))
).fit(X, y, sample_weight=sw)
Implementing weighted
for a Pipeline
(or other meta-estimators) would be tricky since set_fit_request
is only available for consumers and not non-consumer routers; therefore the user needs to repeat the weighted
call for all sub-estimators.
Option 2: Different meta-estimators
Have two classes of meta-estimators (or routers to be specific).
In this scenario, we divide meta-estimators into two classes, simple and complex. Simple routers are the ones which simply forward **kwargs
to sub-estimators, and by default the assume sub-estimators have requested those metadata. This simplifies the users' code and makes the existing code for simple meta-estimators to keep working, but it raises a few issues.
First is that there will be two classes of meta-estimators, and the user would need to know which estimator is of which class. It's also not clear what we should do if the user explicitly sets request values for metadata (we can probably respect those if present).
Another issue is that if a meta-estimator changes behavior, it needs to become a complex meta-estimator if we want to keep backward compatibility for it. This doesn't seem like a good pattern.
Option 3: Keep as is
Do nothing, things are as is.
I'm in favor or option 1 because:
- with the helper function the user code doesn't look too verbose
- using metadata is not a beginner kinda thing and therefore this API is not hampering beginners' experience with the library
- it keeps consistency among meta-estimators/consumers
xref: #22986 (review)
Metadata
Metadata
Assignees
Type
Projects
Status