[BUG] `make_forecasting_scorer` fails for wrapped `sklearn` metrics on newer `sklearn` versions #5715

tvdboom · 2024-01-09T13:11:00Z

Describe the bug
The evaluate function fails with using a custom metric converted from sklearn using make_forecasting_scorer.

To Reproduce
(I know mean_absolute_error is also present in sktime's API but that's beyond the point)

from sktime.datasets import load_airline
from sktime.forecasting.model_evaluation import evaluate
from sktime.forecasting.naive import NaiveForecaster
from sktime.split import ExpandingWindowSplitter
from sktime.performance_metrics.forecasting import make_forecasting_scorer
from sklearn.metrics import mean_absolute_error

y = load_airline()

print(evaluate(
    forecaster=NaiveForecaster(),
    cv=ExpandingWindowSplitter(),
    y=y,
    scoring=make_forecasting_scorer(mean_absolute_error, name="mae", greater_is_better=False),
    error_score="raise",
))

Results in:

    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:\repos\ATOM\test_ts.py", line 39, in <module>
    print(evaluate(
          ^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\forecasting\model_evaluation\_functions.py", line 623, in evaluate
    results = parallelize(
              ^^^^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\utils\parallel.py", line 72, in parallelize
    ret = para_fun(
          ^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\utils\parallel.py", line 92, in _parallelize_none
    ret = [fun(x, meta=meta) for x in iter]
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\utils\parallel.py", line 92, in <listcomp>
    ret = [fun(x, meta=meta) for x in iter]
           ^^^^^^^^^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\forecasting\model_evaluation\_functions.py", line 270, in _evaluate_window
    raise e
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\forecasting\model_evaluation\_functions.py", line 262, in _evaluate_window
    score = metric(y_test, y_pred, y_train=y_train)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\performance_metrics\forecasting\_classes.py", line 169, in __call__
    return self.evaluate(y_true, y_pred, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\performance_metrics\forecasting\_classes.py", line 212, in evaluate
    out_df = self._evaluate(y_true=y_true_inner, y_pred=y_pred_inner, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\performance_metrics\forecasting\_classes.py", line 587, in _evaluate
    res = func(y_true=y_true, y_pred=y_pred, **params)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sklearn\utils\_param_validation.py", line 192, in wrapper
    params = func_sig.bind(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ROB6874\AppData\Local\Programs\Python\Python311\Lib\inspect.py", line 3212, in bind
    return self._bind(args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ROB6874\AppData\Local\Programs\Python\Python311\Lib\inspect.py", line 3201, in _bind
    raise TypeError(
TypeError: got an unexpected keyword argument 'func'

Expected behavior

No error, and results for the mean_absolute_error metric.

Additional context

Versions

System:
python: 3.11.6 (tags/v3.11.6:8b6ee5b, Oct 2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)]
executable: C:\repos\ATOM\venv311\Scripts\python.exe
machine: Windows-10-10.0.22621-SP0
Python dependencies:
pip: 23.3.2
sktime: 0.25.0
sklearn: 1.3.2
skbase: 0.6.1
numpy: 1.26.3
scipy: 1.11.4
pandas: 2.1.4
matplotlib: 3.8.2
joblib: 1.3.2
numba: 0.58.1
statsmodels: 0.14.0
pmdarima: 2.0.4
statsforecast: 1.6.0
tsfresh: None
tslearn: None
torch: 2.1.1+cpu
tensorflow: 2.15.0
tensorflow_probability: None

The text was updated successfully, but these errors were encountered:

fkiraly · 2024-01-09T14:35:29Z

Confirmed on current main, python 3.11, windnows.

It seems like the bug is caused by sklearn's recently introduced parameter validation for performance metrics.

In custom metrics, arguments of the class are passed to to the metric, if and only if the signature of the matric accepts it.

The problem is that the signanture of mean_absolute_scorer does now accept kwargs, but it raises an error on kwargs keys that are not allowed, via a decorator. It is an odd design choice imo that this does not use python's own parametere validation, but one we have to cope for.

The solution would be to control more stringently what _DynamicForeastingErrorMetric passes on to func.

fkiraly · 2024-01-09T14:40:31Z

The horror comes in through this decorator in sklearn, the MAE is now decorated:

@validate_params(
    {
        "y_true": ["array-like"],
        "y_pred": ["array-like"],
        "sample_weight": ["array-like", None],
        "multioutput": [StrOptions({"raw_values", "uniform_average"}), "array-like"],
    },
    prefer_skip_nested_validation=True,
)
def mean_absolute_error(
    y_true, y_pred, *, sample_weight=None, multioutput="uniform_average"
):

Due to the decorator, it is no longer easy to parse which arguments the function accepts, e.g., python native inspection will not produce the signature of the function, but that of the decorator, which - unlike in earlier sklearn versions - has kwargs.

fkiraly · 2024-01-09T14:51:48Z

minimal example:

import pandas as pd

from sklearn.metrics import mean_absolute_error

from sktime.performance_metrics.forecasting import make_forecasting_scorer

scorer = make_forecasting_scorer(mean_absolute_error, name="RMSLE")

scorer.evaluate(pd.Series([1, 2, 3]), pd.Series([1, 2, 4]))

fkiraly · 2024-01-09T14:56:28Z

Fixed here:
#5717

I consider the fix problematic, as it reads out private attributes of the sklearn decorator. But there seems to be no clear alternative that I can find.

A big question would be, is there a programmatic, public, stable way to ask an sklearn function the question "which arguments do you take"? Their new decorator destroys the standard, python native way, without replacing it with a clear alternative - this seems like problematic design to me.

tvdboom · 2024-01-09T15:10:18Z

I see. Thanks for working so quickly on the fix.

fkiraly · 2024-01-09T15:58:09Z

Understanding of the bug also suggests a temp workaround - use a wrapped ("de-decorated") mean_absolute_error instead of the original:

def mean_absolute_error(y_true, y_pred, sample_weight, multioutput):
    from sklearn.metrics import mean_absolute_error as _mae

    return _mae(y_true, y_pred, sample_weight, multioutput)

yarnabrina · 2024-01-10T17:08:24Z

The horror comes in through this decorator in sklearn, the MAE is now decorated:
@validate_params(
    {
        "y_true": ["array-like"],
        "y_pred": ["array-like"],
        "sample_weight": ["array-like", None],
        "multioutput": [StrOptions({"raw_values", "uniform_average"}), "array-like"],
    },
    prefer_skip_nested_validation=True,
)
def mean_absolute_error(
    y_true, y_pred, *, sample_weight=None, multioutput="uniform_average"
):
Due to the decorator, it is no longer easy to parse which arguments the function accepts, e.g., python native inspection will not produce the signature of the function, but that of the decorator, which - unlike in earlier sklearn versions - has kwargs.

If I may ask, can you please tell what is the issue here? The keys of the decorator first argument and parameters in function definition look identical to me. There is no kwargs in the signature, why are we expecting that?

(I haven't seen sktime or sklearn code, asking just from noticing this issue.)

fkiraly · 2024-01-10T21:13:39Z

@yarnabrina, the decorator validate_params has kwargs, but fails if any are passed except those of mean_absolute_error.

Naive inspection fails (via inspect.signature), asking the question "which parameters do you accept", to the decorated mean_absolute_error.

For interface uniformity, we require that all metrics potentially accept things like y_train, but if not used it is simply not used (without exception raised). The sktime class must decide whether to pass on y_train to the wrapped metric or not, and so far that was done via inspection, assuming that it should be passed iff the argument is there, or kwargs is there.

But that condition is no longer valid, since with the decorator, kwargs are there, but passing y_train still fails.

…cs (#5717) Fixes #5715. The issue with `sklearn` metrics is coming from their decorator which no longer allows inspection of admissible arguments via `python` native `sigature`. Question for reviewers: is there an `sklearn` native way to inspect permissible signature, without reading private attriutes?

tvdboom added the bug Something isn't working label Jan 9, 2024

fkiraly added this to Needs triage & validation in Bugfixing via automation Jan 9, 2024

fkiraly changed the title ~~[BUG] The evaluate function fails for custom metrics~~ [BUG] make_forecasting_scorer fails for wrapped sklearn metrics on newer sklearn versions Jan 9, 2024

fkiraly added the module:metrics&benchmarking metrics and benchmarking modules label Jan 9, 2024

fkiraly mentioned this issue Jan 9, 2024

[BUG] Fix dynamic make_forecasting_scorer for newer sklearn metrics #5717

Merged

fkiraly moved this from Needs triage & validation to Under review in Bugfixing Jan 9, 2024

fkiraly closed this as completed in #5717 Jan 16, 2024

Bugfixing automation moved this from Under review to Fixed/resolved Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `make_forecasting_scorer` fails for wrapped `sklearn` metrics on newer `sklearn` versions #5715

[BUG] `make_forecasting_scorer` fails for wrapped `sklearn` metrics on newer `sklearn` versions #5715

tvdboom commented Jan 9, 2024

fkiraly commented Jan 9, 2024 •

edited

fkiraly commented Jan 9, 2024

fkiraly commented Jan 9, 2024

fkiraly commented Jan 9, 2024 •

edited

tvdboom commented Jan 9, 2024

fkiraly commented Jan 9, 2024

yarnabrina commented Jan 10, 2024

fkiraly commented Jan 10, 2024 •

edited

[BUG] make_forecasting_scorer fails for wrapped sklearn metrics on newer sklearn versions #5715

[BUG] make_forecasting_scorer fails for wrapped sklearn metrics on newer sklearn versions #5715

Comments

tvdboom commented Jan 9, 2024

fkiraly commented Jan 9, 2024 • edited

fkiraly commented Jan 9, 2024

fkiraly commented Jan 9, 2024

fkiraly commented Jan 9, 2024 • edited

tvdboom commented Jan 9, 2024

fkiraly commented Jan 9, 2024

yarnabrina commented Jan 10, 2024

fkiraly commented Jan 10, 2024 • edited

[BUG] `make_forecasting_scorer` fails for wrapped `sklearn` metrics on newer `sklearn` versions #5715

[BUG] `make_forecasting_scorer` fails for wrapped `sklearn` metrics on newer `sklearn` versions #5715

fkiraly commented Jan 9, 2024 •

edited

fkiraly commented Jan 9, 2024 •

edited

fkiraly commented Jan 10, 2024 •

edited