Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] make_forecasting_scorer fails for wrapped sklearn metrics on newer sklearn versions #5715

Closed
tvdboom opened this issue Jan 9, 2024 · 8 comments · Fixed by #5717
Closed
Labels
bug Something isn't working module:metrics&benchmarking metrics and benchmarking modules
Projects

Comments

@tvdboom
Copy link
Contributor

tvdboom commented Jan 9, 2024

Describe the bug
The evaluate function fails with using a custom metric converted from sklearn using make_forecasting_scorer.

To Reproduce
(I know mean_absolute_error is also present in sktime's API but that's beyond the point)

from sktime.datasets import load_airline
from sktime.forecasting.model_evaluation import evaluate
from sktime.forecasting.naive import NaiveForecaster
from sktime.split import ExpandingWindowSplitter
from sktime.performance_metrics.forecasting import make_forecasting_scorer
from sklearn.metrics import mean_absolute_error

y = load_airline()

print(evaluate(
    forecaster=NaiveForecaster(),
    cv=ExpandingWindowSplitter(),
    y=y,
    scoring=make_forecasting_scorer(mean_absolute_error, name="mae", greater_is_better=False),
    error_score="raise",
))

Results in:

    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:\repos\ATOM\test_ts.py", line 39, in <module>
    print(evaluate(
          ^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\forecasting\model_evaluation\_functions.py", line 623, in evaluate
    results = parallelize(
              ^^^^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\utils\parallel.py", line 72, in parallelize
    ret = para_fun(
          ^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\utils\parallel.py", line 92, in _parallelize_none
    ret = [fun(x, meta=meta) for x in iter]
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\utils\parallel.py", line 92, in <listcomp>
    ret = [fun(x, meta=meta) for x in iter]
           ^^^^^^^^^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\forecasting\model_evaluation\_functions.py", line 270, in _evaluate_window
    raise e
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\forecasting\model_evaluation\_functions.py", line 262, in _evaluate_window
    score = metric(y_test, y_pred, y_train=y_train)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\performance_metrics\forecasting\_classes.py", line 169, in __call__
    return self.evaluate(y_true, y_pred, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\performance_metrics\forecasting\_classes.py", line 212, in evaluate
    out_df = self._evaluate(y_true=y_true_inner, y_pred=y_pred_inner, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sktime\performance_metrics\forecasting\_classes.py", line 587, in _evaluate
    res = func(y_true=y_true, y_pred=y_pred, **params)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\repos\ATOM\venv311\Lib\site-packages\sklearn\utils\_param_validation.py", line 192, in wrapper
    params = func_sig.bind(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ROB6874\AppData\Local\Programs\Python\Python311\Lib\inspect.py", line 3212, in bind
    return self._bind(args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ROB6874\AppData\Local\Programs\Python\Python311\Lib\inspect.py", line 3201, in _bind
    raise TypeError(
TypeError: got an unexpected keyword argument 'func'

Expected behavior

No error, and results for the mean_absolute_error metric.

Additional context

Versions

System:
python: 3.11.6 (tags/v3.11.6:8b6ee5b, Oct 2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)]
executable: C:\repos\ATOM\venv311\Scripts\python.exe
machine: Windows-10-10.0.22621-SP0
Python dependencies:
pip: 23.3.2
sktime: 0.25.0
sklearn: 1.3.2
skbase: 0.6.1
numpy: 1.26.3
scipy: 1.11.4
pandas: 2.1.4
matplotlib: 3.8.2
joblib: 1.3.2
numba: 0.58.1
statsmodels: 0.14.0
pmdarima: 2.0.4
statsforecast: 1.6.0
tsfresh: None
tslearn: None
torch: 2.1.1+cpu
tensorflow: 2.15.0
tensorflow_probability: None

@tvdboom tvdboom added the bug Something isn't working label Jan 9, 2024
@fkiraly fkiraly added this to Needs triage & validation in Bugfixing via automation Jan 9, 2024
@fkiraly
Copy link
Collaborator

fkiraly commented Jan 9, 2024

Confirmed on current main, python 3.11, windnows.

It seems like the bug is caused by sklearn's recently introduced parameter validation for performance metrics.

In custom metrics, arguments of the class are passed to to the metric, if and only if the signature of the matric accepts it.

The problem is that the signanture of mean_absolute_scorer does now accept kwargs, but it raises an error on kwargs keys that are not allowed, via a decorator. It is an odd design choice imo that this does not use python's own parametere validation, but one we have to cope for.

The solution would be to control more stringently what _DynamicForeastingErrorMetric passes on to func.

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 9, 2024

The horror comes in through this decorator in sklearn, the MAE is now decorated:

@validate_params(
    {
        "y_true": ["array-like"],
        "y_pred": ["array-like"],
        "sample_weight": ["array-like", None],
        "multioutput": [StrOptions({"raw_values", "uniform_average"}), "array-like"],
    },
    prefer_skip_nested_validation=True,
)
def mean_absolute_error(
    y_true, y_pred, *, sample_weight=None, multioutput="uniform_average"
):

Due to the decorator, it is no longer easy to parse which arguments the function accepts, e.g., python native inspection will not produce the signature of the function, but that of the decorator, which - unlike in earlier sklearn versions - has kwargs.

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 9, 2024

minimal example:

import pandas as pd

from sklearn.metrics import mean_absolute_error

from sktime.performance_metrics.forecasting import make_forecasting_scorer

scorer = make_forecasting_scorer(mean_absolute_error, name="RMSLE")

scorer.evaluate(pd.Series([1, 2, 3]), pd.Series([1, 2, 4]))

@fkiraly fkiraly changed the title [BUG] The evaluate function fails for custom metrics [BUG] make_forecasting_scorer fails for wrapped sklearn metrics on newer sklearn versions Jan 9, 2024
@fkiraly fkiraly added the module:metrics&benchmarking metrics and benchmarking modules label Jan 9, 2024
@fkiraly fkiraly moved this from Needs triage & validation to Under review in Bugfixing Jan 9, 2024
@fkiraly
Copy link
Collaborator

fkiraly commented Jan 9, 2024

Fixed here:
#5717

I consider the fix problematic, as it reads out private attributes of the sklearn decorator. But there seems to be no clear alternative that I can find.

A big question would be, is there a programmatic, public, stable way to ask an sklearn function the question "which arguments do you take"? Their new decorator destroys the standard, python native way, without replacing it with a clear alternative - this seems like problematic design to me.

@tvdboom
Copy link
Contributor Author

tvdboom commented Jan 9, 2024

I see. Thanks for working so quickly on the fix.

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 9, 2024

Understanding of the bug also suggests a temp workaround - use a wrapped ("de-decorated") mean_absolute_error instead of the original:

def mean_absolute_error(y_true, y_pred, sample_weight, multioutput):
    from sklearn.metrics import mean_absolute_error as _mae

    return _mae(y_true, y_pred, sample_weight, multioutput)

@yarnabrina
Copy link
Collaborator

The horror comes in through this decorator in sklearn, the MAE is now decorated:

@validate_params(
    {
        "y_true": ["array-like"],
        "y_pred": ["array-like"],
        "sample_weight": ["array-like", None],
        "multioutput": [StrOptions({"raw_values", "uniform_average"}), "array-like"],
    },
    prefer_skip_nested_validation=True,
)
def mean_absolute_error(
    y_true, y_pred, *, sample_weight=None, multioutput="uniform_average"
):

Due to the decorator, it is no longer easy to parse which arguments the function accepts, e.g., python native inspection will not produce the signature of the function, but that of the decorator, which - unlike in earlier sklearn versions - has kwargs.

If I may ask, can you please tell what is the issue here? The keys of the decorator first argument and parameters in function definition look identical to me. There is no kwargs in the signature, why are we expecting that?

(I haven't seen sktime or sklearn code, asking just from noticing this issue.)

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 10, 2024

@yarnabrina, the decorator validate_params has kwargs, but fails if any are passed except those of mean_absolute_error.

Naive inspection fails (via inspect.signature), asking the question "which parameters do you accept", to the decorated mean_absolute_error.

For interface uniformity, we require that all metrics potentially accept things like y_train, but if not used it is simply not used (without exception raised). The sktime class must decide whether to pass on y_train to the wrapped metric or not, and so far that was done via inspection, assuming that it should be passed iff the argument is there, or kwargs is there.

But that condition is no longer valid, since with the decorator, kwargs are there, but passing y_train still fails.

fkiraly added a commit that referenced this issue Jan 16, 2024
…cs (#5717)

Fixes #5715.

The issue with `sklearn` metrics is coming from their decorator which no
longer allows inspection of admissible arguments via `python` native
`sigature`.

Question for reviewers: is there an `sklearn` native way to inspect
permissible signature, without reading private attriutes?
Bugfixing automation moved this from Under review to Fixed/resolved Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:metrics&benchmarking metrics and benchmarking modules
Projects
Bugfixing
Fixed/resolved
Development

Successfully merging a pull request may close this issue.

3 participants