[BUG] SettingWithCopyWarning when using custom error metric with evaluate #4292

marrov · 2023-03-06T09:49:57Z

Describe the bug

When using the evaluate function with a custom metric, I get a SettingWithCopyWarning. For large runs, there are too many warning print statements and I have to resort to manually deactivating the warning before running evaluate.

To Reproduce

In this example, I make a custom scorer to keep track of the bias error. As far as I know, there is no BiasError or PercentageError metric currently in sktime and this cannot be done directly via the _percentage_error function with make_forecasting_scorer as it needs the symmetric keyword. Therefore, I had to define a custom function for this purpose.

import numpy as np
import pandas as pd

from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.model_evaluation import evaluate
from sktime.forecasting.model_selection import ExpandingWindowSplitter
from sktime.performance_metrics.forecasting import make_forecasting_scorer, MeanAbsolutePercentageError
from sktime.performance_metrics.forecasting._functions import _percentage_error
from sktime.utils._testing.hierarchical import _make_hierarchical

# Define a custom metric
def custom_nonsymm_percentage_error(
    y_true: pd.DataFrame, y_pred: pd.DataFrame
) -> float:
    return _percentage_error(y_true=y_true, y_pred=y_pred, symmetric=False)

# Define a custom scorer based on the previous function
percentage_error = make_forecasting_scorer(
    func=custom_nonsymm_percentage_error,
    name="PercentageErrorBias",
    greater_is_better=False,
)

# Make a list of error metrics
error_metrics = [
    MeanAbsolutePercentageError(),
    percentage_error, # If the custom scorer is commented out the warning disappears
]

# Define the target variable
y = _make_hierarchical()

# Define the CV strategy
cv = ExpandingWindowSplitter(initial_window=6, fh=ForecastingHorizon([1]), step_length=1)

# Run the evaluate function
results = evaluate(
    forecaster=NaiveForecaster(strategy="last"),
    y=y,
    cv=cv,
    return_data=True,
    scoring=error_metrics,
    error_score=np.nan,
)

Expected behavior
No warning should appear when using a custom scorer. The culprit appear to be this line in the evaluate function:
results[f"test_{s.name}"].iloc[row] =

Versions

System: python: 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0] executable: /local_disk0/.ephemeral_nfs/envs/pythonEnv-2c84d41e-41f3-4d35-8db5-5c0b4d380a2a/bin/python machine: Linux-5.4.0-1100-azure-x86_64-with-glibc2.29

Python dependencies:
pip: 21.0.1
setuptools: 52.0.0
sklearn: 0.24.1
sktime: 0.16.1
statsmodels: 0.13.5
numpy: 1.22.4
scipy: 1.10.1
pandas: 1.5.3
matplotlib: 3.4.2
joblib: 1.0.1
numba: 0.56.4
pmdarima: 2.0.2
tsfresh: 0.20.0

The text was updated successfully, but these errors were encountered:

fkiraly · 2023-03-06T23:09:06Z

reproduced on windows, python 3.10

fkiraly · 2023-03-06T23:17:48Z

Thanks for diagnosing the bug. I think #4294 fixes the SettingWithCopyWarning problem.

I think there are two more problems here for which issues (and contributions) would be appreciated:

make_forecasting_scorer does not properly work with _percentage_error. I've tried and verified this too, this seems like a bug. FYI @RNKuhns (original author of the function)
bias error and percentage error (that you could not find) could be added as metrics

Question: is percentage error not the same as mean absolute percentage error ("mean" just refers to sample averaging), or do you mean sth different?

marrov · 2023-03-07T10:07:39Z

I believe bias error = percentage error, it's just different nomenclature. Hence, the issue with using MAPE without the averaging as a proxy for bias/percentage comes from the absolute value part. The main difference that I see in terms of design here is that bias/percentage error cannot/should not be averaged (differently signed error cancel out), so it should always produce an output at a very granular/disaggregated level.

fkiraly · 2023-03-07T19:35:25Z

ahhhh, you want it without the absolute value.

Got it, that would be a nice "new issue".
Worth probably writing down the formula to avoid confusion.

At some point, we also wanted to write all the formulae in the metrics.
I think this would be a good first issue.

RNKuhns · 2023-03-08T06:13:09Z

One thing to consider is whether bias is a performance metric or a statistical test?

A large negative error and large positive error can balance each other out when calculating average error (used in bias test). This is not good for selecting/tuning models (since zero or small errors may not actually signal smaller divergence between actual and predictions).

It is good statistical test of model performance (just like evaluating that errors aren’t overly auto correlated).

marrov · 2023-03-08T09:16:50Z

I believe not all error metrics have to be used or useful for model selection/tuning to be relevant. In this case, this request comes from business use case. I agree that aggregating bias errors should not be done, but bias per time-index is a KPI for many solutions in order keeping track of tendencies of over/under forecasting in practice. Hence, I think this is a good reason to have a class for this use case rather than having to import the "private" _percentage_error function.

@marrov

…tric in `evaluate` (#4294) Fixes #4292 @marrov, thanks, I think you've correctly identified the source location of the bug! I'm not quite clear why we are dealing with a `DataFrame` that has repeated index (zero??), but I've fixed it now by chaging the chained double access to single `loc` access with two indices, and ensuring the multiple indexp roblem does not occur.

RNKuhns · 2023-03-14T18:36:07Z

@fkiraly the use case totally makes sense. It is just a matter of organization.

Bias is on of several post hoc statistical test evaluations that aren’t really for tuning model performance, but are used in evaluating model quality.

It may make sense to delineate separate or otherwise communicate the difference to users (so people don’t mistakenly tune a model base on bias or other tests like that wghich may be added in the future).

marrov added the bug Something isn't working label Mar 6, 2023

fkiraly added the module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting label Mar 6, 2023

fkiraly added this to Needs triage & validation in Bugfixing via automation Mar 6, 2023

fkiraly moved this from Needs triage & validation to Under review in Bugfixing Mar 6, 2023

fkiraly mentioned this issue Mar 6, 2023

&marrov [BUG] fix Settingwithcopywarning when using custom error metric in evaluate #4294

Merged

fkiraly added the module:metrics&benchmarking metrics and benchmarking modules label Mar 6, 2023

fkiraly closed this as completed in #4294 Mar 14, 2023

Bugfixing automation moved this from Under review to Fixed/resolved Mar 14, 2023

fkiraly mentioned this issue Mar 18, 2023

[ENH] ensure make_scorer works with hyperparameters #4357

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] SettingWithCopyWarning when using custom error metric with evaluate #4292

[BUG] SettingWithCopyWarning when using custom error metric with evaluate #4292

marrov commented Mar 6, 2023 •

edited by fkiraly

fkiraly commented Mar 6, 2023

fkiraly commented Mar 6, 2023 •

edited

marrov commented Mar 7, 2023

fkiraly commented Mar 7, 2023

RNKuhns commented Mar 8, 2023

marrov commented Mar 8, 2023

RNKuhns commented Mar 14, 2023

[BUG] SettingWithCopyWarning when using custom error metric with evaluate #4292

[BUG] SettingWithCopyWarning when using custom error metric with evaluate #4292

Comments

marrov commented Mar 6, 2023 • edited by fkiraly

fkiraly commented Mar 6, 2023

fkiraly commented Mar 6, 2023 • edited

marrov commented Mar 7, 2023

fkiraly commented Mar 7, 2023

RNKuhns commented Mar 8, 2023

marrov commented Mar 8, 2023

RNKuhns commented Mar 14, 2023

marrov commented Mar 6, 2023 •

edited by fkiraly

fkiraly commented Mar 6, 2023 •

edited