Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] SettingWithCopyWarning when using custom error metric with evaluate #4292

Closed
marrov opened this issue Mar 6, 2023 · 7 comments · Fixed by #4294
Closed

[BUG] SettingWithCopyWarning when using custom error metric with evaluate #4292

marrov opened this issue Mar 6, 2023 · 7 comments · Fixed by #4294
Labels
bug Something isn't working module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting module:metrics&benchmarking metrics and benchmarking modules
Projects

Comments

@marrov
Copy link
Collaborator

marrov commented Mar 6, 2023

Describe the bug

When using the evaluate function with a custom metric, I get a SettingWithCopyWarning. For large runs, there are too many warning print statements and I have to resort to manually deactivating the warning before running evaluate.

To Reproduce

In this example, I make a custom scorer to keep track of the bias error. As far as I know, there is no BiasError or PercentageError metric currently in sktime and this cannot be done directly via the _percentage_error function with make_forecasting_scorer as it needs the symmetric keyword. Therefore, I had to define a custom function for this purpose.

import numpy as np
import pandas as pd

from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.model_evaluation import evaluate
from sktime.forecasting.model_selection import ExpandingWindowSplitter
from sktime.performance_metrics.forecasting import make_forecasting_scorer, MeanAbsolutePercentageError
from sktime.performance_metrics.forecasting._functions import _percentage_error
from sktime.utils._testing.hierarchical import _make_hierarchical

# Define a custom metric
def custom_nonsymm_percentage_error(
    y_true: pd.DataFrame, y_pred: pd.DataFrame
) -> float:
    return _percentage_error(y_true=y_true, y_pred=y_pred, symmetric=False)

# Define a custom scorer based on the previous function
percentage_error = make_forecasting_scorer(
    func=custom_nonsymm_percentage_error,
    name="PercentageErrorBias",
    greater_is_better=False,
)

# Make a list of error metrics
error_metrics = [
    MeanAbsolutePercentageError(),
    percentage_error, # If the custom scorer is commented out the warning disappears
]

# Define the target variable
y = _make_hierarchical()

# Define the CV strategy
cv = ExpandingWindowSplitter(initial_window=6, fh=ForecastingHorizon([1]), step_length=1)

# Run the evaluate function
results = evaluate(
    forecaster=NaiveForecaster(strategy="last"),
    y=y,
    cv=cv,
    return_data=True,
    scoring=error_metrics,
    error_score=np.nan,
)

Expected behavior
No warning should appear when using a custom scorer. The culprit appear to be this line in the evaluate function:
results[f"test_{s.name}"].iloc[row] =

Versions

System: python: 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0] executable: /local_disk0/.ephemeral_nfs/envs/pythonEnv-2c84d41e-41f3-4d35-8db5-5c0b4d380a2a/bin/python machine: Linux-5.4.0-1100-azure-x86_64-with-glibc2.29

Python dependencies:
pip: 21.0.1
setuptools: 52.0.0
sklearn: 0.24.1
sktime: 0.16.1
statsmodels: 0.13.5
numpy: 1.22.4
scipy: 1.10.1
pandas: 1.5.3
matplotlib: 3.4.2
joblib: 1.0.1
numba: 0.56.4
pmdarima: 2.0.2
tsfresh: 0.20.0

@marrov marrov added the bug Something isn't working label Mar 6, 2023
@fkiraly fkiraly added the module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting label Mar 6, 2023
@fkiraly fkiraly added this to Needs triage & validation in Bugfixing via automation Mar 6, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Mar 6, 2023

reproduced on windows, python 3.10

@fkiraly fkiraly moved this from Needs triage & validation to Under review in Bugfixing Mar 6, 2023
@fkiraly fkiraly added the module:metrics&benchmarking metrics and benchmarking modules label Mar 6, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Mar 6, 2023

Thanks for diagnosing the bug. I think #4294 fixes the SettingWithCopyWarning problem.

I think there are two more problems here for which issues (and contributions) would be appreciated:

  • make_forecasting_scorer does not properly work with _percentage_error. I've tried and verified this too, this seems like a bug. FYI @RNKuhns (original author of the function)
  • bias error and percentage error (that you could not find) could be added as metrics

Question: is percentage error not the same as mean absolute percentage error ("mean" just refers to sample averaging), or do you mean sth different?

@marrov
Copy link
Collaborator Author

marrov commented Mar 7, 2023

I believe bias error = percentage error, it's just different nomenclature. Hence, the issue with using MAPE without the averaging as a proxy for bias/percentage comes from the absolute value part. The main difference that I see in terms of design here is that bias/percentage error cannot/should not be averaged (differently signed error cancel out), so it should always produce an output at a very granular/disaggregated level.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 7, 2023

ahhhh, you want it without the absolute value.

Got it, that would be a nice "new issue".
Worth probably writing down the formula to avoid confusion.

At some point, we also wanted to write all the formulae in the metrics.
I think this would be a good first issue.

@RNKuhns
Copy link
Contributor

RNKuhns commented Mar 8, 2023

One thing to consider is whether bias is a performance metric or a statistical test?

A large negative error and large positive error can balance each other out when calculating average error (used in bias test). This is not good for selecting/tuning models (since zero or small errors may not actually signal smaller divergence between actual and predictions).

It is good statistical test of model performance (just like evaluating that errors aren’t overly auto correlated).

@marrov
Copy link
Collaborator Author

marrov commented Mar 8, 2023

I believe not all error metrics have to be used or useful for model selection/tuning to be relevant. In this case, this request comes from business use case. I agree that aggregating bias errors should not be done, but bias per time-index is a KPI for many solutions in order keeping track of tendencies of over/under forecasting in practice. Hence, I think this is a good reason to have a class for this use case rather than having to import the "private" _percentage_error function.

fkiraly added a commit that referenced this issue Mar 14, 2023
…tric in `evaluate` (#4294)

Fixes #4292

@marrov, thanks, I think you've correctly identified the source location of the bug!

I'm not quite clear why we are dealing with a `DataFrame` that has repeated index (zero??), but I've fixed it now by chaging the chained double access to single `loc` access with two indices, and ensuring the multiple indexp roblem does not occur.
Bugfixing automation moved this from Under review to Fixed/resolved Mar 14, 2023
@RNKuhns
Copy link
Contributor

RNKuhns commented Mar 14, 2023

@fkiraly the use case totally makes sense. It is just a matter of organization.

Bias is on of several post hoc statistical test evaluations that aren’t really for tuning model performance, but are used in evaluating model quality.

It may make sense to delineate separate or otherwise communicate the difference to users (so people don’t mistakenly tune a model base on bias or other tests like that wghich may be added in the future).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting module:metrics&benchmarking metrics and benchmarking modules
Projects
Bugfixing
Fixed/resolved
Development

Successfully merging a pull request may close this issue.

3 participants