Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Detrender returns incorrect result when using inverse_transform #4144

Closed
KishManani opened this issue Jan 24, 2023 · 8 comments · Fixed by #4150
Closed

[BUG] Detrender returns incorrect result when using inverse_transform #4144

KishManani opened this issue Jan 24, 2023 · 8 comments · Fixed by #4150
Labels
bug Something isn't working module:datatypes datatypes module: data containers, checkers & converters module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing
Projects

Comments

@KishManani
Copy link
Contributor

KishManani commented Jan 24, 2023

Describe the bug
The inverse_transform is not returned correctly when passing a Series type which has a different name from the DataFrame or Series used during fit. Instead two columns which are null are returned. Another condition to trigger this bug appears to be when the user supplies data with a different time index during fit and inverse_transform.

To Reproduce

from sktime.datasets import load_airline
from sktime.transformations.series.detrend import Detrender
df = load_airline().rename("y").to_frame()

detrender = Detrender()
# Fit on some time period
detrender.fit(df.loc[:"1949-03", "y"])
# De-trend on a different time period
y_detrended = detrender.transform(df.loc["1949-04":, "y"])
# Inverse transform (i.e., add the trend back)
detrender.inverse_transform(y_detrended)

image

This behaviour only incurs when the input is a Series. If I were to wrap "y" above in a list, ["y"], so that a DataFrame is instead passed to fit and transform, then we do not get this bug.

Expected behavior
A single column with the trend added back resulting in the same data as the original dataframe.

Versions

System:
python: 3.8.7 (default, May 6 2021, 21:53:45) [Clang 12.0.0 (clang-1200.0.32.29)]
executable: /Users/kishan_manani/.pyenv/versions/3.8.7/envs/udemy-ts/bin/python3.8
machine: macOS-11.6.6-x86_64-i386-64bit

Python dependencies:
pip: 22.3.1
setuptools: 49.2.1
sklearn: 1.2.0
sktime: 0.15.1
statsmodels: 0.13.2
numpy: 1.21.6
scipy: 1.8.1
pandas: 1.5.2
matplotlib: 3.6.2
joblib: 1.2.0
numba: 0.56.4
pmdarima: 1.8.5
tsfresh: 0.19.0

@KishManani KishManani added the bug Something isn't working label Jan 24, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Jan 26, 2023

Confirmed on python 3.10, windows.

I think it is unclear whether there is expected behaviour - i.e., whether a series of different name should be, by default treated as a different column.

The "logical root case" imo is that the name gets lost after transform - is it?

@fkiraly fkiraly added this to Needs triage & validation in Bugfixing via automation Jan 26, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Jan 26, 2023

Also, slightly confused - I thought we test for all forecasters that they preserve the name attribute for pd.Series - this has been a source of problems previously:
#3323

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 26, 2023

hm, reason is that convert_MvS_to_UvS_as_Series, in line 89, deletes the name of any pd.Series output after conversion brom pd.DataFrame.
That's interesting, and I vaguely remember this had to be added due to some conversion breakage.

Let's see what happens if I remove this...
#4150
(fixes the problem in this issue, but may have other side effects, e.g., when converting pd.DataFrame to pd.Series and back)

@fkiraly fkiraly added module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing module:datatypes datatypes module: data containers, checkers & converters labels Jan 26, 2023
@fkiraly fkiraly moved this from Needs triage & validation to Investigating in Bugfixing Jan 26, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Jan 26, 2023

Another condition for trigger this bug appears to be when the user supplies data with a different time index during fit and inverse_transform.

Can you give an example for this? This should not happen, as the Detrender is tested indirectly via pipelines.

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 27, 2023

Another attempted fix is to remove the name at the start and potentially add it back at the end, so internal usage never sees the pd.Series name attr:
#4155

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 27, 2023

You've opened a can of worms @KishManani.
Looks like the name attribute is inconsistently handled, the problem is that tests did not cover this properly (but I thought they were).

See here for an explanation why it looked like it was covered but wasn't, and a test that adds this coverage:
#4157

See here for various related fixes:
#4150
#4155
#4161

if you have time, would be interested to hear which approach looks more promising.

And, whether there will be unintended side effects of the fix...

@KishManani
Copy link
Contributor Author

Can you give an example for this? This should not happen, as the Detrender is tested indirectly via pipelines.

I'm unable to reproduce the error so this is likely a mistake on my end.

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 30, 2023

I think the combination of PR merged together in #4150 now fixes this.

fkiraly added a commit that referenced this issue Feb 1, 2023
Adds a test, `test_predict_series_name_preserved`, to the forecasting test suite, to test that the `name` attribute of `pd.Series` is preserved in `fit`/`predict`.

A test for detecting the general issue at the root of #4144, which currently was not covered by the forecasting test suite, as the scenarios containing `pd.Series` with non-None `name` attribute did not interact with the tests in which the time index and name were checked for consistency via `_assert_correct_pred_time_index`.

If this works, this should surface a number of cases with the problem, among them `NaiveForecaster` (where this surfaced through manual testing in the context of #4150 failures)
fkiraly added a commit that referenced this issue Feb 1, 2023
…4161)

This PR collects fixes for a number of forecasters to preserve the `name` attribute in `predict`.
See #4144. The issues were found by #4157.

Does *not* contain changes to the conversion logic, or tests (e.g., #4157).
Testing is via merging this PR into the options dealing with the `name` attribute.

Contains fixes for:

* `ARDL`
* `ARIMA` and `AutoARIMA`
* `AutoEnsembleForecaster`
* `AutoETS`
* `Croston`
* `NaiveForecaster`
* `OnlineEnsembleForecaster`
* reducers
* `SquaringResiduals`
* `StackingForecaster`
* `StatsForecastAutoARIMA`
* `STLForecaster`
* `ThetaModularForecaster`
* `TrendForecaster` and `PolynomialTrendForecaster`

As tempting as it might be to add the fix to `BaseForecaster` somewhere in `predict`: whilt it might be DRY-er, I think it would muddle concerns. If at all, this should go in `convert`-like boilerplate rather than in the base class itself, but unclear how that would look like.
fkiraly added a commit that referenced this issue Feb 1, 2023
…to/from `pd.DataFrame` and `np.ndarray`, as `Series` scitype (#4150)

This PR ensures that the `pd.Series` `name` attribute is preserved when converting to and back from `pd.DataFrame` or `np.ndarray` under the `Series` scitype. Fixes #4144.

Currently, the back-conversion intentionally always reset the `name` attribute to `None`, which could lead to unexpected behaviour of some estimators, and user frustration, see #4144

Simply removing the "set to None" breaks other things, in the case of conversion `pd.DataFrame` to `pd.Series` and back.

(experimental PR until tests are added and it is ensured that nothing else breaks)

Depends on:

* #4157 for testing
* #4161 for fixing uncovered bugs via #4157
Bugfixing automation moved this from Investigating to Fixed/resolved Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:datatypes datatypes module: data containers, checkers & converters module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing
Projects
Bugfixing
Fixed/resolved
2 participants