Fix fh.to_relative() bug for DatetimeIndex #582

aiwalter · 2020-12-29T14:18:15Z

Reference Issues/PRs

Solved #534
Not solved #561 as this comes from AutoETS and is a different bug.

What does this implement/fix? Explain your changes.

Bug fix

mloning · 2020-12-29T15:44:53Z

@aiwalter thanks! We're working on the failing test on master #581

aiwalter · 2020-12-29T16:07:10Z

Good to know, I was already wondering 😁

into datetimeindex

…to datetimeindex

mloning

Thanks @aiwalter, what about going from relative to absolute in to_absolute, do we also have to make changes there if the time index/cutoff is pd.DatetimeIndex/pd.Timestamp?

With this we should also be able to simplify _coerce_duration_to_int and other utility functions in utils/datetime which are a mess currently, but can do that in a separate PR.

mloning · 2020-12-30T12:33:50Z

sktime/forecasting/base/_fh.py

+
+            # Bug fix for DatetimeIndex delta calculation (see #534)
+            if isinstance(index, pd.DatetimeIndex):
+                try:


What's the error that we need to catch here? Can we not check if either index or freq has a freq and raise an error otherwise?

In the tests, there are some indices with only one value and freq=None, this is causing the error. Also we have to try first in order to convert e.g. the MS from DatetimeIndex into M of PeriodIndex automatically. If doing y.index.to_period(freq="MS") will result in the following error:

--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-21-3c5e56bae74c> in <module> ----> 1 y.index.to_period(freq="MS") c:\users\martin\miniconda3\envs\sktime37\lib\site-packages\pandas\core\indexes\extension.py in method(self, *args, **kwargs) 76 77 def method(self, *args, **kwargs): ---> 78 result = attr(self._data, *args, **kwargs) 79 if wrap: 80 if isinstance(result, type(self._data)): c:\users\martin\miniconda3\envs\sktime37\lib\site-packages\pandas\core\arrays\datetimes.py in to_period(self, freq) 1109 freq = res 1110 -> 1111 return PeriodArray._from_datetime64(self._data, freq, tz=self.tz) 1112 1113 def to_perioddelta(self, freq): c:\users\martin\miniconda3\envs\sktime37\lib\site-packages\pandas\core\arrays\period.py in _from_datetime64(cls, data, freq, tz) 228 PeriodArray[freq] 229 """ --> 230 data, freq = dt64arr_to_periodarr(data, freq, tz) 231 return cls(data, freq=freq) 232 c:\users\martin\miniconda3\envs\sktime37\lib\site-packages\pandas\core\arrays\period.py in dt64arr_to_periodarr(data, freq, tz) 957 data = data._values 958 --> 959 base = freq._period_dtype_code 960 return c_dt64arr_to_periodarr(data.view("i8"), base, tz), freq 961 AttributeError: 'pandas._libs.tslibs.offsets.MonthBegin' object has no attribute '_period_dtype_code'

Can we not check if either index or freq has a freq and raise an error otherwise?

Then we have to rewrite some tests. And I think it is ok to accept also freq=None as e.g. an index like pd.DatetimeIndex(['2000-02-10 00:00:00']) is just resulting in DatetimeIndex(['2000-02-10'], dtype='datetime64[ns]', freq=None)

aiwalter · 2020-12-30T16:46:01Z

@mloning I am having a look into to_absolute() to see if we can improve things as well. I already found sth related to the freq, so lets keep this PR open a bit..

mloning · 2020-12-30T18:10:49Z

Hi @aiwalter - I had a look at this as well and reworked your initial solution a little bit, shall I just push it to this branch or create a separate PR?

aiwalter · 2020-12-30T18:12:59Z

Ok feel free to push it here :)

…tly required in _check_cutoff

mloning · 2020-12-30T19:46:41Z

@aiwalter Sorry if I accidentally overwrote any of your most recent stuff! Feel free to revert my changes if you don't like them. There's another issue with pd.DatetimeIndex which I mentioned in #534

aiwalter · 2020-12-30T20:32:04Z

@mloning no problem, not sure which commit is failing now?

mloning · 2020-12-31T01:45:27Z

So with these changes now, this one still fails but raises a more helpful error:

from sktime.forecasting.all import *
y = load_airline()
y.index = y.index.to_timestamp()
y_train, y_test = temporal_train_test_split(y, test_size=120)
fh = ForecastingHorizon(y_test.index[0:5], is_relative=False)
fh.to_relative(cutoff=y_train.index[-1])

And this one works:

from sktime.forecasting.all import *
y = load_airline()
y.index = y.index.to_timestamp("M")
y_train, y_test = temporal_train_test_split(y, test_size=120)
fh = ForecastingHorizon(y_test.index[0:5], is_relative=False)
fh.to_relative(cutoff=y_train.index[-1])

mloning · 2020-12-31T11:56:01Z

@aiwalter Any thoughts on these changes? I'm happy to merge them as they are, but agree that we need to further try to simplify things for the handling of time indices and the forecasting horizon.

aiwalter · 2020-12-31T17:41:26Z

@mloning I am not sure if it is the best UX when excluding some indices based on some freq values. The problem for some values like freq="MS" is here, as the freq value is resulting in None when doing .to_period().to_timestamp():

>>> pd.date_range(start='1/1/2018 00:00:00', end='2/1/2018 00:01:00', freq="MS")
DatetimeIndex(['2018-01-01', '2018-02-01'], dtype='datetime64[ns]', freq='MS')

>>> pd.date_range(start='1/1/2018 00:00:00', end='2/1/2018 00:01:00', freq="MS").to_period().to_timestamp()
DatetimeIndex(['2018-01-01', '2018-02-01'], dtype='datetime64[ns]', freq=None)

I am posting this in the pandas gitter as well if there is any other way to solve this.

Actually my solution with try/except was working well because I was able to read the freq from cutoff and assign it back. We could just test all freq values from here to make sure it is working for all? What do you think?

Further there is now an issue here:

from sktime.forecasting.all import *
y = load_airline()
y.index = y.index.to_timestamp("M")
y_train, y_test = temporal_train_test_split(y, test_size=10)

from sktime.forecasting.ets import AutoETS
from sktime.forecasting.fbprophet import Prophet
from sktime.forecasting.arima import AutoARIMA

forecaster = EnsembleForecaster([
    ("autoARIMA", AutoARIMA(sp=12)),
    ("autoETS", AutoETS(auto=True, sp=12, n_jobs=-1)),
    ("fbprophet", Prophet(seasonality_mode="multiplicative", add_country_holidays={"country_name": "US"}))
])
%time forecaster.fit(y_train)

forecaster.predict([1,2,3])

which results in this error:
ValueError: Index type not supported. Please consider using pd.PeriodIndex.

So that is actually sth related to #561 .

aiwalter · 2021-01-01T22:35:24Z

@mloning : See pandas issue. I didnt know about freq="infer", not sure how reliable this works 🤔

mloning · 2021-01-04T19:42:32Z

which results in this error:
ValueError: Index type not supported. Please consider using pd.PeriodIndex.

So that is actually sth related to #561 .

I think this is fine for now because "M" cannot reliably be handled otherwise, so better to raise an informative error.

Re pandas, I think it shouldn't forget freq even if values are missing, freq to me is not the same as complete, but that's a different issue.

I'm not a fan of the try-except workflow if it can be avoided, I'd prefer to handle a known set of cases and raise informative errors otherwise.

I'd suggest to merge this PR first since it's already an improvement, even though it doesn't fix all the issues, and then you can have another go at including more cases if you like?

aiwalter · 2021-01-04T23:05:13Z

@mloning its fine for me to merge this already. I was also thinking about opening our fbprophet for other index types by wrapping them around DatetimeIndex, but might do that in a separate PR. This will however make the index stuff even a bit more complex

mloning · 2021-01-05T12:33:14Z

Alright great @aiwalter - I'll merge this one now and then we can tackle the remaining issues.

A lot of the complications for us seem to stem from incomplete pandas functionality - perhaps it's best to contribute to pandas, but that will take some more time to get it approved and merged.

Fix fh.to_relative() bug for DatetimeIndex

02d70c9

aiwalter requested a review from mloning as a code owner December 29, 2020 14:18

Markus Löning and others added 4 commits December 29, 2020 16:34

Merge branch 'master' into datetimeindex

dab5e16

Merge branch 'master' of https://github.com/alan-turing-institute/sktime

8550aa3

into datetimeindex

Fixing test issues with freq=None

b3c8bea

Merge branch 'datetimeindex' of https://github.com/aiwalter/sktime in…

00ec72f

…to datetimeindex

mloning reviewed Dec 30, 2020

View reviewed changes

Martin Walter and others added 4 commits December 30, 2020 20:11

Added freq value in fh.to_absolute() from cutoff.freq

bd6b525

cutoff as required argument in to_absolute() as it is already implici…

ab5ae97

…tly required in _check_cutoff

Coerce to period for arithmetic operations on time index or fh

a24d4d6

Coerce to period for arithmetic operations on time index or fh

0c1f6f5

Minor changes

2b3129c

Changed freq in fbprophet example to freq=M

b683376

mloning merged commit c57d882 into sktime:master Jan 5, 2021

yarnabrina mentioned this pull request Aug 19, 2023

[BUG] Month frequency data leads to invalid frequency #5131

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fh.to_relative() bug for DatetimeIndex #582

Fix fh.to_relative() bug for DatetimeIndex #582

aiwalter commented Dec 29, 2020

mloning commented Dec 29, 2020

aiwalter commented Dec 29, 2020

mloning left a comment

mloning Dec 30, 2020

aiwalter Dec 30, 2020

aiwalter Dec 30, 2020

aiwalter commented Dec 30, 2020

mloning commented Dec 30, 2020

aiwalter commented Dec 30, 2020

mloning commented Dec 30, 2020

aiwalter commented Dec 30, 2020

mloning commented Dec 31, 2020

mloning commented Dec 31, 2020

aiwalter commented Dec 31, 2020

aiwalter commented Jan 1, 2021 •

edited

Loading

mloning commented Jan 4, 2021

aiwalter commented Jan 4, 2021

mloning commented Jan 5, 2021

Fix fh.to_relative() bug for DatetimeIndex #582

Fix fh.to_relative() bug for DatetimeIndex #582

Conversation

aiwalter commented Dec 29, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

mloning commented Dec 29, 2020

aiwalter commented Dec 29, 2020

mloning left a comment

Choose a reason for hiding this comment

mloning Dec 30, 2020

Choose a reason for hiding this comment

aiwalter Dec 30, 2020

Choose a reason for hiding this comment

aiwalter Dec 30, 2020

Choose a reason for hiding this comment

aiwalter commented Dec 30, 2020

mloning commented Dec 30, 2020

aiwalter commented Dec 30, 2020

mloning commented Dec 30, 2020

aiwalter commented Dec 30, 2020

mloning commented Dec 31, 2020

mloning commented Dec 31, 2020

aiwalter commented Dec 31, 2020

aiwalter commented Jan 1, 2021 • edited Loading

mloning commented Jan 4, 2021

aiwalter commented Jan 4, 2021

mloning commented Jan 5, 2021

aiwalter commented Jan 1, 2021 •

edited

Loading