-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix fh.to_relative() bug for DatetimeIndex #582
Conversation
Good to know, I was already wondering 😁 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aiwalter, what about going from relative to absolute in to_absolute
, do we also have to make changes there if the time index/cutoff is pd.DatetimeIndex
/pd.Timestamp
?
With this we should also be able to simplify _coerce_duration_to_int
and other utility functions in utils/datetime
which are a mess currently, but can do that in a separate PR.
sktime/forecasting/base/_fh.py
Outdated
|
||
# Bug fix for DatetimeIndex delta calculation (see #534) | ||
if isinstance(index, pd.DatetimeIndex): | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the error that we need to catch here? Can we not check if either index or freq has a freq
and raise an error otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the tests, there are some indices with only one value and freq=None
, this is causing the error. Also we have to try
first in order to convert e.g. the MS
from DatetimeIndex
into M
of PeriodIndex
automatically. If doing y.index.to_period(freq="MS")
will result in the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-21-3c5e56bae74c> in <module>
----> 1 y.index.to_period(freq="MS")
c:\users\martin\miniconda3\envs\sktime37\lib\site-packages\pandas\core\indexes\extension.py in method(self, *args, **kwargs)
76
77 def method(self, *args, **kwargs):
---> 78 result = attr(self._data, *args, **kwargs)
79 if wrap:
80 if isinstance(result, type(self._data)):
c:\users\martin\miniconda3\envs\sktime37\lib\site-packages\pandas\core\arrays\datetimes.py in to_period(self, freq)
1109 freq = res
1110
-> 1111 return PeriodArray._from_datetime64(self._data, freq, tz=self.tz)
1112
1113 def to_perioddelta(self, freq):
c:\users\martin\miniconda3\envs\sktime37\lib\site-packages\pandas\core\arrays\period.py in _from_datetime64(cls, data, freq, tz)
228 PeriodArray[freq]
229 """
--> 230 data, freq = dt64arr_to_periodarr(data, freq, tz)
231 return cls(data, freq=freq)
232
c:\users\martin\miniconda3\envs\sktime37\lib\site-packages\pandas\core\arrays\period.py in dt64arr_to_periodarr(data, freq, tz)
957 data = data._values
958
--> 959 base = freq._period_dtype_code
960 return c_dt64arr_to_periodarr(data.view("i8"), base, tz), freq
961
AttributeError: 'pandas._libs.tslibs.offsets.MonthBegin' object has no attribute '_period_dtype_code'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not check if either index or freq has a freq and raise an error otherwise?
Then we have to rewrite some tests. And I think it is ok to accept also freq=None
as e.g. an index like pd.DatetimeIndex(['2000-02-10 00:00:00'])
is just resulting in DatetimeIndex(['2000-02-10'], dtype='datetime64[ns]', freq=None)
@mloning I am having a look into |
Hi @aiwalter - I had a look at this as well and reworked your initial solution a little bit, shall I just push it to this branch or create a separate PR? |
Ok feel free to push it here :) |
…tly required in _check_cutoff
@mloning no problem, not sure which commit is failing now? |
So with these changes now, this one still fails but raises a more helpful error: from sktime.forecasting.all import *
y = load_airline()
y.index = y.index.to_timestamp()
y_train, y_test = temporal_train_test_split(y, test_size=120)
fh = ForecastingHorizon(y_test.index[0:5], is_relative=False)
fh.to_relative(cutoff=y_train.index[-1]) And this one works: from sktime.forecasting.all import *
y = load_airline()
y.index = y.index.to_timestamp("M")
y_train, y_test = temporal_train_test_split(y, test_size=120)
fh = ForecastingHorizon(y_test.index[0:5], is_relative=False)
fh.to_relative(cutoff=y_train.index[-1]) |
@aiwalter Any thoughts on these changes? I'm happy to merge them as they are, but agree that we need to further try to simplify things for the handling of time indices and the forecasting horizon. |
@mloning I am not sure if it is the best UX when excluding some indices based on some >>> pd.date_range(start='1/1/2018 00:00:00', end='2/1/2018 00:01:00', freq="MS")
DatetimeIndex(['2018-01-01', '2018-02-01'], dtype='datetime64[ns]', freq='MS')
>>> pd.date_range(start='1/1/2018 00:00:00', end='2/1/2018 00:01:00', freq="MS").to_period().to_timestamp()
DatetimeIndex(['2018-01-01', '2018-02-01'], dtype='datetime64[ns]', freq=None) I am posting this in the pandas gitter as well if there is any other way to solve this. Actually my solution with try/except was working well because I was able to read the Further there is now an issue here: from sktime.forecasting.all import *
y = load_airline()
y.index = y.index.to_timestamp("M")
y_train, y_test = temporal_train_test_split(y, test_size=10)
from sktime.forecasting.ets import AutoETS
from sktime.forecasting.fbprophet import Prophet
from sktime.forecasting.arima import AutoARIMA
forecaster = EnsembleForecaster([
("autoARIMA", AutoARIMA(sp=12)),
("autoETS", AutoETS(auto=True, sp=12, n_jobs=-1)),
("fbprophet", Prophet(seasonality_mode="multiplicative", add_country_holidays={"country_name": "US"}))
])
%time forecaster.fit(y_train)
forecaster.predict([1,2,3]) which results in this error: So that is actually sth related to #561 . |
I think this is fine for now because "M" cannot reliably be handled otherwise, so better to raise an informative error. Re pandas, I think it shouldn't forget I'm not a fan of the try-except workflow if it can be avoided, I'd prefer to handle a known set of cases and raise informative errors otherwise. I'd suggest to merge this PR first since it's already an improvement, even though it doesn't fix all the issues, and then you can have another go at including more cases if you like? |
@mloning its fine for me to merge this already. I was also thinking about opening our |
Alright great @aiwalter - I'll merge this one now and then we can tackle the remaining issues. A lot of the complications for us seem to stem from incomplete pandas functionality - perhaps it's best to contribute to pandas, but that will take some more time to get it approved and merged. |
Reference Issues/PRs
Solved #534
Not solved #561 as this comes from
AutoETS
and is a different bug.What does this implement/fix? Explain your changes.
Bug fix