New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] NeuralForecastRNN
should auto-detect freq
#6039
Conversation
Enhances `NeuralForecastRNN` to interpret `freq` from `ForecastingHorizon` when passed as `"auto"`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution. Ive made a few comments, please take a look.
Also, can you please add a test for this case? We should test for few standard datasets, and see what happens with bad frequencies (e.g. W or MS) or with RangeIndex.
If all passes, we should also update the get test params to use auto instead of D.
I have updated the code amid requested changes except for this #6039 (comment)
Works fine with freq = 'W'
data = pd.Series(
data=np.random.randn(10),
index=pd.date_range('2022-01-01', periods=10, freq=freq)
)
model = NeuralForecastRNN("auto", max_steps=2)
model.fit(data, fh=[1, 2, 3, 4])
pred = model.predict()
print("Correctly Interpreted:", pred.index.freq == freq)
print(pred)
# Correctly Interpreted: True
# 2022-03-13 -0.519103
# 2022-03-20 -0.556819
# 2022-03-27 -0.517274
# 2022-04-03 -0.551678
# Freq: W-SUN, dtype: float64 with freq = 'MS'
data = pd.Series(
data=np.random.randn(10),
index=pd.date_range('2022-01-01', periods=10, freq=freq)
)
model = NeuralForecastRNN("auto", max_steps=2)
model.fit(data, fh=[1, 2, 3, 4])
pred = model.predict()
# ValueError: Invalid frequency. Please select a frequency that can be converted to a regular `pd.PeriodIndex`. For other frequencies, basic arithmetic operation to compute durations currently do not work reliably.
Test cases using |
I'd say, yes - why does it not detect |
Can you check where is this error (corresponding to MS) originating from in the traceback? Is it from somewhere in sktime code or somewhere from neuralforecast code? If we use neuralforecast directly without using sktime adapter, does it work? If auto fails in test params, does it fail with the above period index error or with the error you generate? If it's the first case, I'll say it's an issue. If the second case, this is probably failing dor range index or index cases. |
I have tested all the frequencies found in the documentation here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html All the frequencies that have been working before are still working (correctly interpreted) with Some freqs raise error with and without These frequencies raise this error originating from
|
That traceback looks very different to what I am used to, just curious which OS+IDE you are using? Is it a normal traceback or with some specific debugger? Regarding your question, I am very surprised. Based on what @fkiraly said in the issue, I thought automatic frequency inference will not raise error. And even if it does, I'm not following why it'll fail in predict and not fit, because that's where you are trying to access freq attribute. Let's wait for @fkiraly to comment, and in the meanwhile can you please share me one code snippet from python console showing exactly where it falls and that full traceback? One final thing, I really appreciate your testing for all supported frequencies. But there even more and they keep changing this quite frequently in different releases. So maybe we have to reduce what all frequencies to test with. |
Oh dear, do you have This has a similar problem internally, when testing with |
|
Thanks for sharing @geetu040. So sktime is trying to do something in predict and not in fit, even though you are not even passing anything in predict. @fkiraly so it's pandas 2.1 only, this is very interesting (and unknown to me). This seems different from this PR's goal, although it's being affected. |
Do you mean, the issue is only 2.1 and above? |
No I meant just in the sense it's not 2.2 as you thought may be the reason in previous comment. I'm not sure myself if pandas version is affecting here or not. |
I see - there have been oddities around |
This one #5131? With my own office hack/validation to ensure exclusive use of range index, I didn't even remember I created this. |
yes, exactly, #5131. Your hack makes me think, perhaps that's exactly what we should do internally, in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added few more comments. One more thing to note is that #6047 is now merged, so it may be better if you pull the latest main
branch and add similar changes to LSTM one as you did for RNN one. After that, you can also consider to parametrise your last test over different classes.
After you do these, let us know and let's run CI. If all tests pass, we can merge it soon. But probably it'll fail few due to freq things, so may be @fkiraly can suggest what's the next course of action here.
Thanks for a detailed Review. I am on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re tests: could you kindly make sure that a test case with freq="auto"
is added to get_test_params
? I suppose this ought to be added both on RNN and LSTM.
I tried and discussed it here: #6039 (comment)
and this
@fkiraly what are your thoughts? |
Does the suggestion of passing Maybe to clarify, my suggestion was that the adapter should do that, not the user. |
Before applying this, I wanted to see the results with other index options supported by pandas. So, I've tried all the indexes that pandas support. Here is the list: https://pandas.pydata.org/docs/reference/api/pandas.Index.html
Using them will raise this Exception and this Exception is not limited to just y = pd.Series(
data=range(10),
index=pd.CategoricalIndex(np.random.randint(0, 3, size=10))
)
NaiveForecaster().fit(y, fh=[1, 2, 3])
# TypeError: Unsupported input data type in NaiveForecaster, input y must be in an sktime compatible format. Allowed scitypes for y in forecasting are Series, Panel, Hierarchical, for instance a pandas.DataFrame with sktime compatible time indices, or with MultiIndex and last(-1) level an sktime compatible time index. See the forecasting tutorial examples/01_forecasting.ipynb, or the data format tutorial examples/AA_datatypes_and_datasets.ipynb See the data format tutorial examples/AA_datatypes_and_datasets.ipynb. If you think the data is already in an sktime supported input format, run sktime.datatypes.check_raise(data, mtype) to diagnose the error, where mtype is the string of the type specification you want. Error message for checked mtypes, in format [mtype: message], as follows: [pd.DataFrame: y must be a pandas.DataFrame, found <class 'pandas.core.series.Series'>] [pd.Series: <class 'pandas.core.indexes.category.CategoricalIndex'> is not supported for y, use one of (<class 'pandas.core.indexes.range.RangeIndex'>, <class 'pandas.core.indexes.period.PeriodIndex'>, <class 'pandas.core.indexes.datetimes.DatetimeIndex'>) or integer index instead.] [np.ndarray: y must be a numpy.ndarray, found <class 'pandas.core.series.Series'>] [df-list: y must be list of pd.DataFrame, found <class 'pandas.core.series.Series'>] [numpy3D: y must be a numpy.ndarray, found <class 'pandas.core.series.Series'>] [pd-multiindex: y must be a pd.DataFrame, found <class 'pandas.core.series.Series'>] [nested_univ: y must be a pd.DataFrame, found <class 'pandas.core.series.Series'>] [pd_multiindex_hier: y must be a pd.DataFrame, found <class 'pandas.core.series.Series'>]
when
Their predictions are regardless of the given freq #6039 (comment) From the above observations, I think we are safe to set freq as "D" when freq is set to "auto" and cannot be interpreted from ForecastingHorizon because the index is either if self.freq == "auto" and fh.freq is None:
# when freq cannot be interpreted from ForecastingHorizon
raise ValueError(
f"Error in {self.__class__.__name__}, "
f"could not interpret freq, "
f"try passing freq in model initialization"
)
self._freq = fh.freq if self.freq == "auto" else self.freq to self._freq = (fh.freq or "D") if (self.freq == "auto") else (self.freq) After applying the new changes locally I have run all the test cases, including test_params set to "auto" instead of "D" as discussed here:
All the test were successful. One last caveat is that during the process I realized index = pd.timedelta_range(start='1 days', periods=10)
y = pd.Series(data=range(10), index=index)
model = NeuralForecastRNN("D", max_steps=1, trainer_kwargs={"logger": False})
model.fit(y, fh=[1, 2, 3])
# ValueError: The time column ('ds') should have either timestamps or integers, got 'timedelta64[ns]'. |
Hm, I think there are three cases we need to cover, not just two:
I would have thought we treat things as follows: A passes Currntly, we always pass Or, is an informative error still raised in case B? |
@geetu040 first of all, thank you for the detailed analysis you've done for both offset aliases and index choices, this is great.
I am somehow doubtful of this statement. If you create your DatetimeIndex using
@fkiraly do you mean you want to pass D for both B and C cases, and pass indices as range index for B?
I don't have a strong opinion, but I will prefer if we don't do this. It's one more condition based on how pandas index frequencies work. Also |
@yarnabrina, typo, and my brain's error correction is not good enough to decode this |
Hm, no, that's not what I meant, but this could be a better idea! What I meant is that B should raise an informative error, but your case handling logic for B might be better.
Can you clarify what you are referring to by "this"? It seems there was slight miscommunication on our expectations what happens in case B. As said, they way how you understood it was not how I meant it afaik, but I also think your version is better. |
About this
But I'll try this in code |
I would try irregular spaced time stamps, e.g., 9:00:00, 10:00:00, 12:00:00, 13:42:42, etc. |
I meant missing dates, e.g. 1st jan, 3rd Jan, 4th Jan, 5th Jan, etc.
I meant passing "D" if user do not pass any frequency and data has range index. So this is same as your original suggestion for C case. |
I see, what would be your preference in case C then, @yarnabrina? Always raising an exception? |
Yes. Unless user passes freq or fh specifically is able to detect one, I'll prefer to raise. Basically I don't want auto case to take benefit of a specific (somewhat questionable) design choice of neuralforecast in case it backfires in future (some change in logic in neuralforecast). If you want to add that option, to make it simpler for users, another argument or force-auto type thing can be done. But of course this is just my personal opinion/preference. |
@yarnabrina, this solution might be better in isolation, but my concern is that the estimator would not satisfy the basic contract then that every forecaster runs with any |
RangeIndex is still supported, users just need to give a freq argument themselves. As I said I don't have a strong opinion. I'm already happy with this PR as it is currently, and if this is added, all of working cases will remain unmodified so I don't have any feedback. |
If we don't raise an error here, another Exception I have tried with this scenario
|
The above comment will be more understandable from this For Reference self._freq = (fh.freq or "D") if (self.freq == "auto") else (self.freq) For Reference A-B-C conditions No Error on these indexes from both codes
Behavior for both code differs here
To simplify above table Behavior for both code differs here
|
I see - so from this, are you making the case that code 1 would be preferable due to consistency? Since code 1 is current, and @yarnabrina says he is fine with current, and tests pass at current - shall we just leave it at current? I would be happy with that, too. |
There are just some changes yet to make
I have these changes locally and all the test cases run fine. I'll commit and push them soon as @yarnabrina approves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since code 1 is current, and @yarnabrina says he is fine with current, and tests pass at current - shall we just leave it at current? I would be happy with that, too.
Approved on principle from my side.
@geetu040 I'm not sure why you need to change that code. I thought we all agreed that we'll not make a change to pass D in case of fh.freq fails with RangeIndex?
Please let me know if I misunderstood the above discussion.
Ok ok I misunderstood actually, but now I'm clear |
@fkiraly tagging in case you missed this PR before release. |
I did not miss, I simply assumed you were going to merge. |
Enhances
NeuralForecastRNN
to interpretfreq
fromForecastingHorizon
when passed as"auto"
Reference Issues/PRs
Fixes #6003.
What does this implement/fix? Explain your changes.
The
NeuralForecastRNN
constructor previously required afreq
argument, which is now proposed to default to"auto"
in which case it interpretsfreq
fromForecastingHorizon
, leveragingfh.freq
in thefit
method.What should a reviewer concentrate their feedback on?
I have run the tests with the updated estimator
freq
can now be passed like this:PR checklist
For all contributions
How to: add yourself to the all-contributors file in the
sktime
root directory (not theCONTRIBUTORS.md
). Common badges:code
- fixing a bug, or adding code logic.doc
- writing or improving documentation or docstrings.bug
- reporting or diagnosing a bug (get this pluscode
if you also fixed the bug in the PR).maintenance
- CI, test framework, release.See here for full badge reference
See here for further details on the algorithm maintainer role.