[ENH] `NeuralForecastRNN` should auto-detect `freq` #6039

geetu040 · 2024-03-01T03:44:12Z

Enhances NeuralForecastRNN to interpret freq from ForecastingHorizon when passed as "auto"

Reference Issues/PRs

What does this implement/fix? Explain your changes.

The NeuralForecastRNN constructor previously required a freq argument, which is now proposed to default to "auto" in which case it interprets freq from ForecastingHorizon, leveraging fh.freq in the fit method.

What should a reviewer concentrate their feedback on?

I have run the tests with the updated estimator

results = check_estimator(NeuralForecastRNN) # All tests PASSED!

freq can now be passed like this:

y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X, test_size=4)

model = NeuralForecastRNN(
	"auto",	# interprets to be "A-DEC"
	futr_exog_list=["ARMED", "POP"], max_steps=5)

model.fit(y_train, X=X_train, fh=[1, 2, 3, 4])

model.predict(X=X_test)
# Seed set to 1
# 1959    66241.984375
# 1960    66700.132812
# 1961    66550.195312
# 1962    67310.007812
# Freq: A-DEC, Name: TOTEMP, dtype: float64

PR checklist

For all contributions

I've added myself to the list of contributors with any new badges I've earned :-)
How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
See here for full badge reference
Optionally, I've added myself and possibly others to the CODEOWNERS file - do this if you want to become the owner or maintainer of an estimator you added.
See here for further details on the algorithm maintainer role.
The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

Enhances `NeuralForecastRNN` to interpret `freq` from `ForecastingHorizon` when passed as `"auto"`

yarnabrina

Thanks for your contribution. Ive made a few comments, please take a look.

Also, can you please add a test for this case? We should test for few standard datasets, and see what happens with bad frequencies (e.g. W or MS) or with RangeIndex.

If all passes, we should also update the get test params to use auto instead of D.

sktime/forecasting/base/adapters/_neuralforecast.py

geetu040 · 2024-03-03T18:47:12Z

I have updated the code amid requested changes except for this #6039 (comment)

#6039 (review)

see what happens with bad frequencies (e.g. W or MS)

Works fine with freq="W"

freq = 'W'
data = pd.Series(
	data=np.random.randn(10),
	index=pd.date_range('2022-01-01', periods=10, freq=freq)
)
model = NeuralForecastRNN("auto", max_steps=2)
model.fit(data, fh=[1, 2, 3, 4])
pred = model.predict()
print("Correctly Interpreted:", pred.index.freq == freq)
print(pred)
# Correctly Interpreted: True
# 2022-03-13   -0.519103
# 2022-03-20   -0.556819
# 2022-03-27   -0.517274
# 2022-04-03   -0.551678
# Freq: W-SUN, dtype: float64

with freq="MS", This same errors persists before and after changes

freq = 'MS'
data = pd.Series(
	data=np.random.randn(10),
	index=pd.date_range('2022-01-01', periods=10, freq=freq)
)
model = NeuralForecastRNN("auto", max_steps=2)
model.fit(data, fh=[1, 2, 3, 4])
pred = model.predict()
# ValueError: Invalid frequency. Please select a frequency that can be converted to a regular `pd.PeriodIndex`. For other frequencies, basic arithmetic operation to compute durations currently do not work reliably.

#6039 (review)

If all passes, we should also update the get test params to use auto instead of D.

Test cases using check_estimator fail when auto is used instead of D. Do I need to do something about this?

fkiraly · 2024-03-03T19:14:12Z

Test cases using check_estimator fail when auto is used instead of D. Do I need to do something about this?

I'd say, yes - why does it not detect D, in the test cases? This hints at "auto" not working as intended.

yarnabrina · 2024-03-04T03:40:12Z

Can you check where is this error (corresponding to MS) originating from in the traceback? Is it from somewhere in sktime code or somewhere from neuralforecast code? If we use neuralforecast directly without using sktime adapter, does it work?

If auto fails in test params, does it fail with the above period index error or with the error you generate? If it's the first case, I'll say it's an issue. If the second case, this is probably failing dor range index or index cases.

geetu040 · 2024-03-04T08:59:52Z

I have tested all the frequencies found in the documentation here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
and this is what I've found

All the frequencies that have been working before are still working (correctly interpreted) with freq="auto". This infers that there is ostensibly nothing wrong with freq="auto". I have reached my conclusion by using the same method I have introduced in test case.
Although I am yet to find why can't it interpret the freq in test_params when we replace D with auto.
And by the way these are the freqs I'm talking about: B C W D h min s ms us ns M Q Y W-SUN W-MON W-TUE W-WED W-THU W-FRI W-SAT

Some freqs raise error with and without freq="auto" and are same as before and after changes
These are the freqs I am talking about: MS BMS CBMS SMS QS BQS YS BYS bh cbh

These frequencies raise this error originating from _predict

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File [~/work/os/sktime/sktime/forecasting/base/_fh.py:953](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:953), in _coerce_to_period(x, freq)
    [952](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:952)     print("Yoooo ->", freq)
--> [953](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:953)     return x.to_period(freq)
    [954](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:954) except (ValueError, AttributeError) as e:

File [~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/indexes/extension.py:98](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/indexes/extension.py:98), in _inherit_from_data.<locals>.method(self, *args, **kwargs)
     [97](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/indexes/extension.py:97)     raise ValueError(f"cannot use inplace with {type(self).__name__}")
---> [98](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/indexes/extension.py:98) result = attr(self._data, *args, **kwargs)
     [99](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/indexes/extension.py:99) if wrap:

File [~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:1190](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:1190), in DatetimeArray.to_period(self, freq)
   [1188](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:1188)     freq = res
-> [1190](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:1190) return PeriodArray._from_datetime64(self._ndarray, freq, tz=self.tz)

File [~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:297](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:297), in PeriodArray._from_datetime64(cls, data, freq, tz)
    [284](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:284) """
    [285](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:285) Construct a PeriodArray from a datetime64 array
    [286](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:286) 
   (...)
    [295](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:295) PeriodArray[freq]
    [296](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:296) """
--> [297](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:297) data, freq = dt64arr_to_periodarr(data, freq, tz)
    [298](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:298) return cls(data, freq=freq)

File [~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:1032](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:1032), in dt64arr_to_periodarr(data, freq, tz)
   [1031](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:1031) freq = Period._maybe_convert_freq(freq)
-> [1032](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:1032) base = freq._period_dtype_code
   [1033](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:1033) return c_dt64arr_to_periodarr(data.view("i8"), base, tz, reso=reso), freq

AttributeError: 'pandas._libs.tslibs.offsets.MonthBegin' object has no attribute '_period_dtype_code'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[4], [line 10](vscode-notebook-cell:?execution_count=4&line=10)
      [8](vscode-notebook-cell:?execution_count=4&line=8) # attempt train
      [9](vscode-notebook-cell:?execution_count=4&line=9) model.fit(y, fh=list(range(1, 100)))
---> [10](vscode-notebook-cell:?execution_count=4&line=10) pred = model.predict()
     [11](vscode-notebook-cell:?execution_count=4&line=11) pred

File [~/work/os/sktime/sktime/forecasting/base/_base.py:444](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_base.py:444), in BaseForecaster.predict(self, fh, X)
    [442](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_base.py:442) # we call the ordinary _predict if no looping/vectorization needed
    [443](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_base.py:443) if not self._is_vectorized:
--> [444](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_base.py:444)     y_pred = self._predict(fh=fh, X=X_inner)
    [445](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_base.py:445) else:
    [446](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_base.py:446)     # otherwise we call the vectorized version of predict
    [447](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_base.py:447)     y_pred = self._vectorize("predict", X=X_inner, fh=fh)

File [~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:301](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:301), in _NeuralForecastAdapter._predict(***failed resolving arguments***)
    [297](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:297)     raise NotImplementedError("Multiple prediction columns are not supported.")
    [299](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:299) model_point_predictions = model_forecasts[prediction_column_names[0]].to_numpy()
--> [301](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:301) absolute_horizons = self.fh.to_absolute_index(self.cutoff)
    [302](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:302) horizon_positions = self.fh.to_indexer(self.cutoff)
    [304](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:304) final_predictions = pandas.Series(
    [305](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:305)     model_point_predictions[horizon_positions],
    [306](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:306)     index=absolute_horizons,
    [307](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:307)     name=self._y.name,
    [308](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:308) )

File [~/work/os/sktime/sktime/forecasting/base/_fh.py:512](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:512), in ForecastingHorizon.to_absolute_index(self, cutoff)
    [492](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:492) """Return absolute values of the horizon as a pandas.Index.
    [493](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:493) 
    [494](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:494) For a forecaster `f` that has `fh` being `self`,
   (...)
    [509](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:509)     Absolute representation of forecasting horizon.
    [510](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:510) """
    [511](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:511) cutoff = self._coerce_cutoff_to_index(cutoff)
--> [512](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:512) fh_abs = _to_absolute(fh=self, cutoff=_HashIndex(cutoff))
    [513](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:513) return fh_abs.to_pandas()

File [~/work/os/sktime/sktime/forecasting/base/_fh.py:885](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:885), in _to_absolute(fh, cutoff)
    [880](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:880)     old_tz = None
    [882](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:882) if is_timestamp:
    [883](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:883)     # coerce to pd.Period for reliable arithmetic operations and
    [884](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:884)     # computations of time deltas
--> [885](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:885)     cutoff = _coerce_to_period(cutoff, freq=fh.freq)
    [887](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:887) if isinstance(cutoff, pd.Index):
    [888](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:888)     cutoff = cutoff[[0] * len(relative)]

File [~/work/os/sktime/sktime/forecasting/base/_fh.py:957](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:957), in _coerce_to_period(x, freq)
    [955](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:955) msg = str(e)
    [956](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:956) if "Invalid frequency" in msg or "_period_dtype_code" in msg:
--> [957](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:957)     raise ValueError(
    [958](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:958)         "Invalid frequency. Please select a frequency that can "
    [959](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:959)         "be converted to a regular `pd.PeriodIndex`. For other "
    [960](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:960)         "frequencies, basic arithmetic operation to compute "
    [961](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:961)         "durations currently do not work reliably."
    [962](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:962)     )
    [963](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:963) else:
    [964](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:964)     raise

ValueError: Invalid frequency. Please select a frequency that can be converted to a regular `pd.PeriodIndex`. For other frequencies, basic arithmetic operation to compute durations currently do not work reliably.

yarnabrina · 2024-03-04T10:30:43Z

That traceback looks very different to what I am used to, just curious which OS+IDE you are using? Is it a normal traceback or with some specific debugger?

Regarding your question, I am very surprised. Based on what @fkiraly said in the issue, I thought automatic frequency inference will not raise error. And even if it does, I'm not following why it'll fail in predict and not fit, because that's where you are trying to access freq attribute. Let's wait for @fkiraly to comment, and in the meanwhile can you please share me one code snippet from python console showing exactly where it falls and that full traceback?

One final thing, I really appreciate your testing for all supported frequencies. But there even more and they keep changing this quite frequently in different releases. So maybe we have to reduce what all frequencies to test with.

fkiraly · 2024-03-04T10:33:28Z

Oh dear, do you have pandas 2.2.X installed perchance?
Is this perhaps related to the recent changes in freq handling and inference in pandas, see #6057 and #5841? FYI @MCRE-BE

This has a similar problem internally, when testing with freq="M". Internally, this converst to "MS" and then DatetimeIndex.to_period complains - only at 2.2.0 or higher.

geetu040 · 2024-03-04T11:16:10Z

Ubuntu 22.04.4 LTS & VS Code Terminal
Here is the code snippet with full traceback

>>> import pandas as pd
>>> from sktime.forecasting.neuralforecast import NeuralForecastRNN
>>> pd.__version__
'2.1.4'
>>> y = pd.Series(data=range(10), index=pd.date_range(start='2022-01-01', periods=10, freq='MS'))
>>> model = NeuralForecastRNN('MS', max_steps=1)
>>> model.fit(y, fh=[1, 2, 3, 4, 5])
Seed set to 1
Epoch 0: 100%|█████████████████████████████████████| 1/1 [00:00<00:00, 27.92it/s, v_num=1, train_loss_step=1.160, train_loss_epoch=1.160]
NeuralForecastRNN(freq='MS', max_steps=1)                                                                                                
>>> model.predict()
Predicting DataLoader 0: 100%|████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 268.21it/s]
Traceback (most recent call last):
  File "/home/geetu/work/os/sktime/sktime/forecasting/base/_fh.py", line 952, in _coerce_to_period
    return x.to_period(freq)
           ^^^^^^^^^^^^^^^^^
  File "/home/geetu/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/indexes/extension.py", line 95, in method
    result = attr(self._data, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geetu/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py", line 1224, in to_period
    return PeriodArray._from_datetime64(self._ndarray, freq, tz=self.tz)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geetu/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py", line 322, in _from_datetime64
    data, freq = dt64arr_to_periodarr(data, freq, tz)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geetu/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py", line 1167, in dt64arr_to_periodarr
    base = freq._period_dtype_code
           ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'pandas._libs.tslibs.offsets.MonthBegin' object has no attribute '_period_dtype_code'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/geetu/work/os/sktime/sktime/forecasting/base/_base.py", line 444, in predict
    y_pred = self._predict(fh=fh, X=X_inner)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geetu/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py", line 281, in _predict
    absolute_horizons = self.fh.to_absolute_index(self.cutoff)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geetu/work/os/sktime/sktime/forecasting/base/_fh.py", line 512, in to_absolute_index
    fh_abs = _to_absolute(fh=self, cutoff=_HashIndex(cutoff))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geetu/work/os/sktime/sktime/forecasting/base/_fh.py", line 885, in _to_absolute
    cutoff = _coerce_to_period(cutoff, freq=fh.freq)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geetu/work/os/sktime/sktime/forecasting/base/_fh.py", line 956, in _coerce_to_period
    raise ValueError(
ValueError: Invalid frequency. Please select a frequency that can be converted to a regular `pd.PeriodIndex`. For other frequencies, basic arithmetic operation to compute durations currently do not work reliably.
>>>

yarnabrina · 2024-03-04T13:32:38Z

Thanks for sharing @geetu040. So sktime is trying to do something in predict and not in fit, even though you are not even passing anything in predict.

@fkiraly so it's pandas 2.1 only, this is very interesting (and unknown to me). This seems different from this PR's goal, although it's being affected.

fkiraly · 2024-03-04T13:34:27Z

@fkiraly so it's pandas 2.1 only

Do you mean, the issue is only 2.1 and above?

yarnabrina · 2024-03-04T13:35:49Z

No I meant just in the sense it's not 2.2 as you thought may be the reason in previous comment. I'm not sure myself if pandas version is affecting here or not.

fkiraly · 2024-03-04T14:38:31Z

I see - there have been oddities around M frequency for a while. Do you remember the other issue? I thought it might have even been opened by you.

yarnabrina · 2024-03-04T15:04:14Z

This one #5131?

With my own office hack/validation to ensure exclusive use of range index, I didn't even remember I created this.

fkiraly · 2024-03-04T15:09:51Z

yes, exactly, #5131. Your hack makes me think, perhaps that's exactly what we should do internally, in ForecastingHorizon. There are too many issues and too frequent changes on how pandas handles frequencies, so internally we should not rely on it too much, imo.

yarnabrina

Added few more comments. One more thing to note is that #6047 is now merged, so it may be better if you pull the latest main branch and add similar changes to LSTM one as you did for RNN one. After that, you can also consider to parametrise your last test over different classes.

After you do these, let us know and let's run CI. If all tests pass, we can merge it soon. But probably it'll fail few due to freq things, so may be @fkiraly can suggest what's the next course of action here.

sktime/forecasting/base/adapters/_neuralforecast.py

sktime/forecasting/tests/test_neuralforecast.py

sktime/forecasting/base/adapters/_neuralforecast.py

sktime/forecasting/tests/test_neuralforecast.py

geetu040 · 2024-03-05T03:22:08Z

Thanks for a detailed Review. I am on it.

fkiraly

Re tests: could you kindly make sure that a test case with freq="auto" is added to get_test_params? I suppose this ought to be added both on RNN and LSTM.

geetu040 · 2024-03-06T19:33:40Z

Re tests: could you kindly make sure that a test case with freq="auto" is added to get_test_params? I suppose this ought to be added both on RNN and LSTM.

I tried and discussed it here: #6039 (comment)

We can still not replace D with auto in freq for get_test_params because it fails the test case because of the Exception I raise here

and this

So my thoughts are that we should work on test cases that are failing because of this newly introduced Exception. Maybe we should alter them to check for this Exception when ForecastingHorizon cannot interpret the freq because of invalid indexing.

@fkiraly what are your thoughts?

fkiraly · 2024-03-06T19:51:05Z

@fkiraly what are your thoughts?

Does the suggestion of passing D if a RangeIndex is encountered not work? See here: #6039 (comment)

Maybe to clarify, my suggestion was that the adapter should do that, not the user.

geetu040 · 2024-03-07T11:06:37Z

So once we tested and in case we find that freq is irrelevant for RangeIndex, can't we do as follows: if index is RangeIndex, just pass "D" so neuralforecast is happy?

Before applying this, I wanted to see the results with other index options supported by pandas. So, I've tried all the indexes that pandas support. Here is the list: https://pandas.pydata.org/docs/reference/api/pandas.Index.html

Index
DatetimeIndex
PeriodIndex
RangeIndex
CategoricalIndex
IntervalIndex
MultiIndex
TimedeltaIndex

sktime does not support series with these indexes:

CategoricalIndex
IntervalIndex
MultiIndex

Using them will raise this Exception and this Exception is not limited to just neuralforecast, but it is raised with all forecasters.

y = pd.Series(
	data=range(10),
	index=pd.CategoricalIndex(np.random.randint(0, 3, size=10))
)
NaiveForecaster().fit(y, fh=[1, 2, 3])
# TypeError: Unsupported input data type in NaiveForecaster, input y must be in an sktime compatible format. Allowed scitypes for y in forecasting are Series, Panel, Hierarchical, for instance a pandas.DataFrame with sktime compatible time indices, or with MultiIndex and last(-1) level an sktime compatible time index. See the forecasting tutorial examples/01_forecasting.ipynb, or the data format tutorial examples/AA_datatypes_and_datasets.ipynb See the data format tutorial examples/AA_datatypes_and_datasets.ipynb. If you think the data is already in an sktime supported input format, run sktime.datatypes.check_raise(data, mtype) to diagnose the error, where mtype is the string of the type specification you want. Error message for checked mtypes, in format [mtype: message], as follows: [pd.DataFrame: y must be a pandas.DataFrame, found <class 'pandas.core.series.Series'>]  [pd.Series: <class 'pandas.core.indexes.category.CategoricalIndex'> is not supported for y, use one of (<class 'pandas.core.indexes.range.RangeIndex'>, <class 'pandas.core.indexes.period.PeriodIndex'>, <class 'pandas.core.indexes.datetimes.DatetimeIndex'>) or integer index instead.]  [np.ndarray: y must be a numpy.ndarray, found <class 'pandas.core.series.Series'>]  [df-list: y must be list of pd.DataFrame, found <class 'pandas.core.series.Series'>]  [numpy3D: y must be a numpy.ndarray, found <class 'pandas.core.series.Series'>]  [pd-multiindex: y must be a pd.DataFrame, found <class 'pandas.core.series.Series'>]  [nested_univ: y must be a pd.DataFrame, found <class 'pandas.core.series.Series'>]  [pd_multiindex_hier: y must be a pd.DataFrame, found <class 'pandas.core.series.Series'>]

sktime supports series with these indexes:

DatetimeIndex
PeriodIndex
RangeIndex
Index

when freq="auto"

DatetimeIndex and PeriodIndex work as usual as their freq is interpreted
RangeIndex and Index have no role in freq

Their predictions are regardless of the given freq #6039 (comment)
but they do raise an Exception when freq is set to "auto" that I will change now to pass "D" freq instead of raising an Exception as suggested here: #6039 (comment)

From the above observations, I think we are safe to set freq as "D" when freq is set to "auto" and cannot be interpreted from ForecastingHorizon because the index is either RangeIndex or Index where arbitrary freq will work out just fine.
For this purpose, I'll have to change the code from

if self.freq == "auto" and fh.freq is None:
	# when freq cannot be interpreted from ForecastingHorizon
	raise ValueError(
		f"Error in {self.__class__.__name__}, "
		f"could not interpret freq, "
		f"try passing freq in model initialization"
	)

self._freq = fh.freq if self.freq == "auto" else self.freq

to

self._freq = (fh.freq or "D") if (self.freq == "auto") else (self.freq)

After applying the new changes locally I have run all the test cases, including test_params set to "auto" instead of "D" as discussed here:

Re tests: could you kindly make sure that a test case with freq="auto" is added to get_test_params? I suppose this ought to be added both on RNN and LSTM.

All the test were successful.
If everything looks fine, I'll commit my changes and push. Also I think I should remove test_neural_forecast_fail_with_auto_freq_on_range_index as it will not be relevant anymore and we have freq="auto" being tested in other test cases and get_test_params

One last caveat is that during the process I realized TimedeltaIndex behaves differently. I am not sure if it is supposed to behave like this or there is a something wrong. If there is a problem, let me know if I should investigate it further. But anyways, I am pretty sure it is irrelevant to this PR.
Here is the code to reproduce it

index = pd.timedelta_range(start='1 days', periods=10)
y = pd.Series(data=range(10), index=index)

model = NeuralForecastRNN("D", max_steps=1, trainer_kwargs={"logger": False})
model.fit(y, fh=[1, 2, 3])
# ValueError: The time column ('ds') should have either timestamps or integers, got 'timedelta64[ns]'.

fkiraly · 2024-03-07T11:50:09Z

Hm, I think there are three cases we need to cover, not just two:

A. the index is time-like (period, timedelta) and has an inferrable freq
B. the index is time-like and does not have inferrable freq
C. the index is int-like, e.g., RangeIndex

I would have thought we treat things as follows: A passes freq as inferred from fh; C passes "D"; B raises the error.

Currntly, we always pass "D" if we cannot infer it.

Or, is an informative error still raised in case B?

yarnabrina · 2024-03-07T14:01:50Z

@geetu040 first of all, thank you for the detailed analysis you've done for both offset aliases and index choices, this is great.

DatetimeIndex and PeriodIndex work as usual as their freq is interpreted

I am somehow doubtful of this statement. If you create your DatetimeIndex using pandas.date_range, of course it's fine. But if you have a list of dates or strings and then use pandas.to_datetime, does it work always, regardless of ~~missifates~~ missing dates or gap between dates? I think this may be same as option B of @fkiraly above.

Currntly, we always pass "D" if we cannot infer it.

@fkiraly do you mean you want to pass D for both B and C cases, and pass indices as range index for B?

Maybe to clarify, my suggestion was that the adapter should do that, not the user.

I don't have a strong opinion, but I will prefer if we don't do this. It's one more condition based on how pandas index frequencies work. Also neuralforecast may change their behaviour in future as you said it's an odd design choice. Or, if you want to have it, we can try to have a "force-auto" option where we pass "D" always for B+C cases you told above.

fkiraly · 2024-03-07T14:12:01Z

missifates

@yarnabrina, typo, and my brain's error correction is not good enough to decode this

fkiraly · 2024-03-07T14:13:02Z

@fkiraly do you mean you want to pass D for both B and C cases, and pass indices as range index for B?

Hm, no, that's not what I meant, but this could be a better idea!

What I meant is that B should raise an informative error, but your case handling logic for B might be better.

I don't have a strong opinion, but I will prefer if we don't do this.

Can you clarify what you are referring to by "this"? It seems there was slight miscommunication on our expectations what happens in case B.

As said, they way how you understood it was not how I meant it afaik, but I also think your version is better.

geetu040 · 2024-03-07T14:17:09Z

About this B case, I am unable to create a such example.

If you create your DatetimeIndex using pandas.date_range, of course it's fine. But if you have a list of dates or strings and then use pandas.to_datetime, does it work always, regardless of missifates or gap between dates?

But I'll try this in code

fkiraly · 2024-03-07T14:28:28Z

I am unable to create a such example.

I would try irregular spaced time stamps, e.g., 9:00:00, 10:00:00, 12:00:00, 13:42:42, etc.

yarnabrina · 2024-03-07T14:34:28Z

missifates

@yarnabrina, typo, and my brain's error correction is not good enough to decode this

I meant missing dates, e.g. 1st jan, 3rd Jan, 4th Jan, 5th Jan, etc.

I don't have a strong opinion, but I will prefer if we don't do this.

Can you clarify what you are referring to by "this"?

I meant passing "D" if user do not pass any frequency and data has range index. So this is same as your original suggestion for C case.

fkiraly · 2024-03-07T15:33:02Z

I meant passing "D" if user do not pass any frequency and data has range index. So this is same as your original suggestion for C case.

I see, what would be your preference in case C then, @yarnabrina? Always raising an exception?

yarnabrina · 2024-03-07T15:49:04Z

Yes. Unless user passes freq or fh specifically is able to detect one, I'll prefer to raise. Basically I don't want auto case to take benefit of a specific (somewhat questionable) design choice of neuralforecast in case it backfires in future (some change in logic in neuralforecast).

If you want to add that option, to make it simpler for users, another argument or force-auto type thing can be done. But of course this is just my personal opinion/preference.

fkiraly · 2024-03-07T15:56:09Z

@yarnabrina, this solution might be better in isolation, but my concern is that the estimator would not satisfy the basic contract then that every forecaster runs with any RangeIndex series. How do we deal with that?

yarnabrina · 2024-03-07T17:36:05Z

RangeIndex is still supported, users just need to give a freq argument themselves.

As I said I don't have a strong opinion. I'm already happy with this PR as it is currently, and if this is added, all of working cases will remain unmodified so I don't have any feedback.

geetu040 · 2024-03-07T19:10:07Z

B. the index is time-like and does not have inferrable freq

If we don't raise an error here, another Exception You must pass a freq argument as current index has none. will be raised anyways

I have tried with this scenario

If you create your DatetimeIndex using pandas.date_range, of course it's fine. But if you have a list of dates or strings and then use pandas.to_datetime, does it work always, regardless of ~~missifates~~ missing dates or gap between dates? I think this may be same as option B of @fkiraly above.

geetu040 · 2024-03-07T19:25:51Z

The above comment will be more understandable from this

For Reference Code 1

self._freq = (fh.freq or "D") if (self.freq == "auto") else (self.freq)

For Reference Code 2

A-B-C conditions

No Error on these indexes from both codes

DatetimeIndex
PeriodIndex
PeriodIndex (Missing Days)
RangeIndex
RangeIndex (Missing Days)
Index
Index (Missing Days)

Behavior for both code differs here DatetimeIndex (Missing Days)

Model	Exception as result of Code 1	Exception as result of Code 2
NeuralForecastRNN("auto")	(predict) You must pass a freq argument as current index has none.	(fit) Error in NeuralForecastRNN, could not interpret freq, try passing freq in model initialization
NeuralForecastRNN("D")	(predict) You must pass a freq argument as current index has none.	(predict) You must pass a freq argument as current index has none.
NaiveForecaster()	(predict) You must pass a freq argument as current index has none.	(predict) You must pass a freq argument as current index has none.

To simplify above table

Behavior for both code differs here DatetimeIndex (Missing Days)

NeuralForecastRNN("D") and NaiveForecaster() (for reference) raises this Excpetion on predict(): You must pass a freq argument as current index has none.
NeuralForecastRNN("auto")
- Code 1 raises this Exception on predict(): You must pass a freq argument as current index has none.
- Code 2 raises this Exception on fit(): Error in NeuralForecastRNN, could not interpret freq, try passing freq in model initialization

fkiraly · 2024-03-07T22:52:09Z

I see - so from this, are you making the case that code 1 would be preferable due to consistency?

Since code 1 is current, and @yarnabrina says he is fine with current, and tests pass at current - shall we just leave it at current? I would be happy with that, too.

geetu040 · 2024-03-08T04:17:22Z

There are just some changes yet to make

Update Code

I'll have to change the code from

if self.freq == "auto" and fh.freq is None:
	# when freq cannot be interpreted from ForecastingHorizon
	raise ValueError(
		f"Error in {self.__class__.__name__}, "
		f"could not interpret freq, "
		f"try passing freq in model initialization"
	)

self._freq = fh.freq if self.freq == "auto" else self.freq

to

self._freq = (fh.freq or "D") if (self.freq == "auto") else (self.freq)

Remove test case for the ValueError Exception
update get_test_params to have freq="auto" at one place instead of freq="D"

I have these changes locally and all the test cases run fine. I'll commit and push them soon as @yarnabrina approves

yarnabrina

Since code 1 is current, and @yarnabrina says he is fine with current, and tests pass at current - shall we just leave it at current? I would be happy with that, too.

Approved on principle from my side.

@geetu040 I'm not sure why you need to change that code. I thought we all agreed that we'll not make a change to pass D in case of fh.freq fails with RangeIndex?

Please let me know if I misunderstood the above discussion.

geetu040 · 2024-03-08T15:01:33Z

@geetu040 I'm not sure why you need to change that code. I thought we all agreed that we'll not make a change to pass D in case of fh.freq fails with RangeIndex?

Ok ok I misunderstood actually, but now I'm clear

yarnabrina · 2024-03-20T15:02:58Z

@fkiraly tagging in case you missed this PR before release.

fkiraly · 2024-03-20T16:37:10Z

I did not miss, I simply assumed you were going to merge.

[ENH] NeuralForecastRNN should auto-detect freq (sktime#6003)

ad840de

Enhances `NeuralForecastRNN` to interpret `freq` from `ForecastingHorizon` when passed as `"auto"`

geetu040 requested review from achieveordie, benHeid, fkiraly and yarnabrina as code owners March 1, 2024 03:44

fkiraly added module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting enhancement Adding new functionality labels Mar 1, 2024

yarnabrina requested changes Mar 1, 2024

View reviewed changes

yarnabrina mentioned this pull request Mar 2, 2024

[ENH] neuralforecast based LSTM model #6047

Merged

geetu040 added 2 commits March 3, 2024 23:40

minor fixes amid requested changes

96b1124

added test cases

46ecb35

geetu040 added 2 commits March 4, 2024 10:42

init attribute fix and document ValueError on freq

0f4559e

add test case to check all supported freqs

76863d3

yarnabrina requested changes Mar 4, 2024

View reviewed changes

geetu040 added 2 commits March 5, 2024 09:28

Merge branch 'main' into auto-detect-freq

bc4dc75

minor fixes and enhances upon review and mimic LSTM with RNN

413736e

fkiraly requested changes Mar 6, 2024

View reviewed changes

geetu040 mentioned this pull request Mar 8, 2024

Add a test_mstl module checking if transform returns desired components #6084

Merged

yarnabrina approved these changes Mar 8, 2024

View reviewed changes

geetu040 requested a review from fkiraly March 8, 2024 17:57

tunglinwood mentioned this pull request Mar 18, 2024

[BUG]NaiveForecaster.fit(y) and NaiveForecaster.predict(fh) can work separately but NaiveForecaster.fit_predict(y, fh) raises TypeError #6157

Open

fkiraly approved these changes Mar 20, 2024

View reviewed changes

fkiraly merged commit d19fda1 into sktime:main Mar 20, 2024
54 checks passed

yarnabrina mentioned this pull request Mar 28, 2024

[ENH] Add neuralforecast auto adapter and an example with AutoTFT #6124

Draft

geetu040 mentioned this pull request Mar 31, 2024

[ENH] Update doc and behavior of freq="auto" in neuralforecast #6237

Merged

3 tasks

[ENH] NeuralForecastRNN should auto-detect freq #6039

[ENH] NeuralForecastRNN should auto-detect freq #6039

Conversation

geetu040 commented Mar 1, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

What should a reviewer concentrate their feedback on?

PR checklist

For all contributions

yarnabrina left a comment

Choose a reason for hiding this comment

geetu040 commented Mar 3, 2024

fkiraly commented Mar 3, 2024

yarnabrina commented Mar 4, 2024

geetu040 commented Mar 4, 2024

yarnabrina commented Mar 4, 2024

fkiraly commented Mar 4, 2024 • edited

geetu040 commented Mar 4, 2024

yarnabrina commented Mar 4, 2024

fkiraly commented Mar 4, 2024

yarnabrina commented Mar 4, 2024

fkiraly commented Mar 4, 2024

yarnabrina commented Mar 4, 2024

fkiraly commented Mar 4, 2024

yarnabrina left a comment

Choose a reason for hiding this comment

geetu040 commented Mar 5, 2024

fkiraly left a comment

Choose a reason for hiding this comment

geetu040 commented Mar 6, 2024

fkiraly commented Mar 6, 2024 • edited

geetu040 commented Mar 7, 2024

fkiraly commented Mar 7, 2024

yarnabrina commented Mar 7, 2024 • edited

fkiraly commented Mar 7, 2024

fkiraly commented Mar 7, 2024 • edited

geetu040 commented Mar 7, 2024

fkiraly commented Mar 7, 2024

yarnabrina commented Mar 7, 2024

fkiraly commented Mar 7, 2024

yarnabrina commented Mar 7, 2024 • edited

fkiraly commented Mar 7, 2024

yarnabrina commented Mar 7, 2024

geetu040 commented Mar 7, 2024

geetu040 commented Mar 7, 2024 • edited

fkiraly commented Mar 7, 2024

geetu040 commented Mar 8, 2024

yarnabrina left a comment

Choose a reason for hiding this comment

geetu040 commented Mar 8, 2024

yarnabrina commented Mar 20, 2024

fkiraly commented Mar 20, 2024

[ENH] `NeuralForecastRNN` should auto-detect `freq` #6039

[ENH] `NeuralForecastRNN` should auto-detect `freq` #6039

fkiraly commented Mar 4, 2024 •

edited

fkiraly commented Mar 6, 2024 •

edited

yarnabrina commented Mar 7, 2024 •

edited

fkiraly commented Mar 7, 2024 •

edited

yarnabrina commented Mar 7, 2024 •

edited

geetu040 commented Mar 7, 2024 •

edited