Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] NeuralForecastRNN should auto-detect freq #6039

Merged
merged 8 commits into from Mar 20, 2024

Conversation

geetu040
Copy link
Contributor

@geetu040 geetu040 commented Mar 1, 2024

Enhances NeuralForecastRNN to interpret freq from ForecastingHorizon when passed as "auto"

Reference Issues/PRs

Fixes #6003.

What does this implement/fix? Explain your changes.

The NeuralForecastRNN constructor previously required a freq argument, which is now proposed to default to "auto" in which case it interprets freq from ForecastingHorizon, leveraging fh.freq in the fit method.

What should a reviewer concentrate their feedback on?

I have run the tests with the updated estimator

results = check_estimator(NeuralForecastRNN) # All tests PASSED!

freq can now be passed like this:

y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X, test_size=4)

model = NeuralForecastRNN(
	"auto",	# interprets to be "A-DEC"
	futr_exog_list=["ARMED", "POP"], max_steps=5)

model.fit(y_train, X=X_train, fh=[1, 2, 3, 4])

model.predict(X=X_test)
# Seed set to 1
# 1959    66241.984375
# 1960    66700.132812
# 1961    66550.195312
# 1962    67310.007812
# Freq: A-DEC, Name: TOTEMP, dtype: float64

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • Optionally, I've added myself and possibly others to the CODEOWNERS file - do this if you want to become the owner or maintainer of an estimator you added.
    See here for further details on the algorithm maintainer role.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

Enhances `NeuralForecastRNN` to interpret `freq` from `ForecastingHorizon` when passed as `"auto"`
@fkiraly fkiraly added module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting enhancement Adding new functionality labels Mar 1, 2024
Copy link
Collaborator

@yarnabrina yarnabrina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. Ive made a few comments, please take a look.

Also, can you please add a test for this case? We should test for few standard datasets, and see what happens with bad frequencies (e.g. W or MS) or with RangeIndex.

If all passes, we should also update the get test params to use auto instead of D.

sktime/forecasting/base/adapters/_neuralforecast.py Outdated Show resolved Hide resolved
sktime/forecasting/base/adapters/_neuralforecast.py Outdated Show resolved Hide resolved
sktime/forecasting/base/adapters/_neuralforecast.py Outdated Show resolved Hide resolved
sktime/forecasting/base/adapters/_neuralforecast.py Outdated Show resolved Hide resolved
@geetu040
Copy link
Contributor Author

geetu040 commented Mar 3, 2024

I have updated the code amid requested changes except for this #6039 (comment)


#6039 (review)

see what happens with bad frequencies (e.g. W or MS)

Works fine with freq="W"

freq = 'W'
data = pd.Series(
	data=np.random.randn(10),
	index=pd.date_range('2022-01-01', periods=10, freq=freq)
)
model = NeuralForecastRNN("auto", max_steps=2)
model.fit(data, fh=[1, 2, 3, 4])
pred = model.predict()
print("Correctly Interpreted:", pred.index.freq == freq)
print(pred)
# Correctly Interpreted: True
# 2022-03-13   -0.519103
# 2022-03-20   -0.556819
# 2022-03-27   -0.517274
# 2022-04-03   -0.551678
# Freq: W-SUN, dtype: float64

with freq="MS", This same errors persists before and after changes

freq = 'MS'
data = pd.Series(
	data=np.random.randn(10),
	index=pd.date_range('2022-01-01', periods=10, freq=freq)
)
model = NeuralForecastRNN("auto", max_steps=2)
model.fit(data, fh=[1, 2, 3, 4])
pred = model.predict()
# ValueError: Invalid frequency. Please select a frequency that can be converted to a regular `pd.PeriodIndex`. For other frequencies, basic arithmetic operation to compute durations currently do not work reliably.

#6039 (review)

If all passes, we should also update the get test params to use auto instead of D.

Test cases using check_estimator fail when auto is used instead of D. Do I need to do something about this?

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 3, 2024

Test cases using check_estimator fail when auto is used instead of D. Do I need to do something about this?

I'd say, yes - why does it not detect D, in the test cases? This hints at "auto" not working as intended.

@yarnabrina
Copy link
Collaborator

Can you check where is this error (corresponding to MS) originating from in the traceback? Is it from somewhere in sktime code or somewhere from neuralforecast code? If we use neuralforecast directly without using sktime adapter, does it work?

If auto fails in test params, does it fail with the above period index error or with the error you generate? If it's the first case, I'll say it's an issue. If the second case, this is probably failing dor range index or index cases.

@geetu040
Copy link
Contributor Author

geetu040 commented Mar 4, 2024

I have tested all the frequencies found in the documentation here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
and this is what I've found


All the frequencies that have been working before are still working (correctly interpreted) with freq="auto". This infers that there is ostensibly nothing wrong with freq="auto". I have reached my conclusion by using the same method I have introduced in test case.
Although I am yet to find why can't it interpret the freq in test_params when we replace D with auto.
And by the way these are the freqs I'm talking about: B C W D h min s ms us ns M Q Y W-SUN W-MON W-TUE W-WED W-THU W-FRI W-SAT


Some freqs raise error with and without freq="auto" and are same as before and after changes
These are the freqs I am talking about: MS BMS CBMS SMS QS BQS YS BYS bh cbh

These frequencies raise this error originating from _predict

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File [~/work/os/sktime/sktime/forecasting/base/_fh.py:953](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:953), in _coerce_to_period(x, freq)
    [952](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:952)     print("Yoooo ->", freq)
--> [953](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:953)     return x.to_period(freq)
    [954](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:954) except (ValueError, AttributeError) as e:

File [~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/indexes/extension.py:98](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/indexes/extension.py:98), in _inherit_from_data.<locals>.method(self, *args, **kwargs)
     [97](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/indexes/extension.py:97)     raise ValueError(f"cannot use inplace with {type(self).__name__}")
---> [98](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/indexes/extension.py:98) result = attr(self._data, *args, **kwargs)
     [99](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/indexes/extension.py:99) if wrap:

File [~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:1190](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:1190), in DatetimeArray.to_period(self, freq)
   [1188](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:1188)     freq = res
-> [1190](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:1190) return PeriodArray._from_datetime64(self._ndarray, freq, tz=self.tz)

File [~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:297](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:297), in PeriodArray._from_datetime64(cls, data, freq, tz)
    [284](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:284) """
    [285](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:285) Construct a PeriodArray from a datetime64 array
    [286](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:286) 
   (...)
    [295](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:295) PeriodArray[freq]
    [296](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:296) """
--> [297](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:297) data, freq = dt64arr_to_periodarr(data, freq, tz)
    [298](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:298) return cls(data, freq=freq)

File [~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:1032](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:1032), in dt64arr_to_periodarr(data, freq, tz)
   [1031](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:1031) freq = Period._maybe_convert_freq(freq)
-> [1032](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:1032) base = freq._period_dtype_code
   [1033](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py:1033) return c_dt64arr_to_periodarr(data.view("i8"), base, tz, reso=reso), freq

AttributeError: 'pandas._libs.tslibs.offsets.MonthBegin' object has no attribute '_period_dtype_code'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[4], [line 10](vscode-notebook-cell:?execution_count=4&line=10)
      [8](vscode-notebook-cell:?execution_count=4&line=8) # attempt train
      [9](vscode-notebook-cell:?execution_count=4&line=9) model.fit(y, fh=list(range(1, 100)))
---> [10](vscode-notebook-cell:?execution_count=4&line=10) pred = model.predict()
     [11](vscode-notebook-cell:?execution_count=4&line=11) pred

File [~/work/os/sktime/sktime/forecasting/base/_base.py:444](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_base.py:444), in BaseForecaster.predict(self, fh, X)
    [442](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_base.py:442) # we call the ordinary _predict if no looping/vectorization needed
    [443](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_base.py:443) if not self._is_vectorized:
--> [444](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_base.py:444)     y_pred = self._predict(fh=fh, X=X_inner)
    [445](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_base.py:445) else:
    [446](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_base.py:446)     # otherwise we call the vectorized version of predict
    [447](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_base.py:447)     y_pred = self._vectorize("predict", X=X_inner, fh=fh)

File [~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:301](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:301), in _NeuralForecastAdapter._predict(***failed resolving arguments***)
    [297](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:297)     raise NotImplementedError("Multiple prediction columns are not supported.")
    [299](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:299) model_point_predictions = model_forecasts[prediction_column_names[0]].to_numpy()
--> [301](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:301) absolute_horizons = self.fh.to_absolute_index(self.cutoff)
    [302](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:302) horizon_positions = self.fh.to_indexer(self.cutoff)
    [304](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:304) final_predictions = pandas.Series(
    [305](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:305)     model_point_predictions[horizon_positions],
    [306](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:306)     index=absolute_horizons,
    [307](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:307)     name=self._y.name,
    [308](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py:308) )

File [~/work/os/sktime/sktime/forecasting/base/_fh.py:512](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:512), in ForecastingHorizon.to_absolute_index(self, cutoff)
    [492](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:492) """Return absolute values of the horizon as a pandas.Index.
    [493](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:493) 
    [494](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:494) For a forecaster `f` that has `fh` being `self`,
   (...)
    [509](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:509)     Absolute representation of forecasting horizon.
    [510](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:510) """
    [511](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:511) cutoff = self._coerce_cutoff_to_index(cutoff)
--> [512](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:512) fh_abs = _to_absolute(fh=self, cutoff=_HashIndex(cutoff))
    [513](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:513) return fh_abs.to_pandas()

File [~/work/os/sktime/sktime/forecasting/base/_fh.py:885](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:885), in _to_absolute(fh, cutoff)
    [880](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:880)     old_tz = None
    [882](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:882) if is_timestamp:
    [883](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:883)     # coerce to pd.Period for reliable arithmetic operations and
    [884](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:884)     # computations of time deltas
--> [885](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:885)     cutoff = _coerce_to_period(cutoff, freq=fh.freq)
    [887](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:887) if isinstance(cutoff, pd.Index):
    [888](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:888)     cutoff = cutoff[[0] * len(relative)]

File [~/work/os/sktime/sktime/forecasting/base/_fh.py:957](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:957), in _coerce_to_period(x, freq)
    [955](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:955) msg = str(e)
    [956](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:956) if "Invalid frequency" in msg or "_period_dtype_code" in msg:
--> [957](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:957)     raise ValueError(
    [958](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:958)         "Invalid frequency. Please select a frequency that can "
    [959](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:959)         "be converted to a regular `pd.PeriodIndex`. For other "
    [960](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:960)         "frequencies, basic arithmetic operation to compute "
    [961](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:961)         "durations currently do not work reliably."
    [962](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:962)     )
    [963](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:963) else:
    [964](https://file+.vscode-resource.vscode-cdn.net/home/geetu/work/os/sktime/_/~/work/os/sktime/sktime/forecasting/base/_fh.py:964)     raise

ValueError: Invalid frequency. Please select a frequency that can be converted to a regular `pd.PeriodIndex`. For other frequencies, basic arithmetic operation to compute durations currently do not work reliably.

@yarnabrina
Copy link
Collaborator

That traceback looks very different to what I am used to, just curious which OS+IDE you are using? Is it a normal traceback or with some specific debugger?

Regarding your question, I am very surprised. Based on what @fkiraly said in the issue, I thought automatic frequency inference will not raise error. And even if it does, I'm not following why it'll fail in predict and not fit, because that's where you are trying to access freq attribute. Let's wait for @fkiraly to comment, and in the meanwhile can you please share me one code snippet from python console showing exactly where it falls and that full traceback?

One final thing, I really appreciate your testing for all supported frequencies. But there even more and they keep changing this quite frequently in different releases. So maybe we have to reduce what all frequencies to test with.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 4, 2024

Oh dear, do you have pandas 2.2.X installed perchance?
Is this perhaps related to the recent changes in freq handling and inference in pandas, see #6057 and #5841? FYI @MCRE-BE

This has a similar problem internally, when testing with freq="M". Internally, this converst to "MS" and then DatetimeIndex.to_period complains - only at 2.2.0 or higher.

@geetu040
Copy link
Contributor Author

geetu040 commented Mar 4, 2024

Ubuntu 22.04.4 LTS & VS Code Terminal
Here is the code snippet with full traceback

>>> import pandas as pd
>>> from sktime.forecasting.neuralforecast import NeuralForecastRNN
>>> pd.__version__
'2.1.4'
>>> y = pd.Series(data=range(10), index=pd.date_range(start='2022-01-01', periods=10, freq='MS'))
>>> model = NeuralForecastRNN('MS', max_steps=1)
>>> model.fit(y, fh=[1, 2, 3, 4, 5])
Seed set to 1
Epoch 0: 100%|█████████████████████████████████████| 1/1 [00:00<00:00, 27.92it/s, v_num=1, train_loss_step=1.160, train_loss_epoch=1.160]
NeuralForecastRNN(freq='MS', max_steps=1)                                                                                                
>>> model.predict()
Predicting DataLoader 0: 100%|████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 268.21it/s]
Traceback (most recent call last):
  File "/home/geetu/work/os/sktime/sktime/forecasting/base/_fh.py", line 952, in _coerce_to_period
    return x.to_period(freq)
           ^^^^^^^^^^^^^^^^^
  File "/home/geetu/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/indexes/extension.py", line 95, in method
    result = attr(self._data, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geetu/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py", line 1224, in to_period
    return PeriodArray._from_datetime64(self._ndarray, freq, tz=self.tz)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geetu/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py", line 322, in _from_datetime64
    data, freq = dt64arr_to_periodarr(data, freq, tz)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geetu/miniconda3/envs/ai/lib/python3.11/site-packages/pandas/core/arrays/period.py", line 1167, in dt64arr_to_periodarr
    base = freq._period_dtype_code
           ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'pandas._libs.tslibs.offsets.MonthBegin' object has no attribute '_period_dtype_code'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/geetu/work/os/sktime/sktime/forecasting/base/_base.py", line 444, in predict
    y_pred = self._predict(fh=fh, X=X_inner)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geetu/work/os/sktime/sktime/forecasting/base/adapters/_neuralforecast.py", line 281, in _predict
    absolute_horizons = self.fh.to_absolute_index(self.cutoff)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geetu/work/os/sktime/sktime/forecasting/base/_fh.py", line 512, in to_absolute_index
    fh_abs = _to_absolute(fh=self, cutoff=_HashIndex(cutoff))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geetu/work/os/sktime/sktime/forecasting/base/_fh.py", line 885, in _to_absolute
    cutoff = _coerce_to_period(cutoff, freq=fh.freq)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/geetu/work/os/sktime/sktime/forecasting/base/_fh.py", line 956, in _coerce_to_period
    raise ValueError(
ValueError: Invalid frequency. Please select a frequency that can be converted to a regular `pd.PeriodIndex`. For other frequencies, basic arithmetic operation to compute durations currently do not work reliably.
>>> 

@yarnabrina
Copy link
Collaborator

Thanks for sharing @geetu040. So sktime is trying to do something in predict and not in fit, even though you are not even passing anything in predict.

@fkiraly so it's pandas 2.1 only, this is very interesting (and unknown to me). This seems different from this PR's goal, although it's being affected.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 4, 2024

@fkiraly so it's pandas 2.1 only

Do you mean, the issue is only 2.1 and above?

@yarnabrina
Copy link
Collaborator

No I meant just in the sense it's not 2.2 as you thought may be the reason in previous comment. I'm not sure myself if pandas version is affecting here or not.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 4, 2024

I see - there have been oddities around M frequency for a while. Do you remember the other issue? I thought it might have even been opened by you.

@yarnabrina
Copy link
Collaborator

This one #5131?

With my own office hack/validation to ensure exclusive use of range index, I didn't even remember I created this.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 4, 2024

yes, exactly, #5131. Your hack makes me think, perhaps that's exactly what we should do internally, in ForecastingHorizon. There are too many issues and too frequent changes on how pandas handles frequencies, so internally we should not rely on it too much, imo.

Copy link
Collaborator

@yarnabrina yarnabrina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few more comments. One more thing to note is that #6047 is now merged, so it may be better if you pull the latest main branch and add similar changes to LSTM one as you did for RNN one. After that, you can also consider to parametrise your last test over different classes.

After you do these, let us know and let's run CI. If all tests pass, we can merge it soon. But probably it'll fail few due to freq things, so may be @fkiraly can suggest what's the next course of action here.

sktime/forecasting/base/adapters/_neuralforecast.py Outdated Show resolved Hide resolved
sktime/forecasting/tests/test_neuralforecast.py Outdated Show resolved Hide resolved
sktime/forecasting/tests/test_neuralforecast.py Outdated Show resolved Hide resolved
sktime/forecasting/tests/test_neuralforecast.py Outdated Show resolved Hide resolved
sktime/forecasting/tests/test_neuralforecast.py Outdated Show resolved Hide resolved
sktime/forecasting/base/adapters/_neuralforecast.py Outdated Show resolved Hide resolved
@geetu040
Copy link
Contributor Author

geetu040 commented Mar 5, 2024

Thanks for a detailed Review. I am on it.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re tests: could you kindly make sure that a test case with freq="auto" is added to get_test_params? I suppose this ought to be added both on RNN and LSTM.

@geetu040
Copy link
Contributor Author

geetu040 commented Mar 6, 2024

Re tests: could you kindly make sure that a test case with freq="auto" is added to get_test_params? I suppose this ought to be added both on RNN and LSTM.

I tried and discussed it here: #6039 (comment)

We can still not replace D with auto in freq for get_test_params because it fails the test case because of the Exception I raise here

and this

So my thoughts are that we should work on test cases that are failing because of this newly introduced Exception. Maybe we should alter them to check for this Exception when ForecastingHorizon cannot interpret the freq because of invalid indexing.

@fkiraly what are your thoughts?

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 6, 2024

@fkiraly what are your thoughts?

Does the suggestion of passing D if a RangeIndex is encountered not work? See here: #6039 (comment)

Maybe to clarify, my suggestion was that the adapter should do that, not the user.

@geetu040
Copy link
Contributor Author

geetu040 commented Mar 7, 2024

So once we tested and in case we find that freq is irrelevant for RangeIndex, can't we do as follows: if index is RangeIndex, just pass "D" so neuralforecast is happy?

Before applying this, I wanted to see the results with other index options supported by pandas. So, I've tried all the indexes that pandas support. Here is the list: https://pandas.pydata.org/docs/reference/api/pandas.Index.html

  1. Index
  2. DatetimeIndex
  3. PeriodIndex
  4. RangeIndex
  5. CategoricalIndex
  6. IntervalIndex
  7. MultiIndex
  8. TimedeltaIndex

sktime does not support series with these indexes:

  1. CategoricalIndex
  2. IntervalIndex
  3. MultiIndex

Using them will raise this Exception and this Exception is not limited to just neuralforecast, but it is raised with all forecasters.

y = pd.Series(
	data=range(10),
	index=pd.CategoricalIndex(np.random.randint(0, 3, size=10))
)
NaiveForecaster().fit(y, fh=[1, 2, 3])
# TypeError: Unsupported input data type in NaiveForecaster, input y must be in an sktime compatible format. Allowed scitypes for y in forecasting are Series, Panel, Hierarchical, for instance a pandas.DataFrame with sktime compatible time indices, or with MultiIndex and last(-1) level an sktime compatible time index. See the forecasting tutorial examples/01_forecasting.ipynb, or the data format tutorial examples/AA_datatypes_and_datasets.ipynb See the data format tutorial examples/AA_datatypes_and_datasets.ipynb. If you think the data is already in an sktime supported input format, run sktime.datatypes.check_raise(data, mtype) to diagnose the error, where mtype is the string of the type specification you want. Error message for checked mtypes, in format [mtype: message], as follows: [pd.DataFrame: y must be a pandas.DataFrame, found <class 'pandas.core.series.Series'>]  [pd.Series: <class 'pandas.core.indexes.category.CategoricalIndex'> is not supported for y, use one of (<class 'pandas.core.indexes.range.RangeIndex'>, <class 'pandas.core.indexes.period.PeriodIndex'>, <class 'pandas.core.indexes.datetimes.DatetimeIndex'>) or integer index instead.]  [np.ndarray: y must be a numpy.ndarray, found <class 'pandas.core.series.Series'>]  [df-list: y must be list of pd.DataFrame, found <class 'pandas.core.series.Series'>]  [numpy3D: y must be a numpy.ndarray, found <class 'pandas.core.series.Series'>]  [pd-multiindex: y must be a pd.DataFrame, found <class 'pandas.core.series.Series'>]  [nested_univ: y must be a pd.DataFrame, found <class 'pandas.core.series.Series'>]  [pd_multiindex_hier: y must be a pd.DataFrame, found <class 'pandas.core.series.Series'>] 

sktime supports series with these indexes:

  1. DatetimeIndex
  2. PeriodIndex
  3. RangeIndex
  4. Index

when freq="auto"

  • DatetimeIndex and PeriodIndex work as usual as their freq is interpreted
  • RangeIndex and Index have no role in freq

Their predictions are regardless of the given freq #6039 (comment)
but they do raise an Exception when freq is set to "auto" that I will change now to pass "D" freq instead of raising an Exception as suggested here: #6039 (comment)


From the above observations, I think we are safe to set freq as "D" when freq is set to "auto" and cannot be interpreted from ForecastingHorizon because the index is either RangeIndex or Index where arbitrary freq will work out just fine.
For this purpose, I'll have to change the code from

if self.freq == "auto" and fh.freq is None:
	# when freq cannot be interpreted from ForecastingHorizon
	raise ValueError(
		f"Error in {self.__class__.__name__}, "
		f"could not interpret freq, "
		f"try passing freq in model initialization"
	)

self._freq = fh.freq if self.freq == "auto" else self.freq

to

self._freq = (fh.freq or "D") if (self.freq == "auto") else (self.freq)

After applying the new changes locally I have run all the test cases, including test_params set to "auto" instead of "D" as discussed here:

Re tests: could you kindly make sure that a test case with freq="auto" is added to get_test_params? I suppose this ought to be added both on RNN and LSTM.

All the test were successful.
If everything looks fine, I'll commit my changes and push. Also I think I should remove test_neural_forecast_fail_with_auto_freq_on_range_index as it will not be relevant anymore and we have freq="auto" being tested in other test cases and get_test_params


One last caveat is that during the process I realized TimedeltaIndex behaves differently. I am not sure if it is supposed to behave like this or there is a something wrong. If there is a problem, let me know if I should investigate it further. But anyways, I am pretty sure it is irrelevant to this PR.
Here is the code to reproduce it

index = pd.timedelta_range(start='1 days', periods=10)
y = pd.Series(data=range(10), index=index)

model = NeuralForecastRNN("D", max_steps=1, trainer_kwargs={"logger": False})
model.fit(y, fh=[1, 2, 3])
# ValueError: The time column ('ds') should have either timestamps or integers, got 'timedelta64[ns]'.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 7, 2024

Hm, I think there are three cases we need to cover, not just two:

  • A. the index is time-like (period, timedelta) and has an inferrable freq
  • B. the index is time-like and does not have inferrable freq
  • C. the index is int-like, e.g., RangeIndex

I would have thought we treat things as follows: A passes freq as inferred from fh; C passes "D"; B raises the error.

Currntly, we always pass "D" if we cannot infer it.

Or, is an informative error still raised in case B?

@yarnabrina
Copy link
Collaborator

yarnabrina commented Mar 7, 2024

@geetu040 first of all, thank you for the detailed analysis you've done for both offset aliases and index choices, this is great.

DatetimeIndex and PeriodIndex work as usual as their freq is interpreted

I am somehow doubtful of this statement. If you create your DatetimeIndex using pandas.date_range, of course it's fine. But if you have a list of dates or strings and then use pandas.to_datetime, does it work always, regardless of missifates missing dates or gap between dates? I think this may be same as option B of @fkiraly above.

Currntly, we always pass "D" if we cannot infer it.

@fkiraly do you mean you want to pass D for both B and C cases, and pass indices as range index for B?

Maybe to clarify, my suggestion was that the adapter should do that, not the user.

I don't have a strong opinion, but I will prefer if we don't do this. It's one more condition based on how pandas index frequencies work. Also neuralforecast may change their behaviour in future as you said it's an odd design choice. Or, if you want to have it, we can try to have a "force-auto" option where we pass "D" always for B+C cases you told above.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 7, 2024

missifates

@yarnabrina, typo, and my brain's error correction is not good enough to decode this

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 7, 2024

@fkiraly do you mean you want to pass D for both B and C cases, and pass indices as range index for B?

Hm, no, that's not what I meant, but this could be a better idea!

What I meant is that B should raise an informative error, but your case handling logic for B might be better.

I don't have a strong opinion, but I will prefer if we don't do this.

Can you clarify what you are referring to by "this"? It seems there was slight miscommunication on our expectations what happens in case B.

As said, they way how you understood it was not how I meant it afaik, but I also think your version is better.

@geetu040
Copy link
Contributor Author

geetu040 commented Mar 7, 2024

About this B case, I am unable to create a such example.

If you create your DatetimeIndex using pandas.date_range, of course it's fine. But if you have a list of dates or strings and then use pandas.to_datetime, does it work always, regardless of missifates or gap between dates?

But I'll try this in code

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 7, 2024

I am unable to create a such example.

I would try irregular spaced time stamps, e.g., 9:00:00, 10:00:00, 12:00:00, 13:42:42, etc.

@yarnabrina
Copy link
Collaborator

missifates

@yarnabrina, typo, and my brain's error correction is not good enough to decode this

I meant missing dates, e.g. 1st jan, 3rd Jan, 4th Jan, 5th Jan, etc.

I don't have a strong opinion, but I will prefer if we don't do this.

Can you clarify what you are referring to by "this"?

I meant passing "D" if user do not pass any frequency and data has range index. So this is same as your original suggestion for C case.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 7, 2024

I meant passing "D" if user do not pass any frequency and data has range index. So this is same as your original suggestion for C case.

I see, what would be your preference in case C then, @yarnabrina? Always raising an exception?

@yarnabrina
Copy link
Collaborator

yarnabrina commented Mar 7, 2024

Yes. Unless user passes freq or fh specifically is able to detect one, I'll prefer to raise. Basically I don't want auto case to take benefit of a specific (somewhat questionable) design choice of neuralforecast in case it backfires in future (some change in logic in neuralforecast).

If you want to add that option, to make it simpler for users, another argument or force-auto type thing can be done. But of course this is just my personal opinion/preference.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 7, 2024

@yarnabrina, this solution might be better in isolation, but my concern is that the estimator would not satisfy the basic contract then that every forecaster runs with any RangeIndex series. How do we deal with that?

@yarnabrina
Copy link
Collaborator

RangeIndex is still supported, users just need to give a freq argument themselves.

As I said I don't have a strong opinion. I'm already happy with this PR as it is currently, and if this is added, all of working cases will remain unmodified so I don't have any feedback.

@geetu040
Copy link
Contributor Author

geetu040 commented Mar 7, 2024

  • B. the index is time-like and does not have inferrable freq

If we don't raise an error here, another Exception You must pass a freq argument as current index has none. will be raised anyways

I have tried with this scenario

If you create your DatetimeIndex using pandas.date_range, of course it's fine. But if you have a list of dates or strings and then use pandas.to_datetime, does it work always, regardless of missifates missing dates or gap between dates? I think this may be same as option B of @fkiraly above.

@geetu040
Copy link
Contributor Author

geetu040 commented Mar 7, 2024

The above comment will be more understandable from this


For Reference Code 1

self._freq = (fh.freq or "D") if (self.freq == "auto") else (self.freq)

For Reference Code 2

A-B-C conditions

No Error on these indexes from both codes

  • DatetimeIndex
  • PeriodIndex
  • PeriodIndex (Missing Days)
  • RangeIndex
  • RangeIndex (Missing Days)
  • Index
  • Index (Missing Days)

Behavior for both code differs here DatetimeIndex (Missing Days)

Model Exception as result of Code 1 Exception as result of Code 2
NeuralForecastRNN("auto") (predict) You must pass a freq argument as current index has none. (fit) Error in NeuralForecastRNN, could not interpret freq, try passing freq in model initialization
NeuralForecastRNN("D") (predict) You must pass a freq argument as current index has none. (predict) You must pass a freq argument as current index has none.
NaiveForecaster() (predict) You must pass a freq argument as current index has none. (predict) You must pass a freq argument as current index has none.

To simplify above table

Behavior for both code differs here DatetimeIndex (Missing Days)

  • NeuralForecastRNN("D") and NaiveForecaster() (for reference) raises this Excpetion on predict(): You must pass a freq argument as current index has none.
  • NeuralForecastRNN("auto")
    • Code 1 raises this Exception on predict(): You must pass a freq argument as current index has none.
    • Code 2 raises this Exception on fit(): Error in NeuralForecastRNN, could not interpret freq, try passing freq in model initialization

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 7, 2024

I see - so from this, are you making the case that code 1 would be preferable due to consistency?

Since code 1 is current, and @yarnabrina says he is fine with current, and tests pass at current - shall we just leave it at current? I would be happy with that, too.

@geetu040
Copy link
Contributor Author

geetu040 commented Mar 8, 2024

There are just some changes yet to make

  1. Update Code

I'll have to change the code from

if self.freq == "auto" and fh.freq is None:
	# when freq cannot be interpreted from ForecastingHorizon
	raise ValueError(
		f"Error in {self.__class__.__name__}, "
		f"could not interpret freq, "
		f"try passing freq in model initialization"
	)

self._freq = fh.freq if self.freq == "auto" else self.freq

to

self._freq = (fh.freq or "D") if (self.freq == "auto") else (self.freq)
  1. Remove test case for the ValueError Exception
  2. update get_test_params to have freq="auto" at one place instead of freq="D"

I have these changes locally and all the test cases run fine. I'll commit and push them soon as @yarnabrina approves

Copy link
Collaborator

@yarnabrina yarnabrina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since code 1 is current, and @yarnabrina says he is fine with current, and tests pass at current - shall we just leave it at current? I would be happy with that, too.

Approved on principle from my side.


@geetu040 I'm not sure why you need to change that code. I thought we all agreed that we'll not make a change to pass D in case of fh.freq fails with RangeIndex?

Please let me know if I misunderstood the above discussion.

@geetu040
Copy link
Contributor Author

geetu040 commented Mar 8, 2024

@geetu040 I'm not sure why you need to change that code. I thought we all agreed that we'll not make a change to pass D in case of fh.freq fails with RangeIndex?

Ok ok I misunderstood actually, but now I'm clear

@yarnabrina
Copy link
Collaborator

@fkiraly tagging in case you missed this PR before release.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 20, 2024

I did not miss, I simply assumed you were going to merge.

@fkiraly fkiraly merged commit d19fda1 into sktime:main Mar 20, 2024
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] NeuralForecastRNN should auto-detect freq
3 participants