Issue with forecaster.update on multi indexed data frame #5833

ManuB68 · 2024-01-25T15:24:13Z

ManuB68
Jan 25, 2024

I am trying to update a forecaster but it doesn't seem to work on my multi indexed data frame:

python code:

df = pd.DataFrame
df = pd.DataFrame([['A', '2024-01-01', 1, 11], ['A', '2024-01-02', 2, 12], ['A', '2024-01-03', 3, 13], ['A', '2024-01-04', 4, 14],\
                   ['B', '2024-01-01', 101, 1], ['B', '2024-01-02', 102, 2], ['B', '2024-01-03', 103, 3], ['B', '2024-01-04', 104, 4],\
                    ['C', '2024-01-01', 10000, 10], ['C', '2024-01-02', 20000, 20], ['C', '2024-01-03', 30000, 30], ['C', '2024-01-04', 40000, 43]],\
                  columns=['Patients', 'Dates', 'M1', 'M2'])

df['Dates']= pd.to_datetime(df['Dates'])

df = df.reset_index(drop = True)
df = df.set_index(["Patients", "Dates"])

y_train, y_test = temporal_train_test_split(df, test_size=1)

fh = 1

forecaster = VAR() 

forecaster.fit(y_train)
y_pred = forecaster.predict(fh=fh)


Day = y_test.index.get_level_values("Dates")[-1]
y_update = y_test[y_test.index.get_level_values("Dates") == Day]

forecaster.update(y_update)

y_pred_updated = forecaster.predict(fh)

Code might look overcomplicated, due to extraction from a code iterating on dates and my poor knowledge of python
The execution fails on line:
forecaster.update(y_update)

with this message:
ValueError: Need at least 3 dates to infer frequency

Answered by yarnabrina

Jan 30, 2024

Hi @ManuB68, in your example, y_test has only 1 row for each patient (coming from test_size=1). That's small and less that 3 which pandas wants for frequency interpretation. sktime fits the model separately for each patient (for each of the lowest level of hierarchy), so it's 1 and not the shape of entire y_test.

I also use the similar format for my own office work, hundreds of series identifiers (e.g. store number), and update works. If you find a specific example where it works normally but just adding multiindex causes failure, please provide a minimal reproducible example with dummy values in the issue #5853.

View full answer

fkiraly · 2024-01-28T16:01:16Z

fkiraly
Jan 28, 2024
Maintainer

This looks like a genuine bug, I am moving it to the bug tracker.

2 replies

ManuB68 Jan 29, 2024
Author

Thank you

fkiraly Jan 30, 2024
Maintainer

issue is here: #5853

yarnabrina · 2024-01-29T16:05:45Z

yarnabrina
Jan 29, 2024
Collaborator

@ManuB68 This is a bug/issue for sure, but this is not caused by multi-index. I use multi-index data with update in our production work as well, and it works fine. The error is caused because y_test in your example is very small, that's why it's complaining about frequency inference problems.

Whether it's a bug that we can handle is something we can discuss in the issue @fkiraly created, but to handle your issue, you can consider using a range index instead of datetime index. I use that to handle some other month/week seasonality issues as well, and it does not face this problem. Here's an example:

>>> 
>>> import pandas
>>> from sktime.forecasting.naive import NaiveForecaster
>>> from sktime.split import temporal_train_test_split
>>> 
>>> model = NaiveForecaster()
>>> 
>>> # datetime index
>>> sample_data = pandas.DataFrame(
...     data={"M1": [1, 2, 3, 4], "M2": [11, 12, 13, 14]},
...     index=pandas.to_datetime(["2024-01-01", "2024-01-02", "2024-01-03", "2024-01-04"]),
... )
>>> 
>>> y_train, y_test = temporal_train_test_split(sample_data, test_size=1)
>>> 
>>> # fails even with non-multi-index data
>>> model_1 = model.clone()
>>> 
>>> model_1.fit(y_train)
NaiveForecaster()
>>> model_1.update(y_test)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/anirban/sktime-fork/sktime/forecasting/base/_base.py", line 886, in update
    self._update_y_X(y_inner, X_inner)
  File "/home/anirban/sktime-fork/sktime/forecasting/base/_base.py", line 1588, in _update_y_X
    self._set_cutoff_from_y(y)
  File "/home/anirban/sktime-fork/sktime/forecasting/base/_base.py", line 1643, in _set_cutoff_from_y
    cutoff_idx = get_cutoff(y, self.cutoff, return_index=True)
  File "/home/anirban/sktime-fork/sktime/datatypes/_utilities.py", line 296, in get_cutoff
    return sub_idx(obj.index, ix) if return_index else obj.index[ix]
  File "/home/anirban/sktime-fork/sktime/datatypes/_utilities.py", line 282, in sub_idx
    res.freq = pd.infer_freq(idx)
  File "/home/anirban/conda-environments/sktime/lib/python3.10/site-packages/pandas/tseries/frequencies.py", line 155, in infer_freq
    inferer = _FrequencyInferer(index)
  File "/home/anirban/conda-environments/sktime/lib/python3.10/site-packages/pandas/tseries/frequencies.py", line 189, in __init__
    raise ValueError("Need at least 3 dates to infer frequency")
ValueError: Need at least 3 dates to infer frequency
>>> 
>>> # fails even during fit
>>> model_2 = model.clone()
>>> 
>>> model_2.fit(y_test)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/anirban/sktime-fork/sktime/forecasting/base/_base.py", line 369, in fit
    self._update_y_X(y_inner, X_inner)
  File "/home/anirban/sktime-fork/sktime/forecasting/base/_base.py", line 1588, in _update_y_X
    self._set_cutoff_from_y(y)
  File "/home/anirban/sktime-fork/sktime/forecasting/base/_base.py", line 1643, in _set_cutoff_from_y
    cutoff_idx = get_cutoff(y, self.cutoff, return_index=True)
  File "/home/anirban/sktime-fork/sktime/datatypes/_utilities.py", line 296, in get_cutoff
    return sub_idx(obj.index, ix) if return_index else obj.index[ix]
  File "/home/anirban/sktime-fork/sktime/datatypes/_utilities.py", line 282, in sub_idx
    res.freq = pd.infer_freq(idx)
  File "/home/anirban/conda-environments/sktime/lib/python3.10/site-packages/pandas/tseries/frequencies.py", line 155, in infer_freq
    inferer = _FrequencyInferer(index)
  File "/home/anirban/conda-environments/sktime/lib/python3.10/site-packages/pandas/tseries/frequencies.py", line 189, in __init__
    raise ValueError("Need at least 3 dates to infer frequency")
ValueError: Need at least 3 dates to infer frequency
>>> 
>>> # range index
>>> sample_data = pandas.DataFrame(data={"M1": [1, 2, 3, 4], "M2": [11, 12, 13, 14]})
>>> 
>>> y_train, y_test = temporal_train_test_split(sample_data, test_size=1)
>>> 
>>> # works
>>> model_3 = model.clone()
>>> 
>>> model_3.fit(y_train)
NaiveForecaster()
>>> model_3.update(y_test)
/home/anirban/sktime-fork/sktime/forecasting/base/_base.py:1928: UserWarning: NotImplementedWarning: NaiveForecaster does not have a custom `update` method implemented. NaiveForecaster will be refit each time `update` is called with update_params=True. To refit less often, use the wrappers in the forecasting.stream module, e.g., UpdateEvery.
  warn(
/home/anirban/sktime-fork/sktime/forecasting/base/_base.py:1928: UserWarning: NotImplementedWarning: NaiveForecaster does not have a custom `update` method implemented. NaiveForecaster will be refit each time `update` is called with update_params=True. To refit less often, use the wrappers in the forecasting.stream module, e.g., UpdateEvery.
  warn(
NaiveForecaster()
>>> 
>>> # works
>>> model_4 = model.clone()
>>> 
>>> model_4.fit(y_test)
NaiveForecaster()
>>>

3 replies

ManuB68 Jan 29, 2024
Author

Hi @yarnabrina ,

Thank you for your reply.

What do you mean by small y ?

Only 1 day ?
Only 4 days per index ?
In this later case, I reported here with a very small example since I can't share patients data, but it didn't work with 1500 patients over more than 1000 days...so I don't think it's the issue here.

I think you are right when you say it's linked to the Date index; I'll try to replace the Date with timestamp then

yarnabrina Jan 30, 2024
Collaborator

Hi @ManuB68, in your example, y_test has only 1 row for each patient (coming from test_size=1). That's small and less that 3 which pandas wants for frequency interpretation. sktime fits the model separately for each patient (for each of the lowest level of hierarchy), so it's 1 and not the shape of entire y_test.

I also use the similar format for my own office work, hundreds of series identifiers (e.g. store number), and update works. If you find a specific example where it works normally but just adding multiindex causes failure, please provide a minimal reproducible example with dummy values in the issue #5853.

Answer selected by yarnabrina

ManuB68 Jan 30, 2024
Author

Hi @yarnabrina
You are right indeed. It works when the update contains at least 3 rows(dates) both in single index or multi index.
@fkiraly I am anyhow surprised. I thought the update method would allow to update the model on a day to day basis, allowing also recursive evaluation!

I resolved the issue by extending y_train progressively and refitting the whole model.

Thank you both for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with forecaster.update on multi indexed data frame #5833

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Issue with forecaster.update on multi indexed data frame #5833

ManuB68 Jan 25, 2024

Replies: 2 comments · 5 replies

fkiraly Jan 28, 2024 Maintainer

ManuB68 Jan 29, 2024 Author

fkiraly Jan 30, 2024 Maintainer

yarnabrina Jan 29, 2024 Collaborator

ManuB68 Jan 29, 2024 Author

yarnabrina Jan 30, 2024 Collaborator

ManuB68 Jan 30, 2024 Author

ManuB68
Jan 25, 2024

Replies: 2 comments 5 replies

fkiraly
Jan 28, 2024
Maintainer

ManuB68 Jan 29, 2024
Author

fkiraly Jan 30, 2024
Maintainer

yarnabrina
Jan 29, 2024
Collaborator

ManuB68 Jan 29, 2024
Author

yarnabrina Jan 30, 2024
Collaborator

ManuB68 Jan 30, 2024
Author