Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Timeseries Setup fails with some datasets #2068

Closed
Rjotsin opened this issue Jan 17, 2022 · 7 comments · Fixed by #2076
Closed

[BUG] Timeseries Setup fails with some datasets #2068

Rjotsin opened this issue Jan 17, 2022 · 7 comments · Fixed by #2076
Assignees
Labels
bug Something isn't working setup time_series Topics related to the time series

Comments

@Rjotsin
Copy link

Rjotsin commented Jan 17, 2022

Describe the bug

Hi team, discovered this bug when setting up a data set. Setup errors out for the pycaret-ts-alpha package. Happens on both Jupyter and Databricks.

Link to dataset included at the bottom, monthly timeseries three years long. My goal was to use the theta model with this, but it seems to be erroring out with the arima package.

ValueError: There are no more samples after a first-order seasonal differencing. See http://alkaline-ml.com/pmdarima/seasonal-differencing-issues.html for a more in-depth explanation and potential work-arounds.

To Reproduce

from pycaret.time_series import setup
import pandas as pd

df = pd.read_csv('pycaret_error_data.csv')

df['Date'] = pd.to_datetime(df['Date'] , infer_datetime_format=True)
df = df.set_index('Date').resample('M').sum()

setup(data=df, fh=3, fold=3, session_id=123, n_jobs=1, seasonal_period='M', system_log=False)

Expected behavior

Note, Setup states it was successful, but errors out right after.

image
image

Additional context

csv file: https://drive.google.com/file/d/171fezHPPrtavyJ71hihINV4bRhIL3GEl/view?usp=sharing

@Rjotsin Rjotsin added the bug Something isn't working label Jan 17, 2022
@ngupta23
Copy link
Collaborator

You have very few data points and are trying to model ARIMA with a seasonality of 12 as well as 3 fold cross validation. There are not enough data point to model it like this. Maybe disable cross validation to see if it helps.

@ngupta23 ngupta23 added time_series Topics related to the time series and removed bug Something isn't working labels Jan 17, 2022
@Rjotsin
Copy link
Author

Rjotsin commented Jan 18, 2022

I did try many combinations of fh and folds and sadly no results. I was planning on just using theta, not arima, is there any way to avoid the arima validation steps?

@moezali1
Copy link
Collaborator

@ngupta23 In this case why would setup fail. No model training is done at this stage. no?

@ngupta23
Copy link
Collaborator

ngupta23 commented Jan 18, 2022

It does some checks in setup for the value of d and D. I am guessing it is failing there. I will need to look into it in detail and get back.

@Rjotsin Can you provide me the complete error message?

@ngupta23 ngupta23 added bug Something isn't working setup labels Jan 18, 2022
@ngupta23 ngupta23 added this to To do in Time Series Forecasting via automation Jan 18, 2022
@ngupta23 ngupta23 added this to the Time Series Gamma Release milestone Jan 18, 2022
@Rjotsin
Copy link
Author

Rjotsin commented Jan 18, 2022

@ngupta23 here is the complete output
timeseries_pycaret_error_output.txt

@ngupta23
Copy link
Collaborator

@Rjotsin I have made a fix and submitted a PR to the local development branch time_series. The fix will go out in the next release of the time series module. In the meantime, you can install from Git directly (once the associated PR is approved and merged to the time_series branch).

Instructions for installing the time series branch.

# Create a Conda env
>>> conda create -n pycaret_ts_git_install python=3.8  (you can name it whatever you want)
>>> conda activate pycaret_ts_git_install

# Install packages
>>> pip install -U git+https://github.com/pycaret/pycaret.git@time_series
>>> pip install -U sktime
>>> pip install -U pmdarima

@moezali1
Copy link
Collaborator

Thank you so much @ngupta23

ngupta23 added a commit that referenced this issue Jan 21, 2022
Fix for short time series - recommended_D test failure  closes #2068
Time Series Forecasting automation moved this from To do to Done Jan 22, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working setup time_series Topics related to the time series
Development

Successfully merging a pull request may close this issue.

3 participants