Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pmdarima interface comparison #104

Closed
mloning opened this issue Jul 31, 2019 · 2 comments · Fixed by #218
Closed

pmdarima interface comparison #104

mloning opened this issue Jul 31, 2019 · 2 comments · Fixed by #218
Assignees
Labels
interfacing algorithms Interfacing existing algorithms/estimators from third party packages

Comments

@mloning
Copy link
Contributor

mloning commented Jul 31, 2019

Data container

  • pmdarima assumes for y (target) array of shape=(n_obs,) and for X (exogeneous) array of shape=(n_obs, n_features), crucially with time series observations in rows.
  • by contrast, we have a nested pandas Series or DataFrame, with rows representing iid samples, i.e. in the classical forecasting setting a single row, and columns representing different kinds of measurements (i.e. features), the format is nested because each cell may contain no longer only primitive types but also arrays or pandas Series of shape=(n_obs,),
  • we chose this data container because it allows to have a unified interface for different time series related learning settings including forecasting, but also panel/longitunidal data settings such as time series classification/regression or supervised/panel forecasting.

Forecasting horizon

  • we allow the forecasting horizon to be an array with the steps ahead, not only the number of steps ahead (n_periods), allowing for gaps e.g. predicting only the third and fifth step ahead with fh = np.array([3, 5]), may not make a difference for ARIMA, but will do for reduction strategies

Fit method

  • some methods already require the forecasting horizon in fit (e.g. non-dynamic reduction to time series regression where one time series regressor is fitted for each step of the forcasting horizon)
  • pmdarima currently takes the forecasting horizon only in predict
  • should we allow the forecasting horizon when passed in fit to be then changed in predict?

Update method

  • pmdarima adds the new data to the old data (kept in the model) and refits the model using the previously fitted params as start params,
  • is the new data always in the future?
  • alternatively, one could use the new data to update the already fitted parameters using Kalman smoothing/filtering, ideally only the new data is used so that the data seen in training does not have to be stored in the model?

In-sample forecasts

  • pmdarima has a separate method for in-sample forecasts
  • currently, we don't have a good way to do in-sample forecasts, our forecasting horizon is currently relative to the end of the series seen in training, allowing negative values would perhaps be one option?

Forecast confidence intervals

  • pmdarima returns intervals from predict method via an optional kwarg changing the output of predict
  • should we return intervals from a separate method instead to have a stable method signature for predict?
  • related to design discussion on probabilistic forecasting Design/implement forecast interval predictions #97

Composition

  • Pipelines for target variable vs TransformedTargetForecaster
@mloning mloning added the interfacing algorithms Interfacing existing algorithms/estimators from third party packages label Jul 31, 2019
@mloning mloning self-assigned this Jul 31, 2019
@tgsmith61591
Copy link

Great discussion points, definitely some we need to consider on pmdarima side to unify interfaces, though I'm not sure how we'd handle paneled data..

pmdarima adds the new data to the old data (kept in the model) and refits the model using the previously fitted params as start params,

FWIW a lot of the design decisions we've made are due to statsmodels' idiosyncrasies. This is a design decision I wish we could have done differently, but unless we move off of statsmodels as the underlying ARIMA implementation(we've had that discussion), I'm not sure if we can do anything about it ...

@mloning mloning linked a pull request Apr 18, 2020 that will close this issue
2 tasks
@mloning
Copy link
Contributor Author

mloning commented Apr 25, 2020

Closed by #218

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interfacing algorithms Interfacing existing algorithms/estimators from third party packages
Development

Successfully merging a pull request may close this issue.

2 participants