Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same model from multiple libraries #2323

Open
ngupta23 opened this issue Mar 23, 2022 · 4 comments · Fixed by #2552
Open

Same model from multiple libraries #2323

ngupta23 opened this issue Mar 23, 2022 · 4 comments · Fixed by #2552
Labels
enhancement New feature or request models time_series Topics related to the time series
Milestone

Comments

@ngupta23
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

In time series, many libraries provide the same models. Example,

AutoARIMA is available from pmdarima and a faster variant is available through statsforecast that now has a plugin in sktime.
Similarly, there are plans to probably add AutoARIMA from statsforecast to darts as well.

from sktime.forecasting.arima import AutoARIMA  # from pmdarima
from sktime.forecasting.statsforecast import AutoARIMA  # from statsforecast

Describe the solution you'd like

Provide a way for users to choose the engine they would like to use for the model. One example could be

create_model("auto_arima", aggregator="sktime", engine="pmdarima")
create_model("auto_arima", aggregator="sktime", engine="statsforecast")

In the future if darts is available as a plugin in sktime, we could do something like this

create_model("auto_arima", aggregator="darts", engine="pmdarima")
create_model("auto_arima", aggregator="darts", engine="statsforecast")

NOTE: The aggregator has to be available as a sktime extention for this to work.

Additional context
This is similar to the engine concept in R/tidymodels, although we are separating the aggregator from the

However, now that the type of model has been specified, a method for fitting or training the model can be stated using the engine. The engine value is often a mash-up of the software that can be used to fit or train the model as well as the estimation method. For example, to use ordinary least squares, we can set the engine to be lm

@ngupta23 ngupta23 added enhancement New feature or request time_series Topics related to the time series models labels Mar 23, 2022
@ngupta23 ngupta23 added this to the pycaret 3.1.0 milestone Mar 23, 2022
@ngupta23 ngupta23 added this to To do in Time Series Forecasting via automation Mar 23, 2022
@ngupta23
Copy link
Collaborator Author

@moezali1 @Yard1 here are my thoughts about this topic. Feedback is welcome.

@Yard1
Copy link
Member

Yard1 commented Apr 2, 2022

This sounds good, though I would be leaning towards making that a setup option.

@ngupta23
Copy link
Collaborator Author

ngupta23 commented Apr 2, 2022

I was thinking that we will have a default priority list of aggregators and engines. But users will be able to provide their own priority to override that. The default list of engine priorities would be stored in the model container itself. The default list of aggregator priorities would be an input to setup and possibly overwritten in compare_models through a user accepted argument.

I will draft the interface for review.

@ngupta23
Copy link
Collaborator Author

ngupta23 commented May 14, 2022

@moezali1 @Yard1 @FedericoGarza PR has been submitted for implementing this without the aggregator (since we only have sktime for now)

Usage

Global settings (using setup)

from sktime.forecasting.arima import AutoARIMA as PmdAutoARIMA
from sktime.forecasting.statsforecast import StatsForecastAutoARIMA

#### Set engine globally ----
exp.setup(data=data, engines={"auto_arima": "statsforecast"})

#### Produces a statsforecast model ----
model = exp.create_model("auto_arima")
assert isinstance(model, StatsForecastAutoARIMA)

exp.compare_models(include=["auto_arima"])
assert isinstance(model, StatsForecastAutoARIMA)

Local Changes

#### default auto_arima engine is pmdarima for now ----
exp.setup(data=data)

#### Produces a pmdarima model ----
model = exp.create_model("auto_arima")
assert isinstance(model, PmdAutoARIMA)

#### Override model engine locally ----
model = exp.create_model("auto_arima", engine="statsforecast")
assert isinstance(model, StatsForecastAutoARIMA)
# Original engine should remain the same (since the above changes was local to the call only)
assert exp.get_engine("auto_arima") == "pmdarima"
model = exp.create_model("auto_arima")
assert isinstance(model, PmdAutoARIMA)

# You can repeat the same for compare_models as well

Notes;

  1. Currently statsforecast is an optional library. Once we have thoroughly vetted functionality in pycaret, we can make it a required dependency and switch this to be the default for auto_arima.
  2. For some reason, on the airline dataset, pmdarima produces better results than statsforecast and also takes less time. More vetting is needed. The time issue can be tracked here: Slow first training with cross-validation Nixtla/statsforecast#177
  3. There seems to be an issue in statsforecast when uninstalled, so we can potentially see issues with the optional dependency checks in pycaret and sktime. See Uninstalling statsforecast does not completely uninstall module? Nixtla/statsforecast#118

@ngupta23 ngupta23 linked a pull request May 15, 2022 that will close this issue
13 tasks
ngupta23 added a commit that referenced this issue May 22, 2022
Implements engines per #2323 (without aggregators)
goodwanghan pushed a commit to goodwanghan/pycaret that referenced this issue May 29, 2022
goodwanghan pushed a commit to goodwanghan/pycaret that referenced this issue May 29, 2022
@ngupta23 ngupta23 modified the milestones: pycaret 3.1.0, pycaret 3.0.0 Aug 2, 2022
@ngupta23 ngupta23 modified the milestones: pycaret 3.0.0, pycaret 3.1.0 Nov 9, 2022
@ngupta23 ngupta23 modified the milestones: 3.1.2, 3.2.1 Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request models time_series Topics related to the time series
Development

Successfully merging a pull request may close this issue.

2 participants