Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to provide a tutorial of some sort to implement AutoML variants of ETS and prophet for univariate forecasting #40

Closed
riteshchhetri10 opened this issue Oct 29, 2021 · 9 comments
Assignees

Comments

@riteshchhetri10
Copy link

riteshchhetri10 commented Oct 29, 2021

Hey, I had been going through the paper "Merlion: A Machine Learning Library for Time Series" and I came across AutoML variants of ETS and prophet models for univariate forecasting. It would be of great help if you could show some tutorial for implementing them, on a simple univariate dataset like the "air-passengers dataset".
I have also tried the AutoSarima for the same dataset from the merlion.models.automl module. But it gives very large errors compared to the auto_arima model from the pmdarima library and even basic statsmodel.tsa SARIMAX methods.
What could be the reason , given that the air-passengers dataset isn't very complicated to forecast?

my code for autosarima.

max_iter = [10,20,50,100,200,400,1000]
list_autosarima_merlion_models = []  #stores all models with diff parameters
parameters_autosarima_merlion_models = [] #stores different params used for diff models

for mi in max_iter:
    config1 = AutoSarimaConfig(max_forecast_steps=len(test_df), order=("auto", "auto", "auto"),
                           seasonal_order=("auto", "auto", "auto", 12), approximation=True, maxiter=mi)
    model1  = SeasonalityLayer(model = AutoSarima(model = Sarima(config1)))
    train_pred, train_err = model1.train(train_df_merlion, train_config={"enforce_stationarity": True,"enforce_invertibility": True})
    list_autosarima_merlion_models.append(model1)
    parameters_autosarima_merlion_models.append(f'{mi} maximum iterations')

Link to the paper that I had gone through.
https://arxiv.org/abs/2109.09265

@aadyotb
Copy link
Contributor

aadyotb commented Oct 29, 2021

@chenghaoliu89, can you investigate this issue with AutoSARIMA?

@riteshchhetri10 re: AutoProphet and AutoETS, you may see the linked API docs. Prophet has a parameter add_seasonality which can be set to "auto", and ETS has parameter seasonal_periods which can be set to "auto" as well to enable auto seasonality detection. We plan to integrate these more directly into the AutoML framework and add the tutorials you requested in a future release.

@aadyotb
Copy link
Contributor

aadyotb commented Oct 29, 2021

@riteshchhetri10, would you mind including a code snippet of how you are computing the error as well?

@riteshchhetri10
Copy link
Author

from merlion.evaluate.forecast import ForecastMetric

test_df_merlion = TimeSeries.from_pd(test_df)
test_pred, test_err = model.forecast(len(test_df))
rmse = ForecastMetric.RMSE.value(ground_truth=test_df_merlion, predict=test_pred)
mae = ForecastMetric.MAE.value(ground_truth=test_df_merlion, predict=test_pred)

#Here the test_df is just a pandas dataframe containing the univariate air-passengers dataset. (~44 rows )
#The RMSE values were used for comparison between pmd autoarima, statsmodel.tsa.SARIMAX, and kats.models.sarima models.

Merlion AutoSarima did not give good results based on RMSE values for the air-passengers dataset.

@chenghaoliu89
Copy link
Contributor

@chenghaoliu89, can you investigate this issue with AutoSARIMA?

@riteshchhetri10 re: AutoProphet and AutoETS, you may see the linked API docs. Prophet has a parameter add_seasonality which can be set to "auto", and ETS has parameter seasonal_periods which can be set to "auto" as well to enable auto seasonality detection. We plan to integrate these more directly into the AutoML framework and add the tutorials you requested in a future release.

@riteshchhetri10 please remove SeasonalityLayer as follows:

    config1 = AutoSarimaConfig(max_forecast_steps=len(test_df), order=("auto", "auto", "auto"),
                           seasonal_order=("auto", "auto", "auto", 12), approximation=True, maxiter=mi)
    model1  =  AutoSarima(model = Sarima(config1))

If you wrap up the SeasonalityLayer, even you have specified the periodicity=12, it will automatically detect the periodicity. For the periodicity detection module, we use a more strict confidence level 0.975, which would lead to periodicity =1 in this dataset. Therefore, the results are not good, since the predictions do not include any period pattern. If we relax it to 0.95, it will output 12 as our expected.

@aadyotb I think we can change the API and expose this confidence level parameters in periodicity detection to the user, what do you think?

@riteshchhetri10
Copy link
Author

@chenghaoliu89 You are right, I removed the SeasonalityLayer and the RMSE dropped significantly. Although, the RMSE values are still higher compared to kats.models.prophet or statsmodels.tsa.Sarima models. Thanks for the help. I am closing this issue.

@aadyotb
Copy link
Contributor

aadyotb commented Nov 2, 2021

@chenghaoliu89 sounds good. Can you create a PR adding the confidence parameter to the SeasonalityLayer API?

@chenghaoliu89
Copy link
Contributor

@chenghaoliu89 You are right, I removed the SeasonalityLayer and the RMSE dropped significantly. Although, the RMSE values are still higher compared to kats.models.prophet or statsmodels.tsa.Sarima models. Thanks for the help. I am closing this issue.

@riteshchhetri10 Would you mind sharing the code for comparison with statsmodel and kats? I might help you diagnosis the reason if I can reproduce the RMSE results.

@riteshchhetri10
Copy link
Author

@chenghaoliu89

Please see the kaggle notebook containing the entire code which builds the following models and tracks the RMSE for them.
a) statsmodel.tsa SARIMAX model
b) kats.prophet model
c) merlion autosarima model
d) pmdautoarima model

You can simply upload the air passengers dataset available on Kaggle and run all cells to get a data frame with the name of the model and RMSE values along with the parameters used.

Link to kaggle notebook
https://www.kaggle.com/ritesh11/statsmodel-kats-merlion-pmdautoarima

resulting table

df_scores

What could be the reason that merlion autosarima does not achieve similar scores on the RMSE metric?

Also could you please elaborate on how do we obtain the parameters of the best model chosen via merlion.autosarima?
Basically getting the (p,d,q) values of the best model that is obtained.

@chenghaoliu89
Copy link
Contributor

@riteshchhetri10
Your can print debug level info logging.basicConfig(level=logging.DEBUG) to display the details for hyper-parameter search. In your example, the automl procedure for Merlion is as follows:
image
Compared to that, you can also print the details of pmdarima by model = pm.auto_arima(train_df, seasonal=True, m=12, trace=True), the automl procedure is as follows:
image

We can find that the key difference is the detected difference order is different. I check the detailed implementation, both of Merlion and pmdarima use KPSS test to choose the difference order. In Merlion, we directly call it from statsmodel https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.kpss.html. The difference is Merlion now uses a trend regression model as default setting for the KPSS test by setting regression = ct, while pmdarima uses a level regression model as a default setting. Even we use the trend regression, the confidence to set the difference order to zero is not high. I think we can change the default setting to level regression, since this is the default setting from the original R implementation (https://rdrr.io/cran/forecast/src/R/unitRoot.R). Besides, It is hard to say which one is better, we can expose it to user to choose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants