Basic Hyperparameter Tuning for Time Series Models in PyCaret #1791

ngupta23 · 2021-11-01T12:59:38Z

ngupta23
Nov 1, 2021
Maintainer

Step 1: Create a simple baseline model

Before we start to tune a model, we need to create a simple baseline model first. Please refer to this post on how you can do that. We show a simple reduced regression model below which uses linear regression to model the monthly airline dataset. A complete notebook example for tuning can be found here.

model = exp.create_model("lr_cds_dt")
exp.plot_model(model)

#### What are the model hyper-parameters? ----
model

The model uses a degree of 1 to detrend the data set before fitting. It also uses a seasonal_period of 1 by default with a window length of 10 (used to extract lagged features. i.e. uses up to 10 lags from the past).

Step 2: Tune the model

Random Grid Search

By default, pycaret performs a random grid search using a predefined internal grid. This default internal grid uses sensible defaults (such as seasonal_period) obtained during the setup.

#### Tune the model (Default = Random Grid Search)
tuned_model_random = exp.tune_model(model)
exp.plot_model(tuned_model_random)

#### What are the tuned model hyper-parameters? ----
tuned_model_random

As we can see, the tuned model has a much better performance than the original model (e.g. MAE reduced substantially). The tuned model uses a degree of 2 to detrend the data, seasonal period of 12 which seems reasonable for a monthly airline data and a window length (number of lags) of 23.

One thing that you might be curious about is what was the search space used? This can be obtained easily using the models() call with internal=True. The column Tune Distributions provides the default random grid used for tuning. In this case, six (6) hyper parameters were tuned in a random manner. Note that

The limits of the search space are chosen carefully based on the time series characteristics. For example,
- The seasonal period sp is chosen to be a choice between 12 and 24. This is based on the seasonality of 12 that is detected during the setup process. You may wonder why we also add a seasonality of 24 in the search space. This is because in some cases, a seasonal period that is a harmonic of the dominant frequency can lead to better modeling results.
- The same rationale goes into choosing the window_length (number of lags used for modeling) search space as well.
The default number of searches is 10 hyper-parameter combinations.

#### OK, so what search space was used? ----
random_grid = exp.models(internal=True).loc["lr_cds_dt", "Tune Distributions"]
random_grid

Fixed Grid Search

Alternately (instead of performing a random grid search), one may choose to do hyper-parameter tuning using a "fixed" grid search. This can be done in pycaret using the search_algorithm argument.

#### Tune the model (Fixed Grid Search) ----
tuned_model_grid = exp.tune_model(model, search_algorithm = "grid")
exp.plot_model(tuned_model_grid)

Again, you might be interested in learning about the search space used for this tuning. This can be obtained easily using the models() call with internal=True. The column Tune Grid provides the default fixed grid used for tuning.

#### What search space was used? ----
fixed_grid = exp.models(internal=True).loc["lr_cds_dt", "Tune Grid"]
fixed_grid

Reasons for preferring Random Grid Search over a Fixed Grid Search

In this case too, six (6) hyper parameters were tuned. The difference is that in case of a fixed grid, all permutations of the hyper-parameters are tried (vs. sampling from a search space in case of a random grid search). Because this can be exhaustive and time consuming, generally the fixed grid searches over a limited space compared to a random grid. This can cause the results of a random grid search to be better than that of a fixed grid search (in general, though not all the time).
In addition, random grid search is preferred because it ends up not wasting valuable time if a certain hyper-parameter is not helpful in improving the performance. This is explained in this wonderful video by Andrew Ng.

For these reasons, "random grid search" is the default option in the pycaret time series module.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic Hyperparameter Tuning for Time Series Models in PyCaret #1791

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Basic Hyperparameter Tuning for Time Series Models in PyCaret #1791

ngupta23 Nov 1, 2021 Maintainer

Step 1: Create a simple baseline model

Step 2: Tune the model

Random Grid Search

Fixed Grid Search

Reasons for preferring Random Grid Search over a Fixed Grid Search

Suggested Next Reads

Replies: 0 comments

ngupta23
Nov 1, 2021
Maintainer