Future regressor data #15

bash-j · 2021-05-27T01:04:18Z

Hi,

I can't find in the documentation how to pass future regressor data to the forecaster. I can specify the column names for the regressors in the ModelComponentsParam, but if you run the run_forecast_config I get an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-267-d126d5cc7a8a> in <module>
      6         coverage=0.95,         # 95% prediction intervals
      7         metadata_param=metadata,
----> 8         model_components_param=model_components
      9     )
     10 )

~\Anaconda3\envs\gkite\lib\site-packages\greykite\framework\templates\forecaster.py in run_forecast_config(self, df, config)
    287             df=df,
    288             config=config)
--> 289         self.forecast_result = forecast_pipeline(**pipeline_parameters)
    290         return self.forecast_result
    291 

~\Anaconda3\envs\gkite\lib\site-packages\greykite\framework\pipeline\pipeline.py in pipeline_wrapper(df, time_col, value_col, date_format, tz, freq, train_end_date, anomaly_info, pipeline, regressor_cols, estimator, hyperparameter_grid, hyperparameter_budget, n_jobs, verbose, forecast_horizon, coverage, test_horizon, periods_between_train_test, agg_periods, agg_func, score_func, score_func_greater_is_better, cv_report_metrics, null_model_params, relative_error_tolerance, cv_horizon, cv_min_train_periods, cv_expanding_window, cv_use_most_recent_splits, cv_periods_between_splits, cv_periods_between_train_test, cv_max_splits)
    240             cv_periods_between_splits=cv_periods_between_splits,
    241             cv_periods_between_train_test=cv_periods_between_train_test,
--> 242             cv_max_splits=cv_max_splits
    243         )
    244     return pipeline_wrapper

~\Anaconda3\envs\gkite\lib\site-packages\greykite\framework\pipeline\pipeline.py in forecast_pipeline(df, time_col, value_col, date_format, tz, freq, train_end_date, anomaly_info, pipeline, regressor_cols, estimator, hyperparameter_grid, hyperparameter_budget, n_jobs, verbose, forecast_horizon, coverage, test_horizon, periods_between_train_test, agg_periods, agg_func, score_func, score_func_greater_is_better, cv_report_metrics, null_model_params, relative_error_tolerance, cv_horizon, cv_min_train_periods, cv_expanding_window, cv_use_most_recent_splits, cv_periods_between_splits, cv_periods_between_train_test, cv_max_splits)
    740         xlabel=time_col,
    741         ylabel=value_col,
--> 742         relative_error_tolerance=relative_error_tolerance)
    743 
    744     result = ForecastResult(

~\Anaconda3\envs\gkite\lib\site-packages\greykite\framework\pipeline\utils.py in get_forecast(df, trained_model, train_end_date, test_start_date, forecast_horizon, xlabel, ylabel, relative_error_tolerance)
    758         Forecasts represented as a ``UnivariateForecast`` object.
    759     """
--> 760     predicted_df = trained_model.predict(df)
    761     # This is more robust than using trained_model.named_steps["estimator"] e.g.
    762     # if the user calls forecast_pipeline with a custom pipeline, where the last

~\Anaconda3\envs\gkite\lib\site-packages\sklearn\utils\metaestimators.py in <lambda>(*args, **kwargs)
    118 
    119         # lambda, but not partial, allows help() to work with update_wrapper
--> 120         out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
    121         # update the docstring of the returned function
    122         update_wrapper(out, self.fn)

~\Anaconda3\envs\gkite\lib\site-packages\sklearn\pipeline.py in predict(self, X, **predict_params)
    417         for _, name, transform in self._iter(with_final=False):
    418             Xt = transform.transform(Xt)
--> 419         return self.steps[-1][-1].predict(Xt, **predict_params)
    420 
    421     @if_delegate_has_method(delegate='_final_estimator')

~\Anaconda3\envs\gkite\lib\site-packages\greykite\sklearn\estimator\base_silverkite_estimator.py in predict(self, X, y)
    346             trained_model=self.model_dict,
    347             past_df=None,
--> 348             new_external_regressor_df=None)["fut_df"]  # regressors are included in X
    349 
    350         self.forecast = pred_df

~\Anaconda3\envs\gkite\lib\site-packages\greykite\algo\forecast\silverkite\forecast_silverkite.py in predict(self, fut_df, trained_model, freq, past_df, new_external_regressor_df, sim_num, include_err, force_no_sim, na_fill_func)
   1464 
   1465         if fut_df.shape[0] <= 0:
-> 1466             raise ValueError("``fut_df`` must be a dataframe of non-zero size.")
   1467 
   1468         if time_col not in fut_df.columns:

ValueError: ``fut_df`` must be a dataframe of non-zero size.

The text was updated successfully, but these errors were encountered:

al-bert · 2021-05-27T23:15:59Z

Could you share what your dataframe looks like? It should include future values for the regressors (but not for the variable you want to forecast).

Relevant docs:

bash-j · 2021-05-28T10:35:36Z

Thanks for pointing me in the right direction. I missed that the df had to contain future dates with no values populated for the target value. I used this code to extend my original data and it then was able to run the forecaster.

future = pd.DataFrame({'ds': pd.date_range(start=(data['ds'].max() + timedelta(weeks=1)),
                                           end=(data['ds'].max() + timedelta(weeks=52)),
                                           freq='W')})
future['a'] = data['a'].iloc[-1]
future['b'] = data['b'].iloc[-1]
future['c'] = 0

data = pd.concat([data, future], ignore_index=True)

al-bert · 2021-05-28T16:26:03Z

Great to hear! I know this is simplified code, but it's worth noting that you may opt to forecast 'a' and 'b' (using greykite, or any other method) first and use those as inputs to forecast 'c'.

al-bert closed this as completed May 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future regressor data #15

Future regressor data #15

bash-j commented May 27, 2021

al-bert commented May 27, 2021

bash-j commented May 28, 2021

al-bert commented May 28, 2021

Future regressor data #15

Future regressor data #15

Comments

bash-j commented May 27, 2021

al-bert commented May 27, 2021

bash-j commented May 28, 2021

al-bert commented May 28, 2021