Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future regressor data #15

Closed
bash-j opened this issue May 27, 2021 · 3 comments
Closed

Future regressor data #15

bash-j opened this issue May 27, 2021 · 3 comments

Comments

@bash-j
Copy link

bash-j commented May 27, 2021

Hi,

I can't find in the documentation how to pass future regressor data to the forecaster. I can specify the column names for the regressors in the ModelComponentsParam, but if you run the run_forecast_config I get an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-267-d126d5cc7a8a> in <module>
      6         coverage=0.95,         # 95% prediction intervals
      7         metadata_param=metadata,
----> 8         model_components_param=model_components
      9     )
     10 )

~\Anaconda3\envs\gkite\lib\site-packages\greykite\framework\templates\forecaster.py in run_forecast_config(self, df, config)
    287             df=df,
    288             config=config)
--> 289         self.forecast_result = forecast_pipeline(**pipeline_parameters)
    290         return self.forecast_result
    291 

~\Anaconda3\envs\gkite\lib\site-packages\greykite\framework\pipeline\pipeline.py in pipeline_wrapper(df, time_col, value_col, date_format, tz, freq, train_end_date, anomaly_info, pipeline, regressor_cols, estimator, hyperparameter_grid, hyperparameter_budget, n_jobs, verbose, forecast_horizon, coverage, test_horizon, periods_between_train_test, agg_periods, agg_func, score_func, score_func_greater_is_better, cv_report_metrics, null_model_params, relative_error_tolerance, cv_horizon, cv_min_train_periods, cv_expanding_window, cv_use_most_recent_splits, cv_periods_between_splits, cv_periods_between_train_test, cv_max_splits)
    240             cv_periods_between_splits=cv_periods_between_splits,
    241             cv_periods_between_train_test=cv_periods_between_train_test,
--> 242             cv_max_splits=cv_max_splits
    243         )
    244     return pipeline_wrapper

~\Anaconda3\envs\gkite\lib\site-packages\greykite\framework\pipeline\pipeline.py in forecast_pipeline(df, time_col, value_col, date_format, tz, freq, train_end_date, anomaly_info, pipeline, regressor_cols, estimator, hyperparameter_grid, hyperparameter_budget, n_jobs, verbose, forecast_horizon, coverage, test_horizon, periods_between_train_test, agg_periods, agg_func, score_func, score_func_greater_is_better, cv_report_metrics, null_model_params, relative_error_tolerance, cv_horizon, cv_min_train_periods, cv_expanding_window, cv_use_most_recent_splits, cv_periods_between_splits, cv_periods_between_train_test, cv_max_splits)
    740         xlabel=time_col,
    741         ylabel=value_col,
--> 742         relative_error_tolerance=relative_error_tolerance)
    743 
    744     result = ForecastResult(

~\Anaconda3\envs\gkite\lib\site-packages\greykite\framework\pipeline\utils.py in get_forecast(df, trained_model, train_end_date, test_start_date, forecast_horizon, xlabel, ylabel, relative_error_tolerance)
    758         Forecasts represented as a ``UnivariateForecast`` object.
    759     """
--> 760     predicted_df = trained_model.predict(df)
    761     # This is more robust than using trained_model.named_steps["estimator"] e.g.
    762     # if the user calls forecast_pipeline with a custom pipeline, where the last

~\Anaconda3\envs\gkite\lib\site-packages\sklearn\utils\metaestimators.py in <lambda>(*args, **kwargs)
    118 
    119         # lambda, but not partial, allows help() to work with update_wrapper
--> 120         out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
    121         # update the docstring of the returned function
    122         update_wrapper(out, self.fn)

~\Anaconda3\envs\gkite\lib\site-packages\sklearn\pipeline.py in predict(self, X, **predict_params)
    417         for _, name, transform in self._iter(with_final=False):
    418             Xt = transform.transform(Xt)
--> 419         return self.steps[-1][-1].predict(Xt, **predict_params)
    420 
    421     @if_delegate_has_method(delegate='_final_estimator')

~\Anaconda3\envs\gkite\lib\site-packages\greykite\sklearn\estimator\base_silverkite_estimator.py in predict(self, X, y)
    346             trained_model=self.model_dict,
    347             past_df=None,
--> 348             new_external_regressor_df=None)["fut_df"]  # regressors are included in X
    349 
    350         self.forecast = pred_df

~\Anaconda3\envs\gkite\lib\site-packages\greykite\algo\forecast\silverkite\forecast_silverkite.py in predict(self, fut_df, trained_model, freq, past_df, new_external_regressor_df, sim_num, include_err, force_no_sim, na_fill_func)
   1464 
   1465         if fut_df.shape[0] <= 0:
-> 1466             raise ValueError("``fut_df`` must be a dataframe of non-zero size.")
   1467 
   1468         if time_col not in fut_df.columns:

ValueError: ``fut_df`` must be a dataframe of non-zero size.

@al-bert
Copy link
Contributor

al-bert commented May 27, 2021

Could you share what your dataframe looks like? It should include future values for the regressors (but not for the variable you want to forecast).

Relevant docs:

@bash-j
Copy link
Author

bash-j commented May 28, 2021

Thanks for pointing me in the right direction. I missed that the df had to contain future dates with no values populated for the target value. I used this code to extend my original data and it then was able to run the forecaster.

future = pd.DataFrame({'ds': pd.date_range(start=(data['ds'].max() + timedelta(weeks=1)),
                                           end=(data['ds'].max() + timedelta(weeks=52)),
                                           freq='W')})
future['a'] = data['a'].iloc[-1]
future['b'] = data['b'].iloc[-1]
future['c'] = 0

data = pd.concat([data, future], ignore_index=True)

@al-bert
Copy link
Contributor

al-bert commented May 28, 2021

Great to hear! I know this is simplified code, but it's worth noting that you may opt to forecast 'a' and 'b' (using greykite, or any other method) first and use those as inputs to forecast 'c'.

@al-bert al-bert closed this as completed May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants