This notebook demonstrates how to apply PredictHQ’s model in real forecasting scenarios, guiding users through feature engineering, model evaluation, and forecasting. It compares a baseline model (using only time trends) with a PredictHQ-enhanced model to quantify the value of event intelligence.

A sample demand dataset and configuration file is provided.

Please refer to the `README.md` file for more details.

# Contents

* [Settings](#Settings)
* [Load Demand and Configuration Files](#Load-Demand-and-Configuration-files)
* [Beam Analysis and Feature Engineering](#Beam-Analysis-and-Feature-Engineering)
* [Model Evaluation and Comparison](#Model-Evaluation-and-Comparison)
* [Model Creation](#Model-Creation)
* [Forecasting](#Forecasting)

In [None]:
# Install dependencies if not already installed
# !pip install -r requirements.txt

In [None]:
import os
import pandas as pd
import json
import cloudpickle
import plotly.graph_objects as go

from phq import (
    run_beam_analysis,
    process_demand_data,
    prepare_event_features,
    prepare_time_trend_features,
    prepare_forecast_features,
    evaluate_forecast_model,
    PhqForecastModel
)

# Settings

The notebook supports two execution modes, controlled by the `RUN_SETTING` parameter:

1. `RUN_SETTING = "CSV_EVENT_FEATURES"` – Runs without a `PHQ_ACCESS_TOKEN`, using pre-generated event feature files for the provided sample demand dataset and configuration file.
2. `RUN_SETTING = "API_EVENT_FEATURES"` – Runs with a `PHQ_ACCESS_TOKEN`, using PredictHQ APIs to generate PredictHQ Event Features for the provided demand dataset and configuration file. The demand dataset can be either the sample demand data or the user’s own dataset.

Ensure that `PHQ_ACCESS_TOKEN` is set when selecting `API_EVENT_FEATURES`.



In [None]:
# set the run setting to "CSV_EVENT_FEATURES" to use the sample csv files in the data folder or "API_EVENT_FEATURES" to use the PredictHQ APIs
RUN_SETTING = "CSV_EVENT_FEATURES"
if RUN_SETTING == "API_EVENT_FEATURES":
    # set the PHQ access token in the environment variable or replace "XXXXXX" with your access token
    PHQ_ACCESS_TOKEN = os.environ.get("PHQ_ACCESS_TOKEN") or "XXXXXX"

if RUN_SETTING == "CSV_EVENT_FEATURES":
    print("Running the notebook with provided sample demand and PredictHQ Event Features")
else:
    print(f"Running with PHQ_ACCESS_TOKEN")

os.makedirs("results/models", exist_ok=True)

# Load Demand and Configuration Files

The demand dataset is stored as a CSV file with columns for `date` and `demand`. The configuration file provides metadata such as `lat`, `lon`, `industry`, and `name`, which are required in the feature engineering step.

In [None]:
# Read the sample daily demand data
sample_demand_df = pd.read_csv("data/sample_demand.csv")

# Read the configuration file
with open("data/sample_config.json", "r") as json_file:
    config = json.load(json_file)

# Beam Analysis and Feature Engineering

The PredictHQ Event Features are prepared based on Feature Importance results from a Beam Analysis. The time trend features are based on the time dates and historical demand values. Key steps include:
- Running a **Beam Analysis** using PredictHQ API if `RUN_SETTING = "API_EVENT_FEATURES"`.
- Preparing demand data to handle missing values.
- Preparing **PredictHQ Event Features**.
- Preparing **time trend features**.

In [None]:
if RUN_SETTING == "API_EVENT_FEATURES":
    # create an analysis and wait for it to complete
    sample_beam_analysis_result = run_beam_analysis(
        config["name"],
        config["lat"],
        config["lon"],
        sample_demand_df,
        PHQ_ACCESS_TOKEN,
        industry=config["industry"],
    )
    sample_beam_analysis_id = sample_beam_analysis_result.analysis_id
else:
    sample_beam_analysis_id = "csv_sample_beam_analysis_id"

In [None]:
# Process demand dataset
demand_df = process_demand_data(sample_demand_df)
# Prepare PredictHQ Event Features
if RUN_SETTING == "API_EVENT_FEATURES":
    event_features_df = prepare_event_features(
        sample_beam_analysis_id, PHQ_ACCESS_TOKEN
    )
else:
    event_features_df = pd.read_csv("data/sample_event_features.csv")
    event_features_df["date"] = pd.to_datetime(event_features_df["date"])
# Prepare time trend features
time_trend_features_df = prepare_time_trend_features(demand_df)
# Combine PredictHQ Event Features and time trend features
combined_features_df = time_trend_features_df.merge(event_features_df, on="date")

# Model Evaluation and Comparison

The model performance is evaluated and compared with and without PredictHQ Event Features using **Mean Absolute Percentage Error (MAPE)**. The model evalution might take a few minutes depending on the size of the demand dataset.

In [None]:
# Forecast model evaluation with PredictHQ Event Features
evaluation_results_phq = evaluate_forecast_model(combined_features_df, demand_df)

# Forecast model evaluation without PredictHQ Event Features
evaluation_results_baseline = evaluate_forecast_model(time_trend_features_df, demand_df)

# Model performance comparison
phq_mape = evaluation_results_phq["mape"]
baseline_mape = evaluation_results_baseline["mape"]
print(f"MAPE for forecast model with PredictHQ Event Features: {phq_mape:.2f}")
print(f"MAPE for forecast model without PredictHQ Event Features: {baseline_mape:.2f}")
print(f"Relative MAPE improvements: {100 * (baseline_mape - phq_mape) / baseline_mape:.2f}%")

# Model Creation

The model is trained using event and time trend features and then saved.

In [None]:
# Create forecast model
forecast_model = PhqForecastModel()
forecast_model.fit(combined_features_df, demand_df)
# Save forecast model
with open(f"results/models/model_{sample_beam_analysis_id}.pkl", "wb") as f:
    cloudpickle.dump(forecast_model, f)

# Forecasting

Prepare forecasting features and apply the trained model to predict future demand.

In [None]:
# Prepare features for forecasting
FORECAST_HORIZON = 7

if RUN_SETTING == "API_EVENT_FEATURES":
    forecasting_features_df = prepare_forecast_features(
        demand_df, sample_beam_analysis_id, FORECAST_HORIZON, PHQ_ACCESS_TOKEN
    )
else:
    forecasting_features_df = pd.read_csv("data/sample_forecasting_features.csv")
    forecasting_features_df["date"] = pd.to_datetime(forecasting_features_df["date"])
# Forecast demand values
predictions = forecast_model.predict(forecasting_features_df)

## Visualize Forecasting Results

Visualize and compare historical demand with forecasted demand values.

In [None]:
fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=sample_demand_df["date"],
        y=sample_demand_df["demand"],
        mode="lines+markers",
        name="Actual",
    )
)
fig.add_trace(
    go.Scatter(
        x=forecasting_features_df["date"],
        y=predictions,
        mode="lines+markers",
        name="Forecast",
    )
)
# Customize layout
fig.update_layout(
    title=f'Forecast results for the next {FORECAST_HORIZON} days',
    xaxis_title="Date",
    yaxis_title="Demand",
)

fig