![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FApplied+ML%2FForecasting&file=BigQuery+ML+For+Hierarchical+Forecasting.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Applied%20ML/Forecasting/BigQuery%20ML%20For%20Hierarchical%20Forecasting.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FApplied%2520ML%2FForecasting%2FBigQuery%2520ML%2520For%2520Hierarchical%2520Forecasting.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20ML/Forecasting/BigQuery%20ML%20For%20Hierarchical%20Forecasting.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/Applied%20ML/Forecasting/BigQuery%20ML%20For%20Hierarchical%20Forecasting.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# BigQuery ML For Hierarchical Forecasting


- Introduce hierarchical forecasting topic
- Data description and prep
- Forecasting the lowest level series
- Bottom-Up Hierarchical Forecasting with built-in functionality
    - Detailed review of the bottom-up approach
- Top-Down Hierarchical Forecasting: Step-by-step instructions
- Top-Down Hierarchical Forecasting: Workflow With BigQuery Procedural Language


Key Links To Document:
- https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-time-series
- https://cloud.google.com/bigquery/docs/arima-time-series-forecasting-with-hierarchical-time-series
- https://cloud.google.com/bigquery/docs/e2e-journey


**Data Source:**

This workflow uses data collected from products being sold in stores.  It could be used to forecast product demand at the store level as well as considering the hierarcy of `State | County | City | Store`.  The data are found in the BigQuery Public dataset at:

- `bigquery-public-data.iowa_liquor_sales.sales`

---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-bigquery'),
    ('plotly', 'plotly'),
    ('kaleido', 'kaleido')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [4]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

Inputs

In [5]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [6]:
REGION = 'us-central1'
SERIES = 'applied-ml-forecasting'
EXPERIMENT = 'bqml-hierarchical'

# gcs bucket
GCS_BUCKET = PROJECT_ID

Packages

In [7]:


from google.cloud import bigquery

Clients

In [8]:
# bigquery client
bq = bigquery.Client(project = PROJECT_ID)

# bigquery cell magics
%load_ext google.cloud.bigquery

The google.cloud.bigquery extension is already loaded. To reload it, use:
  %reload_ext google.cloud.bigquery


Prepare the code below for your environment.

This notebook takes advantage of the [BigQuery IPython magic](https://cloud.google.com/python/docs/reference/bigquery/latest/magics) for legibility and ease of copy/pasting to BigQuery SQL editor. If this notebook is being used from an environment that can run notebooks it needs further preparation: Colab, Colab Enterprise, Vertex AI Workbench Instances, or BigQuery Studio with a Python Notebook. The SQL code in these cells uses the fully qualified [BigQuery table](https://cloud.google.com/bigquery/docs/tables-intro) names in the form `projectname.datasetname.tablename`. Prepare for your environment by:

- Edit > Find
    - Find: `statmike-mlops-349915`
    - Replace: `<your project id>`
    - Replace All

---
## BigQuery Source Data

This workflow uses data collected from products being sold in stores.  It could be used to forecast product demand at the store level as well as considering the hierarcy of `State | County | City | Store`.  The data are found in the BigQuery Public dataset at:

- `bigquery-public-data.iowa_liquor_sales.sales`

In [9]:
%%bigquery
SELECT *
FROM `bigquery-public-data.iowa_liquor_sales.sales`
LIMIT 5

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,invoice_and_item_number,date,store_number,store_name,address,city,zip_code,store_location,county_number,county,...,item_number,item_description,pack,bottle_volume_ml,state_bottle_cost,state_bottle_retail,bottles_sold,sale_dollars,volume_sold_liters,volume_sold_gallons
0,RINV-05110400027,2024-01-30,2647,HY-VEE #7 / CEDAR RAPIDS,5050 EDGEWOOD RD,CEDAR RAPIDS,52411.0,POINT(-91.698522983 42.029484381),,LINN,...,43120,BACARDI SUPERIOR PET,6,1750,15.5,23.25,-48,-1116.0,-84.0,-22.19
1,RINV-05297200096,2024-06-11,2621,HY-VEE FOOD STORE #3 / SIOUX CITY,3301 GORDON DR,SIOUX CITY,51105.0,POINT(-96.362866022 42.488984001),,WOODBURY,...,36908,MCCORMICK 80PRF VODKA PET,6,1750,8.24,12.36,-12,-148.32,-21.0,-5.54
2,RINV-05455500010,2024-10-03,2643,HY-VEE WINE AND SPIRITS / WATERLOO,2126 KIMBALL AVE,WATERLOO,50701.0,POINT(-92.35698 42.47029),,BLACK HAWK,...,64865,FIREBALL CINNAMON WHISKEY PET,12,750,9.0,13.5,-24,-324.0,-18.0,-4.75
3,RINV-04806800054,2023-08-16,5145,SOUTH SIDE FOOD MART,1101 ARMY POST RD. SUITE A & B,DES MOINES,50315.0,POINT(-93.628625001 41.526920009),,POLK,...,31470,NEW AMSTERDAM GIN,12,1000,7.83,11.75,-12,-141.0,-12.0,-3.17
4,RINV-05394100050,2024-08-20,2606,HY-VEE WINE AND SPIRITS / HUMBOLDT,1011 13TH ST NORTH,HUMBOLDT,50548.0,POINT(-94.226730035 42.733207011),,HUMBOLDT,...,35917,FIVE O'CLOCK VODKA,12,1000,4.66,6.99,-12,-83.88,-12.0,-3.17


### Describe Data with `ML.DESCRIBE_DATA`

Reviewing a few records, like above, gives a good sense of how the data is arranged. Before proceeding with machine learning techniques it is important to understand more about these raw columns.  Are they ready to use a features in a model or is some form of feature engineering needed first?  For this, the distribution of values is an important starting point.  

While SQL could be used to look at the distribution, it would be a time consuming process and requires different techniques for different data types like numerical, string, boolean, dates, times, array and struct version of these, and arrays of structs.

To make this process fast and simple, the new [`ML.DESCRIBE_DATA`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-describe-data) function is used to get a single row for each column the describes the data distribution:
- `top_k`: get the top 3 most frequent categories for string columns (default = 1)
- `num_quantiles`: get 4 quantiles for numerical columns (default = 2)

In [10]:
%%bigquery
SELECT *
FROM ML.DESCRIBE_DATA(
    (SELECT * EXCEPT(store_location) FROM `bigquery-public-data.iowa_liquor_sales.sales`),
    STRUCT(3 AS top_k, 4 AS num_quantiles)
)

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,name,num_rows,num_nulls,num_zeros,min,max,mean,stddev,median,quantiles,unique,avg_string_length,num_values,top_values,min_array_length,max_array_length,avg_array_length,total_array_length,array_length_quantiles,dimension
0,address,31339341,83720,,1 E MAIN ST,PO BOX 261 310 W DILLON,,,,[],3199.0,16.002626,31255621,"[{'value': '3221 SE 14TH ST', 'count': 345286}...",,,,,[],
1,bottle_volume_ml,31339341,0,10.0,0,378000,870.840142,619.444568,750.0,"[0.0, 750.0, 750.0, 1000.0, 378000.0]",,,31339341,[],,,,,[],
2,bottles_sold,31339341,0,9.0,-768,15000,10.915819,30.769464,6.0,"[-768.0, 3.0, 6.0, 12.0, 15000.0]",,,31339341,[],,,,,[],
3,category,31339341,16974,,1011000.0,1901200.0,,,,[],114.0,9.0,31322367,"[{'value': '1031100.0', 'count': 3344482}, {'v...",,,,,[],
4,category_name,31339341,25040,,100 PROOF VODKA,WHITE RUM,,,,[],103.0,17.457523,31314301,"[{'value': 'AMERICAN VODKAS', 'count': 3250690...",,,,,[],
5,city,31339341,83719,,ACKLEY,ZWINGLE,,,,[],501.0,9.193062,31255622,"[{'value': 'DES MOINES', 'count': 2664255}, {'...",,,,,[],
6,county,31339341,160522,,ADAIR,WRIGHT,,,,[],100.0,6.428723,31178819,"[{'value': 'POLK', 'count': 5831419}, {'value'...",,,,,[],
7,county_number,31339341,7206498,,1,99,,,,[],99.0,1.912535,24132843,"[{'value': None, 'count': 7206498}, {'value': ...",,,,,[],
8,date,31339341,0,,2012-01-03,2025-03-31,,,,[],3492.0,10.0,31339341,"[{'value': '2015-04-29', 'count': 35482}, {'va...",,,,,[],
9,invoice_and_item_number,31339341,0,,306831300001,S444400083,,,,[],31649878.0,14.054386,31339341,"[{'value': 'INV-60784400016', 'count': 324}, {...",,,,,[],


### Create A BigQuery Dataset

Create a new [BigQuery Dataset](https://cloud.google.com/bigquery/docs/datasets) as a working location for this workflow:

In [11]:
%%bigquery
CREATE SCHEMA IF NOT EXISTS `statmike-mlops-349915.applied_ml_forecasting`
    OPTIONS(
        location = 'US'
    )

Query is running:   0%|          |

### Create A Source Table or View

Let's create a source table that is a subset of the data, three counties, and generate the overall demand for each store on each day (date):


In [12]:
%%bigquery
CREATE OR REPLACE TABLE `statmike-mlops-349915.applied_ml_forecasting.source` AS
SELECT
    store_number,
    city,
    county,
    date,
    SUM(bottles_sold) AS total_units_sold
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE county IN ('POLK', 'LINN', 'SCOTT')
GROUP BY store_number, date, city, county;

Query is running:   0%|          |

In [13]:
%%bigquery
SELECT
    county,
    city,
    store_number,
    COUNT(*) AS row_count
FROM `statmike-mlops-349915.applied_ml_forecasting.source`
GROUP BY county, city, store_number;

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,county,city,store_number,row_count
0,POLK,BONDURANT,10021,52
1,POLK,WEST DES MOINES,10023,61
2,POLK,WEST DES MOINES,10024,63
3,POLK,URBANDALE,10025,79
4,POLK,DES MOINES,10029,149
...,...,...,...,...
779,SCOTT,DAVENPORT,9922,3
780,POLK,DES MOINES,9925,9
781,LINN,CEDAR RAPIDS,9928,5
782,SCOTT,BETTENDORF,9934,2


### Prepare Source Table for Forecasting

Add a `splits` column that diffentiates the last 30 days data as 'TEST' and the history before that as 'TRAIN'.

In [14]:
%%bigquery
CREATE OR REPLACE TABLE `statmike-mlops-349915.applied_ml_forecasting.prepped` AS
WITH DateRange AS (
    SELECT
        MAX(date) AS max_date
    FROM `statmike-mlops-349915.applied_ml_forecasting.source`
),
ActiveStores AS (
    SELECT DISTINCT store_number
    FROM `statmike-mlops-349915.applied_ml_forecasting.source`
    WHERE date = (SELECT max_date FROM DateRange)
)
SELECT
    s.*,
    CASE
        WHEN s.date >= DATE_SUB((SELECT max_date FROM DateRange), INTERVAL 30 DAY) THEN 'TEST'
        ELSE 'TRAIN'
    END AS splits
FROM `statmike-mlops-349915.applied_ml_forecasting.source` s
WHERE s.store_number IN (SELECT store_number FROM ActiveStores)

Query is running:   0%|          |

---
## Base-Level Forecasting With BigQuery ML's ARIMA+

Start by creating forecast for the lowest level of the hierarchy, the stores, directly. 

In [15]:
%%bigquery
CREATE OR REPLACE MODEL `statmike-mlops-349915.applied_ml_forecasting.base_forecast`
OPTIONS (
    model_type = 'ARIMA_PLUS',
    time_series_timestamp_col = 'date',
    time_series_data_col = 'total_units_sold',
    time_series_id_col = ['county', 'city', 'store_number'],
    holiday_region = 'US',
    data_frequency = 'DAILY',
    horizon = 90
) AS
SELECT * EXCEPT(splits)
FROM `statmike-mlops-349915.applied_ml_forecasting.prepped`
WHERE splits = 'TRAIN'

Query is running:   0%|          |

### Review Forecast Values

In [21]:
%%bigquery
SELECT *
FROM ML.FORECAST(
    MODEL `statmike-mlops-349915.applied_ml_forecasting.base_forecast`,
    STRUCT(1 AS horizon, 0.95 AS confidence_level)
)
ORDER BY county, city, store_number, forecast_timestamp

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,store_number,city,county,forecast_timestamp,forecast_value,standard_error,confidence_level,prediction_interval_lower_bound,prediction_interval_upper_bound,confidence_interval_lower_bound,confidence_interval_upper_bound
0,10523,CEDAR RAPIDS,LINN,2025-02-25 00:00:00+00:00,702.975335,62.055257,0.95,581.566825,824.383845,581.566825,824.383845
1,2508,CEDAR RAPIDS,LINN,2025-02-25 00:00:00+00:00,1175.648520,89.508670,0.95,1000.528557,1350.768482,1000.528557,1350.768482
2,3912,CEDAR RAPIDS,LINN,2025-02-25 00:00:00+00:00,266.897361,20.646982,0.95,226.502406,307.292317,226.502406,307.292317
3,4162,CEDAR RAPIDS,LINN,2025-02-25 00:00:00+00:00,548.600611,34.888406,0.95,480.342907,616.858314,480.342907,616.858314
4,4489,CEDAR RAPIDS,LINN,2025-02-26 00:00:00+00:00,-18.023300,9.963337,0.95,-37.516151,1.469551,-37.516151,1.469551
...,...,...,...,...,...,...,...,...,...,...,...
103,5846,DAVENPORT,SCOTT,2025-02-25 00:00:00+00:00,726.942220,45.711598,0.95,637.509394,816.375047,637.509394,816.375047
104,5847,DAVENPORT,SCOTT,2025-02-25 00:00:00+00:00,376.444669,13.620911,0.95,349.795929,403.093410,349.795929,403.093410
105,6092,DAVENPORT,SCOTT,2025-02-25 00:00:00+00:00,382.205971,16.968819,0.95,349.007189,415.404754,349.007189,415.404754
106,5725,PLEASANT VALLEY,SCOTT,2025-02-25 00:00:00+00:00,102.803880,2.388106,0.95,98.131652,107.476109,98.131652,107.476109


### Forecast Evaluation

Reference for [`ML.EVALUATE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate) for `model_type = 'ARIMA_PLUS'`

The metrics returned depend on if input (test) data is provided and if `perform_aggregation` is `True` or `False`.  If `False` then metrics per timestamp are provide, and if `True` then metrics per `time_series_id_col` are provided.

In [17]:
%%bigquery
SELECT *
FROM ML.EVALUATE(
    MODEL `statmike-mlops-349915.applied_ml_forecasting.base_forecast`,
    (
        SELECT *
        FROM `statmike-mlops-349915.applied_ml_forecasting.prepped`
        WHERE splits = 'TEST'
        ORDER BY county, city, store_number
    ),
    STRUCT(TRUE AS perform_aggregation)
)
ORDER BY county, city, store_number

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,store_number,city,county,mean_absolute_error,mean_squared_error,root_mean_squared_error,mean_absolute_percentage_error,symmetric_mean_absolute_percentage_error
0,10523,CEDAR RAPIDS,LINN,101.358366,17912.200096,133.836468,13.375811,13.808762
1,2508,CEDAR RAPIDS,LINN,409.162327,220690.794969,469.777389,64.784294,42.586377
2,3912,CEDAR RAPIDS,LINN,138.140721,23949.874336,154.757469,59.352770,52.511813
3,4162,CEDAR RAPIDS,LINN,287.439707,85060.609190,291.651520,118.548085,72.218421
4,4489,CEDAR RAPIDS,LINN,427.407907,186983.053092,432.415371,148.653140,200.000000
...,...,...,...,...,...,...,...,...
106,5846,DAVENPORT,SCOTT,115.016578,15088.327162,122.834552,15.847650,15.667139
107,5847,DAVENPORT,SCOTT,189.386667,44494.007471,210.936027,106.672151,61.702804
108,6092,DAVENPORT,SCOTT,79.944033,8387.574810,91.583704,29.060930,24.241486
109,5725,PLEASANT VALLEY,SCOTT,55.588745,4602.770710,67.843723,242.202090,93.228738


---
## Hierarchical Forecasting With BigQuery ML's ARIMA+

Provide the hierarchy when creating the forecast:

In [18]:
%%bigquery
CREATE OR REPLACE MODEL `statmike-mlops-349915.applied_ml_forecasting.hierarchical_forecast`
OPTIONS (
    model_type = 'ARIMA_PLUS',
    time_series_timestamp_col = 'date',
    time_series_data_col = 'total_units_sold',
    time_series_id_col = ['county', 'city', 'store_number'],
    hierarchical_time_series_cols = ['county', 'city', 'store_number'],
    holiday_region = 'US',
    data_frequency = 'DAILY',
    horizon = 90
) AS
SELECT * EXCEPT(splits)
FROM `statmike-mlops-349915.applied_ml_forecasting.prepped`
WHERE splits = 'TRAIN'

Query is running:   0%|          |

### Review Forecast Values

In [19]:
%%bigquery
SELECT *
FROM ML.FORECAST(
    MODEL `statmike-mlops-349915.applied_ml_forecasting.hierarchical_forecast`,
    STRUCT(1 AS horizon, 0.95 AS confidence_level)
)
ORDER BY county, city, store_number, forecast_timestamp

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,store_number,city,county,forecast_timestamp,forecast_value,standard_error,confidence_level,prediction_interval_lower_bound,prediction_interval_upper_bound,confidence_interval_lower_bound,confidence_interval_upper_bound
0,,,,2025-01-22 00:00:00+00:00,41941.363422,7148.310758,0.95,27955.992997,55926.733848,27955.992997,55926.733848
1,,,LINN,2025-01-28 00:00:00+00:00,4822.720659,465.524959,0.95,3911.940585,5733.500732,3911.940585,5733.500732
2,,CEDAR RAPIDS,LINN,2025-01-28 00:00:00+00:00,4470.863516,455.689780,0.95,3579.325558,5362.401473,3579.325558,5362.401473
3,10523,CEDAR RAPIDS,LINN,2025-02-25 00:00:00+00:00,702.975335,62.055257,0.95,581.566825,824.383845,581.566825,824.383845
4,2508,CEDAR RAPIDS,LINN,2025-02-25 00:00:00+00:00,1175.648520,89.508670,0.95,1000.528557,1350.768482,1000.528557,1350.768482
...,...,...,...,...,...,...,...,...,...,...,...
121,6092,DAVENPORT,SCOTT,2025-02-25 00:00:00+00:00,382.205971,16.968819,0.95,349.007189,415.404754,349.007189,415.404754
122,,PLEASANT VALLEY,SCOTT,2025-02-25 00:00:00+00:00,102.803880,2.388106,0.95,98.131652,107.476109,98.131652,107.476109
123,5725,PLEASANT VALLEY,SCOTT,2025-02-25 00:00:00+00:00,102.803880,2.388106,0.95,98.131652,107.476109,98.131652,107.476109
124,,WALCOTT,SCOTT,2025-02-25 00:00:00+00:00,111.598288,9.731738,0.95,92.558552,130.638025,92.558552,130.638025


### Forecast Evaluation

Reference for [`ML.EVALUATE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate) for `model_type = 'ARIMA_PLUS'`

The metrics returned depend on if input (test) data is provided and if `perform_aggregation` is `True` or `False`.  If `False` then metrics per timestamp are provide, and if `True` then metrics per `time_series_id_col` are provided.

In [20]:
%%bigquery
SELECT *
FROM ML.EVALUATE(
    MODEL `statmike-mlops-349915.applied_ml_forecasting.hierarchical_forecast`,
    (
        SELECT *
        FROM `statmike-mlops-349915.applied_ml_forecasting.prepped`
        WHERE splits = 'TEST'
        ORDER BY county, city, store_number
    ),
    STRUCT(TRUE AS perform_aggregation)
)
ORDER BY county, city, store_number

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,store_number,city,county,mean_absolute_error,mean_squared_error,root_mean_squared_error,mean_absolute_percentage_error,symmetric_mean_absolute_percentage_error
0,10523,CEDAR RAPIDS,LINN,101.358366,17912.200096,133.836468,13.375811,13.808762
1,2508,CEDAR RAPIDS,LINN,409.162327,220690.794969,469.777389,64.784294,42.586377
2,3912,CEDAR RAPIDS,LINN,138.140721,23949.874336,154.757469,59.352770,52.511813
3,4162,CEDAR RAPIDS,LINN,287.439707,85060.609190,291.651520,118.548085,72.218421
4,4489,CEDAR RAPIDS,LINN,427.407907,186983.053092,432.415371,148.653140,200.000000
...,...,...,...,...,...,...,...,...
106,5846,DAVENPORT,SCOTT,115.016578,15088.327162,122.834552,15.847650,15.667139
107,5847,DAVENPORT,SCOTT,189.386667,44494.007471,210.936027,106.672151,61.702804
108,6092,DAVENPORT,SCOTT,79.944033,8387.574810,91.583704,29.060930,24.241486
109,5725,PLEASANT VALLEY,SCOTT,55.588745,4602.770710,67.843723,242.202090,93.228738


---
## Compare Base-Level Forecasting To Hierarchical Forecasting


### Review Base-Level: `store_number`

The lowest levels are identical.  This is verified here by choosing a single time-series for one of the `store_number` values and compare the forecasted horizion for both the `base_forecast` model and the `hierarchical_forecast` model of which both are created above.

In [22]:
%%bigquery
WITH
baseForecast AS (
    SELECT store_number, city, county, forecast_timestamp, forecast_value AS base_forecast_value
    FROM ML.FORECAST(
        MODEL `statmike-mlops-349915.applied_ml_forecasting.base_forecast`,
        STRUCT(10 AS horizon, 0.95 AS confidence_level)
    )
    WHERE store_number = '10523'
),
hierarchicalForecast AS (
    SELECT store_number, city, county, forecast_timestamp, forecast_value AS hierarchical_forecast_value
    FROM ML.FORECAST(
        MODEL `statmike-mlops-349915.applied_ml_forecasting.hierarchical_forecast`,
        STRUCT(10 AS horizon, 0.95 AS confidence_level)
    )
    WHERE store_number = '10523'
)
SELECT bf.*, hf.hierarchical_forecast_value
FROM baseForecast bf
JOIN hierarchicalForecast hf
ON bf.store_number = hf.store_number AND bf.forecast_timestamp = hf.forecast_timestamp

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,store_number,city,county,forecast_timestamp,base_forecast_value,hierarchical_forecast_value
0,10523,CEDAR RAPIDS,LINN,2025-02-25 00:00:00+00:00,702.975335,702.975335
1,10523,CEDAR RAPIDS,LINN,2025-02-26 00:00:00+00:00,694.467176,694.467176
2,10523,CEDAR RAPIDS,LINN,2025-02-27 00:00:00+00:00,691.624712,691.624712
3,10523,CEDAR RAPIDS,LINN,2025-02-28 00:00:00+00:00,690.886527,690.886527
4,10523,CEDAR RAPIDS,LINN,2025-03-01 00:00:00+00:00,690.858837,690.858837
5,10523,CEDAR RAPIDS,LINN,2025-03-02 00:00:00+00:00,691.021467,691.021467
6,10523,CEDAR RAPIDS,LINN,2025-03-03 00:00:00+00:00,691.197027,691.197027
7,10523,CEDAR RAPIDS,LINN,2025-03-04 00:00:00+00:00,691.336656,691.336656
8,10523,CEDAR RAPIDS,LINN,2025-03-05 00:00:00+00:00,691.43573,691.43573
9,10523,CEDAR RAPIDS,LINN,2025-03-06 00:00:00+00:00,691.502125,691.502125


### Review **city** from: state > county > city > store_number

In [23]:
%%bigquery
WITH
getDate AS (
    SELECT MAX(forecast_timestamp) AS maxDate
    FROM ML.FORECAST(
        MODEL `statmike-mlops-349915.applied_ml_forecasting.hierarchical_forecast`,
        STRUCT(1 AS horizon, 0.95 AS confidence_level)
    )
    WHERE city = 'CEDAR RAPIDS' AND store_number IS NOT null
),
hierarchicalForecast AS (
    SELECT store_number, city, county, forecast_timestamp, forecast_value AS hierarchical_forecast_value
    FROM ML.FORECAST(
        MODEL `statmike-mlops-349915.applied_ml_forecasting.hierarchical_forecast`,
        STRUCT(60 AS horizon, 0.95 AS confidence_level)
    )
    WHERE city = 'CEDAR RAPIDS'
        AND store_number IS null
        AND forecast_timestamp >= (SELECT maxDate FROM getDate)
),
sumStore AS (
    SELECT forecast_timestamp, SUM(forecast_value) AS sum_hierarchical_forecast_value
    FROM ML.FORECAST(
        MODEL `statmike-mlops-349915.applied_ml_forecasting.hierarchical_forecast`,
        STRUCT(60 AS horizon, 0.95 AS confidence_level)
    )
    WHERE city = 'CEDAR RAPIDS'
        AND store_number IS NOT null
        AND forecast_timestamp >= (SELECT maxDate FROM getDate)
    GROUP BY forecast_timestamp
)
SELECT hf.*, ss.sum_hierarchical_forecast_value
FROM hierarchicalForecast hf
JOIN sumStore ss
on hf.forecast_timestamp = ss.forecast_timestamp
ORDER BY hf.forecast_timestamp
LIMIT 10

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,store_number,city,county,forecast_timestamp,hierarchical_forecast_value,sum_hierarchical_forecast_value
0,,CEDAR RAPIDS,LINN,2025-02-26 00:00:00+00:00,5415.189411,5415.189411
1,,CEDAR RAPIDS,LINN,2025-02-27 00:00:00+00:00,5380.736415,5380.736415
2,,CEDAR RAPIDS,LINN,2025-02-28 00:00:00+00:00,5368.338455,5368.338455
3,,CEDAR RAPIDS,LINN,2025-03-01 00:00:00+00:00,5358.385042,5358.385042
4,,CEDAR RAPIDS,LINN,2025-03-02 00:00:00+00:00,5346.868044,5346.868044
5,,CEDAR RAPIDS,LINN,2025-03-03 00:00:00+00:00,5325.221178,5325.221178
6,,CEDAR RAPIDS,LINN,2025-03-04 00:00:00+00:00,5084.537395,5084.537395
7,,CEDAR RAPIDS,LINN,2025-03-05 00:00:00+00:00,5281.324812,5281.324812
8,,CEDAR RAPIDS,LINN,2025-03-06 00:00:00+00:00,5273.640368,5273.640368
9,,CEDAR RAPIDS,LINN,2025-03-07 00:00:00+00:00,5278.108566,5278.108566


### Review **county** from: state > county > city > store_number

In [24]:
%%bigquery
WITH
getDate AS (
    SELECT MAX(forecast_timestamp) AS maxDate
    FROM ML.FORECAST(
        MODEL `statmike-mlops-349915.applied_ml_forecasting.hierarchical_forecast`,
        STRUCT(1 AS horizon, 0.95 AS confidence_level)
    )
    WHERE county = 'LINN' AND store_number IS NOT null
),
hierarchicalForecast AS (
    SELECT store_number, city, county, forecast_timestamp, forecast_value AS hierarchical_forecast_value
    FROM ML.FORECAST(
        MODEL `statmike-mlops-349915.applied_ml_forecasting.hierarchical_forecast`,
        STRUCT(60 AS horizon, 0.95 AS confidence_level)
    )
    WHERE county = 'LINN'
        AND city IS null
        AND forecast_timestamp >= (SELECT maxDate FROM getDate)
),
sumCity AS (
    SELECT forecast_timestamp, SUM(forecast_value) AS sum_hierarchical_forecast_value
    FROM ML.FORECAST(
        MODEL `statmike-mlops-349915.applied_ml_forecasting.hierarchical_forecast`,
        STRUCT(60 AS horizon, 0.95 AS confidence_level)
    )
    WHERE county = 'LINN'
        AND city IS NOT null
        AND store_number IS null
        AND forecast_timestamp >= (SELECT maxDate FROM getDate)
    GROUP BY forecast_timestamp
)
SELECT hf.*, ss.sum_hierarchical_forecast_value
FROM hierarchicalForecast hf
JOIN sumCity ss
on hf.forecast_timestamp = ss.forecast_timestamp
ORDER BY hf.forecast_timestamp
LIMIT 10

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,store_number,city,county,forecast_timestamp,hierarchical_forecast_value,sum_hierarchical_forecast_value
0,,,LINN,2025-02-26 00:00:00+00:00,5924.673178,5924.673178
1,,,LINN,2025-02-27 00:00:00+00:00,5900.450941,5900.450941
2,,,LINN,2025-02-28 00:00:00+00:00,5894.762931,5894.762931
3,,,LINN,2025-03-01 00:00:00+00:00,5889.370467,5889.370467
4,,,LINN,2025-03-02 00:00:00+00:00,5880.782748,5880.782748
5,,,LINN,2025-03-03 00:00:00+00:00,5859.47192,5859.47192
6,,,LINN,2025-03-04 00:00:00+00:00,5621.433112,5621.433112
7,,,LINN,2025-03-05 00:00:00+00:00,5820.958917,5820.958917
8,,,LINN,2025-03-06 00:00:00+00:00,5815.743691,5815.743691
9,,,LINN,2025-03-07 00:00:00+00:00,5821.161979,5821.161979


### Review **state** from: state > county > city > store_number

In [26]:
%%bigquery
WITH
getDate AS (
    SELECT MAX(forecast_timestamp) AS maxDate
    FROM ML.FORECAST(
        MODEL `statmike-mlops-349915.applied_ml_forecasting.hierarchical_forecast`,
        STRUCT(1 AS horizon, 0.95 AS confidence_level)
    )
    WHERE store_number IS NOT null
),
hierarchicalForecast AS (
    SELECT store_number, city, county, forecast_timestamp, forecast_value AS hierarchical_forecast_value
    FROM ML.FORECAST(
        MODEL `statmike-mlops-349915.applied_ml_forecasting.hierarchical_forecast`,
        STRUCT(60 AS horizon, 0.95 AS confidence_level)
    )
    WHERE county IS null
        AND forecast_timestamp >= (SELECT maxDate FROM getDate)
),
sumCounty AS (
    SELECT forecast_timestamp, SUM(forecast_value) AS sum_hierarchical_forecast_value
    FROM ML.FORECAST(
        MODEL `statmike-mlops-349915.applied_ml_forecasting.hierarchical_forecast`,
        STRUCT(60 AS horizon, 0.95 AS confidence_level)
    )
    WHERE county IS NOT null
        AND city IS null
        AND store_number IS null
        AND forecast_timestamp >= (SELECT maxDate FROM getDate)
    GROUP BY forecast_timestamp
)
SELECT hf.*, ss.sum_hierarchical_forecast_value
FROM hierarchicalForecast hf
JOIN sumCounty ss
on hf.forecast_timestamp = ss.forecast_timestamp
ORDER BY hf.forecast_timestamp
LIMIT 10

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,store_number,city,county,forecast_timestamp,hierarchical_forecast_value,sum_hierarchical_forecast_value
0,,,,2025-03-01 00:00:00+00:00,44553.67266,44553.67266
1,,,,2025-03-02 00:00:00+00:00,49360.761687,49360.761687
2,,,,2025-03-03 00:00:00+00:00,54797.748108,54797.748108
3,,,,2025-03-04 00:00:00+00:00,42102.678653,42102.678653
4,,,,2025-03-05 00:00:00+00:00,40320.447117,40320.447117
5,,,,2025-03-06 00:00:00+00:00,52924.571389,52924.571389
6,,,,2025-03-07 00:00:00+00:00,43418.883669,43418.883669
7,,,,2025-03-08 00:00:00+00:00,46727.577635,46727.577635
8,,,,2025-03-09 00:00:00+00:00,49675.367616,49675.367616
9,,,,2025-03-10 00:00:00+00:00,53676.853414,53676.853414


---
## Top-Down Hierarchical Forecasting - Custom Approach

As seen above, the built-in hierarchical forecasting is a bottom-up aggregration of base-level forecast values. 