# Forecasting with Snowflake Cortex ML-Based Functions

## Overview 

One of the most critical activities that a Data/Business Analyst has to perform is to produce recommendations to their business stakeholders based upon the insights they have gleaned from their data. In practice, this means that they are often required to build models to: make forecasts. However, Analysts are often impeded from creating the best models possible due to the depth of statistical and machine learning knowledge required to implement them in practice. Further, python or other programming frameworks may be unfamiliar to Analysts who write SQL, and the nuances of fine-tuning a model may require expert knowledge that may be out of reach. 

For these use cases, Snowflake has developed a set of SQL based ML Functions, that implement machine learning models on the user's behalf. As of December 2023, three ML Functions are available for time-series based data:

1. Forecasting: which enables users to forecast a metric based on past values. Common use-cases for forecasting including predicting future sales, demand for particular sku's of an item, or volume of traffic into a website over a period of time.

For further details on ML Functions, please refer to the [snowflake documentation](https://docs.snowflake.com/guides-overview-analysis). 

### Prerequisites
- Working knowledge of SQL
- A Snowflake account login with an ACCOUNTADMIN role. If not, you will need to use a different role that has the ability to create database, schema, table, stages, tasks, email integrations, and stored procedures. 

### What You’ll Learn 
- How to make use of Forecasting ML Function to create models 

This is a notebook written using Snowflake notebooks, to use outside of Snowflake, you will need to connect to the Snowflake instance (see my other notebooks for that)

In [None]:
# Import python packages
import streamlit as st
import pandas as pd

# We can also use Snowpark for our analyses!
from snowflake.snowpark.context import get_active_session
session = get_active_session()


Data for this comes from the Quickstart, go find it here: (https://quickstarts.snowflake.com/guide/ml_forecasting_ad/index.html?index=..%2F..index#1)


## Forecasting Demand for Lobster Mac & Cheese

We will start off by first building a forecasting model to predict the demand for Lobster Mac & Cheese in Vancouver.


### Step 1: Visualize Daily Sales on Snowsight

Before building our model, let's first visualize our data to get a feel for what daily sales looks like. Run the following sql command in your Snowsight UI, and toggle to the chart at the bottom.


In [None]:
-- query a sample of the ingested data
SELECT *
    FROM tasty_byte_sales
    WHERE menu_item_name LIKE 'Lobster Mac & Cheese';

We can plot the daily sales for the item Lobster Mac & Cheese going back all the way to 2014.

In [None]:

# TODO: CELL REFERENCE REPLACE
df = cells.cell2.to_pandas()
import altair as alt
chart = alt.Chart(df).mark_line().encode(
    x='DATE',
    y='TOTAL_SOLD'
).properties(
    width=700  # Set the width of the chart
)

chart

Observing the chart, one thing we can notice is that there appears to be a seasonal trend present for sales, on a yearly basis. This is an important consideration for building robust forecasting models, and we want to make sure that we feed in enough training data that represents one full cycle of the time series data we are modeling for. The forecasting ML function is smart enough to be able to automatically identify and handle multiple seasonality patterns, so we will go ahead and use the latest year's worth of data as input to our model. In the query below, we will also convert the date column using the `to_timestamp_ntz` function, so that it be used in the forecasting function. 

### Step 2: Creating our First Forecasting Model: Lobster Mac & Cheese

We can use SQL to directly call the forecasting ML function. Under the hood, the forecasting ML function automatically takes care of many of the data science best practices that are required to build good models. This includes performing hyper-parameter tuning, adjusting for missing data, and creating new features. We will build our first forecasting model below, for only the Lobster Mac & Cheese menu item. 

In [None]:
-- Create Table containing the latest years worth of sales data: 
CREATE OR REPLACE TABLE vancouver_sales AS (
    SELECT
        to_timestamp_ntz(date) as timestamp,
        primary_city,
        menu_item_name,
        total_sold
    FROM
        tasty_byte_sales
    WHERE
        date > (SELECT max(date) - interval '1 year' FROM tasty_byte_sales)
    GROUP BY
        all
);

Select * FROM vancouver_sales LIMIT 100;

In [None]:

-- Create view for lobster sales
CREATE OR REPLACE VIEW lobster_sales AS (
    SELECT
        timestamp,
        total_sold
    FROM
        vancouver_sales
    WHERE
        menu_item_name LIKE 'Lobster Mac & Cheese'
);


In [None]:
Select * FROM LOBSTER_SALES LIMIT 100

In [None]:

-- Build Forecasting model; this could take ~15-25 secs; please be patient
CREATE OR REPLACE SNOWFLAKE.ML.FORECAST lobstermac_forecast (
    INPUT_DATA => SYSTEM$REFERENCE('VIEW', 'lobster_sales'),
    TIMESTAMP_COLNAME => 'TIMESTAMP',
    TARGET_COLNAME => 'TOTAL_SOLD'
);

In [None]:
-- Show models to confirm training has completed
SHOW SNOWFLAKE.ML.FORECAST;
     

In the steps above, we create a view containing the relevant daily sales for our Lobster Mac & Cheese item, to which we pass to the forecast function. The last step should confirm that the model has been created, and ready to create predictions. 

In [None]:
-- Create predictions, and save results to a table:  
CALL lobstermac_forecast!FORECAST(FORECASTING_PERIODS => 10);

## Step 3: Creating and Visualizing Predictions

Let's now use our trained `lobstermac_forecast` model to create predictions for the demand for the next 10 days. 


In [None]:

-- Store the results of the cell above as a table
CREATE OR REPLACE TABLE macncheese_predictions AS (
    SELECT * FROM {{cell8}}
);

In [None]:

-- Visualize the results, overlaid on top of one another: 
SELECT
    timestamp,
    total_sold,
    NULL AS forecast
FROM
    lobster_sales
WHERE
    timestamp > '2023-03-01'
UNION
SELECT
    TS AS timestamp,
    NULL AS total_sold,
    forecast
FROM
    macncheese_predictions
ORDER BY
    timestamp asc;

In [None]:
import pandas as pd
df = cells.cell10.to_pandas()
df = pd.melt(df,id_vars=["TIMESTAMP"],value_vars=["TOTAL_SOLD","FORECAST"])
df = df.replace({"TOTAL_SOLD":"ACTUAL"})
df.columns = ["TIMESTAMP","TYPE", "AMOUNT SOLD"]

import altair as alt
alt.Chart(df).mark_line().encode(
    x = "TIMESTAMP",
    y = "AMOUNT SOLD",
    color = "TYPE"
).properties(
    width=800  # Set the width of the chart
)



There we have it! We just created our first set of predictions for the next 10 days worth of demand, which can be used to inform how much inventory of raw ingredients we may need. As shown from the above visualization, there seems to also be a weekly trend for the items sold, which the model was also able to pick up on. 

**Note:** You may notice that your chart has included the null being represented as 0's. Make sure to select the 'none' aggregation for each of columns as shown on the right hand side of the image above to reproduce the image. Additionally, your visualization may look different based on what version of the ML forecast function you call. The above image was created with **version 7.0**.


In [None]:
CALL lobstermac_forecast!show_evaluation_metrics();