# Snowflake Cortex Forecasting

# Documentation

> Note: This is a POC and is not production ready. This is a repository for the Cortex ML Function Forecasting. There are many updates that can be made to make this better and more robust for a production ready use case, but this will get you 80-90% of the way there for simple forecating use cases. This was developed in hopes that it would get you started on your forecasting journey. Then allow you to make adjustments to fit your specific use case.

## Overview

The `SnowflakeMLForecast` class is a flexible tool designed for creating, managing, and analyzing forecast models within Snowflake using the `CREATE SNOWFLAKE.ML.FORECAST` functionality. It allows users to define models, configure inputs, generate forecasts, and visualize results seamlessly.

## Features

- Dynamic Forecast Model Creation: Automatically generates SQL queries to create forecast models based on configuration files.
- Visualization: Integrates with both Streamlit and standard Python environments to display forecast results and key data aspects.
- Tag Management: Handles the creation of tags in Snowflake and ensures smooth operation even if tags already exist.
- Configurable: Supports YAML configuration files for easy setup and flexibility.
- Error Handling: Robust error handling and user feedback for a seamless experience.


## Installation

### Requirements

- Python 3.7+
- Snowflake Connector
- Pandas
- Altair (for visualizations)
- Streamlit (optional, for UI)


### Setup

1. Clone the Repository:

2. Install Dependencies:

3. Set Up Configuration:
   Create a YAML file with your configuration settings (explained below).


## Configuration

The `SnowflakeMLForecast` class relies on a YAML configuration file to define the input data, forecast settings, and other options. Below is an example configuration:

> This configuration is what is used in the storage example to be able to forecast your snowflake storage usage.

```yaml
model:
  name: my_forecast_model
  tags:
    environment: production
    team: data_science
  comment: "Forecast model for predicting trends."

input_data:
  table: storage_usage_train
  table_type: table  # Options: 'table', 'view'
  timestamp_column: usage_date
  target_column: storage_gb
  series_column: null  # Set to column name for multiple time series
  exogenous_columns: # Or [column1, column2] if thre are no columns it will use all columns in the view or table
    - column1
    - column2

forecast_config:
  training_days: 180
  forecast_days: 30
  config_object:
    on_error: skip
    evaluate: true
    evaluation_config:
      n_splits: 2 # Default is 2
      gap: 0 # Default is 0
      prediction_interval: 0.95

output:
  table: storage_forecast_results
```

## Usage

### Creating a SnowflakeMLForecast Instance


```python
# Step 1: Create a Forecast Model Instance

# Define your connection configuration
connection_config = {
    'user': 'your_user',
    'password': 'your_password',
    'account': 'your_account',
    'database': 'your_database',
    'warehouse': 'your_warehouse',
    'schema': 'your_schema',
    'role': 'your_role'
}

# Create an instance of SnowflakeMLForecast
forecast_model = SnowflakeMLForecast(
    config_file='path/to/your/config.yaml',
    connection_config=connection_config
)

# Step 2: Run Forecast and Visualize Results
forecast_data = forecast_model.create_and_run_forecast()
forecast_model.generate_forecast_and_visualization()

# Step 3: Clean Up
forecast_model.cleanup()
```


# Full Example

> See in Docs/ folder for two example of this in action. One is for storage and the other is for Taxi Pick up in NYC.

In [None]:
#| skip
from snowflake.snowpark.version import VERSION
from cortex_forecast.forecast import SnowflakeMLForecast
import os

## Create Snowflake Connection Using SnowflakeMLForecast


> Note: Make sure that you create a yaml file that you would like to so that the SnowflakeMLForecast can read the connection information from it and be able to build your forecast.

In [None]:
#| skip
forecast_model = SnowflakeMLForecast(
   config='./cortex_forecast/files/yaml/storage_forecast_config.yaml',
    connection_config={
        'user': os.getenv('SNOWFLAKE_USER'),
        'password': os.getenv('SNOWFLAKE_PASSWORD'),
        'account': os.getenv('SNOWFLAKE_ACCOUNT'),
        'database': 'CORTEX',
        'warehouse': 'CORTEX_WH',
        'schema': 'DEV',
        'role': 'CORTEX_USER_ROLE'  # Use the desired role
    },
    is_streamlit=False
)

snowflake_environment = forecast_model.session.sql('SELECT current_user(), current_version()').collect()
snowpark_version = VERSION
print('\nConnection Established with the following parameters:')
print('Snowflake version           : {}'.format(snowflake_environment[0][1]))
print('Snowpark for Python version : {}.{}.{}'.format(snowpark_version[0], snowpark_version[1], snowpark_version[2]))

In [None]:
#| skip
# Create Training Data
training_days = 365

forecast_model.session.sql(f'''CREATE OR REPLACE TABLE storage_usage_train AS
    SELECT 
        TO_TIMESTAMP_NTZ(usage_date) AS usage_date,
        storage_bytes / POWER(1024, 3) AS storage_gb
    FROM 
    (
        SELECT * 
            FROM snowflake.account_usage.storage_usage
            WHERE usage_date < CURRENT_DATE()
    )
    WHERE TO_TIMESTAMP_NTZ(usage_date) > DATEADD(day, -{training_days}, CURRENT_DATE())
''').collect()
forecast_model.session.sql('SELECT * FROM storage_usage_train ORDER BY usage_date DESC LIMIT 10').show()

In [None]:
#| skip
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
#| skip
df = forecast_model.session.sql('SELECT * FROM storage_usage_train ORDER BY usage_date').to_pandas()
df.head()
df = df.set_index('USAGE_DATE')
df['STORAGE_GB'].plot(figsize=(10, 6), title='Storage GB Over Time')

# Show the plot
plt.xlabel('Date')
plt.ylabel('Storage GB')
plt.grid(True)
plt.show()


### Train a Model

> This will use what is inside of the yaml file that you created that you passed over to the SnowflakeMLForecast object


In [None]:
#| skip
# Run Forecast
forecast_data = forecast_model.create_and_run_forecast()
forecast_data.head()

### Visualize Forecast

In [None]:
forecast_model.generate_forecast_and_visualization(show_historical=True)