# Source data setup

Before you begin this lab, please make sure you've gone through this [additional README](test_integration/README.md).

This demo uses a free dataset available on Snowflake Marketplace called ['United States Retail Foot Traffic Data'](https://app.snowflake.com/marketplace/listing/GZT1ZVTYF7/constellation-network-united-states-retail-foot-traffic-data?search=us%20foot%20traffic).
Please ensure that you have installed that dataset within your account before proceeding.

Also ensure that you have appropriate grants on the Database and Schema to be able to create permanent and/or temporary tables in them.

# The Scenario

In this demo we will try to build an anomaly dectection model using a sample timeseries from a Marketplace provider called ['United States Retail Foot Traffic Data'](https://app.snowflake.com/marketplace/listing/GZT1ZVTYF7/constellation-network-united-states-retail-foot-traffic-data?search=us%20foot%20traffic). This timeseries represents average footfall in different states across US with regions of interest to retail stores. 

Our aim would be to understand if there's any abnormal footfall behaviour across the state of CA (focussing on single series for this demo) which could need more careful analysis/epxloration.

The data we receive is actually in JSON and therefore will need some pre-processing before it can be supplied to SNOWFLAKE.ML.ANOMALY_DETECTION procedure to train a model on. The model that gets trained on this pre-processing data, can then be used to identify anomalies over a test dataset.

## Application structure
This entire application is written using Snowpark API and can be run from the client side orchestration tooling. All the data prep transformations, model training & inferencing are written into Python modules respectively.

```
 test_integration/
 |--foot_traffic_data_prep.py
 |--foot_traffic_anomaly_detection.py
```

To this set we'll add some basic tests using the Pytest framework to test data prep transformers locally, and then the model training & inferencing on Snowflake (Anomaly Detection function can be run on Snowflake). The complete code source base looks like:

```
 test_integration/
 |--foot_traffic_data_prep.py
 |--foot_traffic_anomaly_detection.py
 |--conftest.py
 |--test_foot_traffic_data_prep.py
 |--test_foot_traffic_anomaly_detection.py
 ```

 ## Local Testing Framework

We can use the Local testing framework to quickly validate these data prep transformers without necessarily spending compute to test againt Snowflake. We'll use a representative set of JSON records based on the actual data sample from the chosen provider.

Local testing framework provides a new config parameter called 'local_testing' while creating a new Session. All the supported APIs can be tested locally as they're used within transforming functions. For those that aren't supported you could write mock patch to simulate the behaviour of that API. 

This test suite is programmed for the following:
- [x] Local test configuration
- [x] Fixture for Snowpark Session
- [x] Fixture for sample data creation
- [x] Skip tests

In [1]:
%%bash

cat test_integration/conftest.py

def pytest_addoption(parser):
    parser.addoption("--snowflake-session", action="store", default="live", help="--snowflake-session [local|live]") ##live represents Snowflake connection


In [27]:
%%bash 
pytest test_integration/ --disable-warnings --snowflake-session local 

platform darwin -- Python 3.11.9, pytest-7.4.4, pluggy-1.0.0
rootdir: /Users/hayan/Documents/GitHub/snowpark-best-practices-v2-main
plugins: anyio-4.2.0
collected 3 items

test_integration/test_foot_traffic_anomaly_detection.py [33ms[0m[33m                [ 33%][0m
test_integration/test_foot_traffic_data_prep.py [32m.[0m[32m.[0m[33m                       [100%][0m



In [24]:
%%bash

## Running against live session 
## Live testing helps identify potential issues by running on a large sample of data on Snowflake but also make sure any objects that need to be created, like through a Stored Procedure, are created as expected.
pytest test_integration/ --disable-warnings --snowflake-session live

platform darwin -- Python 3.11.9, pytest-7.4.4, pluggy-1.0.0
rootdir: /Users/hayan/Documents/GitHub/snowpark-best-practices-v2-main
plugins: anyio-4.2.0
collected 3 items

test_integration/test_foot_traffic_anomaly_detection.py [32m.[0m[33m                [ 33%][0m
test_integration/test_foot_traffic_data_prep.py [32m.[0m[32m.[0m[33m                       [100%][0m

