# Module 3: Scheduling batch transformations with dbt, Airflow, and Feast

## 1. Overview
In this notebook, we see how to use dbt to automatically run batch transformations with Airflow, and run materialization once dbt has run its incremental model

<img src="../architecture.png" width="750"/>

# 2. Setup the feature store

In [8]:
%env SNOWFLAKE_DEPLOYMENT_URL="[YOUR DEPLOYMENT]"
%env SNOWFLAKE_USER="[YOUR USER]"
%env SNOWFLAKE_PASSWORD="[YOUR PASSWORD]"
%env SNOWFLAKE_ROLE="[YOUR ROLE]"
%env SNOWFLAKE_WAREHOUSE="[YOUR WAREHOUSE]"
%env SNOWFLAKE_DATABASE="[YOUR DATABASE]"
%env USAGE=False

env: SNOWFLAKE_DEPLOYMENT_URL="[YOUR DEPLOYMENT]"
env: SNOWFLAKE_USER="[YOUR USER]"
env: SNOWFLAKE_PASSWORD="[YOUR PASSWORD]"
env: SNOWFLAKE_ROLE="[YOUR ROLE]"
env: SNOWFLAKE_WAREHOUSE="[YOUR WAREHOUSE]"
env: SNOWFLAKE_DATABASE="[YOUR DATABASE]"
env: USAGE=False


In [2]:
from feast import FeatureStore
from datetime import datetime

store = FeatureStore(repo_path=".")

  from requests.packages.urllib3.contrib.pyopenssl \


### Fetch training data from offline store
Just to verify the features are in the batch sources.

In [3]:
entity_sql = f"""
    SELECT
        NAMEORIG as USER_ID,
        TIMESTAMP as "event_timestamp"
    FROM {store.get_data_source("transactions_source").get_table_query_string()}
    WHERE TIMESTAMP BETWEEN '2021-07-14' and '2021-07-16'
"""
training_df = store.get_historical_features(
    entity_df=entity_sql,
    features=store.get_feature_service("model_v2"),
).to_df()
print(training_df.head(20))

        USER_ID            event_timestamp CREDIT_SCORE     7D_AVG_AMT
0   C1619346615 2021-07-14 03:41:33.973401          637  146595.646667
1   C1894613709 2021-07-14 14:54:24.478003          573   72593.000000
2    C938481695 2021-07-14 03:37:41.862083          550   81486.760000
3   C1539734700 2021-07-14 03:38:02.177158          675   79342.800000
4    C545219707 2021-07-15 03:50:16.217478          709  145267.880000
5   C1248916744 2021-07-14 10:16:12.381417          690  403257.450000
6    C715411011 2021-07-14 10:49:44.439814          670  229318.560000
7   C1979950617 2021-07-14 14:58:14.942241          704    4010.840000
8    C145981125 2021-07-15 03:49:26.870783          724    8006.030000
9   C1017336142 2021-07-14 14:55:05.654883          550   78008.880000
10   C939911592 2021-07-14 18:32:48.978282          720  159968.610000
11     C6106605 2021-07-14 10:13:32.477179          645     474.280000
12  C1498664405 2021-07-14 06:02:44.138693          550  133530.183333
13   C

## Materialize batch features & fetch online features from Redis
We didn't materialize the full set of data with Airflow to save time / money. Now we selectively materialize so we can fetch the right online data.

In [4]:
!feast materialize 2021-07-14 2021-07-16

  from requests.packages.urllib3.contrib.pyopenssl \
Materializing [1m[32m2[0m feature views from [1m[32m2021-07-13 20:00:00-04:00[0m to [1m[32m2021-07-15 20:00:00-04:00[0m into the [1m[32mredis[0m online store.

[1m[32mcredit_scores_features[0m:
100%|████████████████████████████████████████████████████| 654482/654482 [00:37<00:00, 17375.42it/s]
[1m[32maggregate_transactions_features[0m:
100%|██████████████████████████████████████████████████████| 54991/54991 [00:03<00:00, 17412.26it/s]


#### SDK based online retrieval
Now we can retrieve these materialized features from Redis by directly using the SDK. This is one of the most popular ways to retrieve features with Feast since it allows you to integrate with an existing service (e.g. a Flask) that also handles model inference or pre/post-processing

In [5]:
features = store.get_online_features(
    features=store.get_feature_service("model_v2"),
    entity_rows=[
        {
            "USER_ID": "C1835422371",
        }
    ],
).to_dict()

def print_online_features(features):
    for key, value in sorted(features.items()):
        print(key, " : ", value)

print_online_features(features)

7D_AVG_AMT  :  [332090.0]
CREDIT_SCORE  :  [680]
USER_ID  :  ['C1835422371']


#### HTTP based online retrieval
We can also retrieve from a deployed feature server. We had previously deployed this with Docker Compose (see [docker-compose.yml](../docker-compose.yml))

This can be preferable for many reasons. If you want to build an in-memory cache, caching on a central feature server can allow more effective caching across teams. You can also more centrally manage rate-limiting / access control, upgrade Feast versions independently, etc.

In [7]:
import requests
import json

online_request = {
  "feature_service": "model_v2",
  "entities": {
    "USER_ID": ["C1570470538"]
  }
}
r = requests.post('http://localhost:6566/get-online-features', data=json.dumps(online_request))
print(json.dumps(r.json(), indent=4, sort_keys=True))

{
    "metadata": {
        "feature_names": [
            "USER_ID",
            "CREDIT_SCORE",
            "7D_AVG_AMT"
        ]
    },
    "results": [
        {
            "event_timestamps": [
                "1970-01-01T00:00:00Z"
            ],
            "statuses": [
                "PRESENT"
            ],
            "values": [
                "C1570470538"
            ]
        },
        {
            "event_timestamps": [
                "1970-01-01T00:00:00Z"
            ],
            "statuses": [
                "PRESENT"
            ],
            "values": [
                null
            ]
        },
        {
            "event_timestamps": [
                "2021-07-14T09:54:26Z"
            ],
            "statuses": [
                "PRESENT"
            ],
            "values": [
                32451.919921875
            ]
        }
    ]
}
