# Module 3: Scheduling batch transformations with dbt, Airflow, and Feast

## 1. Overview
In this notebook, we see how to use dbt to automatically run batch transformations with Airflow, and run materialization once dbt has run its incremental model

<img src="../architecture.png" width="750"/>

# 2. Setup the feature store

### Apply feature repository
We first run `feast apply` to register the data sources + features and setup Redis.

In [26]:
%env SNOWFLAKE_DEPLOYMENT_URL="[YOUR DEPLOYMENT]"
%env SNOWFLAKE_USER="[YOUR USER]"
%env SNOWFLAKE_PASSWORD="[YOUR PASSWORD]"
%env SNOWFLAKE_ROLE="[YOUR ROLE]"
%env SNOWFLAKE_WAREHOUSE="[YOUR WAREHOUSE]"
%env SNOWFLAKE_DATABASE="[YOUR DATABASE]"
%env USAGE=False

env: SNOWFLAKE_DEPLOYMENT_URL="[YOUR DEPLOYMENT]"
env: SNOWFLAKE_USER="[YOUR USER]"
env: SNOWFLAKE_PASSWORD="[YOUR PASSWORD]"
env: SNOWFLAKE_ROLE="[YOUR ROLE]"
env: SNOWFLAKE_WAREHOUSE="[YOUR WAREHOUSE]"
env: SNOWFLAKE_DATABASE="[YOUR DATABASE]"
env: USAGE=False


In [3]:
!feast apply

  from requests.packages.urllib3.contrib.pyopenssl \
  CREDIT_SCORE
0          905
object
Created feature service [1m[32mmodel_v1[0m

Deploying infrastructure for [1m[32mcredit_scores_features[0m
Deploying infrastructure for [1m[32maggregate_transactions_features[0m


In [4]:
from feast import FeatureStore
from datetime import datetime

store = FeatureStore(repo_path=".")

  from requests.packages.urllib3.contrib.pyopenssl \


### Fetch training data from offline store
Just to verify the features are in the batch sources.

In [20]:
entity_sql = f"""
    SELECT
        NAMEORIG as USER_ID,
        TIMESTAMP as "event_timestamp"
    FROM {store.get_data_source("transactions_source").get_table_query_string()}
    WHERE TIMESTAMP BETWEEN '2021-07-14' and '2021-07-16'
"""
training_df = store.get_historical_features(
    entity_df=entity_sql,
    features=store.get_feature_service("model_v2"),
).to_df()
print(training_df.head(20))

        USER_ID            event_timestamp CREDIT_SCORE     7D_AVG_AMT
0    C249180629 2021-07-14 09:58:08.149794          645   50057.237500
1   C1280683177 2021-07-14 13:53:44.646282          678  509711.838571
2   C2110692114 2021-07-14 09:37:42.499745          627  107384.993333
3   C2028855118 2021-07-15 08:32:44.047911          694   91279.846000
4   C1098256092 2021-07-14 21:19:47.547929          653  203728.274286
5    C151864295 2021-07-15 12:02:10.888576          602  152704.650000
6    C453965153 2021-07-14 07:45:56.422663          608   55838.750000
7    C453965153 2021-07-14 08:30:18.698001          608   56424.637143
8   C1538941588 2021-07-14 04:46:01.147905          664   53359.743333
9   C2088453634 2021-07-14 12:32:55.807549          630  142781.994286
10   C938678606 2021-07-14 20:23:41.073121          735  200350.285714
11  C1090163421 2021-07-14 13:29:22.567975          708  140612.327143
12  C1664422545 2021-07-14 21:59:44.553418          649  190196.110000
13   C

### 4. Materialize batch features & fetch online features from Redis
First we materialize features (which generate the latest values for each entity key from batch sources) into the online store (Redis)

In [19]:
!feast materialize 2021-07-14 2021-07-16

  from requests.packages.urllib3.contrib.pyopenssl \
Materializing [1m[32m2[0m feature views from [1m[32m2021-07-13 20:00:00-04:00[0m to [1m[32m2021-07-15 20:00:00-04:00[0m into the [1m[32mredis[0m online store.

[1m[32mcredit_scores_features[0m:
100%|████████████████████████████████████████████████████| 654482/654482 [00:23<00:00, 27441.97it/s]
[1m[32maggregate_transactions_features[0m:
100%|██████████████████████████████████████████████████████| 54991/54991 [00:02<00:00, 18921.10it/s]


Feast manages what time intervals have been materialized in the registry. So if you schedule regular materialization every hour, you can run `feast materialize-incremental` and Feast will know that all the previous hours were already processed.

#### SDK based online retrieval
Now we can retrieve these materialized features from Redis by directly using the SDK. This is one of the most popular ways to retrieve features with Feast since it allows you to integrate with an existing service (e.g. a Flask) that also handles model inference or pre/post-processing

In [25]:
features = store.get_online_features(
    features=store.get_feature_service("model_v2"),
    entity_rows=[
        {
            "USER_ID": "C1835422371",
        }
    ],
).to_dict()

def print_online_features(features):
    for key, value in sorted(features.items()):
        print(key, " : ", value)

print_online_features(features)

7D_AVG_AMT  :  [298976.46875]
CREDIT_SCORE  :  [680]
USER_ID  :  ['C1835422371']


#### HTTP based online retrieval
We can also retrieve from a deployed feature server. We had previously deployed this with Docker Compose (see [docker-compose.yml](../docker-compose.yml))

This can be preferable for many reasons. If you want to build an in-memory cache, caching on a central feature server can allow more effective caching across teams. You can also more centrally manage rate-limiting / access control, upgrade Feast versions independently, etc.

In [19]:
import requests
import json

online_request = {
  "feature_service": "model_v1",
  "entities": {
    "USER_ID": ["C1570470538"]
  }
}
r = requests.post('http://localhost:6566/get-online-features', data=json.dumps(online_request))
print(json.dumps(r.json(), indent=4, sort_keys=True))

{
    "metadata": {
        "feature_names": [
            "USER_ID",
            "CREDIT_SCORE"
        ]
    },
    "results": [
        {
            "event_timestamps": [
                "1970-01-01T00:00:00Z"
            ],
            "statuses": [
                "PRESENT"
            ],
            "values": [
                "C1570470538"
            ]
        },
        {
            "event_timestamps": [
                "2021-07-13T22:32:18Z"
            ],
            "statuses": [
                "PRESENT"
            ],
            "values": [
                570
            ]
        }
    ]
}
