# Feast Batch Serving
This is an extension to `feast-quickstart` notebook to demonstrate the batch serving capability of Feast.

## Prerequisite
- A running Feast Serving service with store configuration that supports batch retrieval. (eg. BigQuery store)

## Data Preparation


In [1]:
import feast
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
from feast.serving.ServingService_pb2 import GetOnlineFeaturesRequest
from feast.types.Value_pb2 import Value as Value
from feast.client import Client
from feast.feature_set import FeatureSet

In [2]:
client = feast.Client(core_url="core:6565", serving_url="batch-serving:6567")

In [3]:
cust_trans_fs = FeatureSet.from_yaml("../features/cust_trans_fs.yaml")

In [4]:
client.apply(cust_trans_fs)

Feature set updated/created: "customer_transactions:1".


In [5]:
offset = 10000
nr_of_customers = 5
customer_df = pd.DataFrame(
    {
        "datetime": [datetime.utcnow() for _ in range(nr_of_customers)],
        "customer_id": [offset + inc for inc in range(nr_of_customers)],
        "daily_transactions": [np.random.uniform(0, 10) for _ in range(nr_of_customers)],
        "total_transactions": [np.random.uniform(100, 200) for _ in range(nr_of_customers)],
    }
)
customer_df

Unnamed: 0,datetime,customer_id,daily_transactions,total_transactions
0,2019-12-06 02:17:46.899904,10000,2.797627,175.978266
1,2019-12-06 02:17:46.899915,10001,4.931632,153.871975
2,2019-12-06 02:17:46.899922,10002,0.206628,108.558844
3,2019-12-06 02:17:46.899929,10003,2.354937,119.549455
4,2019-12-06 02:17:46.899937,10004,7.171423,115.345183


In [6]:
client.ingest(cust_trans_fs, dataframe=customer_df)

100%|██████████| 5/5 [00:00<00:00,  7.24rows/s]


Ingested 5 rows into customer_transactions:1





## Batch Retrieval
Batch retrieval takes a dataframe containing the entities column and event timestamp as an input. The result would be the outer join of the input and the features. The input dataframe needs to have a column named `datetime` as event timestamp. No results will be returned if the difference between the feature ingestion timestamp and the `event_timestamp` is greater than the `maxAge` parameter specified in the feature set.

In [7]:
entity_df = customer_df[["customer_id"]].assign(datetime=datetime.utcnow())
feature_ids=[
    "customer_transactions:1:daily_transactions",
    "customer_transactions:1:total_transactions",
]
batch_job = client.get_batch_features(feature_ids, entity_df)
batch_job.to_dataframe()

Unnamed: 0,customer_transactions_v1_feature_timestamp,customer_id,event_timestamp,customer_transactions_v1_daily_transactions,customer_transactions_v1_total_transactions
0,2019-12-06 02:17:46+00:00,10001,2019-12-06 02:17:55.612449+00:00,4.931632,153.87198
1,2019-12-06 02:17:46+00:00,10004,2019-12-06 02:17:55.612449+00:00,7.171423,115.345184
2,2019-12-06 02:17:46+00:00,10000,2019-12-06 02:17:55.612449+00:00,2.797627,175.97827
3,2019-12-06 02:17:46+00:00,10002,2019-12-06 02:17:55.612449+00:00,0.206628,108.558846
4,2019-12-06 02:17:46+00:00,10003,2019-12-06 02:17:55.612449+00:00,2.354937,119.54945


In [8]:
stale_entity_df = customer_df[["customer_id"]].assign(datetime=datetime.utcnow() + timedelta(days=30))
feature_ids=[
    "customer_transactions:1:daily_transactions",
    "customer_transactions:1:total_transactions",
]
batch_job = client.get_batch_features(feature_ids, stale_entity_df)
batch_job.to_dataframe()

Unnamed: 0,customer_transactions_v1_feature_timestamp,customer_id,event_timestamp,customer_transactions_v1_daily_transactions,customer_transactions_v1_total_transactions
0,,10000,2020-01-05 02:18:43.900732+00:00,,
1,,10001,2020-01-05 02:18:43.900732+00:00,,
2,,10002,2020-01-05 02:18:43.900732+00:00,,
3,,10003,2020-01-05 02:18:43.900732+00:00,,
4,,10004,2020-01-05 02:18:43.900732+00:00,,
