# Feast Basic CustomerTransactions Example

This is a minimal example of using Feast. In this example we will
1. Create a synthetic customer feature dataset
1. Register a feature set to represent these features in Feast
1. Ingest these features into Feast
1. Create a feature query and retrieve historical feature data

### 1. Install Feast and dependencies

If you are using `gcr.io/konnekt-core/jupyter-feast`, it should already be available in the `feast-venv` kernel.

Otherwise use the following command (`#ask-konnekt` for pypi credentials if you don't have any).

In [None]:
!pip install --quiet --upgrade pip pandas numpy protobuf \
  && pip install -i "https://pypi.prod.konnekt.us/simple/" --extra-index-url https://pypi.org/simple/ "feast==0.4.3.dev68+g1feb425.d20200108"

### 2. Import necessary modules

In [None]:
import pandas as pd
import numpy as np
from pytz import utc
from feast import Client, FeatureSet, Entity, ValueType
from google.protobuf.duration_pb2 import Duration
from datetime import datetime, timedelta
from random import randrange, randint
import os

### 3. Configure Feast services and connect the Feast client

If you have injected the environment variables into your notebook, you can use the follow code. Otherwise configure as necessary.

Note, we will not be using online serving for the time being.

In [None]:
CORE_URL = os.getenv('FEAST_CORE_URL')
BATCH_SERVING_URL = os.getenv('FEAST_BATCH_SERVING_URL')
ONLINE_SERVING_URL = os.getenv('FEAST_ONLINE_SERVING_URL')
print(CORE_URL, BATCH_SERVING_URL, ONLINE_SERVING_URL)

In [None]:
PROJECT = 'proj_' + str(randint(1000, 9999))
print(PROJECT)
client = Client(core_url=CORE_URL, serving_url=BATCH_SERVING_URL, project=PROJECT)
if PROJECT not in client.list_projects():
    client.create_project(PROJECT)

### 4. Create synthetic customer features

Here we will create customer features for 5 customers over a month. Each customer will have a set of features for every day.

In [None]:
days = [datetime.utcnow().replace(hour=0, minute=0, second=0, microsecond=0).replace(tzinfo=utc) \
        - timedelta(day) for day in range(31)]

customers = [1001, 1002, 1003, 1004, 1005]

In [None]:
customer_features = pd.DataFrame(
    {
        "datetime": [day for day in days for customer in customers],
        "customer_id": [customer for day in days for customer in customers],
        "daily_transactions": [np.random.rand() * 10 for _ in range(len(days) * len(customers))],
        "total_transactions": [np.random.randint(100) for _ in range(len(days) * len(customers))],
    }
)

print(customer_features.head(10))

### 5. Create feature set for customer features

Now we will create a feature set for these features. Feature sets are essentially a schema that represent
feature values. Feature sets allow Feast to both identify feature values and their structure. 

In this case we need to define any entity columns as well as the maximum age. The entity column in this case is "customer_id". Max age is set to 1 day (defined in seconds). This means that for each feature query during retrieval, the serving API will only retrieve features up to a maximum of 1 day per provided timestamp and entity combination. 

In [None]:
customer_fs = FeatureSet(
    "customer_transactions",
    max_age=Duration(seconds=86400),
    entities=[Entity(name='customer_id', dtype=ValueType.INT64)]
)

Here we are automatically inferring the schema from the provided dataset

In [None]:
customer_fs.infer_fields_from_df(customer_features, replace_existing_features=True)

### 6. Register feature set with Feast Core

The apply() method will register the provided feature set with Feast core, allowing users to retrieve features from this feature set

In [None]:
client.apply(customer_fs)

We test the retrieval of this feature set object (not its data), to ensure that we have the latest version

In [None]:
customer_fs = client.get_feature_set("customer_transactions")
print(customer_fs)

### 7. Ingest data into Feast for a feature set

In [None]:
client.ingest("customer_transactions", customer_features)

### 8. Create a batch retrieval query

In order to retrieve historical feature data, the user must provide an entity_rows dataframe. This dataframe contains a combination of timestamps and entities. In this case, the user must provide both customer_ids and timestamps. 

We will randomly generate timestamps over the last 30 days, and assign customer_ids to them. When these entity rows are sent to the Feast Serving API to retrieve feature values, along with a list of feature ids, Feast is then able to attach the correct feature values to each entity row. The one exception is if the feature values fall outside of the maximum age window.

In [None]:
event_timestamps = [datetime.utcnow().replace(tzinfo=utc) - timedelta(days=randrange(15), hours=randrange(24), minutes=randrange(60)) for day in range(30)]

entity_rows = pd.DataFrame(
    {
        "datetime": event_timestamps,
        "customer_id": [customers[idx % len(customers)] for idx in range(len(event_timestamps))],
    }
)

print(entity_rows.head(10))

### 9. Retrieve historical/batch features

In [None]:
job = client.get_batch_features(
                            feature_refs=[
                                f"{PROJECT}/daily_transactions:{customer_fs.version}",
                                f"{PROJECT}/total_transactions:{customer_fs.version}"
                               ],
                            entity_rows=entity_rows,
                            default_project=PROJECT
                         )
df = job.to_dataframe()
print(df.head(10))