<center><img src=https://raw.githubusercontent.com/feast-dev/feast/master/docs/assets/feast_logo.png width=400/></center>

# Deploying the Feature Store

### Introduction

Feast enables AI/ML teams to serve (and consume) features via feature stores. In this notebook, we will configure the feature stores and feature definitions, and deploy a Feast feature store server. We will also materialize (move) data from the offline store to the online store.

In Feast, offline stores support pulling large amounts of data for model training using tools like Redshift, Snowflake, Bigquery, and Spark. In contrast, the focus of Feast online stores is feature serving in support of model inference, using tools like Redis, Snowflake, PostgreSQL, and SQLite.

In this notebook, we will setup a file-based (Dask) offline store and SQLite online store. The online store will be made available through the Feast server.

This notebook assumes that you have prepared the data by running the notebook [01_Credit_Risk_Data_Prep.ipynb](01_Credit_Risk_Data_Prep.ipynb). 

### Setup

*The following code assumes that you have read the example README.md file, and that you have setup an environment where the code can be run. Please make sure you have addressed the prerequisite needs.*

In [1]:
# Imports
import re
import sys
import time
import signal
import sqlite3
import subprocess
import datetime as dt
from feast import FeatureStore

### Feast Feature Store Configuration

For model training, we usually don't need (or want) a constantly running feature server. All we need is the ability to efficiently query and pull all of the training data at training time. In contrast, during model serving we need servers that are always ready to supply feature records in response to application requests. 

This training-serving dichotomy is reflected in Feast using "offline" and "online" stores. Offline stores are configured to work with database technologies typically used for training, while online stores are configured to use storage and streaming technologies that are popular for feature serving.

We need to create a `feature_store.yaml` config file to tell feast the structure we want in our offline and online feature stores. Below, we write the configuration for a local "Dask" offline store and local SQLite online store. We give the feature store a project name of `loan_applications`, and provider `local`. The registry is where the feature store will keep track of feature definitions and online store updates; we choose a file location in this case.

See the [feature_store.yaml](https://docs.feast.dev/reference/feature-repository/feature-store-yaml) documentation for further details. 

In [2]:
%%writefile Feature_Store/feature_store.yaml

project: loan_applications
registry: data/registry.db
provider: local
offline_store:
    type: dask
online_store:
    type: sqlite
    path: data/online_store.db
entity_key_serialization_version: 2

Writing Feature_Store/feature_store.yaml


### Feature Definitions

We also need to create feature definitions and other feature constructs in a python file, which we name `feature_definitions.py`. For our purposes, we define the following:

- Data Source: connections to data storage or data-producing endpoints
- Entity: primary key fields which can be used for joining data
- FeatureView: collections of features from a data source
- FeatureService: collection of FeatureViews

For more information on these, see the [Concepts](https://docs.feast.dev/getting-started/concepts) section of the Feast documentation.

In [3]:
%%writefile Feature_Store/feature_definitions.py

# Imports
import os
from pathlib import Path
from feast import (
    FileSource,
    Entity,
    FeatureView,
    Field,
    FeatureService
)
from feast.types import Float32
from feast.data_format import ParquetFormat

CURRENT_DIR = os.path.abspath(os.curdir)

# Data Sources
# A data source tells Feast where the data lives
data_a = FileSource(
    file_format=ParquetFormat(),
    path=Path(CURRENT_DIR,"data/train_a.parquet").as_uri()
)
data_b = FileSource(
    file_format=ParquetFormat(),
    path=Path(CURRENT_DIR,"data/train_b.parquet").as_uri()
)
data_test = FileSource(
    file_format=ParquetFormat(),
    path=Path(CURRENT_DIR,"data/test.parquet").as_uri()
)   

# Entity
# An entity tells Feast the column it can use to join tables
loan_id = Entity(
    name = "loan_id",
    join_keys = ["ID"]
)

# Feature views
# A feature view is how Feast groups features
features_a = FeatureView(
    name="data_a",
    entities=[loan_id],
    schema=[
        Field(name="duration", dtype=Float32),
        Field(name="credit_amount", dtype=Float32),
        Field(name="installment_commitment", dtype=Float32),
        Field(name="checking_status_ord", dtype=Float32)
    ],
    source=data_a
)
features_b = FeatureView(
    name="data_b",
    entities=[loan_id],
    schema=[
        Field(name="residence_since", dtype=Float32),
        Field(name="age", dtype=Float32),
        Field(name="existing_credits", dtype=Float32),
        Field(name="num_dependents", dtype=Float32),
        Field(name="housing_ord", dtype=Float32)
    ],
    source=data_b
)
features_test = FeatureView(
    name="data_test",
    entities=[loan_id],
    schema=[
        Field(name="duration", dtype=Float32),
        Field(name="credit_amount", dtype=Float32),
        Field(name="installment_commitment", dtype=Float32),
        Field(name="checking_status_ord", dtype=Float32),
        Field(name="residence_since", dtype=Float32),
        Field(name="age",dtype=Float32),
        Field(name="existing_credits", dtype=Float32),
        Field(name="num_dependents", dtype=Float32),
        Field(name="housing_ord", dtype=Float32)
    ],
    source=data_test    
)

# Feature Service
# a feature service in Feast represents a logical group of features
loan_fs = FeatureService(
    name="loan_fs",
    features=[features_a, features_b]
)

Writing Feature_Store/feature_definitions.py


### Applying the Configuration and Definitions

Now that we have our feature store configuration (`feature_store.yaml`) and feature definitions (`feature_definitions.py`), we are ready to "apply" them. The `feast apply` command creates a registry file (`Feature_Store/data/registry.db`) and sets up data connections; in this case, it creates a SQLite database (`Feature_Store/data/online_store.db`).

In [4]:
# Run 'feast apply' in the Feature_Store directory
!feast --chdir ./Feature_Store apply

Created entity [1m[32mloan_id[0m
Created feature view [1m[32mdata_b[0m
Created feature view [1m[32mdata_test[0m
Created feature view [1m[32mdata_a[0m
Created feature service [1m[32mloan_fs[0m

Created sqlite table [1m[32mloan_applications_data_a[0m
Created sqlite table [1m[32mloan_applications_data_b[0m
Created sqlite table [1m[32mloan_applications_data_test[0m



In [5]:
# List the Feature_Store/data/ directory to see newly created files
!ls -nlh Feature_Store/data/

total 256
-rw-r--r--  1 501  20    40K Oct 19 09:21 online_store.db
-rw-r--r--  1 501  20   3.4K Oct 19 09:21 registry.db
-rw-r--r--  1 501  20    13K Oct 18 17:41 test.parquet
-rw-r--r--  1 501  20   5.8K Oct 18 17:41 test_y.parquet
-rw-r--r--  1 501  20    22K Oct 18 17:41 train_a.parquet
-rw-r--r--  1 501  20    19K Oct 18 17:41 train_b.parquet
-rw-r--r--  1 501  20    14K Oct 18 17:41 train_y.parquet


Note that while `feast apply` set up the `sqlite` online database, `online_store.db`, no data has been added to the online database as of yet. We can verify this by connecting with the `sqlite3` library.

In [6]:
# Connect to sqlite database
conn = sqlite3.connect("Feature_Store/data/online_store.db")
cursor = conn.cursor()
# Query table data (3 tables)
print(
    "Online Store Tables:           ",
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table';").fetchall()
)
print(
    "loan_applications_data_a data: ",
    cursor.execute("SELECT * FROM loan_applications_data_a").fetchall()
)
print(
    "loan_applications_data_b data: ",
    cursor.execute("SELECT * FROM loan_applications_data_b").fetchall()
)
conn.close()

Online Store Tables:            [('loan_applications_data_a',), ('loan_applications_data_b',), ('loan_applications_data_test',)]
loan_applications_data_a data:  []
loan_applications_data_b data:  []


Since we have used `feast apply` to create the registry, we can now use the Feast Python SDK to interact with our new feature store. To see other possible commands see the [Feast Python SDK documentation](https://rtd.feast.dev/en/master/).

In [7]:
# Get feature store config
store = FeatureStore(repo_path="./Feature_Store")
store.config

RepoConfig(project='loan_applications', provider='local', registry_config='data/registry.db', online_config={'type': 'sqlite', 'path': 'data/online_store.db'}, offline_config={'type': 'dask'}, batch_engine_config='local', feature_server=None, flags=None, repo_path=PosixPath('Feature_Store'), entity_key_serialization_version=2, coerce_tz_aware=True)

In [8]:
# List feature views
feature_views = store.list_batch_feature_views()
for fv in feature_views:
    print(f"Feature view: {fv.name}  |  Features: {fv.features}")

Feature view: data_b  |  Features: [residence_since-Float32, age-Float32, existing_credits-Float32, num_dependents-Float32, housing_ord-Float32]
Feature view: data_test  |  Features: [duration-Float32, credit_amount-Float32, installment_commitment-Float32, checking_status_ord-Float32, residence_since-Float32, age-Float32, existing_credits-Float32, num_dependents-Float32, housing_ord-Float32]
Feature view: data_a  |  Features: [duration-Float32, credit_amount-Float32, installment_commitment-Float32, checking_status_ord-Float32]


### Deploying the Feature Store Servers

If you wish to share a feature store with your team, Feast provides feature servers. To spin up an offline feature server process, we can use the `feast serve_offline` command, while to spin up a Feast online feature server, we use the `feast serve` command.

Let's spin up an offline and an online server that we can use in the subsequent notebooks to get features during model training and model serving. We will run both servers as background processes, that we can communicate with in the other notebooks.

First, we write a helper function to extract the first few printed log lines (so we can print it in the notebook cell output).

In [9]:
# TimeoutError class
class TimeoutError(Exception):
    pass

# TimeoutError raise function
def timeout():
    raise TimeoutError("timeout")

# Get first few log lines function
def print_first_proc_lines(proc, wait):
    '''Given a process, `proc`, read and print output lines until they stop 
    comming (waiting up to `wait` seconds for new lines to appear)'''
    lines = ""
    while True:
        signal.signal(signal.SIGALRM, timeout)
        signal.alarm(wait)
        try:
            lines += proc.stderr.readline()
        except:
            break
    if lines:
        print(lines, file=sys.stderr)

Launch the offline server with the command `feast --chdir ./Feature_Store serve_offline`.

In [10]:
# Feast offline server process
offline_server_proc = subprocess.Popen(
    "feast --chdir ./Feature_Store serve_offline 2>&2 & echo $! > server_proc.txt",
    shell=True,
    text=True,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    bufsize=0
)
print_first_proc_lines(offline_server_proc, 2)

The tail end of the command above, `2>&2 & echo $! > server_proc.txt`, captures log messages (in the offline case there are none), and writes the process PID to the file `server_proc.txt` (we will use this in the cleanup notebook, [05_Credit_Risk_Cleanup.ipynb](05_Credit_Risk_Cleanup.ipynb)).

Next, launch the online server with the command `feast --chdir ./Feature_Store serve`.

In [11]:
# Feast online server (master and worker) processes
online_server_proc = subprocess.Popen(
    "feast --chdir ./Feature_Store serve 2>&2 & echo $! >> server_proc.txt",
    shell=True,
    text=True,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    bufsize=0
)
print_first_proc_lines(online_server_proc, 3)

For more details, see https://github.com/Kludex/uvicorn-worker.
[2024-10-19 09:21:25 -0600] [74004] [INFO] Starting gunicorn 23.0.0
[2024-10-19 09:21:25 -0600] [74004] [INFO] Listening at: http://127.0.0.1:6566 (74004)
[2024-10-19 09:21:25 -0600] [74004] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2024-10-19 09:21:25 -0600] [74030] [INFO] Booting worker with pid: 74030
[2024-10-19 09:21:25 -0600] [74030] [INFO] Started server process [74030]
[2024-10-19 09:21:25 -0600] [74030] [INFO] Waiting for application startup.
[2024-10-19 09:21:25 -0600] [74030] [INFO] Application startup complete.



Note that the output helpfully let's us know that the online server is "Listening at: http://127.0.0.1:6566" (the default host:port).

List the running processes to verify they are up.

In [12]:
# List running Feast processes (paths redacted)
running_procs = !ps -ef | grep feast | grep serve

for line in running_procs:
    redacted = re.sub(r'/*[^\s]*(?P<cmd>(python )|(feast ))', r'**/\g<cmd>', line)
    print(redacted)

  501 74000     1   0  9:21AM ??         0:03.61 **/python **/feast --chdir ./Feature_Store serve_offline
  501 74004     1   0  9:21AM ??         0:03.65 **/python **/feast --chdir ./Feature_Store serve
  501 74030 74004   0  9:21AM ??         0:00.03 **/python **/feast --chdir ./Feature_Store serve
  501 74045 73952   0  9:21AM ??         0:00.01 /bin/zsh -c ps -ef | grep **/feast | grep serve


Note that there are two process for the online server (master and worker).

### Materialize Features to the Online Store

At this point, there is no data in the online store yet. Let's use the SDK feature store object (that we created above) to "materialize" data; this is Feast lingo for moving/updating data from the offline store to the online store.

In [13]:
# Materialize
# Recall that we mocked the outcome data to have timestamps from 
# 'Tue Sep 24 12:00:00 2023'out to "Wed Oct  9 12:00:00 2023"
# The loan outcome timestamps were then lagged by 30-90 days (which is Jan 7 12:00:00 2024)
res = store.materialize(
    start_date=dt.datetime(2023,9,24,12,0,0),
    end_date=dt.datetime(2024,1,7,12,0,0)
)



Materializing [1m[32m3[0m feature views from [1m[32m2023-09-24 12:00:00-06:00[0m to [1m[32m2024-01-07 12:00:00-07:00[0m into the [1m[32msqlite[0m online store.

[1m[32mdata_b[0m:


100%|██████████████████████████████████████████████████████████| 800/800 [00:00<00:00, 25119.35it/s]


[1m[32mdata_test[0m:


100%|██████████████████████████████████████████████████████████| 200/200 [00:00<00:00, 15921.29it/s]


[1m[32mdata_a[0m:


100%|██████████████████████████████████████████████████████████| 800/800 [00:00<00:00, 36734.14it/s]


Now, we can query the SQLite database again and see data in the response!

In [14]:
# Query the online store database to verify materialized data
conn = sqlite3.connect("Feature_Store/data/online_store.db")
cursor = conn.cursor()
print(
    "loan_applications_data_a data: ",
    cursor.execute("SELECT * FROM loan_applications_data_a LIMIT 2").fetchall()
)
print(
    "loan_applications_data_b data: ",
    cursor.execute("SELECT * FROM loan_applications_data_b LIMIT 2").fetchall()
)
conn.close()

loan_applications_data_a data:  [(b'\x02\x00\x00\x00ID\x04\x00\x00\x00\x08\x00\x00\x00\xa4\x00\x00\x00\x00\x00\x00\x00', 'duration', b'5\x00\x00\x10B', None, '2023-09-24 12:00:43', None), (b'\x02\x00\x00\x00ID\x04\x00\x00\x00\x08\x00\x00\x00\xa4\x00\x00\x00\x00\x00\x00\x00', 'credit_amount', b'5\x00@cD', None, '2023-09-24 12:00:43', None)]
loan_applications_data_b data:  [(b'\x02\x00\x00\x00ID\x04\x00\x00\x00\x08\x00\x00\x00\xa4\x00\x00\x00\x00\x00\x00\x00', 'residence_since', b'5\x00\x00\x80@', None, '2023-09-24 12:00:43', None), (b'\x02\x00\x00\x00ID\x04\x00\x00\x00\x08\x00\x00\x00\xa4\x00\x00\x00\x00\x00\x00\x00', 'age', b'5\x00\x00\x10B', None, '2023-09-24 12:00:43', None)]


Note that the data is stored in binary strings, which is part of Feast's optimization for online queries. To get human-readable data, use the `get-online-features` REST API command, which returns a JSON response.

In [15]:
# curl command to online server to get data from the online store
cmd = """http://localhost:6566/get-online-features \
    -d '{ 
            "feature_service": "loan_fs",
            "entities": {"ID": [18, 764]}
        }'
"""

response = !curl -X POST {cmd}

In [16]:
response

['  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current',
 '                                 Dload  Upload   Total   Spent    Left  Speed',
 '',
 '  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0',
 '100  1513  100  1417  100    96   138k   9576 --:--:-- --:--:-- --:--:--  147k',
 '{"metadata":{"feature_names":["ID","credit_amount","installment_commitment","duration","checking_status_ord","existing_credits","num_dependents","housing_ord","residence_since","age"]},"results":[{"values":[18,764],"statuses":["PRESENT","PRESENT"],"event_timestamps":["1970-01-01T00:00:00Z","1970-01-01T00:00:00Z"]},{"values":[12579.0,2463.0],"statuses":["PRESENT","PRESENT"],"event_timestamps":["2023-09-25T03:43:38Z","2023-09-26T14:23:53Z"]},{"values":[4.0,4.0],"statuses":["PRESENT","PRESENT"],"event_timestamps":["2023-09-25T03:43:38Z","2023-09-26T14:23:53Z"]},{"values":[24.0,24.0],"statuses":["PRESENT","PRESENT"],"event_timestamps":["2023-09-25T03:43:38Z

The `curl` command gave us a quick validation. In the [04_Credit_Risk_Model_Serving.ipynb](04_Credit_Risk_Model_Serving.ipynb) notebook, we'll use the Python `requests` library to handle the query better.

Now that the feature stores and their respective servers have been configured and deployed, we can proceed to train an AI model in [03_Credit_Risk_Model_Training.ipynb](03_Credit_Risk_Model_Training.ipynb).