## Installing Feast
Feast is a python dependency so we have to install it using `pip`

In [1]:
# WE MUST ENSURE PYTHON CONSISTENCY BETWEEN NOTEBOOK AND FEAST SERVERS
# LAUNCH THIS NOTEBOOK FROM A CLEAN PYTHON ENVIRONMENT >3.9
%pip install feast==0.40.1


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
# Displaying the current directory. We will know where the feast files will be created so that we can review them using jupyter console or explorer 
%pwd

'/opt/app-root/src/feast/examples/rhoai-quickstart'

In [3]:
# Creating the feast repository. If there is already existing repository then removing it first.
!rm -rf my_feast_project
!feast init my_feast_project


Creating a new Feast repository in [1m[32m/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project[0m.



Above output displays where the feast repo has been created. It may differ based on the environment configuration. On the example currently it is placing at `/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project`

In [4]:
# Going to change the current directory to feature_repo so that we can execute feast CLI commands.
%cd my_feast_project/feature_repo

/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project/feature_repo


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [5]:
# Inspect the feast repo path files. Going to describe each file/folder purpose.
!ls -lah

total 36K
drwxr-xr-x. 4 1001130000 1001130000 4.0K Sep 24 03:07 .
drwxr-xr-x. 3 1001130000 1001130000 4.0K Sep 24 03:07 ..
drwxr-xr-x. 2 1001130000 root       4.0K Sep 24 03:07 data
-rw-r--r--. 1 1001130000 1001130000 5.2K Sep 24 03:07 example_repo.py
-rw-r--r--. 1 1001130000 1001130000  372 Sep 24 03:07 feature_store.yaml
-rw-r--r--. 1 1001130000 1001130000    0 Sep 23 16:09 __init__.py
drwxr-xr-x. 2 1001130000 1001130000 4.0K Sep 23 16:09 __pycache__
-rw-r--r--. 1 1001130000 1001130000 4.3K Sep 23 16:09 test_workflow.py


Now the feast repo has been created for you. Inspect the below files.

`data` contains the parquet file data used to demonstrate this example.

`example_repo.py` file will have the code to create feast objects such as FeatureView, FeatureServices and OnDemandFeatureViews required to demonstrate this example.
[my_feast_project/feature_repo/example_repo.py](./my_feast_project/feature_repo/example_repo.py)

`feature_store.yaml` file will have all the configurations related to feast.
[my_feast_project/feature_repo/feature_store.yaml](./my_feast_project/feature_repo/feature_store.yaml)

`test_workflow.py` contains the python code to demonstrate run all key Feast commands, including defining, retrieving, and pushing features.
[my_feast_project/feature_repo/test_workflow.py](./my_feast_project/feature_repo/test_workflow.py)

In [6]:
more feature_store.yaml

project: my_feast_project
# By default, the registry is a file (but can be turned into a more scalable SQL-backed registry)
registry: data/registry.db
# The provider primarily specifies default offline / online stores & storing the registry in a given cloud
provider: local
online_store:
    type: sqlite
    path: data/online_store.db
entity_key_serialization_version: 2


The default feature store configuration uses the sqllite as registry and online store.

In [7]:
import pandas as pd
pd.read_parquet("data/driver_stats.parquet")

Unnamed: 0,event_timestamp,driver_id,conv_rate,acc_rate,avg_daily_trips,created
0,2024-09-09 03:00:00+00:00,1005,0.136662,0.043697,22,2024-09-24 03:07:44.487
1,2024-09-09 04:00:00+00:00,1005,0.027620,0.808205,953,2024-09-24 03:07:44.487
2,2024-09-09 05:00:00+00:00,1005,0.523759,0.807032,739,2024-09-24 03:07:44.487
3,2024-09-09 06:00:00+00:00,1005,0.795781,0.674151,743,2024-09-24 03:07:44.487
4,2024-09-09 07:00:00+00:00,1005,0.075056,0.902140,724,2024-09-24 03:07:44.487
...,...,...,...,...,...,...
1802,2024-09-24 01:00:00+00:00,1001,0.163166,0.803832,648,2024-09-24 03:07:44.487
1803,2024-09-24 02:00:00+00:00,1001,0.102868,0.695487,361,2024-09-24 03:07:44.487
1804,2021-04-12 07:00:00+00:00,1001,0.039615,0.870356,271,2024-09-24 03:07:44.487
1805,2024-09-16 15:00:00+00:00,1003,0.506131,0.342303,799,2024-09-24 03:07:44.487


You have not created any feast objects to do that you have to execute command `feast apply` on the directory where `feature_store.yaml` exists. Lets go and do that now.

In [8]:
# Below folder is creating interference with the feast apply command so deleting it in case if it exists.
!rm -rf .ipynb_checkpoints/

In [9]:
# this command will actual creates the feast objects mentioned in `example_repo.py`
!feast apply

Created entity [1m[32mdriver[0m
Created feature view [1m[32mdriver_hourly_stats[0m
Created feature view [1m[32mdriver_hourly_stats_fresh[0m
Created on demand feature view [1m[32mtransformed_conv_rate[0m
Created on demand feature view [1m[32mtransformed_conv_rate_fresh[0m
Created feature service [1m[32mdriver_activity_v2[0m
Created feature service [1m[32mdriver_activity_v1[0m
Created feature service [1m[32mdriver_activity_v3[0m

Created sqlite table [1m[32mmy_feast_project_driver_hourly_stats_fresh[0m
Created sqlite table [1m[32mmy_feast_project_driver_hourly_stats[0m



## Generating the training Data

To train a model, we need features and labels. Often, this label data is stored separately (e.g. you have one table storing user survey results and another set of tables with feature values). Feast can help generate the features that map to these labels.

In [10]:
from datetime import datetime
import pandas as pd

from feast import FeatureStore

# Note: see https://docs.feast.dev/getting-started/concepts/feature-retrieval for 
# more details on how to retrieve for all entities in the offline store instead
entity_df = pd.DataFrame.from_dict(
    {
        # entity's join key -> entity values
        "driver_id": [1001, 1002, 1003],
        # "event_timestamp" (reserved key) -> timestamps
        "event_timestamp": [
            datetime(2021, 4, 12, 10, 59, 42),
            datetime(2021, 4, 12, 8, 12, 10),
            datetime(2021, 4, 12, 16, 40, 26),
        ],
        # (optional) label name -> label values. Feast does not process these
        "label_driver_reported_satisfaction": [1, 5, 3],
        # values we're using for an on-demand transformation
        "val_to_add": [1, 2, 3],
        "val_to_add_2": [10, 20, 30],
    }
)

store = FeatureStore(repo_path=".")

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
        "transformed_conv_rate:conv_rate_plus_val1",
        "transformed_conv_rate:conv_rate_plus_val2",
    ],
).to_df()

print("----- Feature schema -----\n")
print(training_df.info())

print()
print("----- Example features -----\n")
print(training_df.head())



----- Feature schema -----

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 10 columns):
 #   Column                              Non-Null Count  Dtype              
---  ------                              --------------  -----              
 0   driver_id                           3 non-null      int64              
 1   event_timestamp                     3 non-null      datetime64[ns, UTC]
 2   label_driver_reported_satisfaction  3 non-null      int64              
 3   val_to_add                          3 non-null      int64              
 4   val_to_add_2                        3 non-null      int64              
 5   conv_rate                           3 non-null      float32            
 6   acc_rate                            3 non-null      float32            
 7   avg_daily_trips                     3 non-null      int32              
 8   conv_rate_plus_val1                 3 non-null      float64            
 9   conv_rate_plus_val2

## Run offline inference (batch scoring)
To power a batch model, we primarily need to generate features with the get_historical_features call, but using the current timestamp

In [11]:
entity_df["event_timestamp"] = pd.to_datetime("now", utc=True)
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
        "transformed_conv_rate:conv_rate_plus_val1",
        "transformed_conv_rate:conv_rate_plus_val2",
    ],
).to_df()

print("\n----- Example features -----\n")
print(training_df.head())




----- Example features -----

   driver_id                  event_timestamp  \
0       1001 2024-09-24 03:08:48.536177+00:00   
1       1002 2024-09-24 03:08:48.536177+00:00   
2       1003 2024-09-24 03:08:48.536177+00:00   

   label_driver_reported_satisfaction  val_to_add  val_to_add_2  conv_rate  \
0                                   1           1            10   0.102868   
1                                   5           2            20   0.691400   
2                                   3           3            30   0.914943   

   acc_rate  avg_daily_trips  conv_rate_plus_val1  conv_rate_plus_val2  
0  0.695487              361             1.102868            10.102868  
1  0.798216               18             2.691400            20.691400  
2  0.274195              368             3.914943            30.914943  


## Ingest batch features into your online store

In [12]:
!feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

Materializing [1m[32m2[0m feature views to [1m[32m2024-09-24 03:09:03+00:00[0m into the [1m[32msqlite[0m online store.

[1m[32mdriver_hourly_stats[0m from [1m[32m2024-09-23 03:09:06+00:00[0m to [1m[32m2024-09-24 03:09:03+00:00[0m:
100%|████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 677.51it/s]
[1m[32mdriver_hourly_stats_fresh[0m from [1m[32m2024-09-23 03:09:06+00:00[0m to [1m[32m2024-09-24 03:09:03+00:00[0m:
100%|████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 885.40it/s]


## Fetching feature vectors for inference
At inference time, we need to quickly read the latest feature values for different drivers (which otherwise might have existed only in batch sources) from the online feature store using `get_online_features()`. These feature vectors can then be fed to the model.

In [13]:
from pprint import pprint
from feast import FeatureStore

store = FeatureStore(repo_path=".")

feature_vector = store.get_online_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
    entity_rows=[
        # {join_key: entity_value}
        {"driver_id": 1004},
        {"driver_id": 1005},
    ],
).to_dict()

pprint(feature_vector)



{'acc_rate': [0.6362705826759338, 0.9071609377861023],
 'avg_daily_trips': [846, 810],
 'conv_rate': [0.35948044061660767, 0.6279279589653015],
 'driver_id': [1004, 1005]}


## Using a feature service to fetch online features instead.
You can also use feature services to manage multiple features, and decouple feature view definitions and the features needed by end applications. The feature store can also be used to fetch either online or historical features using the same API below. 

The `driver_activity_v4` feature service pulls all features from the `driver_hourly_stats` feature view:

In [14]:
import example_repo
from feast import FeatureStore


from feast import FeatureService
driver_activity_v4 = FeatureService(
    name="driver_activity_v4",
    features=[example_repo.driver_stats_fresh_fv],
)

feature_store = FeatureStore('.')  # Initialize the feature store

feature_store.apply([driver_activity_v4])

print("FeatureService driver_activity_v4 created.")



FeatureService driver_activity_v4 created.


In [15]:
import example_repo
from pprint import pprint


#feature_service = feature_store.get_feature_service("driver_activity_v4")
feature_vector = feature_store.get_online_features(
    features=driver_activity_v4,
    entity_rows=[
        # {join_key: entity_value}
        {"driver_id": 1004},
        {"driver_id": 1005},
    ],
).to_dict()
pprint(feature_vector)



{'acc_rate': [0.6362705826759338, 0.9071609377861023],
 'avg_daily_trips': [846, 810],
 'conv_rate': [0.35948044061660767, 0.6279279589653015],
 'driver_id': [1004, 1005]}


## Accessing Features using remote online store

In this section we will start the feast in server and client mode. We will start the feast online server and retrieve online features using `remote` online store.

By default online server starts on the port: `6566`. We are going to still refer the same registry as `my_feast_project` to keep this example simple to understand instead of starting registry and online server at the same time. You can review the client feature store configuration [here](./remote-online/feature_store.yaml). 


In the actual production environment you can run registry, online and offline servers and access them remotely using feature store clients. 

### Starting feast online sever

In [16]:
import subprocess

# Run feast serve in the background
feast_online_server_process = subprocess.Popen(["feast", "serve"])

For more details, see https://github.com/Kludex/uvicorn-worker.
[2024-09-24 03:09:58 +0000] [11973] [INFO] Starting gunicorn 23.0.0
[2024-09-24 03:09:58 +0000] [11973] [INFO] Listening at: http://127.0.0.1:6566 (11973)
[2024-09-24 03:09:58 +0000] [11973] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2024-09-24 03:09:58 +0000] [12007] [INFO] Booting worker with pid: 12007
[2024-09-24 03:09:58 +0000] [12007] [INFO] Started server process [12007]
[2024-09-24 03:09:58 +0000] [12007] [INFO] Waiting for application startup.
[2024-09-24 03:09:58 +0000] [12007] [INFO] Application startup complete.


In [17]:
%%sh
# checking if the online server process started.
ps -ef | grep 'feast serve'

1001130+   11973   11692 51 03:09 ?        00:00:04 /opt/app-root/bin/python3.9 /opt/app-root/bin/feast serve
1001130+   12007   11973  0 03:09 ?        00:00:00 /opt/app-root/bin/python3.9 /opt/app-root/bin/feast serve
1001130+   12014   12011  0 03:10 ?        00:00:00 grep feast serve


### Retrieving the features using online remote client 

In [19]:
import os
import yaml

directory = os.path.abspath("./../../remote-online")
os.makedirs(directory, exist_ok=True)

data = {
    'project': 'my_feast_project',
    'registry': './../my_feast_project/feature_repo/data/registry.db',
    'provider': 'local',
    'online_store': {
        'type': 'remote',
        'path': 'http://127.0.0.1:6566'
    },
    'entity_key_serialization_version': 2
}

file_path = os.path.join(directory, 'feature_store.yaml')

# Write to a YAML file
with open(file_path, 'w') as file:
    yaml.dump(data, file, default_flow_style=False)

print("remote-online feature_Store.yaml file has been created.")


remote-online feature_Store.yaml file has been created.


In [22]:
%cd ./../../remote-online

/opt/app-root/src/feast/examples/rhoai-quickstart/remote-online


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [23]:
online_feature_store_client = FeatureStore('.')
online_feature_store_client.apply([])
print("remote online feature store client has been initialized.")

remote online feature store client has been initialized.


Now we are going to retrieve the same features we have retrieved in previous section. Here we are client store going to retrieve the features using remote feature store.

In [24]:
online_features_stores_client = online_feature_store_client.get_online_features(
    features=driver_activity_v4,
    entity_rows=[
        # {join_key: entity_value}
        {"driver_id": 1004},
        {"driver_id": 1005},
    ],
).to_dict()
pprint(online_features_stores_client)



127.0.0.1:40596 - "POST /get-online-features HTTP/1.1" 200
{'acc_rate': [0.6362705826759338, 0.9071609377861023],
 'avg_daily_trips': [846, 810],
 'conv_rate': [0.35948044061660767, 0.6279279589653015],
 'driver_id': [1004, 1005]}




## Accessing Feast Registry metadata using remote registry store
Registry is going to have all the metadata information of feast objects such as FeatureService, FeatureViews. Either you can directly access this information using the way referred in above section.

The other way to access in the client server model. You can start the registry server and access them using remote registry client as shown in this section.

The default port for the registry is `6570`


### Starting the registry server as remote

In [26]:
import subprocess

# Run feast serve in the background
feast_remote_registry_server_process = subprocess.Popen(["feast", "serve_registry"])
print("Registry server started on the default port 6570. Go to next cell and check if the process is available.")

Registry server started on the default port 6570. Go to next cell.


In [31]:
%%sh
# checking if the registry server process started.
pwd
ps -ef | grep 'feast serve_registry'

/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project/feature_repo
1001130+   12533   11692  0 03:18 ?        00:00:04 /opt/app-root/bin/python3.9 /opt/app-root/bin/feast serve_registry
1001130+   13097   13094  0 03:28 ?        00:00:00 grep feast serve_registry


### Initializing the remote registry client and retrieving the feast metadata

In [32]:
import os
import yaml

directory = os.path.abspath("./../../remote-registry")
os.makedirs(directory, exist_ok=True)

data = {
    'project': 'my_feast_project',
    'registry': {
        'type': 'remote',
        'path': 'localhost:6570'
    },
    'provider': 'local',
    'online_store': {
        'type': 'remote',
        'path': 'http://127.0.0.1:6566'
    },
    'entity_key_serialization_version': 2
}

file_path = os.path.join(directory, 'feature_store.yaml')

# Write to a YAML file
with open(file_path, 'w') as file:
    yaml.dump(data, file, default_flow_style=False)

print("remote-registry feature_Store.yaml file has been created.")

remote-registry feature_Store.yaml file has been created.


In [33]:
%cd ./../../remote-registry

/opt/app-root/src/feast/examples/rhoai-quickstart/remote-registry


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [36]:
registry_feature_store_client = FeatureStore('.')
registry_feature_store_client.apply([])
print("Remote registry feature store client has been initialized.")

Remote registry feature store client has been initialized.


In [37]:
# Liting all feature views using remote registry client
registry_feature_store_client.list_all_feature_views(allow_cache=False)



[<FeatureView(name = driver_hourly_stats, entities = ['driver'], ttl = 1 day, 0:00:00, stream_source = None, batch_source = {
   "type": "BATCH_FILE",
   "timestampField": "event_timestamp",
   "createdTimestampColumn": "created",
   "fileOptions": {
     "uri": "/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project/feature_repo/data/driver_stats.parquet"
   },
   "name": "driver_hourly_stats_source"
 }, entity_columns = [driver_id-Int64], features = [conv_rate-Float32, acc_rate-Float32, avg_daily_trips-Int64], description = , tags = {'team': 'driver_performance'}, owner = , projection = FeatureViewProjection(name='driver_hourly_stats', name_alias=None, desired_features=[], features=[conv_rate-Float32, acc_rate-Float32, avg_daily_trips-Int64], join_key_map={}), created_timestamp = 2024-09-24 03:08:32.697894, last_updated_timestamp = 2024-09-24 03:09:06.081507, online = True, materialization_intervals = [(datetime.datetime(2024, 9, 23, 3, 9, 6, 17933, tzinfo=<UTC>), datetim

In [38]:
# Liting all feature services using remote registry client
registry_feature_store_client.list_feature_services()

[<FeatureService(name = driver_activity_v2, _features = [], feature_view_projections = [FeatureViewProjection(name='driver_hourly_stats', name_alias=None, desired_features=[], features=[conv_rate-Float32, acc_rate-Float32, avg_daily_trips-Int64], join_key_map={}), FeatureViewProjection(name='transformed_conv_rate', name_alias=None, desired_features=[], features=[conv_rate_plus_val1-Float64, conv_rate_plus_val2-Float64], join_key_map={})], description = , tags = {}, owner = , created_timestamp = 2024-09-24 03:08:32.699926, last_updated_timestamp = 2024-09-24 03:08:32.699926, logging_config = None)>,
 <FeatureService(name = driver_activity_v1, _features = [], feature_view_projections = [FeatureViewProjection(name='driver_hourly_stats', name_alias=None, desired_features=[], features=[conv_rate-Float32], join_key_map={}), FeatureViewProjection(name='transformed_conv_rate', name_alias=None, desired_features=[], features=[conv_rate_plus_val1-Float64, conv_rate_plus_val2-Float64], join_key_ma

## Stopping the online, registry server

In [39]:
%%sh
# checking if the registry server and online server process is already running.
pwd
ps -ef | grep 'feast serve'

/opt/app-root/src/feast/examples/rhoai-quickstart/remote-registry
1001130+   11973   11692  0 03:09 ?        00:00:04 /opt/app-root/bin/python3.9 /opt/app-root/bin/feast serve
1001130+   12007   11973  0 03:09 ?        00:00:00 /opt/app-root/bin/python3.9 /opt/app-root/bin/feast serve
1001130+   12533   11692  0 03:18 ?        00:00:04 /opt/app-root/bin/python3.9 /opt/app-root/bin/feast serve_registry
1001130+   13663   13620  0 03:36 ?        00:00:00 grep feast serve


In [40]:
feast_online_server_process.terminate()  # Stop the remote Feast online server
feast_remote_registry_server_process.terminate() # stops the remote registry server
print("remote online and registry server has been stopped.")


remote online and registry server has been stopped.


[2024-09-24 03:36:24 +0000] [11973] [INFO] Handling signal: term
[2024-09-24 03:36:24 +0000] [12007] [INFO] Shutting down
[2024-09-24 03:36:24 +0000] [12007] [INFO] Error while closing socket [Errno 9] Bad file descriptor
[2024-09-24 03:36:24 +0000] [12007] [INFO] Waiting for application shutdown.
[2024-09-24 03:36:24 +0000] [12007] [INFO] Application shutdown complete.
[2024-09-24 03:36:24 +0000] [12007] [INFO] Finished server process [12007]
[2024-09-24 03:36:24 +0000] [11973] [ERROR] Worker (pid:12007) was sent SIGTERM!
[2024-09-24 03:36:24 +0000] [11973] [INFO] Shutting down: Master


In [41]:
%%sh
# checking if the registry server and online server process stopped. wait for some time until it kills.
pwd
ps -ef | grep 'feast serve'

/opt/app-root/src/feast/examples/rhoai-quickstart/remote-registry
1001130+   13848   13805  0 03:37 ?        00:00:00 grep feast serve
