## Installing Feast
Feast is a python dependency so we have to install it using `pip`

In [2]:
# WE MUST ENSURE PYTHON CONSISTENCY BETWEEN NOTEBOOK AND FEAST SERVERS
# LAUNCH THIS NOTEBOOK FROM A CLEAN PYTHON ENVIRONMENT >3.9
%pip install -q feast==0.40.1
# grpcio is needed as a dependency in the later section of the example to run the feast registry server.
%pip install -q grpcio



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
feast package is installed


## Creating and initializing Feast project

In [3]:
# Displaying the current directory. We will know where the feast files will be created so that we can review them using jupyter console or explorer
%pwd

'/opt/app-root/src/feast/examples/rhoai-quickstart'

In [4]:
# Creating the feast repository. If there is already existing repository then removing it first.
!rm -rf my_feast_project
!feast init my_feast_project


Creating a new Feast repository in [1m[32m/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project[0m.



Above output displays where the feast repo has been created. It may differ based on the environment configuration.

In [5]:
# Going to change the current directory to feature_repo so that we can execute feast CLI commands.
%cd my_feast_project/feature_repo

/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project/feature_repo


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [10]:
# Inspect the feast repo path files. Displaying folder strucuture as tree. Going to describe each file/folder purpose.
!find . | sed -e 's/[^-][^\/]*\// |-- /g' -e 's/|-- \(.*\)/+-- \1/'

.
 +-- data
 +--  |-- driver_stats.parquet
 +-- __init__.py
 +-- __pycache__
 +--  |-- __init__.cpython-39.pyc
 +--  |-- test_workflow.cpython-39.pyc
 +--  |-- example_repo.cpython-39.pyc
 +-- feature_store.yaml
 +-- example_repo.py
 +-- test_workflow.py


Now the feast repo has been created for you. Running the `feast init` command populated the directory with an example feature store structure, complete with example data.

We are defining an entity for the driver in the current example. You can think of an entity as a primary key used to fetch features. Rest of the example will work on the driver data. All the data is coming from the `data/driver_stats.parquet` file which will act as offline store in our example.

Inspect the below files before going further in the current example.

`data` contains the parquet file data used to demonstrate this example.

`example_repo.py` file will have the code to create feast objects such as FeatureView, FeatureServices and OnDemandFeatureViews required to demonstrate this example.
[my_feast_project/feature_repo/example_repo.py](./my_feast_project/feature_repo/example_repo.py)

`feature_store.yaml` file will have all the configurations related to feast.
[my_feast_project/feature_repo/feature_store.yaml](./my_feast_project/feature_repo/feature_store.yaml)

`test_workflow.py` contains the python code to demonstrate run all key Feast commands, including defining, retrieving, and pushing features.
[my_feast_project/feature_repo/test_workflow.py](./my_feast_project/feature_repo/test_workflow.py)

In [13]:
!cat feature_store.yaml

project: my_feast_project
# By default, the registry is a file (but can be turned into a more scalable SQL-backed registry)
registry: data/registry.db
# The provider primarily specifies default offline / online stores & storing the registry in a given cloud
provider: local
online_store:
    type: sqlite
    path: data/online_store.db
entity_key_serialization_version: 2


File `data/driver_stats.parquet` is generated by the `feast init` command and it acts a historical information source to this example. We have defined this source in the [my_feast_project/feature_repo/example_repo.py](./my_feast_project/feature_repo/example_repo.py) file.

```python
driver_stats_source = FileSource(
    name="driver_hourly_stats_source",
    path="/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project/feature_repo/data/driver_stats.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created",
)
```


In [16]:
import pandas as pd
pd.read_parquet("data/driver_stats.parquet")

Unnamed: 0,event_timestamp,driver_id,conv_rate,acc_rate,avg_daily_trips,created
0,2024-09-09 17:00:00+00:00,1005,0.373758,0.475401,354,2024-09-24 17:07:41.972
1,2024-09-09 18:00:00+00:00,1005,0.057971,0.375569,517,2024-09-24 17:07:41.972
2,2024-09-09 19:00:00+00:00,1005,0.383832,0.323274,484,2024-09-24 17:07:41.972
3,2024-09-09 20:00:00+00:00,1005,0.403390,0.570664,634,2024-09-24 17:07:41.972
4,2024-09-09 21:00:00+00:00,1005,0.536741,0.645107,128,2024-09-24 17:07:41.972
...,...,...,...,...,...,...
1802,2024-09-24 15:00:00+00:00,1001,0.534048,0.621612,511,2024-09-24 17:07:41.972
1803,2024-09-24 16:00:00+00:00,1001,0.776248,0.120384,311,2024-09-24 17:07:41.972
1804,2021-04-12 07:00:00+00:00,1001,0.058821,0.109781,581,2024-09-24 17:07:41.972
1805,2024-09-17 05:00:00+00:00,1003,0.297863,0.940503,13,2024-09-24 17:07:41.972


You have not created any feast objects to do that you have to execute command `feast apply` on the directory where `feature_store.yaml` exists. Lets go and do that now.

In [19]:
# Below folder is creating interference with the feast apply command so deleting it in case if it exists.
!rm -rf .ipynb_checkpoints/

In [20]:
# this command will actual creates the feast objects mentioned in `example_repo.py`
!feast apply

Created entity [1m[32mdriver[0m
Created feature view [1m[32mdriver_hourly_stats[0m
Created feature view [1m[32mdriver_hourly_stats_fresh[0m
Created on demand feature view [1m[32mtransformed_conv_rate[0m
Created on demand feature view [1m[32mtransformed_conv_rate_fresh[0m
Created feature service [1m[32mdriver_activity_v2[0m
Created feature service [1m[32mdriver_activity_v1[0m
Created feature service [1m[32mdriver_activity_v3[0m

Created sqlite table [1m[32mmy_feast_project_driver_hourly_stats_fresh[0m
Created sqlite table [1m[32mmy_feast_project_driver_hourly_stats[0m



## Generating the training Data

To train a model, we need features and labels. Often, this label data is stored separately (e.g. you have one table storing user survey results and another set of tables with feature values). Feast can help generate the features that map to these labels.


Feast needs a list of entities (e.g. driver ids) and timestamps. Feast will join relevant tables to create the relevant feature vectors. There are two ways to generate this list:

* The user can query that table of labels with timestamps and pass that into Feast as an entity dataframe for training data generation.

* The user can also query that table with a SQL query which pulls entities. See the [documentation](https://docs.feast.dev/getting-started/concepts/feature-retrieval) on feature retrieval for details

Note: we include timestamps because we want the features for the same driver at various timestamps to be used in a model.

In [21]:
from datetime import datetime
import pandas as pd

from feast import FeatureStore

# Note: see https://docs.feast.dev/getting-started/concepts/feature-retrieval for 
# more details on how to retrieve for all entities in the offline store instead
entity_df = pd.DataFrame.from_dict(
    {
        # entity's join key -> entity values
        "driver_id": [1001, 1002, 1003],
        # "event_timestamp" (reserved key) -> timestamps
        "event_timestamp": [
            datetime(2021, 4, 12, 10, 59, 42),
            datetime(2021, 4, 12, 8, 12, 10),
            datetime(2021, 4, 12, 16, 40, 26),
        ],
        # (optional) label name -> label values. Feast does not process these
        "label_driver_reported_satisfaction": [1, 5, 3],
        # values we're using for an on-demand transformation
        "val_to_add": [1, 2, 3],
        "val_to_add_2": [10, 20, 30],
    }
)

store = FeatureStore(repo_path=".")

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
        "transformed_conv_rate:conv_rate_plus_val1",
        "transformed_conv_rate:conv_rate_plus_val2",
    ],
).to_df()

print("----- Feature schema -----\n")
print(training_df.info())

print()
print("----- Example features -----\n")
print(training_df.head())



----- Feature schema -----

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 10 columns):
 #   Column                              Non-Null Count  Dtype              
---  ------                              --------------  -----              
 0   driver_id                           3 non-null      int64              
 1   event_timestamp                     3 non-null      datetime64[ns, UTC]
 2   label_driver_reported_satisfaction  3 non-null      int64              
 3   val_to_add                          3 non-null      int64              
 4   val_to_add_2                        3 non-null      int64              
 5   conv_rate                           3 non-null      float32            
 6   acc_rate                            3 non-null      float32            
 7   avg_daily_trips                     3 non-null      int32              
 8   conv_rate_plus_val1                 3 non-null      float64            
 9   conv_rate_plus_val2

## Run offline inference (batch scoring)
To power a batch model, we primarily need to pull features with the get_historical_features call, but using the current timestamp

In [22]:
entity_df["event_timestamp"] = pd.to_datetime("now", utc=True)
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
        "transformed_conv_rate:conv_rate_plus_val1",
        "transformed_conv_rate:conv_rate_plus_val2",
    ],
).to_df()

print("\n----- Example features -----\n")
print(training_df.head())




----- Example features -----

   driver_id                  event_timestamp  \
0       1002 2024-09-24 18:01:58.027897+00:00   
1       1001 2024-09-24 18:01:58.027897+00:00   
2       1003 2024-09-24 18:01:58.027897+00:00   

   label_driver_reported_satisfaction  val_to_add  val_to_add_2  conv_rate  \
0                                   5           2            20   0.311688   
1                                   1           1            10   0.776248   
2                                   3           3            30   0.235401   

   acc_rate  avg_daily_trips  conv_rate_plus_val1  conv_rate_plus_val2  
0  0.991556              579             2.311688            20.311688  
1  0.120384              311             1.776248            10.776248  
2  0.644993              381             3.235401            30.235401  


## Ingest batch features into your online store

This command will generate the features from offline store and stores into online store. This command will call `get_historical_features` to get the data from offline store.

In [23]:
!feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

Materializing [1m[32m2[0m feature views to [1m[32m2024-09-24 18:02:06+00:00[0m into the [1m[32msqlite[0m online store.

[1m[32mdriver_hourly_stats[0m from [1m[32m2024-09-23 18:02:09+00:00[0m to [1m[32m2024-09-24 18:02:06+00:00[0m:
100%|████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 670.38it/s]
[1m[32mdriver_hourly_stats_fresh[0m from [1m[32m2024-09-23 18:02:09+00:00[0m to [1m[32m2024-09-24 18:02:06+00:00[0m:
100%|████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 807.56it/s]


## Fetching feature vectors for inference
At inference time, we need to quickly read the latest feature values for different drivers (which otherwise might have existed only in batch sources) from the online feature store using `get_online_features()`. These feature vectors can then be fed to the model.

In [24]:
from pprint import pprint
from feast import FeatureStore

store = FeatureStore(repo_path=".")

feature_vector = store.get_online_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
    entity_rows=[
        # {join_key: entity_value}
        {"driver_id": 1004},
        {"driver_id": 1005},
    ],
).to_dict()

pprint(feature_vector)



{'acc_rate': [0.49898454546928406, 0.2943153381347656],
 'avg_daily_trips': [178, 74],
 'conv_rate': [0.19129787385463715, 0.5790505409240723],
 'driver_id': [1004, 1005]}


## Using a feature service to fetch online features instead.
You can also use feature services to manage multiple features, and decouple feature view definitions and the features needed by end applications. The feature store can also be used to fetch either online or historical features using the same API below. 

The `driver_activity_v4` feature service pulls all features from the `driver_hourly_stats` feature view:

In [25]:
import example_repo
from feast import FeatureStore


from feast import FeatureService
driver_activity_v4 = FeatureService(
    name="driver_activity_v4",
    features=[example_repo.driver_stats_fresh_fv],
)

feature_store = FeatureStore('.')  # Initialize the feature store

feature_store.apply([driver_activity_v4])

print("FeatureService driver_activity_v4 created.")



FeatureService driver_activity_v4 created.


In [26]:
import example_repo
from pprint import pprint


#feature_service = feature_store.get_feature_service("driver_activity_v4")
feature_vector = feature_store.get_online_features(
    features=driver_activity_v4,
    entity_rows=[
        # {join_key: entity_value}
        {"driver_id": 1004},
        {"driver_id": 1005},
    ],
).to_dict()
pprint(feature_vector)



{'acc_rate': [0.49898454546928406, 0.2943153381347656],
 'avg_daily_trips': [178, 74],
 'conv_rate': [0.19129787385463715, 0.5790505409240723],
 'driver_id': [1004, 1005]}


## Accessing Features using remote online store

In this section we will start the feast in server and client mode. We will start the feast online server and retrieve online features using `remote` online store.

By default online server starts on the port: `6566`. We are going to still refer the same registry as `my_feast_project` to keep this example simple to understand instead of starting registry and online server at the same time. You can review the client feature store configuration [here](./remote-online/feature_store.yaml). 


In the actual production environment you can run registry, online and offline servers and access them remotely using feature store clients. 

### Starting feast online feature server

In [28]:
import subprocess

# Run feast serve in the background
feast_online_server_process = subprocess.Popen(["feast", "serve"])

For more details, see https://github.com/Kludex/uvicorn-worker.
[2024-09-24 18:03:30 +0000] [17522] [INFO] Starting gunicorn 23.0.0
[2024-09-24 18:03:30 +0000] [17522] [INFO] Listening at: http://127.0.0.1:6566 (17522)
[2024-09-24 18:03:30 +0000] [17522] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2024-09-24 18:03:30 +0000] [17570] [INFO] Booting worker with pid: 17570
[2024-09-24 18:03:30 +0000] [17570] [INFO] Started server process [17570]
[2024-09-24 18:03:30 +0000] [17570] [INFO] Waiting for application startup.
[2024-09-24 18:03:30 +0000] [17570] [INFO] Application startup complete.


In [29]:
%%sh
# checking if the online server process started.
ps -ef | grep 'feast serve'

1001130+   17522   16173 32 18:03 ?        00:00:04 /opt/app-root/bin/python3.9 /opt/app-root/bin/feast serve
1001130+   17570   17522  0 18:03 ?        00:00:00 /opt/app-root/bin/python3.9 /opt/app-root/bin/feast serve
1001130+   17578   17575  0 18:03 ?        00:00:00 grep feast serve


### Retrieving the features using online remote client 

In [30]:
import os
import yaml

directory = os.path.abspath("./../../remote-online")
os.makedirs(directory, exist_ok=True)

data = {
    'project': 'my_feast_project',
    'registry': './../my_feast_project/feature_repo/data/registry.db',
    'provider': 'local',
    'online_store': {
        'type': 'remote',
        'path': 'http://127.0.0.1:6566'
    },
    'entity_key_serialization_version': 2
}

file_path = os.path.join(directory, 'feature_store.yaml')

# Write to a YAML file
with open(file_path, 'w') as file:
    yaml.dump(data, file, default_flow_style=False)

print("remote-online feature_Store.yaml file has been created.")


remote-online feature_Store.yaml file has been created.


In [31]:
%cd ./../../remote-online

/opt/app-root/src/feast/examples/rhoai-quickstart/remote-online


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [32]:
online_feature_store_client = FeatureStore('.')
online_feature_store_client.apply([])
print("remote online feature store client has been initialized.")

remote online feature store client has been initialized.


Now we are going to retrieve the same features we have retrieved in previous section. Here we are client store going to retrieve the features using remote feature store.

In [33]:
online_features_stores_client = online_feature_store_client.get_online_features(
    features=driver_activity_v4,
    entity_rows=[
        # {join_key: entity_value}
        {"driver_id": 1004},
        {"driver_id": 1005},
    ],
).to_dict()
pprint(online_features_stores_client)



127.0.0.1:55838 - "POST /get-online-features HTTP/1.1" 200
{'acc_rate': [0.49898454546928406, 0.2943153381347656],
 'avg_daily_trips': [178, 74],
 'conv_rate': [0.19129787385463715, 0.5790505409240723],
 'driver_id': [1004, 1005]}




## Accessing Feast Registry metadata using remote registry store
Registry is going to have all the metadata information of feast objects such as FeatureService, FeatureViews. Either you can directly access this information using the way referred in above section.

The other way to access in the client server model. You can start the registry server and access them using remote registry client as shown in this section.

The default port for the registry is `6570`


### Starting the registry server as remote

Change the current directory context to initial feature store so that we can start the registry server.

In [52]:
%cd ./../my_feast_project/feature_repo

/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project/feature_repo


In [35]:
import subprocess

# Run feast serve in the background
feast_remote_registry_server_process = subprocess.Popen(["feast", "serve_registry"])
print("Registry server started on the default port 6570. Go to next cell and check if the process is available.")

Registry server started on the default port 6570. Go to next cell and check if the process is available.


In [45]:
%%sh
# checking if the registry server process started.
pwd
ps -ef | grep 'feast serve_registry'

/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project/feature_repo
1001130+   17902   16173  1 18:07 ?        00:00:04 /opt/app-root/bin/python3.9 /opt/app-root/bin/feast serve_registry
1001130+   18249   18246  0 18:13 ?        00:00:00 grep feast serve_registry


### Initializing the remote registry client and retrieving the feast metadata

In [46]:
import os
import yaml

directory = os.path.abspath("./../../remote-registry")
os.makedirs(directory, exist_ok=True)

data = {
    'project': 'my_feast_project',
    'registry': {
        'registry_type': 'remote',
        'path': 'localhost:6570'
    },
    'provider': 'local',
    'online_store': {
        'type': 'remote',
        'path': 'http://127.0.0.1:6566'
    },
    'entity_key_serialization_version': 2
}

file_path = os.path.join(directory, 'feature_store.yaml')

# Write to a YAML file
with open(file_path, 'w') as file:
    yaml.dump(data, file, default_flow_style=False)

print("remote-registry feature_Store.yaml file has been created.")

remote-registry feature_Store.yaml file has been created.


In [48]:
%cd ./../../remote-registry

/opt/app-root/src/feast/examples/rhoai-quickstart/remote-registry


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [49]:
registry_feature_store_client = FeatureStore('.')
registry_feature_store_client.apply([])
print("Remote registry feature store client has been initialized.")

Remote registry feature store client has been initialized.


In [50]:
# Listing all feature views using remote registry client
registry_feature_store_client.list_all_feature_views(allow_cache=False)



[<FeatureView(name = driver_hourly_stats, entities = ['driver'], ttl = 1 day, 0:00:00, stream_source = None, batch_source = {
   "type": "BATCH_FILE",
   "timestampField": "event_timestamp",
   "createdTimestampColumn": "created",
   "fileOptions": {
     "uri": "/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project/feature_repo/data/driver_stats.parquet"
   },
   "name": "driver_hourly_stats_source"
 }, entity_columns = [driver_id-Int64], features = [conv_rate-Float32, acc_rate-Float32, avg_daily_trips-Int64], description = , tags = {'team': 'driver_performance'}, owner = , projection = FeatureViewProjection(name='driver_hourly_stats', name_alias=None, desired_features=[], features=[conv_rate-Float32, acc_rate-Float32, avg_daily_trips-Int64], join_key_map={}), created_timestamp = 2024-09-24 18:01:41.750517, last_updated_timestamp = 2024-09-24 18:02:09.176556, online = True, materialization_intervals = [(datetime.datetime(2024, 9, 23, 18, 2, 9, 112629, tzinfo=<UTC>), datet

In [51]:
# Listing all feature services using remote registry client
registry_feature_store_client.list_feature_services()

[<FeatureService(name = driver_activity_v2, _features = [], feature_view_projections = [FeatureViewProjection(name='driver_hourly_stats', name_alias=None, desired_features=[], features=[conv_rate-Float32, acc_rate-Float32, avg_daily_trips-Int64], join_key_map={}), FeatureViewProjection(name='transformed_conv_rate', name_alias=None, desired_features=[], features=[conv_rate_plus_val1-Float64, conv_rate_plus_val2-Float64], join_key_map={})], description = , tags = {}, owner = , created_timestamp = 2024-09-24 18:01:41.752541, last_updated_timestamp = 2024-09-24 18:01:41.752541, logging_config = None)>,
 <FeatureService(name = driver_activity_v1, _features = [], feature_view_projections = [FeatureViewProjection(name='driver_hourly_stats', name_alias=None, desired_features=[], features=[conv_rate-Float32], join_key_map={}), FeatureViewProjection(name='transformed_conv_rate', name_alias=None, desired_features=[], features=[conv_rate_plus_val1-Float64, conv_rate_plus_val2-Float64], join_key_ma

## Stopping the online, registry server

In [54]:
%%sh
# checking if the registry server and online server process is already running.
pwd
ps -ef | grep 'feast serve'

/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project/feature_repo
1001130+   17522   16173  0 18:03 ?        00:00:04 /opt/app-root/bin/python3.9 /opt/app-root/bin/feast serve
1001130+   17570   17522  0 18:03 ?        00:00:00 /opt/app-root/bin/python3.9 /opt/app-root/bin/feast serve
1001130+   17902   16173  0 18:07 ?        00:00:04 /opt/app-root/bin/python3.9 /opt/app-root/bin/feast serve_registry
1001130+   18481   18438  0 18:15 ?        00:00:00 grep feast serve


In [55]:
feast_online_server_process.terminate()  # Stop the remote Feast online server
feast_remote_registry_server_process.terminate() # stops the remote registry server
print("remote online and registry server has been stopped.")


remote online and registry server has been stopped.


[2024-09-24 18:16:00 +0000] [17522] [INFO] Handling signal: term
[2024-09-24 18:16:00 +0000] [17570] [INFO] Shutting down
[2024-09-24 18:16:00 +0000] [17570] [INFO] Error while closing socket [Errno 9] Bad file descriptor
[2024-09-24 18:16:00 +0000] [17570] [INFO] Waiting for application shutdown.
[2024-09-24 18:16:00 +0000] [17570] [INFO] Application shutdown complete.
[2024-09-24 18:16:00 +0000] [17570] [INFO] Finished server process [17570]
[2024-09-24 18:16:00 +0000] [17522] [ERROR] Worker (pid:17570) was sent SIGTERM!
[2024-09-24 18:16:00 +0000] [17522] [INFO] Shutting down: Master


In [56]:
%%sh
# checking if the registry server and online server process stopped. wait for some time until it kills.
pwd
ps -ef | grep 'feast serve'

/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project/feature_repo
1001130+   18542   18499  0 18:16 ?        00:00:00 grep feast serve
