# Feast Demo

In [1]:
import pandas as pd

In [2]:
df = pd.read_parquet('../feature_repo/data/driver_stats.parquet')
df.head()

Unnamed: 0,event_timestamp,driver_id,conv_rate,acc_rate,avg_daily_trips,created
0,2024-03-06 16:00:00+00:00,1005,0.098983,0.229224,560,2024-03-21 16:00:07.931
1,2024-03-06 17:00:00+00:00,1005,0.597186,0.596457,785,2024-03-21 16:00:07.931
2,2024-03-06 18:00:00+00:00,1005,0.460126,0.218102,413,2024-03-21 16:00:07.931
3,2024-03-06 19:00:00+00:00,1005,0.738934,0.810678,374,2024-03-21 16:00:07.931
4,2024-03-06 20:00:00+00:00,1005,0.792706,0.913296,801,2024-03-21 16:00:07.931


## Feast apply (from feast repo directory)

When executing feast apply:
- Feast will scan Python files in your feature repository and find all Feast object definitions, such as feature views, entities, and data sources.
- Feast will validate your feature definition.
- Feast will sync the metadata about Feast objects to the registry. If a registry does not exist, then it will be instantiated.
- Feast will create all necessary feature store infrastructure.

```bash
feast apply

Created entity driver
Created feature view driver_hourly_stats_fresh
Created feature view driver_hourly_stats
Created feature service driver_activity_v2
Created feature service driver_activity_v1

Created sqlite table feast_demo_driver_hourly_stats_fresh
Created sqlite table feast_demo_driver_hourly_stats
```

We have created:

- the entity driver,
- two feature views in the offline and online stores (because online=True), and
- two feature services to be able to use the feature view at inference time

## Retrieve offline features for training/batch inference

In [3]:
from feast import FeatureStore
from datetime import datetime

In [4]:
feature_store = FeatureStore(repo_path="../feature_repo")

In [5]:
entity_df = pd.DataFrame.from_dict(
    {
        "driver_id": [1001, 1002, 1003, 1004, 1001],
        "event_timestamp": [
            datetime(2021, 4, 12, 10, 59, 42),
            datetime(2021, 4, 12, 8, 12, 10),
            datetime(2021, 4, 12, 16, 40, 26),
            datetime(2021, 4, 12, 15, 1, 12),
            datetime.now()
        ]
    }
)

In [6]:
entity_df.head()

Unnamed: 0,driver_id,event_timestamp
0,1001,2021-04-12 10:59:42.000000
1,1002,2021-04-12 08:12:10.000000
2,1003,2021-04-12 16:40:26.000000
3,1004,2021-04-12 15:01:12.000000
4,1001,2024-03-21 16:33:22.939165


In [7]:
training_df = feature_store.get_historical_features(
    entity_df=entity_df,
    features=feature_store.get_feature_service("driver_activity_v1"),  #features retrieved via the feature service
).to_df()

We know that in driver_activity_v1 the driver_id appears several times, but get_historical_features() does a point-in-time join looking at event_timestamp and retrieving the most recent features at that time.

In [8]:
training_df

Unnamed: 0,driver_id,event_timestamp,conv_rate,acc_rate,avg_daily_trips
0,1001,2021-04-12 10:59:42+00:00,0.31346,0.967728,821
1,1002,2021-04-12 08:12:10+00:00,0.235467,0.025087,39
2,1003,2021-04-12 16:40:26+00:00,0.55694,0.157237,335
3,1004,2021-04-12 15:01:12+00:00,0.960231,0.239735,309
4,1001,2024-03-21 16:33:22.939165+00:00,0.119581,0.766169,185


## Retrieve online features for online inference

### Materialization

For online inference, we want to retrieve features very quickly via our online store, as opposed to fetching them from slow joins. However, the features are not in our online store just yet, so we'll need to materialize them first.

```bash
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME
``


In production, incremental materialization can be handled via Airflow operators or other job orchestrators.

In [9]:
feature_store.materialize_incremental(end_date=datetime.now())

Materializing [1m[32m2[0m feature views to [1m[32m2024-03-21 16:33:23+01:00[0m into the [1m[32msqlite[0m online store.

[1m[32mdriver_hourly_stats[0m from [1m[32m2023-03-22 15:33:23+01:00[0m to [1m[32m2024-03-21 16:33:23+01:00[0m:


100%|███████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 1540.55it/s]


[1m[32mdriver_hourly_stats_fresh[0m from [1m[32m2023-03-22 15:33:23+01:00[0m to [1m[32m2024-03-21 17:33:23+01:00[0m:


100%|███████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 2855.99it/s]


### Online Features

In [10]:
feature_store.get_online_features(
    entity_rows=[  # no timestamp required because only the most recent features are retrieved for the following entity values
        {"driver_id": 1001},
        {"driver_id": 1004},
    ],
    features=feature_store.get_feature_service("driver_activity_v1"),
).to_df()

Unnamed: 0,driver_id,conv_rate,acc_rate,avg_daily_trips
0,1001,0.119581,0.766169,185
1,1004,0.403795,0.815309,519
