## Step 1: Install Feast

Install Feast using pip:


In [1]:
pip install feast -U -q

Note: you may need to restart the kernel to use updated packages.


## Step 2: Create a feature repository

A feature repository is a directory that contains the configuration of the feature store and individual features. This configuration is written as code (Python/YAML) and it's highly recommended that teams track it centrally using git. See [Feature Repository](https://docs.feast.dev/reference/feature-repository) for a detailed explanation of feature repositories.

The easiest way to create a new feature repository to use the `feast init` command:

In [1]:
%%sh
feast init feature_repo


Creating a new Feast repository in /home/willem/Projects/feast/examples/quickstart/feature_repo.



### Inspecting the feature repository

Let's take a look at the repo itself. It breaks down into


*   `data/` contains the raw parquet data
*   `example.py` contains demo feature definitions
*   `feature_store.yaml` contains a demo setup configuring where data sources are



In [2]:
%cd feature_repo

/home/willem/Projects/feast/examples/quickstart/feature_repo


In [3]:
!ls

data  example.py  feature_store.yaml


## Step 3: Applying and deploying feature definitions

`feast apply` scans python files in the current directory for feature definitions and deploys infrastructure according to `feature_store.yaml`

In [4]:
%%sh
feast apply

Registered entity driver_id
Registered feature view driver_hourly_stats
Deploying infrastructure for driver_hourly_stats


## Step 4: Materialize features

We now serialize these features since the beginning of time to prepare for model training / serving (note: `materialize-incremental` serializes all new features since the last `materialize` call)

In [5]:
from datetime import datetime

In [6]:
!feast materialize-incremental {datetime.now().isoformat()}

Materializing [1m[32m1[0m feature views to [1m[32m2021-08-08 11:53:37-07:00[0m into the [1m[32msqlite[0m online store.

[1m[32mdriver_hourly_stats[0m from [1m[32m2021-08-08 01:53:37-07:00[0m to [1m[32m2021-08-08 11:53:37-07:00[0m:
100%|███████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 1033.44it/s]


### Inspect materialized features

Note that now there are `online_store.db` and `registry.db`, which store the materialized features and schema information, respectively

In [7]:
!ls data

driver_stats.parquet  online_store.db  registry.db


## Step 5: Fetch features for offline / online cases

### Read some historical data for model training

We pass in the 'entities' aka the unique identifiers of the example features we want as well as the feature names we want to extract.

Note that we include timestamps because want the features for the same driver at various timestamps to be used in a model

In [8]:
from datetime import datetime, timedelta
import pandas as pd

from feast import FeatureStore

# The entity dataframe is the dataframe we want to enrich with feature values
entity_df = pd.DataFrame.from_dict(
    {
        "driver_id": [1001, 1002, 1003, 1004, 1005],
        "event_timestamp": [
            datetime.now() - timedelta(minutes=11),
            datetime.now() - timedelta(minutes=36),
            datetime.now() - timedelta(minutes=73),
            datetime.now() - timedelta(minutes=124),
            datetime.now() - timedelta(minutes=235),
        ],
    }
)

store = FeatureStore(repo_path=".")

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
).to_df()

print("----- Feature schema -----\n")
print(training_df.info())

print()
print("----- Example features -----\n")
print(training_df.head())

----- Feature schema -----

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype              
---  ------           --------------  -----              
 0   event_timestamp  5 non-null      datetime64[ns, UTC]
 1   driver_id        5 non-null      int64              
 2   conv_rate        5 non-null      float32            
 3   acc_rate         5 non-null      float32            
 4   avg_daily_trips  5 non-null      int32              
dtypes: datetime64[ns, UTC](1), float32(2), int32(1), int64(1)
memory usage: 180.0 bytes
None

----- Example features -----

                   event_timestamp  driver_id  conv_rate  acc_rate  \
0 2021-08-08 14:59:19.238912+00:00       1005   0.730167  0.866257   
1 2021-08-08 16:50:19.238910+00:00       1004   0.313922  0.563453   
2 2021-08-08 17:41:19.238909+00:00       1003   0.630335  0.210270   
3 2021-08-08 18:18:19.238907+00:00       1002   0.341512  0.8

### Read features at serving time


We can also read the latest feature values from the online feature store using `get_online_features()`

In [9]:
from pprint import pprint
from feast import FeatureStore

store = FeatureStore(repo_path=".")

feature_vector = store.get_online_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
    entity_rows=[
        {"driver_id": 1001},
        {"driver_id": 1002},
    ],
).to_dict()

pprint(feature_vector)

{'acc_rate': [0.899538516998291, 0.8277764916419983],
 'avg_daily_trips': [979, 492],
 'conv_rate': [0.5747835636138916, 0.34151211380958557],
 'driver_id': [1001, 1002]}
