# FEAST Get Started

Use a sepeate VENV with FEAST related only installation. Otherwise, lots of errors e.g. [Expected 96 from C header, got 88 from PyObject](https://github.com/sinaptik-ai/pandas-ai/issues/1251).

* [FEAST Quickstart](https://docs.feast.dev/getting-started/quickstart)
* [quickstart.ipynb](https://github.com/feast-dev/feast/blob/master/examples/quickstart/quickstart.ipynb).

FEAST document and samples are poorly written. Better tutorials: 

* [Creating a Feature Store with Feast - Part 1: Building a Local Feature Store for ML Training and Prediction](https://kedion.medium.com/creating-a-feature-store-with-feast-part-1-37c380223e2f)
* [Creating a Feature Store with Feast - Part 2: Validating Data with Feast and Great Expectations](https://kedion.medium.com/feature-storage-for-ml-with-feast-part-2-34df1971a8d3)
* [Creating a Feature Store with Feast - Part 3: Building An API and React App for Feast](https://kedion.medium.com/feature-storage-for-ml-with-feast-a061899fc4a2)

* [Streamlining ML development with Feast](https://cloud.google.com/blog/products/databases/how-feast-feature-store-streamlines-ml-development)
* [MLOps 03: Feast Feature Store — An In-depth Overview Experimentation and Application in Tabular data](https://medium.com/@ongxuanhong/mlops-03-feast-feature-store-an-in-depth-overview-experimentation-and-application-in-tabular-b9d1c5376483)

* [Feast: The open source feature store for AI May 16, 2025](https://www.redhat.com/en/blog/feast-open-source-feature-store-ai)

## Warning

FEAST documentation and sample code in [FEAST Quickstart](https://docs.feast.dev/getting-started/quickstart) are poorly written with incosistencies. Do not rely but see [FEAST Quickstart](https://docs.feast.dev/getting-started/quickstart).


In [1]:
%%html
<style>
table {float:left}
</style>

In [2]:
import subprocess
from datetime import (
    datetime,
    timedelta
)

import pandas as pd
from feast import (
    Entity,
    FeatureService,
    FeatureView,
    Field,
    FileSource,
    Project,
    PushSource,
    RequestSource,
)
from feast.feature_logging import LoggingConfig
from feast.infra.offline_stores.file_source import FileLoggingDestination
from feast.on_demand_feature_view import on_demand_feature_view
from feast.types import Float32, Float64, Int64

from feast import FeatureStore
from feast.data_source import PushMode

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# FEAST Project

## Create Project 

Like ```git init <directory>```, ```feast init <project_directory<>``` creates the blueprint or skeleton of your feature store.

```
my_project/feature_repo
├── data
│   └── driver_stats.parquet
├── example_repo.py
└── feature_store.yaml
```


In [3]:
# !feast init my_project
%cd my_project/feature_repo
%pwd

/Users/oonisim/home/repository/git/oonisim/feast/my_project/feature_repo


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


'/Users/oonisim/home/repository/git/oonisim/feast/my_project/feature_repo'

### Project as Namespace

* [FEAST Project](https://docs.feast.dev/getting-started/concepts/project)

> Projects provide complete isolation of feature stores at the infrastructure level. This is accomplished through **resource namespacing, e.g., prefixing table names with the associated project**. Each project should be considered a completely separate universe of entities and features. 

### Project Configuration

```feast configuration``` shows the project configurations defined in ```feature_store.yaml```.

The following top-level configuration options exist in the feature_store.yaml file.

| Item          | Description                                                                | Value                                                  |
|---------------|----------------------------------------------------------------------------|--------------------------------------------------------|
| project       | a namespace for the entire feature store.                                  |                                                        |
| provider      | provider is an implementation of a feature store, like Terraform provider. | local aws gcp                                          |
| registry      | central catalog of all feature definitions and their related metadata.     | data/registry.db s3://feast-test-s3-bucket/registry.pb |
| online_store  | Low latency feature server implementation.                                 | ```type: dynamodb ```               |
| offline_store | Computation Engine for Transformation and Materialisation.                 | ```type: redshift ```              |

In [4]:
! feast configuration

project: my_project
provider: local
registry: data/registry.db
online_store:
  type: sqlite
  path: data/online_store.db
auth:
  type: no_auth
offline_store:
  type: file
batch_engine: local
entity_key_serialization_version: 3



## Deploy Project



In [5]:
!feast teardown
!feast apply

  driver = Entity(name="driver", join_keys=["driver_id"])
Applying changes for project my_project
Created project [1m[32mmy_project[0m
Created entity [1m[32mdriver[0m
Created feature view [1m[32mdriver_hourly_stats[0m
Created feature service [1m[32mdriver_activity_v1[0m

Created sqlite table [1m[32mmy_project_driver_hourly_stats[0m



In [6]:
!feast feature-views list

NAME                 ENTITIES    TYPE
driver_hourly_stats  {'driver'}  FeatureView


In [7]:
!feast entities list

NAME    DESCRIPTION    TYPE
driver                 ValueType.UNKNOWN


---

# Data

In [8]:
driver_stats_df = pd.read_parquet("data/driver_stats.parquet")

print(driver_stats_df.info())
driver_stats_df.head(5)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1807 entries, 0 to 1806
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype              
---  ------           --------------  -----              
 0   event_timestamp  1807 non-null   datetime64[ns, UTC]
 1   driver_id        1807 non-null   int64              
 2   conv_rate        1807 non-null   float32            
 3   acc_rate         1807 non-null   float32            
 4   avg_daily_trips  1807 non-null   int32              
 5   created          1807 non-null   datetime64[us]     
dtypes: datetime64[ns, UTC](1), datetime64[us](1), float32(2), int32(1), int64(1)
memory usage: 63.7 KB
None


Unnamed: 0,event_timestamp,driver_id,conv_rate,acc_rate,avg_daily_trips,created
0,2025-07-22 14:00:00+00:00,1005,0.80306,0.44058,397,2025-08-06 14:28:04.645
1,2025-07-22 15:00:00+00:00,1005,0.247837,0.249946,313,2025-08-06 14:28:04.645
2,2025-07-22 16:00:00+00:00,1005,0.63339,0.618245,206,2025-08-06 14:28:04.645
3,2025-07-22 17:00:00+00:00,1005,0.227286,0.701076,600,2025-08-06 14:28:04.645
4,2025-07-22 18:00:00+00:00,1005,0.595457,0.991147,545,2025-08-06 14:28:04.645


---
# Feature View

The core idea of FEAST is, FEAST does NOT store the raw data but **manages how this data is accessed and interpreted**.

Feature View is a separation of concern - segregate raw data storage technology and location.

## Definition

Feature View is defined in ```feature_repo/example_repo.py```.


* [Feature view](https://docs.feast.dev/master/getting-started/concepts/feature-view)

> In the offline setting, Feature View is a stateless collection of features that are created when the [get_historical_features](https://rtd.feast.dev/en/master/#feast.feature_store.FeatureStore.get_historical_features) method is called.



### Data Source (FileSource)

FEAST way of Encapsulation on where the data is and how to access it.

### Raw Features (FeatureView)

FEAST way of selecting raw features.

In [9]:
#driver_hourly_stats_view

# Feature Store

In [10]:
feature_store = FeatureStore(repo_path=".")

## Query Columns from Offline Store

* [get_historical_features](https://rtd.feast.dev/en/master/#feast.feature_store.FeatureStore.get_historical_features)

> This method joins historical feature data from one or more feature views to an entity dataframe by using a time travel join. Each feature view is joined to the entity dataframe using all entities configured for the respective feature view.
>
> **Parameters**  
> * ```entity_df```: a collection of rows containing all entity columns (e.g., driver_id) on which features need to be joined, as well as a event_timestamp column used to ensure point-in-time correctness.
> 
> **Returns**: RetrievalJob which can be used to materialize the results.

* [RetrievalJob](https://rtd.feast.dev/en/master/#feast.infra.offline_stores.offline_store.RetrievalJob)

> A RetrievalJob manages the execution of a query to retrieve data from the offline store.  
> **Methods**  
> * [to_df](https://rtd.feast.dev/en/master/#feast.infra.offline_stores.offline_store.RetrievalJob.to_df): 
> Synchronously executes the underlying query and returns the result as a pandas dataframe. On demand transformations will be executed. 

What is ```event_timestamp``` in ```entity_df```? There is no record in the data source that matches ```(driver_id, event_timestamp)==(1001, datetime(2021, 4, 12, 10, 59, 42))```.

* [FEAST Feature Store - What is event_timestamp in entity_df parameter of FeatureStore.get_historical_features method](https://stackoverflow.com/q/79714277/4281353)

In [11]:
entity_df = pd.DataFrame.from_dict(
    {
        "driver_id": [1001],
        "event_timestamp": [
            datetime(2025, 7, 22, 14, 00, 00),   # Need to be exact value match
            #datetime.now()
        ],
    }
)

In [12]:
#entity_df["event_timestamp"] = pd.to_datetime("now", utc=True)
training_df = feature_store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
).to_df()

In [13]:
training_df

Unnamed: 0,driver_id,event_timestamp,conv_rate,acc_rate,avg_daily_trips
0,1001,2025-07-22 14:00:00+00:00,0.453962,0.325967,502


### Use SQL as entity_df

* [Example: entity SQL query for generating training data](https://docs.feast.dev/getting-started/concepts/feature-retrieval#example-entity-sql-query-for-generating-training-data)

It looks the function is not implemented in FEAST. Inquiry [Feaset Slack Question](https://feastopensource.slack.com/archives/C01M2GYP0UC/p1754792525332979).

```
File feast/infra/offline_stores/file_source.py:228, in FileSource.get_table_query_string(self)
    227 def get_table_query_string(self) -> str:
--> 228     raise NotImplementedError
```


In [17]:

# SQL query for entity_df (example using DuckDB or BigQuery as the offline store)
entity_df_sql = f"""
SELECT
    driver_id,
    event_timestamp
FROM {feature_store.get_data_source("driver_hourly_stats_source").get_table_query_string()}
WHERE driver_id IS NOT NULL
LIMIT 100
"""

training_df = feature_store.get_historical_features(
    entity_df=entity_df_sql,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
).to_df()

print(training_df.head())


NotImplementedError: 

## Query Columns from Online Store

In [14]:
driver_stats_fs = feature_store.get_feature_service("driver_activity_v1")

feature_vector = feature_store.get_online_features(
    features=driver_stats_fs,
    entity_rows=[
        {
            "driver_id": 1001,
        }
    ]
).to_df()

In [15]:
feature_vector

Unnamed: 0,driver_id,conv_rate,acc_rate,avg_daily_trips
0,1001,,,
