# FEAST - How to initialise programmatically

Instead of using feature_store.yaml and feast CLI, use SDK.


In [1]:
%%html
<style>
table {float:left}
</style>

In [3]:
import subprocess
from datetime import (
    datetime,
    timedelta
)

import pandas as pd
from feast import (
    Entity,
    FeatureService,
    FeatureView,
    Field,
    FileSource,
    Project,
    PushSource,
    RequestSource,
    RepoConfig,
)
from feast.repo_config import (
    RegistryConfig
)
from feast.feature_logging import LoggingConfig
from feast.infra.offline_stores.file_source import FileLoggingDestination
from feast.infra.online_stores.sqlite import (
    SqliteOnlineStoreConfig
)
from feast.on_demand_feature_view import on_demand_feature_view
from feast.types import (
    Float32, Float64, Int64
)
from feast import FeatureStore
from feast.data_source import PushMode

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# FEAST Project

## Create Project 

Like ```git init <directory>```, ```feast init <project_directory<>``` creates the blueprint or skeleton of your feature store.

```
my_project/feature_repo
├── data
│   └── driver_stats.parquet
├── example_repo.py
└── feature_store.yaml
```


In [4]:
# !feast init my_project
%cd my_project/feature_repo
%pwd

/Users/onishima/Documents/home/repository/git/FEAST/feast/my_project/feature_repo


'/Users/onishima/Documents/home/repository/git/FEAST/feast/my_project/feature_repo'

### Project as Namespace

* [FEAST Project](https://docs.feast.dev/getting-started/concepts/project)

> Projects provide complete isolation of feature stores at the infrastructure level. This is accomplished through **resource namespacing, e.g., prefixing table names with the associated project**. Each project should be considered a completely separate universe of entities and features. 

### Project Configuration

```feast configuration``` shows the project configurations defined in ```feature_store.yaml```.

The following top-level configuration options exist in the feature_store.yaml file.

| Item          | Description                                                                | Value                                                  |
|---------------|----------------------------------------------------------------------------|--------------------------------------------------------|
| project       | a namespace for the entire feature store.                                  |                                                        |
| provider      | provider is an implementation of a feature store, like Terraform provider. | local aws gcp                                          |
| registry      | central catalog of all feature definitions and their related metadata.     | data/registry.db s3://feast-test-s3-bucket/registry.pb |
| online_store  | Low latency feature server implementation.                                 | ```type: dynamodb ```               |
| offline_store | Computation Engine for Transformation and Materialisation.                 | ```type: redshift ```              |

## Deploy Project



In [5]:
!feast teardown
# !feast apply

## Repository

In [6]:
repo_config = RepoConfig(
    registry=RegistryConfig(path="data/registry.db"),
    project="my_project",
    provider="local",
    offline_store="file",  # Could also be the OfflineStoreConfig e.g. FileOfflineStoreConfig
    online_store=SqliteOnlineStoreConfig(path='data/online_store.db')
)

## File Data Source

In [7]:
# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_hourly_stats = FileSource(
    name="driver_hourly_stats_source",
    path="data/driver_stats.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created",
)

## Entity

In [None]:
# Define an entity for the driver. You can think of an entity as a primary key used to
# fetch features.
driver = Entity(name="driver", join_keys=["driver_id"], value_type=Int64)

## Feature View

In [8]:
# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_hourly_stats_view = FeatureView(
    # The unique name of this feature view. Two feature views in a single
    # project cannot have the same name
    name="driver_hourly_stats",
    entities=[driver],
    ttl=timedelta(days=1),
    # The list of features defined below act as a schema to both define features
    # for both materialization of features into a store, and are used as references
    # during retrieval for building a training dataset or serving features
    schema=[
        Field(name="conv_rate", dtype=Float32),
        Field(name="acc_rate", dtype=Float32),
        Field(name="avg_daily_trips", dtype=Int64, description="Average daily trips"),
    ],
    online=True,
    source=driver_hourly_stats,           # <--- Link to the raw data storage technology
    # Tags are user defined key/value pairs that are attached to each
    # feature view
    tags={"team": "driver_performance"},
)

  driver = Entity(name="driver", join_keys=["driver_id"])


## Feature Service

In [9]:
driver_activity_fs_v1 = FeatureService(
    name="driver_activity_v1",
    features=[
        # driver_hourly_stats_view[["conv_rate", "acc_rate"]],  # Sub-selects a feature from a feature view
        driver_hourly_stats_view,
    ],
    logging_config=LoggingConfig(
        destination=FileLoggingDestination(path="data")
    ),
)

## Feature Store

In [None]:
feature_store = FeatureStore(config=repo_config)

## Registration

In [11]:
feature_store.apply(objects=[
    driver_hourly_stats,       # Data source
    driver,                    # Entity
    driver_hourly_stats_view,  # Feature View
    driver_activity_fs_v1      # Feature Service
])

## Result

In [12]:
!feast configuration

project: my_project
provider: local
registry: data/registry.db
online_store:
  type: sqlite
  path: data/online_store.db
auth:
  type: no_auth
offline_store:
  type: file
batch_engine: local
entity_key_serialization_version: 3



In [13]:
!feast feature-views list

NAME                 ENTITIES    TYPE
driver_hourly_stats  {'driver'}  FeatureView


In [14]:
!feast entities list

NAME    DESCRIPTION    TYPE
driver                 ValueType.UNKNOWN


---

# Data

In [15]:
driver_stats_df = pd.read_parquet("data/driver_stats.parquet")

print(driver_stats_df.info())
driver_stats_df.head(5)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1807 entries, 0 to 1806
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype              
---  ------           --------------  -----              
 0   event_timestamp  1807 non-null   datetime64[ns, UTC]
 1   driver_id        1807 non-null   int64              
 2   conv_rate        1807 non-null   float32            
 3   acc_rate         1807 non-null   float32            
 4   avg_daily_trips  1807 non-null   int32              
 5   created          1807 non-null   datetime64[us]     
dtypes: datetime64[ns, UTC](1), datetime64[us](1), float32(2), int32(1), int64(1)
memory usage: 63.7 KB
None


Unnamed: 0,event_timestamp,driver_id,conv_rate,acc_rate,avg_daily_trips,created
0,2025-07-22 14:00:00+00:00,1005,0.80306,0.44058,397,2025-08-06 14:28:04.645
1,2025-07-22 15:00:00+00:00,1005,0.247837,0.249946,313,2025-08-06 14:28:04.645
2,2025-07-22 16:00:00+00:00,1005,0.63339,0.618245,206,2025-08-06 14:28:04.645
3,2025-07-22 17:00:00+00:00,1005,0.227286,0.701076,600,2025-08-06 14:28:04.645
4,2025-07-22 18:00:00+00:00,1005,0.595457,0.991147,545,2025-08-06 14:28:04.645


## Query Columns from Offline Store

* [get_historical_features](https://rtd.feast.dev/en/master/#feast.feature_store.FeatureStore.get_historical_features)

> This method joins historical feature data from one or more feature views to an entity dataframe by using a time travel join. Each feature view is joined to the entity dataframe using all entities configured for the respective feature view.
>
> **Parameters**  
> * ```entity_df```: a collection of rows containing all entity columns (e.g., driver_id) on which features need to be joined, as well as a event_timestamp column used to ensure point-in-time correctness.
> 
> **Returns**: RetrievalJob which can be used to materialize the results.

* [RetrievalJob](https://rtd.feast.dev/en/master/#feast.infra.offline_stores.offline_store.RetrievalJob)

> A RetrievalJob manages the execution of a query to retrieve data from the offline store.  
> **Methods**  
> * [to_df](https://rtd.feast.dev/en/master/#feast.infra.offline_stores.offline_store.RetrievalJob.to_df): 
> Synchronously executes the underlying query and returns the result as a pandas dataframe. On demand transformations will be executed. 

* [FEAST Feature Store - What is event_timestamp in entity_df parameter of FeatureStore.get_historical_features method](https://stackoverflow.com/q/79714277/4281353)

In [25]:
entity_df = pd.DataFrame.from_dict(
    {
        "driver_id": [1001],
        "event_timestamp": [
            datetime(2025, 7, 22, 14, 00, 00),   # Need to be exact value match
            # datetime.now()
        ],
    }
)

In [26]:
#entity_df["event_timestamp"] = pd.to_datetime("now", utc=True)
training_df = feature_store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
).to_df()

In [27]:
training_df

Unnamed: 0,driver_id,event_timestamp,conv_rate,acc_rate,avg_daily_trips
0,1001,2025-07-22 14:00:00+00:00,0.453962,0.325967,502


### Use SQL as entity_df

* [Example: entity SQL query for generating training data](https://docs.feast.dev/getting-started/concepts/feature-retrieval#example-entity-sql-query-for-generating-training-data)

It looks the function is not implemented in FEAST. Inquiry [Feaset Slack Question](https://feastopensource.slack.com/archives/C01M2GYP0UC/p1754792525332979).

```
File feast/infra/offline_stores/file_source.py:228, in FileSource.get_table_query_string(self)
    227 def get_table_query_string(self) -> str:
--> 228     raise NotImplementedError
```


In [30]:

# SQL query for entity_df (example using DuckDB or BigQuery as the offline store)
# entity_df_sql = f"""
# SELECT
#     driver_id,
#     event_timestamp
# FROM {feature_store.get_data_source("driver_hourly_stats_source").get_table_query_string()}
# WHERE driver_id IS NOT NULL
# LIMIT 100
# """
entity_df_sql = f"""
SELECT
    driver_id,
    event_timestamp
FROM driver_hourly_stats
WHERE driver_id IS NOT NULL
LIMIT 100
"""
training_df = feature_store.get_historical_features(
    entity_df=entity_df_sql,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
).to_df()

print(training_df.head())


ValueError: Please provide an entity_df of type <class 'type'> instead of type <class 'str'>

## Query Columns from Online Store

Need to run the materialization command to get columns from the online store.

### Materialise Online Store

In [44]:
#!feast materialize-incremental 2025-08-22T14:00:00
feature_store.materialize(
    start_date=datetime.strptime("2000-08-22T14:00:00", "%Y-%m-%dT%H:%M:%S"),
    end_date=datetime.strptime("2025-08-22T14:00:00", "%Y-%m-%dT%H:%M:%S")
)

Materializing [1m[32m1[0m feature views from [1m[32m2000-08-22 14:00:00+00:00[0m to [1m[32m2025-08-22 14:00:00+00:00[0m into the [1m[32msqlite[0m online store.

[1m[32mdriver_hourly_stats[0m:


In [45]:
driver_stats_fs = feature_store.get_feature_service("driver_activity_v1")

feature_vector = feature_store.get_online_features(
    features=driver_stats_fs,
    entity_rows=[
        {
            "driver_id": 1001,
        }
    ]
).to_df()

In [46]:
feature_vector

Unnamed: 0,driver_id,avg_daily_trips,conv_rate,acc_rate
0,1001,828,0.86841,0.784907


In [47]:
features = feature_store.get_online_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
    entity_rows=[
        {
            "driver_id": 1001,
        }
    ],
).to_df()

In [48]:
features

Unnamed: 0,driver_id,avg_daily_trips,conv_rate,acc_rate
0,1001,828,0.86841,0.784907
