# FEAST Get Started

* [FEAST Quickstart](https://docs.feast.dev/getting-started/quickstart)

Same with [quickstart.ipynb](https://github.com/feast-dev/feast/blob/master/examples/quickstart/quickstart.ipynb).

# Data

In [34]:
import subprocess
from datetime import datetime

import pandas as pd

from feast import FeatureStore
from feast.data_source import PushMode

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [26]:
driver_stats_df = pd.read_parquet("data/driver_stats.parquet")

print(driver_stats_df.info())
driver_stats_df.head(5)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1807 entries, 0 to 1806
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype              
---  ------           --------------  -----              
 0   event_timestamp  1807 non-null   datetime64[ns, UTC]
 1   driver_id        1807 non-null   int64              
 2   conv_rate        1807 non-null   float32            
 3   acc_rate         1807 non-null   float32            
 4   avg_daily_trips  1807 non-null   int32              
 5   created          1807 non-null   datetime64[us]     
dtypes: datetime64[ns, UTC](1), datetime64[us](1), float32(2), int32(1), int64(1)
memory usage: 63.7 KB
None


Unnamed: 0,event_timestamp,driver_id,conv_rate,acc_rate,avg_daily_trips,created
0,2025-07-10 12:00:00+00:00,1005,0.332461,0.62452,163,2025-07-25 12:53:26.095
1,2025-07-10 13:00:00+00:00,1005,0.980694,0.55647,474,2025-07-25 12:53:26.095
2,2025-07-10 14:00:00+00:00,1005,0.895391,0.477705,306,2025-07-25 12:53:26.095
3,2025-07-10 15:00:00+00:00,1005,0.802549,0.86371,352,2025-07-25 12:53:26.095
4,2025-07-10 16:00:00+00:00,1005,0.065515,0.09165,488,2025-07-25 12:53:26.095


# FEAST Project

In [53]:
! feast configuration

project: my_project
provider: local
registry: data/registry.db
online_store:
  type: sqlite
  path: data/online_store.db
auth:
  type: no_auth
offline_store: dask
batch_engine: local
entity_key_serialization_version: 3



In [54]:
! feast apply

  driver = Entity(name="driver", join_keys=["driver_id"])
Applying changes for project my_project
Created project [1m[32mmy_project[0m
Created entity [1m[32mdriver[0m
Created feature view [1m[32mdriver_hourly_stats_fresh[0m
Created feature view [1m[32mdriver_hourly_stats[0m
Created on demand feature view [1m[32mtransformed_conv_rate[0m
Created on demand feature view [1m[32mtransformed_conv_rate_fresh[0m
Created feature service [1m[32mdriver_activity_v2[0m
Created feature service [1m[32mdriver_activity_v1[0m
Created feature service [1m[32mdriver_activity_v3[0m

Created sqlite table [1m[32mmy_project_driver_hourly_stats_fresh[0m
Created sqlite table [1m[32mmy_project_driver_hourly_stats[0m



```
Created project my_project
Created entity driver
Created feature view driver_hourly_stats
Created feature view driver_hourly_stats_fresh
Created on demand feature view transformed_conv_rate_fresh
Created on demand feature view transformed_conv_rate
Created feature service driver_activity_v1
Created feature service driver_activity_v3
Created feature service driver_activity_v2

WARNING:root:Cannot use sqlite_vec for vector search
WARNING:root:Cannot use sqlite_vec for vector search
Created sqlite table my_project_driver_hourly_stats_fresh
Created sqlite table my_project_driver_hourly_stats
```

In [55]:
!feast feature-views list

NAME                         ENTITIES    TYPE
driver_hourly_stats_fresh    {'driver'}  FeatureView
driver_hourly_stats          {'driver'}  FeatureView
transformed_conv_rate        {'driver'}  OnDemandFeatureView
transformed_conv_rate_fresh  {'driver'}  OnDemandFeatureView


In [56]:
!feast entities list

NAME    DESCRIPTION    TYPE
driver                 ValueType.UNKNOWN


---
# Feature View

Feature View is defined in ```feature_repo/example_repo.py```.


* [Feature view](https://docs.feast.dev/master/getting-started/concepts/feature-view)

> In the offline setting, Feature View is a stateless collection of features that are created when the [get_historical_features](https://rtd.feast.dev/en/master/#feast.feature_store.FeatureStore.get_historical_features) method is called.



In [None]:
```
# Define an entity for the driver. You can think of an entity as a primary key used to
# fetch features.
driver = Entity(name="driver", join_keys=["driver_id"])

# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_stats_source = FileSource(
    name="driver_hourly_stats_source",
    path="data/driver_stats.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created",
)

# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_stats_fv = FeatureView(
    # The unique name of this feature view. Two feature views in a single
    # project cannot have the same name
    name="driver_hourly_stats",
    entities=[driver],
    ttl=timedelta(days=1),
    # The list of features defined below act as a schema to both define features
    # for both materialization of features into a store, and are used as references
    # during retrieval for building a training dataset or serving features
    schema=[
        Field(name="conv_rate", dtype=Float32),
        Field(name="acc_rate", dtype=Float32),
        Field(name="avg_daily_trips", dtype=Int64, description="Average daily trips"),
    ],
    online=True,
    source=driver_stats_source,
    # Tags are user defined key/value pairs that are attached to each
    # feature view
    tags={"team": "driver_performance"},
)
```

NAME                         ENTITIES    TYPE
driver_hourly_stats          {'driver'}  FeatureView
driver_hourly_stats_fresh    {'driver'}  FeatureView
transformed_conv_rate_fresh  {'driver'}  OnDemandFeatureView
transformed_conv_rate        {'driver'}  OnDemandFeatureView


## Select Features



* [get_historical_features](https://rtd.feast.dev/en/master/#feast.feature_store.FeatureStore.get_historical_features)

> This method joins historical feature data from one or more feature views to an entity dataframe by using a time travel join. Each feature view is joined to the entity dataframe using all entities configured for the respective feature view.
>
> **Parameters**  
> * ```entity_df```: a collection of rows containing all entity columns (e.g., driver_id) on which features need to be joined, as well as a event_timestamp column used to ensure point-in-time correctness.
> 
> **Returns**: RetrievalJob which can be used to materialize the results.

* [RetrievalJob](https://rtd.feast.dev/en/master/#feast.infra.offline_stores.offline_store.RetrievalJob)

> A RetrievalJob manages the execution of a query to retrieve data from the offline store.  
> **Methods**  
> * [to_df](https://rtd.feast.dev/en/master/#feast.infra.offline_stores.offline_store.RetrievalJob.to_df): 
> Synchronously executes the underlying query and returns the result as a pandas dataframe. On demand transformations will be executed. 

In [17]:
store = FeatureStore(repo_path=".")

What is ```event_timestamp``` in ```entity_df```? There is no record in the data source that matches ```(driver_id, event_timestamp)==(1001, datetime(2021, 4, 12, 10, 59, 42))```.

* [FEAST Feature Store - What is event_timestamp in entity_df parameter of FeatureStore.get_historical_features method](https://stackoverflow.com/q/79714277/4281353)

In [48]:
driver_stats_df[
    (driver_stats_df['driver_id'] == 1001) &
    (driver_stats_df['event_timestamp'] == pd.to_datetime(datetime(2021, 4, 12, 10, 59, 42), utc=True))
]

Unnamed: 0,event_timestamp,driver_id,conv_rate,acc_rate,avg_daily_trips,created


In [18]:
def fetch_historical_features_entity_df(store: FeatureStore, for_batch_scoring: bool):
    # Note: see https://docs.feast.dev/getting-started/concepts/feature-retrieval for more details on how to retrieve
    # for all entities in the offline store instead
    entity_df = pd.DataFrame.from_dict(
        {
            # entity's join key -> entity values
            "driver_id": [1001, 1002, 1003],
            # "event_timestamp" (reserved key) -> timestamps
            "event_timestamp": [
                datetime(2021, 4, 12, 10, 59, 42),
                datetime(2021, 4, 12, 8, 12, 10),
                datetime(2021, 4, 12, 16, 40, 26),
            ],
            # (optional) label name -> label values. Feast does not process these
            "label_driver_reported_satisfaction": [1, 5, 3],
            # values we're using for an on-demand transformation
            "val_to_add": [1, 2, 3],
            "val_to_add_2": [10, 20, 30],
        }
    )
        # For batch scoring, we want the latest timestamps
    if for_batch_scoring:
        entity_df["event_timestamp"] = pd.to_datetime("now", utc=True)

    # From the Pandas DataFrame, generate a FeatureView
    training_df = store.get_historical_features(
        entity_df=entity_df,
        features=[
            "driver_hourly_stats:conv_rate",
            "driver_hourly_stats:acc_rate",
            "driver_hourly_stats:avg_daily_trips",
            "transformed_conv_rate:conv_rate_plus_val1",
            "transformed_conv_rate:conv_rate_plus_val2",
        ],
    ).to_df()
    return training_df

In [20]:
df = fetch_historical_features_entity_df(store=store, for_batch_scoring=False)

In [22]:
df

Unnamed: 0,driver_id,event_timestamp,label_driver_reported_satisfaction,val_to_add,val_to_add_2,conv_rate,acc_rate,avg_daily_trips,conv_rate_plus_val1,conv_rate_plus_val2
0,1001,2021-04-12 10:59:42+00:00,1,1,10,0.941475,0.381865,607,1.941475,10.941475
1,1002,2021-04-12 08:12:10+00:00,5,2,20,0.980638,0.508142,874,2.980638,20.980638
2,1003,2021-04-12 16:40:26+00:00,3,3,30,0.815404,0.601665,100,3.815404,30.815404
