# Quick start

## 1.  Install Feast
Install the Feast SDK and CLI using pip:

In this tutorial, we focus on a local deployment. For a more in-depth guide on how to use Feast with
Snowflake / GCP / AWS deployments, see Running Feast with [Snowflake/GCP/AWS](https://docs.feast.dev/how-to-guides/feast-snowflake-gcp-aws)

```shell
pip install feast

poetry add feast
```

## 2. Step 2: Create a feature repository
Bootstrap a new feature repository using feast init from the command line.

```shell
feast init feature_repo
cd feature_repo
```

This will create a directory in your current location. In my case, I have created the following directory

```shell
/home/pliu/git/FeatureEngineering/feature_stores/01.Feast/01.QuickStart/feature_repo

# when you open it, it has the following contents
├── data
│   └── driver_stats.parquet
├── example.py
├── feature_store.yaml
└── __init__.py

```

- data/ contains raw demo parquet data
- example.py contains demo feature definitions
- feature_store.yaml contains a demo setup configuring where data sources are

### Explore feature_store.yaml
Below is the content of the auto generated feature_store.yaml

```yaml
project: feature_repo
registry: data/registry.db
provider: local
online_store:
    path: data/online_store.db
```

The most import config is the **provider**. This defines where the raw data exists
(for generating training data & feature values for serving), and where to materialize feature values to in the
online store (for serving).

Valid values for provider in feature_store.yaml are:
- local: use file source with SQLite/Redis
- gcp: use BigQuery/Snowflake with Google Cloud Datastore/Redis
- aws: use Redshift/Snowflake with DynamoDB/Redis

Note that there are many other sources Feast works with, including `Azure, Hive, Trino, and PostgreSQL` via community
plugins. See [Third party integrations](https://docs.feast.dev/getting-started/third-party-integrations) for all supported datasources.

A custom setup can also be made by following [adding a custom provider](https://docs.feast.dev/how-to-guides/creating-a-custom-provider).

### Inspecting raw data

The raw feature data we have in this demo is stored in a local parquet file. The dataset captures hourly stats of a driver in a ride-sharing app.

In [1]:
import pandas as pd

data_path="feature_repo/data/driver_stats.parquet"
df=pd.read_parquet(data_path)

df.head()

Unnamed: 0,event_timestamp,driver_id,conv_rate,acc_rate,avg_daily_trips,created
0,2022-06-07 13:00:00+00:00,1005,0.913727,0.034992,655,2022-06-22 13:50:22.688
1,2022-06-07 14:00:00+00:00,1005,0.508678,0.651014,38,2022-06-22 13:50:22.688
2,2022-06-07 15:00:00+00:00,1005,0.896986,0.741025,788,2022-06-22 13:50:22.688
3,2022-06-07 16:00:00+00:00,1005,0.189035,0.729997,894,2022-06-22 13:50:22.688
4,2022-06-07 17:00:00+00:00,1005,0.27172,0.254235,91,2022-06-22 13:50:22.688


## Step 3: Register feature definitions and deploy your feature store

### 3.1 Extract the features from the raw data

Below is the complet python script that extract three feature from the raw data

```python
# This is an example feature definition file

from datetime import timedelta
from feast import Entity, FeatureService, FeatureView, Field, FileSource, ValueType
from feast.types import Float32, Int64

#################### Step 1: Define the data source #################################
# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_hourly_stats = FileSource(
    path="/home/pliu/git/FeatureEngineering/feature_stores/01.Feast/01.QuickStart/feature_repo/data/driver_stats.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created",
)

################## Step 2: Define the data entity ##################################
# Define an entity for the driver. You can think of entity as a primary key used to
# fetch features.
driver = Entity(name="driver", join_keys=["driver_id"], value_type=ValueType.INT64,)


################### Step 3: Define the feature view ###############################
# A feature view contains:
# - name: the name of the view
# - entities: a list of identification column that are used to fetch features
# - schema: a list of columns that represent the feature column
# - source: the FileSource (raw source data) that are used to build the feature views

# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three features column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_hourly_stats_view = FeatureView(
    name="driver_hourly_stats",
    entities=["driver"],
    ttl=timedelta(days=1),
    schema=[
        Field(name="conv_rate", dtype=Float32),
        Field(name="acc_rate", dtype=Float32),
        Field(name="avg_daily_trips", dtype=Int64),
    ],
    online=True,
    source=driver_hourly_stats,
    tags={},
)

################## Step 4: Define the feature store ##############################
# note a feature store can contain multiple feature views
driver_stats_fs = FeatureService(
    name="driver_activity", features=[driver_hourly_stats_view]
)


```

### 3.2 Register the features

To register the feature definitions that we have defined inside the `example.py(python script above)`, we need to run below command

```shell
feast apply
```

This command scans python the python files (e.g. `example.py`) in the current directory for feature view/entity definitions, registers the objects, and deploys infrastructure. In this example, our example.py (shown again below for convenience) uses a SQLite online store tables. Note that we had specified SQLite as the default online store by using the `local provider in feature_store.yaml`.

You should see below output after running `feast apply`

```text
Created entity driver
Created feature view driver_hourly_stats
Created feature service driver_activity

Created sqlite table feature_repo_driver_hourly_stats

```

## Step 4: Generating training data

To train a model, we need features and labels. Often, this label data is stored separately (e.g. you have one table storing user survey results and another set of tables with feature values).

The user can query that table of labels with timestamps and pass that into Feast as an entity dataframe for training data generation. In many cases, Feast will also intelligently join relevant tables to create the relevant feature vectors.

- Note that we include timestamps because want the features for the same driver at various timestamps to be used in a model.

In [2]:
from datetime import datetime, timedelta
import pandas as pd

from feast import FeatureStore

# The entity dataframe is the dataframe we want to enrich with feature values
entity_df = pd.DataFrame.from_dict(
    {
        # entity's join key -> entity values
        "driver_id": [1001, 1002, 1003],

        # label name -> label values
        "label_driver_reported_satisfaction": [1, 5, 3],

        # "event_timestamp" (reserved key) -> timestamps
        "event_timestamp": [
            datetime.now() - timedelta(minutes=11),
            datetime.now() - timedelta(minutes=36),
            datetime.now() - timedelta(minutes=73),
        ],
    }
)
store_path="feature_repo/."
store = FeatureStore(repo_path=store_path)

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
).to_df()

print("----- Feature schema -----\n")
print(training_df.info())

print()
print("----- Example features -----\n")
print(training_df.head())



----- Feature schema -----

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 0 entries
Data columns (total 6 columns):
 #   Column                              Non-Null Count  Dtype              
---  ------                              --------------  -----              
 0   driver_id                           0 non-null      int64              
 1   label_driver_reported_satisfaction  0 non-null      int64              
 2   event_timestamp                     0 non-null      datetime64[ns, UTC]
 3   conv_rate                           0 non-null      float32            
 4   acc_rate                            0 non-null      float32            
 5   avg_daily_trips                     0 non-null      int32              
dtypes: datetime64[ns, UTC](1), float32(2), int32(1), int64(2)
memory usage: 124.0 bytes
None

----- Example features -----

Empty DataFrame
Columns: [driver_id, label_driver_reported_satisfaction, event_timestamp, conv_rate, acc_rate, avg_daily_trips]
Index: []


## Step 5: Load features into your online store

We now serialize the latest values of features since the beginning of time to prepare for serving (note: materialize-incremental serializes all new features since the last materialize call).

Run below bash command

```shell
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME
```

You should see below output

```text
Materializing 1 feature views to 2022-06-22 15:45:11+02:00 into the sqlite online store.

driver_hourly_stats from 2022-06-21 13:45:14+02:00 to 2022-06-22 15:45:11+02:00:
100%|█████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 45.08it/s]

```

## Step 6: Fetching feature vectors for inference

At inference time, we need to quickly read the latest feature values for different drivers (which otherwise might have existed only in batch sources) from the online feature store using `get_online_features()`. These feature vectors can then be fed to the model.

In [7]:
from pprint import pprint
from feast import FeatureStore

store = FeatureStore(repo_path=store_path)

feature_vector = store.get_online_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
    entity_rows=[
        # {join_key: entity_value}
        {"driver_id": 1004},
        {"driver_id": 1005},
    ],
).to_dict()

pprint(feature_vector)

{'acc_rate': [0.669306755065918, 0.4245249927043915],
 'avg_daily_trips': [147, 578],
 'conv_rate': [0.7321154475212097, 0.7953137755393982],
 'driver_id': [1004, 1005]}


## Step 7: Using a feature service to fetch online features instead.

You can also use feature services to manage multiple features, and decouple feature view definitions and the features needed by end applications. The feature store can also be used to fetch either online or historical features using the same api below. More information can be found [here](https://docs.feast.dev/getting-started/concepts/feature-retrieval).

In [8]:
from feast import FeatureStore
feature_store = FeatureStore(store_path)  # Initialize the feature store

feature_service = feature_store.get_feature_service("driver_activity")
features = feature_store.get_online_features(
    features=feature_service,
    entity_rows=[
        # {join_key: entity_value}
        {"driver_id": 1004},
        {"driver_id": 1005},
    ],
).to_dict()

In [10]:
pprint(features)

{'acc_rate': [0.669306755065918, 0.4245249927043915],
 'avg_daily_trips': [147, 578],
 'conv_rate': [0.7321154475212097, 0.7953137755393982],
 'driver_id': [1004, 1005]}


## Step 8: Browse your features with the Web UI (experimental)

View all registered features, data sources, entities, and feature services with the Web UI
Run below command

```shell
# start web ui on port 8080
feast ui -p 8080
```
