## 🧮 Features Fun with Feast

In machine learning, models rely on well-defined **features**—structured data points that help predict outcomes. But managing these features across different projects and environments can quickly become complex. How do you ensure consistency between training and inference? How do you version and share features across teams?

This is where a **Feature Store** comes in. A Feature Store acts as a **centralized repository** for storing, processing, and serving machine learning features. It ensures that features are consistent, reusable, and efficiently retrieved for both training and real-time inference.

### 🔍 What is Feast?

**Feast (Feature Store)** is an open-source framework that simplifies feature management. It provides a **scalable and structured way** to store, retrieve, and serve features for machine learning models. By using Feast, organizations can:
1. **Manage** feature definitions and metadata.
2. **Store** features in offline and online databases.
3. **Serve** features quickly for real-time predictions.

In this notebook, we will explore how to:
1. Set up **Feast** and define feature sets.
2. Materialize features from **offline** to **online stores**.
3. Understand how we can manage features efficiently for **machine learning workflows**.

## ⚙️ Setting Up Feast

Now that we understand what Feast is, it's time to set it up and explore how we can configure and interact with our **Feature Store**.

Before diving into feature definitions and configurations, we first need to **install and import the necessary libraries and dependencies**. Let's get started!

In [None]:
!pip install -q -r requirements.txt

In [2]:
import feast
from datetime import datetime
import yaml
import sys, os

### Feature Store configuration files

Inside the `feature_repo` directory, we have several files that contain configurations and feature definitions.

In `features.py`, we define the list of song features that will be used in our **Feature Store**, such as `energy`, `acousticness`, and others.

On the other hand, Feast uses `feature_store.yaml` to configure the **Feature Store**. This file must be located at the root of a **feature repository**, in our case, the `feature_repo` directory.

In [9]:
sys.path.append(os.path.abspath('feature_repo/'))
from features import music, song_properties
from feature_service import song_properties_fs

In [None]:
with open('feature_repo/feature_store.yaml', 'r') as file:
    fs_config_yaml = yaml.safe_load(file)

fs_config = feast.repo_config.RepoConfig(**fs_config_yaml)
fs = feast.FeatureStore(config=fs_config)

### 🏗️ How does Feast work?

Feast organizes features into three key components:
- **📜 Registry:** A metadata store that keeps track of all feature definitions, sources, and entities.
- **📂 Offline Store:** A long-term storage system that holds historical feature data for training models.
- **⚡ Online Store:** A low-latency store optimized for real-time feature retrieval during inference.

In [None]:
import yaml
# Pretty-print the YAML configuration
print(yaml.dump(fs_config_yaml, default_flow_style=False, sort_keys=False, indent=2))

As mentioned earlier, Feast uses the `feature_store.yaml` file to [configure the Feature Store](https://docs.feast.dev/reference/feature-repository/feature-store-yaml#overview). If you examine the YAML configuration, you will recognize the three key components we just described.

In our setup:
- The **Registry** and **Online Store** are configured to use a **PostgreSQL database**.
- The **Offline Store** is configured as a **file-based storage system**, which can be linked to an S3 bucket (in the `feature.py`, check the music_source that uses a s3 bucket as FileStore).

This configuration ensures that features are efficiently stored, tracked, and served for both training and inference.

## 🎯 Features and Feature Values

In machine learning, **features** are the key pieces of data used as input signals for predictive models. In the context of a dataset, we distinguish between:

- **Feature:** A complete column in the dataset that represents a measurable property (e.g., "energy" or "speechiness" of a song).
- **Feature Value:** A single data point from that feature column (e.g., the "energy" value for a specific song).

Simply put, features provide the structured information that models use to make predictions. In our case, the **energy** and **speechiness** of a song are examples of features that could help determine whether a track becomes a hit.

## 🚀 Applying Feast

Before using our features, we first need to **apply** them. This step registers all the feature definitions inside the `feature_repo` with our **Feast registry**.

In our setup, the registry is stored in a **PostgreSQL database**, which acts as a central metadata store. By applying Feast, we ensure that all feature definitions are properly cataloged and ready to be fetched from the offline store or served in real-time from the online store.

In [5]:
fs.apply([song_properties_fs, music, song_properties])

## Materialize our features
Next step is to materialize the features, what this does is move the features from the offline store into the online store.
Specifically, we move a subset of the features, the ones that's within a defined timeframe and we only store the latest features inside the online store.

In [None]:
fs.materialize(start_date=datetime(2023, 1, 1), end_date=datetime.now())

## Next step
Now that we have set up Feast, let's start using it!
Go to the next notebook to see how we can fetch training features: [2-test_load_historitcal_features.ipynb](2-test_load_historitcal_features.ipynb)