## 📂 Offline Features

With our features now registered, we can use them to create a **training dataset** by fetching the defined features from the **Offline Store**. 

In this example, we'll retrieve the **full dataset** with all available features. However, as we’ll explore in this notebook, we can easily:
- Select **specific features** instead of the full set.
- Combine features from **different FeatureViews**.
- Filter data based on a **specific time window**.

This flexibility allows us to tailor the dataset to the needs of our machine learning model.

## 📦 Importing Dependencies
Before working with Feast, we need to import the necessary libraries:

In [None]:
import feast
import pandas as pd
from datetime import datetime
import psycopg2
import yaml
import numpy as np

## ⚙️ Loading the Feature Store Configuration  

As we saw previously, before we can interact with **Feast**, we need to load its configuration from the `feature_store.yaml` file. This file defines how the Feature Store is set up, including connections to the registry, online store, and offline store.

In [None]:
with open('feature_repo/feature_store.yaml', 'r') as file:
    fs_config_yaml = yaml.safe_load(file)

fs_config = feast.repo_config.RepoConfig(**fs_config_yaml)
fs = feast.FeatureStore(config=fs_config)

With this code we are aiming the following:
- Reads the **feature store configuration** from `feature_store.yaml`.
- Initializes a **FeatureStore** object to enable feature retrieval and management.
- Sets up the **connection** to the PostgreSQL registry and storage locations.

Once this setup is complete, we can start querying feature data from the Feature Store!

### Point-in-time correctness
To get datapoints from Feast, we have to provide it with the ID we want to get features for as well as a date.  
Feast then finds the most recent feature values before that date for the selected features. This ensures that our models don’t accidentally use future information, preventing data leakage.  

In our case, we are using the song_rankings dataset to grab features from the song_properties data and merge them together.  
Because we have multiple songs from the same date and with the same ID in the rankings dataset, we are also adding a small date delta to make sure the Feast doesn't discard any of the duplicates.

In [None]:
training_dataset = pd.read_parquet("../99-data_prep/song_rankings.parquet")
# Feast will remove rows with identical id and date so we add a small delta to each
microsecond_deltas = np.arange(0, len(training_dataset))*2
training_dataset['snapshot_date'] = training_dataset['snapshot_date'] + pd.to_timedelta(microsecond_deltas, unit='us')
# Lets wakeup with some Espresso ☕ and see how our data changed over time in France 
training_dataset[
    (training_dataset['name'] == 'Espresso') &
    (training_dataset['country'] == 'FR')
]

As we checked, we need to:
- Load the **Parquet dataset** into a Pandas DataFrame.
- Ensure unique **timestamp values** by adding microsecond deltas.
- Prepare the dataset so Feast can match features using **entity keys and timestamps**.

## 🏷️ Selecting Features for Training  

Now we can specify what features we want to get, note that we also say what Feature View those features come from.

In [None]:
features=[
        "song_properties:is_explicit",
        "song_properties:duration_ms",
        "song_properties:danceability",
        "song_properties:energy",
        "song_properties:key",
        "song_properties:loudness",
        "song_properties:mode",
        "song_properties:speechiness",
        "song_properties:acousticness",
        "song_properties:instrumentalness",
        "song_properties:liveness",
        "song_properties:valence",
        "song_properties:tempo"
]

This list of features does the following:
- Specifies **song-related features** such as `energy`, `danceability`, and `tempo`.
- Uses the **feature naming convention** (`song_properties:<feature_name>`) to retrieve the correct data.
- Allows flexibility in selecting features based on the **model’s needs**.

## 🔄 Retrieving Historical Features  

Finally, we retrieve the historical feature values for our dataset from Feast.

In [None]:
training_df = fs.get_historical_features(entity_df=training_dataset, features=features).to_df()
training_df

Finally, when we retrieve the historical feature values for our dataset from Feast, we are doing the following:
- Queries the **Offline Store** to retrieve features for each song and timestamp.
- Matches **entity keys** (e.g., song IDs) with stored features.
- Converts the output into a **Pandas DataFrame** for easy use in training models.

With these steps completed, we now have a **fully-featured dataset**, enriched with historical values, ready for training! 🚀

We now know how to get training data from Feast, let's next look at how to use it during inference.  
In the next notebook we will fetch Online Features: [3-test_load_online_features.ipynb](3-test_load_online_features.ipynb)