Copyright (c) Microsoft Corporation.
Licensed under the MIT license.

# Feast Azure Provider Tutorial: Register Features

In this notebook you will connect to your feature store and register features into a central repository hosted on Azure Blob Storage. It should be noted that best practice for registering features would be through a CI/CD process e.g. GitHub Actions, or Azure DevOps.

## What is Feast?

Feast is an operational data system for managing and serving machine learning features to models in production. Feast is able to serve feature data to models from a low-latency online store (for real-time prediction) or from an offline store (for scale-out batch scoring or model training).

![feast overview](../media/feast-overview.png)

## Configure Feature Repository

The cell below displays the feature_store.yaml file - a file that contains infrastructural configuration, such as where the registry file is located, and connection strings to data.

__There is no need to change the details in this file. When you connect to the feature store afterwards, the credentials are resolved from the Azure ML default keyvault.__

In [None]:
!cat feature_repo/feature_store.yaml

## Connect to the feature store

Below you connect to the feature store.

In [None]:
import os
from feast import FeatureStore
from azureml.core import Workspace

# access key vault to get secrets
ws = Workspace.from_config()
kv = ws.get_default_keyvault()
os.environ['REGISTRY_PATH']=kv.get_secret("FEAST-REGISTRY-PATH")
os.environ['SQL_CONN']=kv.get_secret("FEAST-OFFLINE-STORE-CONN")
os.environ['REDIS_CONN']=kv.get_secret("FEAST-ONLINE-STORE-CONN")

# connect to feature store
fs = FeatureStore("./feature_repo")

## Define the data source (offline store)

The data source refers to raw underlying data (a table in Azure SQL DB or Synapse SQL). Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store.

In [None]:
from feast.infra.offline_stores.contrib.mssql_offline_store.mssqlserver_source import MsSqlServerSource

orders_table = "orders"
driver_hourly_table = "driver_hourly"
customer_profile_table = "customer_profile"

driver_source = MsSqlServerSource(
    table_ref=driver_hourly_table,
    event_timestamp_column="datetime",
    created_timestamp_column="created",
)

customer_source = MsSqlServerSource(
    table_ref=customer_profile_table,
    event_timestamp_column="datetime",
    created_timestamp_column="",
)

## Define Feature Views

A feature view is an object that represents a logical group of time-series feature data as it is found in a data source. Feature views consist of one or more entities, features, and a data source. Feature views allow Feast to model your existing feature data in a consistent way in both an offline (training) and online (serving) environment.

Feature views are used during:

- The generation of training datasets by querying the data source of feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views. 
- Loading of feature values into an online store. Feature views determine the storage schema in the online store.
- Retrieval of features from the online store. Feature views provide the schema definition to Feast in order to look up features from the online store.

__NOTE: Feast does not generate feature values. It acts as the ingestion and serving system. The data sources described within feature views should reference feature values in their already computed form.__

In [None]:
from feast import Feature, FeatureView, ValueType
from datetime import timedelta

driver_fv = FeatureView(
    name="driver_stats",
    entities=["driver"],
    features=[
        Feature(name="conv_rate", dtype=ValueType.FLOAT),
        Feature(name="acc_rate", dtype=ValueType.FLOAT),
        Feature(name="avg_daily_trips", dtype=ValueType.INT32),
    ],
    batch_source=driver_source,
    ttl=timedelta(hours=2),
)

customer_fv = FeatureView(
    name="customer_profile",
    entities=["customer_id"],
    features=[
        Feature(name="current_balance", dtype=ValueType.FLOAT),
        Feature(name="avg_passenger_count", dtype=ValueType.FLOAT),
        Feature(name="lifetime_trip_count", dtype=ValueType.INT32),
    ],
    batch_source=customer_source,
    ttl=timedelta(days=2),
)

# Define entities

An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. For example, a ride-hailing service could have customers and drivers as their entities, which group related features that correspond to these customers and drivers.

Entities are defined as part of feature views. Entities are used to identify the primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities (more than one entity object) in a feature view.
Entities should be reused across feature views.

## Entity key

A related concept is an entity key. These are one or more entity values that uniquely describe a feature view record. In the case of an entity (like a driver) that only has a single entity field, the entity is an entity key. However, it is also possible for an entity key to consist of multiple entity values. For example, a feature view with the composite entity of (customer, country) might have an entity key of (1001, 5).

Entity keys act as primary keys. They are used during the lookup of features from the online store, and they are also used to match feature rows across feature views during point-in-time joins.

In [None]:
from feast import Entity
driver = Entity(name="driver", join_key="driver_id", value_type=ValueType.INT64)
customer = Entity(name="customer_id", value_type=ValueType.INT64)

## Feast `apply()`

Feast `apply` will:

1. Feast will scan Python files in your feature repository and find all Feast object definitions, such as feature views, entities, and data sources.
1. Feast will validate your feature definitions
1. Feast will sync the metadata about Feast objects to the registry. If a registry does not exist, then it will be instantiated. The standard registry is a simple protobuf binary file that is stored on Azure Blob Storage.
1. Feast CLI will create all necessary feature store infrastructure. The exact infrastructure that is deployed or configured depends on the provider configuration that you have set in feature_store.yaml.

In [None]:
fs.apply([driver, driver_fv, customer, customer_fv])

## What just happened?

If you look in your feast registry storage account, you will see there is now a registry.db file that contains the metadata for your registered features. Below you can list the feature views:

In [None]:
import pandas as pd
from google.protobuf.json_format import MessageToDict

for x in fs.list_feature_views():
    d=MessageToDict(x.to_proto())
    print("🪟 Feature view name:", d['spec']['name'])
    print("🧑 Entities:", d['spec']['entities'])
    print("🧪 Features:",  d['spec']['features'])
    print("💾 Batch source type:",  d['spec']['batchSource']['dataSourceClassType'])
    print("\n")


## Next Steps
In the [next part of this tutorial](./part3-train-and-deploy-with-feast.ipynb) you will:

- Train a model using features stored in your feature store
- Materialize the data from the offline store to the online store
- Deploy the model to a real-time endpoint, that consumes feature vectors from the online store.
