# Deep Dive Tutorial: Materializing Features

## Learning Objectives

In this tutorial you will learn:
1. How to construct an observation set
2. How features, entities, and observation sets are used together
3. How to preview features
4. How to get historical values
5. How and why to deploy features
6. How to serve and consume deployed features

## Set up the prerequisites

Learning Objectives

In this section you will:
* start your local featurebyte server
* import libraries
* learn the about catalogs
* activate a pre-built catalogs

### Load the featurebyte library and connect to the local instance of featurebyte

In [None]:
!pip install featurebyte
!wget https://raw.githubusercontent.com/featurebyte/featurebyte-hosted-tutorials/main/tutorials/notebooks/prebuilt_catalogs.py

In [1]:
# library imports
import pandas as pd
import numpy as np

# load the featurebyte SDK
import featurebyte as fb

print("FeatureByte version " + fb.version)

# inject your API token after registering for the tutorial
fb.register_tutorial_api_token("<api_token>")

2023-03-27 19:34:46.947 | INFO     | featurebyte.docker.manager:start_playground:305 | Starting featurebyte service | {}


FeatureByte version 0.1.4


2023-03-27 19:34:54.829 | INFO     | featurebyte.docker.manager:start_playground:307 | Starting local spark service | {}
2023-03-27 19:35:01.721 | INFO     | featurebyte.docker.manager:start_playground:310 | Starting documentation service | {}
2023-03-27 19:35:07.669 | INFO     | featurebyte.docker.manager:start_playground:314 | Creating local spark feature store | {}
2023-03-27 19:35:08.087 | INFO     | featurebyte.docker.manager:start_playground:336 | Dataset grocery already exists, skipping import | {}
2023-03-27 19:35:08.087 | INFO     | featurebyte.docker.manager:start_playground:336 | Dataset healthcare already exists, skipping import | {}
2023-03-27 19:35:08.088 | INFO     | featurebyte.docker.manager:start_playground:336 | Dataset creditcard already exists, skipping import | {}


### Create a pre-built catalog for this tutorial, with the data, metadata, and features already set up

Note that creating a pre-built catalog is not a step you will do in real-life. This is a function specific to this quick-start tutorial to quickly skip over many of the preparatory steps and get you to a point where you can materialize features.

In a real-life project you would do data modeling, declaring the tables, entities, and the associated metadata. This would not be a frequent task, but forms the basis for best-practice feature engineering.

In [2]:
# get the functions to create a pre-built catalog
from prebuilt_catalogs import *

# create a new catalog for this tutorial
catalog_name = create_tutorial_catalog(PrebuiltCatalog.DeepDiveMaterializingFeatures)

Cleaning up any existing tutorial catalogs
Building a deep dive catalog for materializing features named [deep dive materializing features 20230327:1935]
Creating new catalog
Catalog created
Registering the source tables
Registering the entities
Tagging the entities to columns in the data tables
Populating the feature store with example features
Saving Feature(s) |████████████████████████████████████████| 4/4 [100%] in 3.3s (1.20/s)                                
Loading Feature(s) |████████████████████████████████████████| 4/4 [100%] in 0.8s (4.92/s)                               
Saving Feature(s) |████████████████████████████████████████| 1/1 [100%] in 0.6s (1.75/s)                                
Loading Feature(s) |████████████████████████████████████████| 1/1 [100%] in 0.2s (4.55/s)                               
Catalog created and pre-populated with data and features


### Example: Activate an existing catalog

In [3]:
# you can activate an existing catalog
catalog = fb.Catalog.activate(catalog_name)

### Load the tables for this catalog

In [4]:
# get the tables for this catalog
grocery_customer_table = catalog.get_table("GROCERYCUSTOMER")
grocery_items_table = catalog.get_table("INVOICEITEMS")
grocery_invoice_table = catalog.get_table("GROCERYINVOICE")
grocery_product_table = catalog.get_table("GROCERYPRODUCT")

### Create views for the tables in this catalog

In [5]:
# create the views
grocery_customer_view = grocery_customer_table.get_view()
grocery_invoice_view = grocery_invoice_table.get_view()
grocery_items_view = grocery_items_table.get_view()
grocery_product_view = grocery_product_table.get_view()

## How to construct an observation set

Learning Objectives

In this section you will learn:
* the purpose of observation sets
* the relationship between entities, point in time, and observation sets
* how to construct an observation set

### Concept: Materialization

A feature definition is a set of instructions for computing the feature on past or newly available data. The act of computing features is known as Feature Materialization.

### Concept: Observation set

An observation set is a table of entity keys and points in time, for which you wish to materialize feature values. The entities keys define which entities a feature will materialize, and the points in time define at which timestamps.

### Concept: Point in time

A point-in-time for a feature refers to a specific moment in the past with which the feature's values are associated.

It is a crucial aspect of historical feature serving, which allows machine learning models to make predictions based on historical data. By providing a point-in-time, a feature can be used to train and test models on past data, enabling them to make accurate predictions for similar situations in the future.

An observation set is created as a pandas data frame containing the keys for the primary entity, and points in time. The column name for the primary entity must be its serving name, and the column name for the point in time must be "POINT_IN_TIME".

### Example: Create an observation set based upon events

Some use cases are about events, and require predictions to be triggered when a specified event occurs.

A use case requiring predictions about a grocery customer whenever an invoice event occurs, your observation set may be sampled from historical invoices.

In [6]:
# show the serving name for grocery customer
entity_list = catalog.list_entities()
display(entity_list[entity_list.name == "grocerycustomer"])

Unnamed: 0,name,serving_names,created_at
3,grocerycustomer,[GROCERYCUSTOMERGUID],2023-03-27 11:35:39.817


In [7]:
# get a sample of 200 customer IDs and invoice event timestamps from Q4 2022
filter = (grocery_invoice_view["Timestamp"].dt.year == 2022) & (grocery_invoice_view["Timestamp"].dt.month >= 10)
observation_set = (
    grocery_invoice_view[filter].sample(200)[["GroceryCustomerGuid", "Timestamp"]]
    .rename({
        "Timestamp": "POINT_IN_TIME",
        "GroceryCustomerGuid": "GROCERYCUSTOMERGUID",
    }, axis=1)
)
display(observation_set)

Unnamed: 0,GROCERYCUSTOMERGUID,POINT_IN_TIME
0,30e3fbe4-3cbe-4d51-b6ca-1f990ef9773d,2022-12-17 12:12:40
1,7484ebd5-ee65-49f8-abce-8becd7af39fb,2022-12-26 18:12:37
2,a906b457-33c7-4186-a4a8-77f2ad018c2b,2022-12-04 16:13:10
3,e0453f48-5d57-4681-84b3-0f07b15ab48e,2022-11-05 17:08:56
4,e459196f-bf0a-41a1-a307-a4cfcf41fea9,2022-11-27 17:42:11
...,...,...
195,b3b9a70e-4ec3-4fe2-b563-873899b357b1,2022-10-03 13:37:48
196,1c930774-32aa-4ef7-8ba8-2efc412a4732,2022-12-24 13:24:18
197,9c926395-4a8c-45ad-b938-df427ad1be61,2022-11-20 13:23:07
198,7cd5368e-2152-47cd-8cce-f9f46ab80c2e,2022-10-24 20:20:58


### Example: Create an observation set based upon regularly scheduled batch predictions

Some use cases require predictions to be triggered at regular time periods. Some use cases have conditions for which only a subset of entities require predictions.

A use case requiring monthly predictions for recently active customers may use an observation set containing sample customer IDs combined with predefined timestamps.

In [8]:
# define a function to list a sample of the customers who were active in a given month
def get_recently_active_customers(month_number):
    # filter the invoices by month
    filter = (grocery_invoice_view["Timestamp"].dt.month == month_number) & (grocery_invoice_view["Timestamp"].dt.year == 2022)
    # get a list of customers who made an invoice in the month
    recently_active_customers = grocery_invoice_view[filter].sample(200)["GroceryCustomerGuid"].unique()
    # get the start of the month
    point_in_time = pd.Timestamp(f"2022-{month_number}-01")
    # get the end of the month
    end_of_month = point_in_time + pd.DateOffset(months=1)
    # get the point in time by subtracting 0.001 second from the end of the month
    point_in_time = end_of_month - pd.Timedelta(seconds=0.001)
    # combine the point in time with the customer IDs
    recently_active_customers = pd.DataFrame({
        "GROCERYCUSTOMERGUID": recently_active_customers,
        "POINT_IN_TIME": point_in_time,
    })
    return recently_active_customers

# create an observation set comprised of up to 200 customers per month who were active in that month in the second half of 2022
observation_set = pd.concat([get_recently_active_customers(month_number) for month_number in range(7, 13)], ignore_index=True)
display(observation_set)

Unnamed: 0,GROCERYCUSTOMERGUID,POINT_IN_TIME
0,c46b9d66-24a4-4470-a768-6149288be701,2022-07-31 23:59:59.999
1,a0bd3e53-b133-4cee-b18d-7e8cd0987bad,2022-07-31 23:59:59.999
2,56c2d18d-145a-4c66-ac66-65ccd03e83ce,2022-07-31 23:59:59.999
3,653146bb-f075-4879-a423-bb7296b17d74,2022-07-31 23:59:59.999
4,70261fa4-14c7-49b6-a36b-202eb295706a,2022-07-31 23:59:59.999
...,...,...
855,9a6e097b-5297-4a31-b3a0-1443defb0915,2022-12-31 23:59:59.999
856,d996e271-27b0-4ed6-b982-6e516e5cf449,2022-12-31 23:59:59.999
857,5779bf1c-644d-4143-9d23-a12a5a5f7888,2022-12-31 23:59:59.999
858,ab97a367-3a4c-4b69-a3a8-46cde6064ea9,2022-12-31 23:59:59.999


## Previewing features

Learning Objectives

In this section you will learn:
* how to preview features
* the limitations of previews

### Example: Preview features

During feature prototyping, new features may not have been saved to the catalog. A data scientist will want to preview sample features to sensibility check their feature declaration.

In [9]:
# create a lookup feature that is the city in which the customer resides
french_state_lookup = grocery_customer_view.City.as_feature("CustomerCity")

# preview materialized values for the unsaved feature
display(french_state_lookup.preview(observation_set.sample(5)))

Unnamed: 0,GROCERYCUSTOMERGUID,POINT_IN_TIME,CustomerCity
858,ab97a367-3a4c-4b69-a3a8-46cde6064ea9,2022-12-31 23:59:59.999,LE HAVRE
448,36d37d1d-853c-4c5b-8c0c-37eb8b8fd51d,2022-10-31 23:59:59.999,LYON
78,105c3922-0375-4c87-9c64-1b342b31a029,2022-07-31 23:59:59.999,CLERMONT-FERRAND
56,4edf2666-6da3-42d2-a78b-78b45c23a7fd,2022-07-31 23:59:59.999,FRANCONVILLE-LA-GARENNE
196,5d325f7c-54ee-437f-b668-fc7e09ff8d06,2022-08-31 23:59:59.999,ROMAINVILLE


Feature previews are not suited to creating training files or feature serving. Previews have a limitation of 50 rows and do not create an audit trail.

## Create training data

Learning Objectives

In this section you will learn:
* how to design an observation set suitable for training data
* how to get historical values for a feature list
* how to get historical values for the target
* how to join features and the target to create training data

### Design an Observation Set for Training

Observation Training Design: A training data observation set should typically meet the following criteria:
* be collected from a time period that does not start until after the earliest data availability timestamp plus longest time window in the features
* be collected from a time period that ends before the latest data timestamp less the time window of the target value
* uses points in time that align with the anticipated timing of the use case inference, whether it's based on a regular schedule, triggered by an event, or any other timing mechanism.
* does not have duplicate rows
* has a column containing the primary entity of the use case, using its serving name
* has a column, named "POINT_IN_TIME", containing the points in time
* has for the same entity key points in time that have time intervals greater than the horizon of the target to avoid leakage

### Case Study: Predicting Customer Spend

Your chain of grocery stores wants to target market customers immediately after each purchase. As one step in this marketing campaign, they want to predict future customer spend in the 14 days after a purchase.

### Example: Create an observation set for training data

In [10]:
# describe the customer view
display(grocery_customer_view.describe())

Unnamed: 0,RowID,GroceryCustomerGuid,ValidFrom,Gender,Title,GivenName,MiddleInitial,Surname,StreetAddress,City,State,PostalCode,BrowserUserAgent,DateOfBirth,Latitude,Longitude,CurrentRecord
dtype,VARCHAR,VARCHAR,TIMESTAMP,VARCHAR,VARCHAR,VARCHAR,VARCHAR,VARCHAR,VARCHAR,VARCHAR,VARCHAR,VARCHAR,VARCHAR,DATE,FLOAT,FLOAT,BOOL
unique,500,471,500,2,4,330,26,340,485,293,27,347,84,466,500,499,2
%missing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
%empty,0,0,,0,0,0,0,0,0,0,0,0,0,,,,
entropy,6.214608,6.134203,,0.687304,1.118805,5.688874,2.919004,5.714195,6.171973,5.4313,2.487513,5.747327,3.789222,,,,
top,00cde349-0cd6-4335-8780-c323993b1d36,04337c88-309e-4d6d-b4c2-21e7162a78f2,2019-01-01 10:14:46,male,Mr.,Belda,A,Foucault,13 rue Michel Ange,MARSEILLE,Île-de-France,44100,Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl...,,-12.698172,2.403298,true
freq,1,2,1,277,269,5,68,5,3,18,192,5,51,,1,2,471
mean,,,,,,,,,,,,,,,46.938518,2.473241,
std,,,,,,,,,,,,,,,5.281936,6.739228,
min,,,2019-01-01T10:14:46.000000000,,,,,,,,,,,1937-02-15T00:00:00.000000000,-12.773963,-61.12404,


Note that there are 471 unique customers

In [11]:
# describe the invoice view
display(grocery_invoice_view.describe())

Unnamed: 0,GroceryInvoiceGuid,GroceryCustomerGuid,Timestamp,Amount
dtype,VARCHAR,VARCHAR,TIMESTAMP,FLOAT
unique,31361,471,31328,6805
%missing,0.0,0.0,0.0,0.0
%empty,0,0,,
entropy,6.214608,5.776151,,
top,018f0163-249b-4cbc-ab4d-e933ce3786c1,c5820998-e779-4d62-ab8b-79ef0dfd841b,2022-01-09 10:47:17,1.0
freq,1,908,2,479
mean,,,,19.968375
std,,,,25.000703
min,,,2022-01-01T00:24:14.000000000,0.0


Note that the earliest data timestamp is at the beginnging of 2022, and the timestamps end in the present.

In [12]:
# get the customer feature list
customer_feature_list = catalog.get_feature_list("CustomerFeatures")

# display details about the features in the customer feature list
display(customer_feature_list.list_features())

Loading Feature(s) |████████████████████████████████████████| 4/4 [100%] in 0.8s (5.01/s)                               


Unnamed: 0,name,version,dtype,readiness,online_enabled,tables,primary_tables,entities,primary_entities,created_at
0,StateMeanLongitude,V230327,FLOAT,DRAFT,False,[GROCERYCUSTOMER],[GROCERYCUSTOMER],[frenchstate],[frenchstate],2023-03-27 11:35:47.656
1,StateMeanLatitude,V230327,FLOAT,DRAFT,False,[GROCERYCUSTOMER],[GROCERYCUSTOMER],[frenchstate],[frenchstate],2023-03-27 11:35:47.110
2,CustomerInventoryMostFrequent_4w,V230327,VARCHAR,DRAFT,False,"[GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT]",[INVOICEITEMS],[grocerycustomer],[grocerycustomer],2023-03-27 11:35:46.346
3,CustomerInventoryEntropy_4w,V230327,FLOAT,DRAFT,False,"[GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT]",[INVOICEITEMS],[grocerycustomer],[grocerycustomer],2023-03-27 11:35:45.295


Note that the longest time window in the features is 4 weeks.

In [13]:
# get the target
customer_target_list = catalog.get_feature_list("TargetFeature")

# display details about the target feature
display(customer_target_list.list_features())

Loading Feature(s) |████████████████████████████████████████| 1/1 [100%] in 0.4s (2.65/s)                               


Unnamed: 0,name,version,dtype,readiness,online_enabled,tables,primary_tables,entities,primary_entities,created_at
0,Target,V230327,FLOAT,DRAFT,False,[GROCERYINVOICE],[GROCERYINVOICE],[grocerycustomer],[grocerycustomer],2023-03-27 11:35:50.087


Note that the time window for the target is 14 days

We can conclude that it would be safe for the training data observation set's points in time to commence on 29-Jan-2022 and end 14 days before the present.<br>

We will create an observation set for invoice dates from Feb-22 to Dec-22.

In [14]:
# filter to get Feb-22 to Dec-22
filter = (grocery_invoice_view["Timestamp"].dt.year == 2022) & (grocery_invoice_view["Timestamp"].dt.month >= 2)

# create a pandas data frame a sample of the customer IDs and timestamps
observation_set_features = observation_set = (
    grocery_invoice_view[filter].sample(1000)[["GroceryCustomerGuid", "Timestamp"]]
    .rename({
        "Timestamp": "POINT_IN_TIME",
        "GroceryCustomerGuid": "GROCERYCUSTOMERGUID",
    }, axis=1)
)
display(observation_set_features)

Unnamed: 0,GROCERYCUSTOMERGUID,POINT_IN_TIME
0,03732ad7-f757-40dd-a834-9a6654544009,2022-07-15 18:19:20
1,ec0a3d0b-1196-439a-8682-2ad3704db074,2022-09-18 18:55:38
2,091be817-ce2c-4a3b-96e0-526b9b2a33f6,2022-06-30 16:30:35
3,6c6e4d82-0856-4709-8e80-695fe85afebf,2022-08-07 18:43:24
4,ecb0a59a-124e-4ac2-8181-5f6f64b4b5cf,2022-02-15 17:00:36
...,...,...
995,a7ada4a3-fd92-44e6-a232-175c90b1c939,2022-11-20 12:35:31
996,b5423dbc-db1c-4dbf-8721-b9fba6bcdd9b,2022-09-29 13:40:19
997,42d75f67-602c-4fa2-becf-bbc8c1ea5bb2,2022-06-20 20:33:26
998,84e20f8f-27d8-4e90-b6a0-d08122144ab8,2022-06-28 13:53:03


### Example: Get historical values

In [15]:
# use the get historical features function to get the feature values for the observation set
training_data_features = customer_feature_list.get_historical_features(observation_set_features)
display(training_data_features)

Retrieving Historical Feature(s) |████████████████████████████████████████| 1/1 [100%] in 9.9s (0.10/s)                 


Unnamed: 0,GROCERYCUSTOMERGUID,POINT_IN_TIME,StateMeanLongitude,StateMeanLatitude,CustomerInventoryMostFrequent_4w,CustomerInventoryEntropy_4w
0,03732ad7-f757-40dd-a834-9a6654544009,2022-07-15 18:19:20,1.448933,45.424578,"Colas, Thés glacés et Sodas",2.888966
1,ec0a3d0b-1196-439a-8682-2ad3704db074,2022-09-18 18:55:38,2.331067,48.840595,Laits,2.919274
2,091be817-ce2c-4a3b-96e0-526b9b2a33f6,2022-06-30 16:30:35,2.331794,48.840356,"Colas, Thés glacés et Sodas",3.290326
3,6c6e4d82-0856-4709-8e80-695fe85afebf,2022-08-07 18:43:24,5.221111,45.669184,Pizza Surgelées,2.690829
4,ecb0a59a-124e-4ac2-8181-5f6f64b4b5cf,2022-02-15 17:00:36,1.688952,43.654240,Yaourt et Compotes,2.921983
...,...,...,...,...,...,...
995,a7ada4a3-fd92-44e6-a232-175c90b1c939,2022-11-20 12:35:31,2.331087,48.841199,Jus Frais,3.679589
996,b5423dbc-db1c-4dbf-8721-b9fba6bcdd9b,2022-09-29 13:40:19,-0.876788,49.313889,"Colas, Thés glacés et Sodas",2.640947
997,42d75f67-602c-4fa2-becf-bbc8c1ea5bb2,2022-06-20 20:33:26,1.334040,47.452014,Chat,3.459457
998,84e20f8f-27d8-4e90-b6a0-d08122144ab8,2022-06-28 13:53:03,2.331794,48.840356,Eaux,2.484367


### Example: Get target values

When target values use aggregates or time offsets, you first need to offset the point in time by the time window.

In [16]:
# add 14 days to the timestamps in the observation set
observation_set_target = observation_set_features.copy()
observation_set_target["POINT_IN_TIME"] = observation_set_target["POINT_IN_TIME"] + pd.DateOffset(days=14)
display(observation_set_target)

Unnamed: 0,GROCERYCUSTOMERGUID,POINT_IN_TIME
0,03732ad7-f757-40dd-a834-9a6654544009,2022-07-29 18:19:20
1,ec0a3d0b-1196-439a-8682-2ad3704db074,2022-10-02 18:55:38
2,091be817-ce2c-4a3b-96e0-526b9b2a33f6,2022-07-14 16:30:35
3,6c6e4d82-0856-4709-8e80-695fe85afebf,2022-08-21 18:43:24
4,ecb0a59a-124e-4ac2-8181-5f6f64b4b5cf,2022-03-01 17:00:36
...,...,...
995,a7ada4a3-fd92-44e6-a232-175c90b1c939,2022-12-04 12:35:31
996,b5423dbc-db1c-4dbf-8721-b9fba6bcdd9b,2022-10-13 13:40:19
997,42d75f67-602c-4fa2-becf-bbc8c1ea5bb2,2022-07-04 20:33:26
998,84e20f8f-27d8-4e90-b6a0-d08122144ab8,2022-07-12 13:53:03


In [17]:
# Materialize the target feature using get historical features
training_data_target = customer_target_list.get_historical_features(observation_set_target)

# remove the offset from the point in time column
training_data_target["POINT_IN_TIME"] = training_data_target["POINT_IN_TIME"] - pd.DateOffset(days=14)

display(training_data_target)

Retrieving Historical Feature(s) |████████████████████████████████████████| 1/1 [100%] in 5.9s (0.17/s)                 


Unnamed: 0,GROCERYCUSTOMERGUID,POINT_IN_TIME,Target
0,03732ad7-f757-40dd-a834-9a6654544009,2022-07-15 18:19:20,33.99
1,ec0a3d0b-1196-439a-8682-2ad3704db074,2022-09-18 18:55:38,87.60
2,091be817-ce2c-4a3b-96e0-526b9b2a33f6,2022-06-30 16:30:35,135.00
3,6c6e4d82-0856-4709-8e80-695fe85afebf,2022-08-07 18:43:24,41.58
4,ecb0a59a-124e-4ac2-8181-5f6f64b4b5cf,2022-02-15 17:00:36,107.38
...,...,...,...
995,a7ada4a3-fd92-44e6-a232-175c90b1c939,2022-11-20 12:35:31,114.60
996,b5423dbc-db1c-4dbf-8721-b9fba6bcdd9b,2022-09-29 13:40:19,64.26
997,42d75f67-602c-4fa2-becf-bbc8c1ea5bb2,2022-06-20 20:33:26,128.92
998,84e20f8f-27d8-4e90-b6a0-d08122144ab8,2022-06-28 13:53:03,144.83


### Example: Merging materialized values for features and target

In [18]:
# merge training data features and training data target
training_data = training_data_features.merge(training_data_target, on=["GROCERYCUSTOMERGUID", "POINT_IN_TIME"])
display(training_data)

Unnamed: 0,GROCERYCUSTOMERGUID,POINT_IN_TIME,StateMeanLongitude,StateMeanLatitude,CustomerInventoryMostFrequent_4w,CustomerInventoryEntropy_4w,Target
0,03732ad7-f757-40dd-a834-9a6654544009,2022-07-15 18:19:20,1.448933,45.424578,"Colas, Thés glacés et Sodas",2.888966,33.99
1,ec0a3d0b-1196-439a-8682-2ad3704db074,2022-09-18 18:55:38,2.331067,48.840595,Laits,2.919274,87.60
2,091be817-ce2c-4a3b-96e0-526b9b2a33f6,2022-06-30 16:30:35,2.331794,48.840356,"Colas, Thés glacés et Sodas",3.290326,135.00
3,6c6e4d82-0856-4709-8e80-695fe85afebf,2022-08-07 18:43:24,5.221111,45.669184,Pizza Surgelées,2.690829,41.58
4,ecb0a59a-124e-4ac2-8181-5f6f64b4b5cf,2022-02-15 17:00:36,1.688952,43.654240,Yaourt et Compotes,2.921983,107.38
...,...,...,...,...,...,...,...
995,a7ada4a3-fd92-44e6-a232-175c90b1c939,2022-11-20 12:35:31,2.331087,48.841199,Jus Frais,3.679589,114.60
996,b5423dbc-db1c-4dbf-8721-b9fba6bcdd9b,2022-09-29 13:40:19,-0.876788,49.313889,"Colas, Thés glacés et Sodas",2.640947,64.26
997,42d75f67-602c-4fa2-becf-bbc8c1ea5bb2,2022-06-20 20:33:26,1.334040,47.452014,Chat,3.459457,128.92
998,84e20f8f-27d8-4e90-b6a0-d08122144ab8,2022-06-28 13:53:03,2.331794,48.840356,Eaux,2.484367,144.83


## Deploying features

Learning Objectives

In this section you will learn:
* feature readiness
* feature list status
* how to deploy a feature list

### Feature readiness

A feature's readiness shows whether it is a prototype, shared, production ready, or deprecated. These readiness flags indicate a feature's suitability for use in feature lists.

DEPRECATED: The feature is no longer suitable for production use.<br>
QUARANTINE: The feature must not be used.<br>
DRAFT: The feature is being prototyped and is not yet ready for production.<br>
PRODUCTION_READY: The feature has been reviewed and is ready for production.<br>

In [19]:
# view the readiness of the features
catalog.list_features()

Unnamed: 0,name,dtype,readiness,online_enabled,tables,primary_tables,entities,primary_entities,created_at
0,Target,FLOAT,DRAFT,False,[GROCERYINVOICE],[GROCERYINVOICE],[grocerycustomer],[grocerycustomer],2023-03-27 11:35:50.098
1,StateMeanLongitude,FLOAT,DRAFT,False,[GROCERYCUSTOMER],[GROCERYCUSTOMER],[frenchstate],[frenchstate],2023-03-27 11:35:47.669
2,StateMeanLatitude,FLOAT,DRAFT,False,[GROCERYCUSTOMER],[GROCERYCUSTOMER],[frenchstate],[frenchstate],2023-03-27 11:35:47.123
3,CustomerInventoryMostFrequent_4w,VARCHAR,DRAFT,False,"[GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT]",[INVOICEITEMS],[grocerycustomer],[grocerycustomer],2023-03-27 11:35:46.358
4,CustomerInventoryEntropy_4w,FLOAT,DRAFT,False,"[GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT]",[INVOICEITEMS],[grocerycustomer],[grocerycustomer],2023-03-27 11:35:45.306


When a feature has been reviewed and is ready for production, its readiness can be upgraded.

In [20]:
# get CustomerInventoryEntropy_4w
customer_inventory_entropy_4w = catalog.get_feature("CustomerInventoryEntropy_4w")

In [21]:
# check feature definition file
customer_inventory_entropy_4w.definition

In [22]:
# change the readiness to public
customer_inventory_entropy_4w.update_readiness("PRODUCTION_READY")

# view the readiness of the features
catalog.list_features()

Unnamed: 0,name,dtype,readiness,online_enabled,tables,primary_tables,entities,primary_entities,created_at
0,Target,FLOAT,DRAFT,False,[GROCERYINVOICE],[GROCERYINVOICE],[grocerycustomer],[grocerycustomer],2023-03-27 11:35:50.098
1,StateMeanLongitude,FLOAT,DRAFT,False,[GROCERYCUSTOMER],[GROCERYCUSTOMER],[frenchstate],[frenchstate],2023-03-27 11:35:47.669
2,StateMeanLatitude,FLOAT,DRAFT,False,[GROCERYCUSTOMER],[GROCERYCUSTOMER],[frenchstate],[frenchstate],2023-03-27 11:35:47.123
3,CustomerInventoryMostFrequent_4w,VARCHAR,DRAFT,False,"[GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT]",[INVOICEITEMS],[grocerycustomer],[grocerycustomer],2023-03-27 11:35:46.358
4,CustomerInventoryEntropy_4w,FLOAT,PRODUCTION_READY,False,"[GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT]",[INVOICEITEMS],[grocerycustomer],[grocerycustomer],2023-03-27 11:35:45.306


### Feature list status

A feature list's status shows whether it is a prototype, shared, production ready, or deprecated. These status flags indicate a feature list's suitability for use in production.

DEPRECATED: The feature list is no longer suitable for production use.<br>
DRAFT: The feature list is being prototyped, and is not yet suitable for production.<br>
PUBLIC_DRAFT: The feature list is ready for review and sharing, but is not yet in production.<br>
PUBLISHED: The feature list has been deployed into production.<br>

In [23]:
# view the status of the feature lists
display(catalog.list_feature_lists())

Unnamed: 0,name,num_features,status,deployed,readiness_frac,online_frac,tables,entities,created_at
0,TargetFeature,1,DRAFT,False,0.0,0.0,[GROCERYINVOICE],[grocerycustomer],2023-03-27 11:35:50.465
1,CustomerFeatures,4,DRAFT,False,0.25,0.0,"[GROCERYCUSTOMER, GROCERYINVOICE, INVOICEITEMS...","[grocerycustomer, frenchstate]",2023-03-27 11:35:48.590


When a feature list is ready for review, its status can be updated.

In [24]:
# get the CustomerFeatures feature list
customer_feature_list = catalog.get_feature_list("CustomerFeatures")

# update the status to PUBLIC_DRAFT
customer_feature_list.update_status("PUBLIC_DRAFT")

# view the status of the feature lists
display(catalog.list_feature_lists())

Loading Feature(s) |████████████████████████████████████████| 4/4 [100%] in 0.8s (5.10/s)                               


Unnamed: 0,name,num_features,status,deployed,readiness_frac,online_frac,tables,entities,created_at
0,TargetFeature,1,DRAFT,False,0.0,0.0,[GROCERYINVOICE],[grocerycustomer],2023-03-27 11:35:50.465
1,CustomerFeatures,4,PUBLIC_DRAFT,False,0.25,0.0,"[GROCERYCUSTOMER, GROCERYINVOICE, INVOICEITEMS...","[grocerycustomer, frenchstate]",2023-03-27 11:35:48.590


### Deploying a feature list

In [25]:
# deploy the customer feature list
customer_feature_list.update_status('PUBLISHED')
customer_feature_list.deploy(enable=True, make_production_ready=True)

# view the status of the feature lists
display(catalog.list_feature_lists())

Loading Feature(s) |████████████████████████████████████████| 4/4 [100%] in 0.7s (5.73/s)                               
Done! |████████████████████████████████████████| 100% in 39.5s (2.53%/s)                                                


Unnamed: 0,name,num_features,status,deployed,readiness_frac,online_frac,tables,entities,created_at
0,TargetFeature,1,DRAFT,False,0.0,0.0,[GROCERYINVOICE],[grocerycustomer],2023-03-27 11:35:50.465
1,CustomerFeatures,4,PUBLISHED,True,1.0,1.0,"[GROCERYCUSTOMER, GROCERYINVOICE, INVOICEITEMS...","[grocerycustomer, frenchstate]",2023-03-27 11:35:48.590


### Why deploy?

When you deploy a feature list, behind the scenes the Feature Store starts regularly pre-calculating and caching feature values. This can significantly reduce the latency of feature serving.

## Serving and consuming features

Learning Objectives

In this section you will learn:
* the point in time used for production serving
* how to create a Python function to consume a feature list
* how to consume a feature list

### Point in time for deployment

The production feature serving API uses the current time as its point in time. To consume the feature list, send only the primary entity via the serving name.

### Automatically create a Python function for consuming the API

You can either use a python template or a shell script where the generated code will use the curl command to send the request.

For the python template, set the language parameter value as 'python'.
For the shell script, set the language parameter value as 'sh'.

In [26]:
# get a python template for consuming the feature serving API
customer_feature_list.get_online_serving_code(language="python")

Copy the online serving code that was generated above, paste it into the cell below, then run it

In [27]:
# replace the contents of this Python code cell with the output from to_be_deployed.get_online_serving_code(language="python")

### Disable a deployment

In [28]:
# disable the feature list deployment
customer_feature_list.deploy(enable=False)

Loading Feature(s) |████████████████████████████████████████| 4/4 [100%] in 0.9s (4.22/s)                               
Done! |████████████████████████████████████████| 100% in 12.4s (8.08%/s)                                                


## Next Steps

Now that you've completed the deep dive materializing features tutorial, you can put your knowledge into practice or learn more:<br>
1. Put your knowledge into practice by creating features in the "credit card dataset feature engineering playground" or "healthcare dataset feature engineering playground" catalogs
2. Learn more about feature governance via the "Quick Start Feature Governance" tutorial
3. Learn about data modeling via the "Deep Dive Data Modeling" tutorial