# Quick Start Tutorial: Reusing Features

## Learning Objectives

In this tutorial you will learn:
1. How to access catalogs of data, entities, features, and feature lists
2. How to search for features suitable for the unit of analysis
3. How to understand an existing feature
4. How to create new features from existing features
5. How to create a new feature list from existing features

## Set up the prerequisites

Learning Objectives

In this section you will:
* start your local featurebyte server
* import libraries
* learn the about catalogs
* activate a pre-built catalog

### Load the featurebyte library and connect to the local instance of featurebyte

In [1]:
# library imports
import pandas as pd
import numpy as np
import random

# load the featurebyte SDK
import featurebyte as fb

# start the local server, then wait for it to be healthy before proceeding
fb.playground()

### Create a pre-built catalog for this tutorial, with the data, metadata, and features already set up

Note that creating a pre-built catalog is not a step you will do in real-life. This is a function specific to this quick-start tutorial to quickly skip over many of the preparatory steps and get you to a point where you can materialize features.

In a real-life project you would do data modeling, declaring the tables, entities, and the associated metadata. This would not be a frequent task, but forms the basis for best-practice feature engineering.

In [2]:
# get the functions to create a pre-built catalog
from prebuilt_catalogs import *

# create a new catalog for this tutorial
catalog = create_tutorial_catalog(PrebuiltCatalog.QuickStartReusingFeatures)

### Example: Load the tables and views

In [3]:
# get the tables for this workspace
grocery_customer_table = catalog.get_table("GROCERYCUSTOMER")
grocery_items_table = catalog.get_table("INVOICEITEMS")
grocery_invoice_table = catalog.get_table("GROCERYINVOICE")
grocery_product_table = catalog.get_table("GROCERYPRODUCT")

# create the views
grocery_customer_view = grocery_customer_table.get_view()
grocery_invoice_view = grocery_invoice_table.get_view()
grocery_items_view = grocery_items_table.get_view()
grocery_product_view = grocery_product_table.get_view()

## Accessing Catalogs

Learning Objectives:

In this section you will learn how to display catalogs of:
* tables
* entities
* features
* feature lists

### Example: A catalog of tables

In [4]:
# list the tables in the catalog
catalog.list_tables()

In [5]:
# load a table
grocery_customer_table = catalog.get_table("GROCERYCUSTOMER")

# show the metadata
grocery_customer_table.info()

### Example: A catalog of entities

In [6]:
# list the entities in the catalog
catalog.list_entities()

In [7]:
# list the entity relationships in the catalog
catalog.list_relationships()

In [8]:
# load an entity
customer_entity = catalog.get_entity("grocerycustomer")

# show the metadata
customer_entity.info()

### Example: A catalog of features

In [9]:
# list the features in the catalog
catalog.list_features()

In [10]:
# load a feature
state_population = catalog.get_feature("StatePopulation")

# show the metadata
state_population.info()

In [11]:
# show the feature lineage for the state population feature
display(state_population.definition)

### Example: A catalog of feature lists

In [12]:
# list the feature lists in the catalog
catalog.list_feature_lists()

In [13]:
# load the feature list
state_features = catalog.get_feature_list("StateFeatureList")

# show the metadata
state_features.info()

In [14]:
# list the features in the feature list
state_features.list_features()

## Search for Features

Learning Objectives

In this section, you will learn:
* what a primary entity is
* how to search for suitable features

### Concept: Primary entity

<b>Feature primary entity:</b> The primary entity of a feature defines the level of analysis for that feature. 
When a feature is a result of an aggregation grouped by multiple entities, the primary entity is a tuple of those entities. For instance, if a feature quantifies the interaction between a customer entity and a merchant entity in the past, such as the sum of transaction amounts grouped by customer and merchant in the past 4 weeks, the primary entity is the tuple of customer and merchant.

When a feature is derived for features with different primary entities, the primary entity is determined by the entity relationships, and the lowest level entity is selected as the primary entity. If the underlying entities have no relationship, the primary entity becomes a tuple of those entities. For example, if a feature compares the basket of a customer with the average basket of customers in the same city, the primary entity is the customer since the customer entity is a child of the customer city entity. However, if the feature is the distance between the customer location and the merchant location, the primary entity becomes the tuple of customer and merchant since these entities do not have any child-parent relationship.

<b>Feature List primary entity:</b> The main focus of a feature list is determined by its primary entity, which typically corresponds to the primary entity of the Use Case that the feature list was created for.

If the features within the list pertain to different primary entities, the primary entity of the feature list is selected based on the entities relationships, with the lowest level entity chosen as the primary entity. In cases where there are no relationships between entities, the primary entity may become a tuple comprising those entities.
To illustrate, consider a feature list comprising features related to card, customer, and customer city. In this case, the primary entity is the card entity since it is a child of both the customer and customer city entities. However, if the feature list also contains features for merchant and merchant city, the primary entity is a tuple of card and merchant.

<b>Use Case primary entity:</b> In a Use Case, the primary entity is the object or concept that defines its problem statement. Usually, this entity is singular, but in cases such as the recommendation engine use case, it can be a tuple of entities that interact with each other.

### Case study: Predicting customer spend

Consider a use case to predict customer spend. The unit of analysis and primary entity is grocery customer. You can use features with primary entities of grocery customer or french state (because state is a parent entity of customer).

### Example: Search for suitable features

In [15]:
# get a list of all the features in the catalog
all_features = catalog.list_features()

# filter to retain only those with grocery customer or state as their primary entity
child_entity = "groceryinvoice"
suitable_features = all_features.loc[[child_entity not in x for x in all_features.entities.values]]
product_entity = "groceryproduct"
suitable_features = suitable_features.loc[
    [product_entity not in x for x in suitable_features.entities.values]
]

# show the features
display(suitable_features)

In [16]:
# find suitable features that use the grocery invoice items table
grocery_items_features = suitable_features.loc[
    ["INVOICEITEMS" in x for x in suitable_features.tables.values]
]

# show the features
display(grocery_items_features)

## Understand an Existing feature

Learning Objectives

In this section you will learn how to:
* load a feature from the catalog
* view the metadata of a feature
* materialize feature values
* view feature lineage as a definition file

### Example: Load a feature from the catalog

In [17]:
# get the CustomerInventory_28d feature
customer_inventory_28d = catalog.get_feature("CustomerInventory_28d")

### Example: View the metadata of a feature

In [18]:
# get a list of all the features in the catalog
all_features = catalog.list_features()

# display the current feature
display(all_features.loc[all_features.name == customer_inventory_28d.name])

In [19]:
# view the detailed metadata
customer_inventory_28d.info()

### Example: Materialize sample values

In [20]:
# get some invoice IDs and invoice event timestamps from Q4 2022
filter = (grocery_invoice_view["Timestamp"].dt.year == 2022) & (
    grocery_invoice_view["Timestamp"].dt.month >= 10
)

observation_set = (
    grocery_invoice_view[filter]
    .sample(10)[["GroceryCustomerGuid", "Timestamp"]]
    .rename(
        {
            "Timestamp": "POINT_IN_TIME",
            "GroceryCustomerGuid": "GROCERYCUSTOMERGUID",
        },
        axis=1,
    )
)
display(observation_set)

In [21]:
# display the feature values
display(customer_inventory_28d.preview(observation_set))

### Example: View the feature lineage

In [22]:
# display the feature lineage for the feature we just loaded from the feature store
display(customer_inventory_28d.definition)

## Create New Features from Existing Features

You can use existing features as inputs to new features.

Learning objectives

In this section you wil learn how to:
* create a new feature from two existing features

### Example: Create a new similarity feature from two existing features

In [23]:
# get the StateInventory_28d feature
state_inventory_28d = catalog.get_feature("StateInventory_28d")

# get the CustomerInventory_28d feature
customer_inventory_28d = catalog.get_feature("CustomerInventory_28d")

# create a new feature that is the cosine similarity of the two features
customer_state_items_similarity_28d = customer_inventory_28d.cd.cosine_similarity(
    state_inventory_28d
)
customer_state_items_similarity_28d.name = "CustomerStateItemsSimilarity_28d"
customer_state_items_similarity_28d.save()

# display the feature lineage for the feature we just created
display(customer_state_items_similarity_28d.definition)

## Create a New Feature List From Existing Features

Learning objectives

In this section you will learn how to:
* create a feature list with a primary entity suited to your use case

### Example: Create a customer level feature list

In [24]:
# get a list of all the features in the catalog
all_features = catalog.list_features()

# filter to retain only those with grocery customer or state as their primary entity
child_entity = "groceryinvoice"
suitable_features = all_features.loc[[child_entity not in x for x in all_features.entities.values]]
product_entity = "groceryproduct"
suitable_features = suitable_features.loc[
    [product_entity not in x for x in suitable_features.entities.values]
]

# show the features
display(suitable_features)

In [25]:
# create a new feature list from the 12 features we just searched for
customer_features = fb.FeatureList(
    [catalog.get_feature(x) for x in suitable_features.name.values], name="CustomerFeatures"
)
customer_features.save()

# display a sample of the feature list values
display(customer_features.preview(observation_set))

In [26]:
# list the feature lists in the catalog
catalog.list_feature_lists()

## Next Steps

Now that you've completed the quick-start reusing features tutorial, you can put your knowledge into practice or learn more:<br>
1. Learn more about materializing features via the "Deep Dive Materializing Features" tutorial
2. Learn more about feature engineering via the "Deep Dive Feature Engineering" tutorial
3. Learn about data modeling via the "Deep Dive Data Modeling" tutorial