# ODSC FeatureByte Workshop

Library imports and connection to the FeatureByte server.

In [None]:
import featurebyte as fb
import pandas as pd
from packaging import version

# this script requires version 0.5.0 or higher
print(f"\nFeatureByte Version: {fb.version}\n")
if version.parse(fb.version) < version.parse('0.5.0'):
    raise Exception("Please upgrade your FeatureByte library to version 0.5.0 or higher")

Next we need to connect to the FeatureByte server.<br>
NOTE: Before the first time you run this cell, type the api token into the script.

In [None]:
#############################################################################################
# paste your API token here
#############################################################################################
api_token = ""

# connect to the FeatureByte server
user_profile_name = 'creditcardworkshop'
try:
    fb.use_profile(user_profile_name)
except Exception as e:
    print("Setting up your user profile...")
    if api_token == "":
        raise Exception("Please paste your API token in the code above")
    server_url = "https://demo.featurebyte.com/api/v1"
    fb.register_profile(user_profile_name, server_url, api_token)
    fb.use_profile(user_profile_name)

catalog_name = 'ODSC West workshop 2023'
catalog = fb.activate_and_get_catalog(catalog_name)

Load an observation set that will be used later for previewing materialized values of feature lists

In [None]:
sample_table_name = 'Workshop Preview Sample Observations'
obs_set = catalog.get_observation_table(sample_table_name).to_pandas()
display(obs_set)

Create views

In [None]:
# create a view for each table
customer_view = catalog.get_view("Customer_Profile")
card_view = catalog.get_view("Card_Details")
state_view = catalog.get_view("State_Details")
transaction_view = catalog.get_view("Transactions")
fraud_view = catalog.get_view("Fraud_Status")
transactiongroup_view = catalog.get_view("Transaction_Types")

## Declaring New Features

We will create feature with an attribute signal from the state view.

In [None]:
#######################################################################
# template python code
#######################################################################

# code template for joining two views
below_poverty_line = state_view.BelowPovertyLevel.as_feature('PopulationBelowPovertyLevel')

# display some sample values
below_poverty_line.preview(obs_set)

We will join two views to enhance the data. <br>
Join the transaction view (the Python object we created earlier is called transaction_view) and the transaction groups view (the Python object we created earlier is called transactiongroup_view), into a new view called joined_view.

In [None]:
#######################################################################
# template python code
#######################################################################

# code template for joining two views
joined_view = transaction_view.join(transactiongroup_view)

Maybe customers with diverse shopping habits are more likely to have their credit card details stolen. So let's create a feature with a diversity signal.<br><br>

Create a bucketing feature that uses the joined view, and [aggregate](https://docs.featurebyte.com/0.6/reference/featurebyte.api.groupby.GroupBy.aggregate_over/) the count of transaction groups across 168 days, grouped by credit card. Name the feature "Card Bucketing Product Groups 168 days"

In [None]:
joined_view.preview(5)

In [None]:
#######################################################################
# template python code
#######################################################################

# code template for bucketing features
window_period = "168d"
feature_name = "Card Bucketing Product Groups 168 days"
bucketing_feature = joined_view.groupby("AccountID", "TransactionGroup").aggregate_over(
    None,
    fb.AggFunc.COUNT,
    windows=[window_period],
    feature_names=[feature_name],
)

Create a feature with a diversity signal, calculating the entropy of the bucketing feature for each credit card.<br>
Name the feature "Card purchase diversity 168 days"

In [None]:
#######################################################################
# template python code
#######################################################################

# code template for diversity feature
diversity_feature = bucketing_feature[feature_name].cd.entropy()
diversity_feature.name = "Transaction Diversity by Card"

Create a new feature list containing our 3 new features

In [None]:
# create a feature list
three_new_features = fb.FeatureList([
    below_poverty_line,
    bucketing_feature,
    diversity_feature,
], name='Workshop 3 new features')

# preview the values of the feature list
three_new_features.preview(obs_set)

## Observation Table

Create an [observation table](https://docs.featurebyte.com/0.6/reference/core/observation_table/) that will be used for creating training data. 
1. Use the transactions view
2. Filter the view to choose transactions occurring 01-Jul-2022 to 30-Jun-2023
3. Sample for 100 examples
4. Name the observation table “100 transactions 01-Jul-2022 to 30-Jun-2023 as at 12-Oct-2023”
5. The entity column name is CardTransactionID and the card serving name is CARDTRANSACTIONID
5. Map the column names to serving names
6. Run the cell to create the table.

In [None]:
transaction_view.preview(5)

In [None]:
#######################################################################
# template python code
#######################################################################
view_filter = (transaction_view["Timestamp"] >= pd.to_datetime("2022-07-01")) & (
    transaction_view["Timestamp"] < pd.to_datetime("2023-07-01")
)

# observation table template code
observation_table = transaction_view[view_filter].create_observation_table(
    name="100 transactions 01-Jul-2022 to 30-Jun-2023 as at 12-Oct-2023",
    sample_rows=100,
    columns=["Timestamp", "CardTransactionID"],
    columns_rename_mapping={
        "Timestamp": "POINT_IN_TIME",
        "CardTransactionID": "CARDTRANSACTIONID",
    }
)

In [None]:
# display a sample of the observation table rows
display(observation_table.sample(10))

## Declare a Target

Declare a [target](https://docs.featurebyte.com/0.6/reference/core/target/) that:
1. Uses fraud_view
2. Is a lookup attribute of the Status column, 30 days after the point in time of the transaction.
3. Is named "Transaction fraud status after 30 days"
4. Is stored in a variable called new_target

In [None]:
#######################################################################
# template python code
#######################################################################

# template target declaration code here
new_target = fraud_view.Status.as_target(
    target_name="Transaction fraud status after 30 days", 
    offset="30d"
)
new_target.save()

Compute the target values

In [None]:
# Materialize the target
training_data_target_table = new_target.compute_target_table(
    observation_table,
    observation_table_name="Workshop target values",
)

# display a sample of the target table rows
display(training_data_target_table.sample(5))

## Training Data

Load the feature list called "Strong_Features_Selection_for_Fraud"

In [None]:
# code template for loading a feature list
feature_list = catalog.get_feature_list('Strong_Features_Selection_for_Fraud')

Create a historical feature table that will be used for training a machine learning model.
1. Use the target table we just created
2. Use the feature list we just loaded
3. Name the feature table “100 rows of card transaction training data 01-Jul-2022 to 30-Jun-2023 as at 05-Sep-2023”
5. Run the cell to create the table

In [None]:
#######################################################################
# template python code
#######################################################################

# historical feature table template here
training_table = feature_list.compute_historical_feature_table(
    training_data_target_table,
    historical_feature_table_name="100 rows of card transaction training data 01-Jul-2022 to 30-Jun-2023 as at 05-Sep-2023"
)

View a sample of the training data

In [None]:
# code to view a sample of the training data
display(training_table.sample())

## New Use Case

In [None]:
# create a context
context = fb.Context.create(name="Credit Card", entity_names=["Card"])

# create a target
target = transaction_view.groupby("AccountID").forward_aggregate(
    method="sum",
    value_column="Amount",
    window="30d",
    target_name="Total Spend 30 days",
)

target.save()

# create a use case
use_case = fb.UseCase.create(
    name="Credit Card Total Spend 30 days",
    target_name=target.name,
    context_name=context.name,
    description="Predict how much a customer will spend in the next 30 days using specific Credit Card",
)
