<a href="https://colab.research.google.com/github/rwagler/Exercise-Digital-Business-and-Platfroms/blob/main/exercise_07_RFM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install the KumoRFM Python SDK
KumoRFM provides an SDK in Python. The Kumo SDK is available for Python 3.9 to Python 3.13

In [None]:
!pip install kumoai

In [None]:
import kumoai.experimental.rfm as rfm

You will need an API key to make calls to KumoRFM.
Use the widget below to generate one for free by clicking "Generate API Key".
If you don't have a KumoRFM account, the widget will prompt you to signup.

You will see the following when your key has been created successfully:

<div align="left">
  <img src="https://kumo-sdk-public.s3.us-west-2.amazonaws.com/rfm-colabs/api-key-created.png" width="300" />
</div>

In [None]:
import os

from google.colab import userdata
userdata.get('KUMO_API_KEY')

if not os.environ.get("KUMO_API_KEY"):
    rfm.authenticate()

In [None]:
# Initialize a Kumo client with your API key:
KUMO_API_KEY = os.environ.get("KUMO_API_KEY")
rfm.init(api_key=KUMO_API_KEY)

## Load the Data

In [None]:
root = 's3://kumo-sdk-public/rfm-datasets/ecom'
path_dict = {
    'users': f'{root}/users.parquet',
    'items': f'{root}/items.parquet',
    'views': f'{root}/views.parquet',
    'orders': f'{root}/orders.parquet',
    'returns': f'{root}/returns.parquet',
}

In [None]:
# load data into pandas dataframes
import pandas as pd

df_users = pd.read_parquet(path_dict['users'])
df_items = pd.read_parquet(path_dict['items'])
df_views = pd.read_parquet(path_dict['views'])
df_orders = pd.read_parquet(path_dict['orders'])
df_returns = pd.read_parquet(path_dict['returns'])

### Familiarize with the data structure

#### User Data

In [None]:
df_users.head(5)

In [None]:
df_users.dtypes

In [None]:
df_users.info()

#### Items

In [None]:
df_items.head(5)

In [None]:
df_items.dtypes

In [None]:
df_items.info()

#### Orders

In [None]:
df_orders.head(5)

In [None]:
df_orders.dtypes

In [None]:
df_orders.info()

#### Returns

In [None]:
df_returns.head(5)

In [None]:
df_returns.dtypes

In [None]:
df_returns.info()

#### Views

In [None]:
df_views.head(5)

In [None]:
df_views.dtypes

In [None]:
df_views.info()

## Turn the data into KumoRFM tables

In [None]:
users = rfm.LocalTable(df_users, name="users").infer_metadata()
orders = rfm.LocalTable(df_orders, name="orders").infer_metadata()
items = rfm.LocalTable(df_items, name="items").infer_metadata()
returns = rfm.LocalTable(df_returns, name="returns").infer_metadata()
views = rfm.LocalTable(df_views, name="views").infer_metadata()

### Inspecting metadata

#### Users

In [None]:
users.print_metadata()

#### Orders

In [None]:
orders.print_metadata()

#### Items

In [None]:
items.print_metadata()

#### Returns

In [None]:
returns.print_metadata()

#### Views

In [None]:
views.print_metadata()

## Create a Graph based on the Data

In [None]:
graph = rfm.LocalGraph.from_data({
    'users': df_users,
    'orders': df_orders,
    'items': df_items,
    'returns': df_returns,
    'views': df_views,
}, infer_metadata=True)

In [None]:
graph.visualize();

## Materialize Graph

In [None]:
model = rfm.KumoRFM(graph)

## Query the graph

In [None]:
def query_graph(query, indices=None):
    df = model.predict(query, indices=indices)
    display(df)
    return df

### Identify Users for churn risk prediction
- KumoRFM can predict a value for a maximum of 1000 entities, so we need to do a preliminary filitering
- Therefore, we just look at the 100 most recent returns.

In [None]:
df = df_returns.sort_values('date', ascending=False)
target_users = df['user_id'].drop_duplicates().head(100).to_list()
print(len(target_users))

### Perform Prediction
Predict the churn probability for the selected customers in the next 30 days.

In [None]:
query = "PREDICT COUNT(orders.*, 0, 30, days) > 0 FOR EACH users.user_id"

In [None]:
churn_riks = query_graph(query=query, indices=target_users)

The probabilities indicate the likelihood of each user's future order activity, where a higher false probability suggests a higher risk of churn.

In [None]:
churn_riks_sorted = churn_riks.sort_values(by='False_PROB', ascending=False)
display(churn_riks_sorted)

## Generate a Product Recommendation for the top-5 customers that are likely to churn

Generate 3 Product Recommendations for each customer.


In [None]:
df_top5 = churn_riks_sorted.head(5)
display(df_top5)

In [None]:
top5= df_top5['ENTITY'].to_list()

In [None]:
query = "PREDICT LIST_DISTINCT(orders.item_id, 0, 30, days) RANK TOP 3 FOR EACH users.user_id"

In [None]:
recommendation = query_graph(query=query, indices=top5)

In [None]:
recommendation_with_item_details = recommendation.rename(columns={'CLASS': 'item_id'})
recommendation_with_item_details = pd.merge(recommendation_with_item_details, df_items[['item_id', 'prod_name']], on='item_id', how='left')

display(recommendation_with_item_details[['ENTITY', 'item_id', 'SCORE', 'prod_name']])