This example shows how to generate recommendations for a user with a given `customer_id`.
You also need to specify the month you want to generate recommendations for
(recommendations for winter months and summer months should differ greatly).

I wrote this code to sanity check that everything works.
The code should be rewritten as transformer scripts.

In [None]:
import hsfs

conn = hsfs.connection()
fs = conn.get_feature_store()

## Initialize Feature Views

(Skip this part if you've already initialized these features views.)

In [None]:
customers_fg = fs.get_feature_group("customers")
articles_fg = fs.get_feature_group("articles")

customer_fv = fs.create_feature_view(
    name='customers_fv',
    query=customers_fg.select_all()
)

articles_fv = fs.create_feature_view(
    name='articles_fv',
    query=articles_fg.select_all()
)

customer_fv.init_serving()
articles_fv.init_serving(batch=True)

## Retrieve Candidate Items

We will retrieve 100 candidate items. To do this we must.
1. Generate a query embedding of the user.
2. Find the 100 closest item embeddings to the query embedding.

For the first part we need to:
- Preprocess "month of purchase" feature.
- Retrieve the "age" feature from customers_fv.

We start with an arbitrary user ID:

In [None]:
customer_id = "f6e35e1902674780464e8bc0f809cb5ae14883212b4f68b35b31de2facdb846f"

# Let's say the customer buys something in July.
month_of_purchase = 7

and by the end of the notebook we will have some recommendations.

In [None]:
query_features = {"customer_id" : customer_id}

# Retrieve customer features (age, postal_code) of customer with id customer_id.
customer_fv = fs.get_feature_view("customers_fv", 1)

customer_features = customer_fv.get_feature_vector({"customer_id" : customer_id})

query_features["age"] = customer_features[1]

In [None]:
# Next we need to preprocess the month of the purchase.

import numpy as np

def month_to_unit_circle(month):
    zero_indexed_month = month - 1
    C = 2*np.pi/12
    month_sin = np.sin(zero_indexed_month*C)
    month_cos = np.cos(zero_indexed_month*C)
    return month_sin, month_cos

query_features["month_sin"], query_features["month_cos"] = month_to_unit_circle(month_of_purchase)

query_features

Now we have all the query features. Let's load our retrieval model to generate the query embedding.

In [None]:
import hsml

# connect with Hopsworks
conn = hsml.connection()

# get Hopsworks Model Serving
ms = conn.get_model_serving()

# get deployment object
deployment = ms.get_deployment("querymodel")

query_emb = deployment.predict({"instances" : [query_features]})["predictions"][0]

query_emb

We'll use this vector to retrieve the 100 closest candidate items. First we'll need to connect to our OpenSearch client.

In [None]:
# Next we need to feed this embedding to the open search engine.

import hopsworks
from opensearchpy import OpenSearch

connection = hopsworks.connection()
project = connection.get_project()
opensearch_api = project.get_opensearch_api()

client = OpenSearch(**opensearch_api.get_default_py_config())

index_name = opensearch_api.get_project_index("candidate_index")

Next we'll do the actual search.

In [None]:
# Search for top 100 closest embeddings.

import pandas as pd

k = 100

query = {
  "size": k,
  "query": {
    "knn": {
      "my_vector1": {
        "vector": query_emb,
        "k": k
      }
    }
  }
}

response = client.search(
    body = query,
    index = index_name
)

hits = response["hits"]["hits"]

This query returns 100 candidate items. Note, however, that it might be the case that the customer has already bought one of these items. Let's find the article IDs of the items that the user has already bought.

In [None]:
already_bought_items_ids = fs.sql(
    f"SELECT transactions_1.article_id from transactions_1 WHERE customer_id = '{customer_id}'"
).values.reshape(-1).tolist()

print(already_bought_items_ids)

We'll make sure to exclude these items from the set of candidate items.

In [None]:
item_id_list = []
item_emb_list = []

exclude_set = set(already_bought_items_ids)

for el in hits:
    item_id = str(el["_id"])
    if item_id in exclude_set:
        continue
    item_emb = el["_source"]["my_vector1"]
    item_id_list.append(item_id)
    item_emb_list.append(item_emb)

item_id_df = pd.DataFrame({"article_id" : item_id_list})
item_emb_df = pd.DataFrame(item_emb_list).add_prefix("item_emb_")

## Ranking Model

We'll use the ranking model to make fine-grained predictions on these candidate items.
This model uses a lot of features, namely:
- All the features that the retrieval model uses.
- The query embedding generated by the retrieval model.
- The candidate embedding retrieved from OpenSearch.
- Additional item features from the articles feature group/view.

In [None]:
articles_fv = fs.get_feature_view("articles_fv", 1)
articles_features = [feat.name for feat in articles_fv.schema]
articles_data = [articles_fv.get_feature_vector({"article_id" : article_id}) for article_id in item_id_list]
articles_df = pd.DataFrame(data=articles_data, columns=articles_features)
ranking_df = item_id_df.merge(articles_df, on="article_id", how="left")

In [None]:
ranking_model_inputs = ranking_df.copy()

# Add the user features we used with our retrieval model.
ranking_model_inputs["age"] = query_features["age"]
ranking_model_inputs["month_sin"] = query_features["month_sin"]
ranking_model_inputs["month_cos"] = query_features["month_cos"]

# Add query embeddings
user_emb_df = pd.DataFrame([query_emb]).add_prefix("user_emb_")
for col in user_emb_df:
    ranking_model_inputs.loc[:,col] = user_emb_df[col][0]

# Add item embeddings.
for col in item_emb_df:
    ranking_model_inputs[col] = item_emb_df[col]
    
ranking_model_inputs

Now we have all the features we need for our ranking model. Let's load the ranking model so that we can get the input format correct.

In [None]:
ranking_deployment = ms.get_deployment("rankingdeployment")

mr = conn.get_model_registry()
model = mr.get_model(ranking_deployment.model_name, ranking_deployment.model_version)
input_schema = model.model_schema["input_schema"]["columnar_schema"]
feat_names = [feat["name"] for feat in input_schema]
ranking_model_inputs = ranking_model_inputs[feat_names]

Finally we can make our predictions. The ranking model will give us a score in range [0,1] (higher the better), and we'll print out the top-10 best predictions.

In [None]:
# Make the actual predictions.
ranking_predictions = ranking_deployment.predict({"inputs" : ranking_model_inputs.values.tolist()})["predictions"]
ranking_scores = np.asarray(ranking_predictions)[:,1] # Scores of the positive class.
ranking_df["ranking_score"] = ranking_scores
ranking_df.sort_values("ranking_score", inplace=True, ascending=False)
ranking_df.head(10)