This example shows how to generate recommendations for a user with a given `customer_id`.
You also need to specify the month you want to generate recommendations for
(recommendations for winter months and summer months should differ greatly).

I wrote this code to sanity check that everything works.
The code should be rewritten as transformer scripts.

In [1]:
import hsfs

conn = hsfs.connection()
fs = conn.get_feature_store()

Connected. Call `.close()` to terminate connection gracefully.


## Initialize Feature Views

(Skip this part if you've already initialized these features views.)

In [2]:
customers_fg = fs.get_feature_group("customers")
articles_fg = fs.get_feature_group("articles")

customer_fv = fs.create_feature_view(
    name='customers_fv',
    query=customers_fg.select_all()
)

articles_fv = fs.create_feature_view(
    name='articles_fv',
    query=articles_fg.select_all()
)

customer_fv.init_serving()
articles_fv.init_serving(batch=True)



Feature view created successfully, explore it at https://35.205.254.211/p/120/fs/68/fv/customers_fv/version/5
Feature view created successfully, explore it at https://35.205.254.211/p/120/fs/68/fv/articles_fv/version/5


## Retrieve Candidate Items

We will retrieve 100 candidate items. To do this we must.
1. Generate a query embedding of the user.
2. Find the 100 closest item embeddings to the query embedding.

For the first part we need to:
- Preprocess "month of purchase" feature.
- Retrieve the "age" feature from customers_fv.

We start with an arbitrary user ID:

In [3]:
customer_id = "f6e35e1902674780464e8bc0f809cb5ae14883212b4f68b35b31de2facdb846f"

# Let's say the customer buys something in July.
month_of_purchase = 7

and by the end of the notebook we will have some recommendations.

In [4]:
query_features = {"customer_id" : customer_id}

# Retrieve customer features (age, postal_code) of customer with id customer_id.
customer_fv = fs.get_feature_view("customers_fv", 1)

customer_features = customer_fv.get_feature_vector({"customer_id" : customer_id})

query_features["age"] = customer_features[1]
print(query_features)

{'customer_id': 'f6e35e1902674780464e8bc0f809cb5ae14883212b4f68b35b31de2facdb846f', 'age': 22.0}




In [5]:
# Next we need to preprocess the month of the purchase.

import numpy as np

def month_to_unit_circle(month):
    zero_indexed_month = month - 1
    C = 2*np.pi/12
    month_sin = np.sin(zero_indexed_month*C)
    month_cos = np.cos(zero_indexed_month*C)
    return month_sin, month_cos

query_features["month_sin"], query_features["month_cos"] = month_to_unit_circle(month_of_purchase)

query_features

{'customer_id': 'f6e35e1902674780464e8bc0f809cb5ae14883212b4f68b35b31de2facdb846f',
 'age': 22.0,
 'month_sin': 1.2246467991473532e-16,
 'month_cos': -1.0}

Now we have all the query features. Let's load our retrieval model to generate the query embedding.

In [None]:
import hsml

# connect with Hopsworks
conn = hsml.connection()

# get Hopsworks Model Serving
ms = conn.get_model_serving()

# get deployment object
deployment = ms.get_deployment("querymodel")

query_emb = deployment.predict({"instances" : [query_features]})["predictions"][0]

query_emb

We'll use this vector to retrieve the 100 closest candidate items. First we'll need to connect to our OpenSearch client.

In [7]:
# Next we need to feed this embedding to the open search engine.

import hopsworks
from opensearchpy import OpenSearch

connection = hopsworks.connection()
project = connection.get_project()
opensearch_api = project.get_opensearch_api()

client = OpenSearch(**opensearch_api.get_default_py_config())

index_name = opensearch_api.get_project_index("candidate_index")



Connected. Call `.close()` to terminate connection gracefully.


Next we'll do the actual search.

In [None]:
# Search for top 100 closest embeddings.

import pandas as pd

k = 100

query = {
  "size": k,
  "query": {
    "knn": {
      "my_vector1": {
        "vector": query_emb,
        "k": k
      }
    }
  }
}

response = client.search(
    body = query,
    index = index_name
)

hits = response["hits"]["hits"]
print(hits)

In [9]:
import os
os.environ["ELASTIC_ENDPOINT"]

'10.132.0.40:9300'

This query returns 100 candidate items. Note, however, that it might be the case that the customer has already bought one of these items. Let's find the article IDs of the items that the user has already bought.

In [13]:
already_bought_items_ids = fs.sql(
    f"SELECT transactions_2.article_id from transactions_2 WHERE customer_id = '{customer_id}'"
).values.reshape(-1).tolist()

print(already_bought_items_ids)

2022-06-07 13:39:39,366 INFO: USE `rec_featurestore`
2022-06-07 13:39:40,086 INFO: SELECT transactions_2.article_id from transactions_2 WHERE customer_id = 'f6e35e1902674780464e8bc0f809cb5ae14883212b4f68b35b31de2facdb846f'
['694589005', '840351003', '782616029', '630696004', '356289075', '860334001', '804612002', '554479005', '853612001', '729928001', '556625006', '552473021', '717490008', '554479001', '819259001', '823685002', '640727001', '804612001', '678342005', '682672001', '855080001', '539209001', '877069002', '640716001', '662257006']


We'll make sure to exclude these items from the set of candidate items.

In [14]:
item_id_list = []
item_emb_list = []

exclude_set = set(already_bought_items_ids)

for el in hits:
    item_id = str(el["_id"])
    if item_id in exclude_set:
        continue
    item_emb = el["_source"]["my_vector1"]
    item_id_list.append(item_id)
    item_emb_list.append(item_emb)

item_id_df = pd.DataFrame({"article_id" : item_id_list})
item_emb_df = pd.DataFrame(item_emb_list).add_prefix("item_emb_")

## Ranking Model

We'll use the ranking model to make fine-grained predictions on these candidate items.
This model uses a lot of features, namely:
- All the features that the retrieval model uses.
- The query embedding generated by the retrieval model.
- The candidate embedding retrieved from OpenSearch.
- Additional item features from the articles feature group/view.

In [15]:
articles_fv = fs.get_feature_view("articles_fv", 1)
articles_features = [feat.name for feat in articles_fv.schema]
articles_data = [articles_fv.get_feature_vector({"article_id" : article_id}) for article_id in item_id_list]
articles_df = pd.DataFrame(data=articles_data, columns=articles_features)
ranking_df = item_id_df.merge(articles_df, on="article_id", how="left")



In [16]:
ranking_model_inputs = ranking_df.copy()

# Add the user features we used with our retrieval model.
ranking_model_inputs["age"] = query_features["age"]
ranking_model_inputs["month_sin"] = query_features["month_sin"]
ranking_model_inputs["month_cos"] = query_features["month_cos"]

# Add query embeddings
# user_emb_df = pd.DataFrame([query_emb]).add_prefix("user_emb_")
# for col in user_emb_df:
#     ranking_model_inputs.loc[:,col] = user_emb_df[col][0]

# # Add item embeddings.
# for col in item_emb_df:
#     ranking_model_inputs[col] = item_emb_df[col]
    
ranking_model_inputs

Unnamed: 0,article_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,graphical_appearance_no,graphical_appearance_name,colour_group_code,colour_group_name,...,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,age,month_sin,month_cos
0,685601034,685601,GREG FANCY,59,Swimwear bottom,Swimwear,1010001,All over pattern,93,Dark Green,...,Menswear,3,Menswear,26,Men Underwear,1018,Swimwear,22.0,1.224647e-16,-1.0
1,685604074,685604,TOM FANCY,59,Swimwear bottom,Swimwear,1010001,All over pattern,50,Other Pink,...,Menswear,3,Menswear,26,Men Underwear,1018,Swimwear,22.0,1.224647e-16,-1.0
2,903486003,903486,Cycling HW Jersey Short,274,Shorts,Garment Lower body,1010016,Solid,30,Other Orange,...,Divided,2,Divided,53,Divided Collection,1025,Shorts,22.0,1.224647e-16,-1.0
3,849217005,849217,Burton pullon,274,Shorts,Garment Lower body,1010016,Solid,6,Light Grey,...,Menswear,3,Menswear,21,Contemporary Casual,1025,Shorts,22.0,1.224647e-16,-1.0
4,892280003,892280,San fran HW destroy,274,Shorts,Garment Lower body,1010023,Denim,71,Light Blue,...,Ladieswear,1,Ladieswear,15,Womens Everyday Collection,1025,Shorts,22.0,1.224647e-16,-1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,792312001,792312,CS Citron,274,Shorts,Garment Lower body,1010016,Solid,22,Yellow,...,Ladieswear,1,Ladieswear,15,Womens Everyday Collection,1025,Shorts,22.0,1.224647e-16,-1.0
96,817416005,817416,Sunset HW utility,274,Shorts,Garment Lower body,1010016,Solid,12,Light Beige,...,Ladieswear,1,Ladieswear,15,Womens Everyday Collection,1025,Shorts,22.0,1.224647e-16,-1.0
97,872618001,872618,Britney rib 2p shorts (J),296,Pyjama bottom,Nightwear,1010001,All over pattern,9,Black,...,Lingeries/Tights,1,Ladieswear,62,"Womens Nightwear, Socks & Tigh",1017,"Under-, Nightwear",22.0,1.224647e-16,-1.0
98,453445034,453445,BB ASTON,59,Swimwear bottom,Swimwear,1010014,Placement print,82,Turquoise,...,"Children Accessories, Swimwear",4,Baby/Children,43,"Kids Accessories, Swimwear & D",1018,Swimwear,22.0,1.224647e-16,-1.0


Now we have all the features we need for our ranking model. Let's load the ranking model so that we can get the input format correct.

In [17]:
ranking_deployment = ms.get_deployment("rankingdeployment")

mr = conn.get_model_registry()
model = mr.get_model(ranking_deployment.model_name, ranking_deployment.model_version)
input_schema = model.model_schema["input_schema"]["columnar_schema"]
feat_names = [feat["name"] for feat in input_schema]
ranking_model_inputs = ranking_model_inputs[feat_names]



Finally we can make our predictions. The ranking model will give us a score in range [0,1] (higher the better), and we'll print out the top-10 best predictions.

In [18]:
# Make the actual predictions.
ranking_predictions = ranking_deployment.predict({"inputs" : ranking_model_inputs.values.tolist()})["predictions"]
ranking_scores = np.asarray(ranking_predictions)[:,1] # Scores of the positive class.
ranking_df["ranking_score"] = ranking_scores
ranking_df.sort_values("ranking_score", inplace=True, ascending=False)
ranking_df.head(10)

Unnamed: 0,article_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,graphical_appearance_no,graphical_appearance_name,colour_group_code,colour_group_name,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,ranking_score
20,765086003,765086,Cycling Shorts,274,Shorts,Garment Lower body,1010001,All over pattern,12,Light Beige,...,Shorts,D,Divided,2,Divided,53,Divided Collection,1025,Shorts,0.743986
24,883068003,883068,Margarita RW Pull-On Shorts,274,Shorts,Garment Lower body,1010016,Solid,51,Light Pink,...,Shorts,D,Divided,2,Divided,53,Divided Collection,1025,Shorts,0.735152
2,903486003,903486,Cycling HW Jersey Short,274,Shorts,Garment Lower body,1010016,Solid,30,Other Orange,...,Shorts,D,Divided,2,Divided,53,Divided Collection,1025,Shorts,0.732437
75,884405002,884405,Lola ol RW Denim Shorts,274,Shorts,Garment Lower body,1010023,Denim,72,Blue,...,Shorts,D,Divided,2,Divided,53,Divided Collection,1025,Shorts,0.731983
4,892280003,892280,San fran HW destroy,274,Shorts,Garment Lower body,1010023,Denim,71,Light Blue,...,Shorts,A,Ladieswear,1,Ladieswear,15,Womens Everyday Collection,1025,Shorts,0.731327
31,603584002,603584,ED Hotpants (1),274,Shorts,Garment Lower body,1010023,Denim,71,Light Blue,...,Shorts,A,Ladieswear,1,Ladieswear,15,Womens Everyday Collection,1025,Shorts,0.731327
65,865073001,865073,San fran HW,274,Shorts,Garment Lower body,1010023,Denim,71,Light Blue,...,Shorts,A,Ladieswear,1,Ladieswear,15,Womens Everyday Collection,1025,Shorts,0.731327
93,490113020,490113,Lola RW Denim Shorts,274,Shorts,Garment Lower body,1010016,Solid,72,Blue,...,Shorts,D,Divided,2,Divided,53,Divided Collection,1025,Shorts,0.725912
34,779725002,779725,Timeless Cheeky V- Brief,59,Swimwear bottom,Swimwear,1010017,Stripe,10,White,...,Swimwear,B,Lingeries/Tights,1,Ladieswear,60,"Womens Swimwear, beachwear",1018,Swimwear,0.723124
69,653664008,653664,Bootylicious Top,298,Bikini top,Swimwear,1010017,Stripe,10,White,...,Swimwear,B,Lingeries/Tights,1,Ladieswear,60,"Womens Swimwear, beachwear",1018,Swimwear,0.723124
