## <span style="color:#ff5f27">👨🏻‍🏫 Train Ranking Model </span>

In this notebook, you will train a ranking model using gradient boosted trees. 

In [1]:
import time

# Start the timer
notebook_start_time = time.time()

## <span style="color:#ff5f27">📝 Imports </span>

In [2]:
!pip install -r requirements.txt


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
import pandas as pd
from catboost import CatBoostClassifier, Pool
from sklearn.metrics import classification_report, precision_recall_fscore_support
import joblib

## <span style="color:#ff5f27">🔮 Connect to Hopsworks Feature Store </span>

In [4]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/15551
Connected. Call `.close()` to terminate connection gracefully.


In [5]:
customers_fg = fs.get_feature_group(
    name="customers",
    version=1,
)

articles_fg = fs.get_feature_group(
    name="articles",
    version=2,
)

trans_fg = fs.get_feature_group(
    name="transactions",
    version=1,
)

rank_fg = fs.get_feature_group(
    name="ranking",
    version=1,
)

## <span style="color:#ff5f27">⚙️ Feature View Creation </span>

In [8]:
# Select features
selected_features_customers = customers_fg.select_all()

fs.get_or_create_feature_view( 
    name='customers',
    query=selected_features_customers,
    version=1,
)

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/15551/fs/15471/fv/customers/version/1


<hsfs.feature_view.FeatureView at 0x31b7c8f10>

In [9]:
# Select features
selected_features_articles = articles_fg.select_except(['embeddings']) 

fs.get_or_create_feature_view(
    name='articles',
    query=selected_features_articles,
    version=1,
)

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/15551/fs/15471/fv/articles/version/1


<hsfs.feature_view.FeatureView at 0x31b7c0e50>

In [10]:
# Select features
selected_features_ranking = rank_fg.select_except(["customer_id", "article_id"])

feature_view_ranking = fs.get_or_create_feature_view(
    name='ranking',
    query=selected_features_ranking,
    labels=["label"],
    version=1,
)

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/15551/fs/15471/fv/ranking/version/1


## <span style="color:#ff5f27">🗄️ Train Data loading </span>

In [11]:
X_train, X_val, y_train, y_val = feature_view_ranking.train_test_split(
    test_size=0.1,
    description='Ranking training dataset',
)

X_train.head(3)

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (206.51s) 



Unnamed: 0,age,month_sin,month_cos,product_type_name,product_group_name,graphical_appearance_name,colour_group_name,perceived_colour_value_name,perceived_colour_master_name,department_name,index_name,index_group_name,section_name,garment_group_name
0,19.0,-0.866025,0.5,Trousers,Garment Lower body,Melange,Grey,Dusty Light,Grey,Basic 1,Divided,Divided,Divided Basics,Jersey Basic
2,33.0,0.866025,-0.5,Belt,Accessories,Solid,Black,Dark,Black,Belts,Ladies Accessories,Ladieswear,Womens Big accessories,Accessories
3,50.0,-0.5,0.866025,Dress,Garment Full body,Solid,Dark Beige,Dark,Beige,Knitwear,Ladieswear,Ladieswear,Womens Everyday Collection,Knitwear


In [12]:
y_train.head(3)

Unnamed: 0,label
0,0
2,0
3,1


## <span style="color:#ff5f27">🏃🏻‍♂️ Model Training </span>

Let's train a model.

In [13]:
cat_features = list(
    X_train.select_dtypes(include=['string', 'object']).columns
)

pool_train = Pool(X_train, y_train, cat_features=cat_features)
pool_val = Pool(X_val, y_val, cat_features=cat_features)

model = CatBoostClassifier(
    learning_rate=0.2,
    iterations=100,
    depth=10,
    scale_pos_weight=10,
    early_stopping_rounds=5,
    use_best_model=True,
)

model.fit(
    pool_train, 
    eval_set=pool_val,
)

0:	learn: 0.6852808	test: 0.6856750	best: 0.6856750 (0)	total: 184ms	remaining: 18.2s
1:	learn: 0.6772188	test: 0.6778851	best: 0.6778851 (1)	total: 336ms	remaining: 16.5s
2:	learn: 0.6712416	test: 0.6721928	best: 0.6721928 (2)	total: 449ms	remaining: 14.5s
3:	learn: 0.6668033	test: 0.6679448	best: 0.6679448 (3)	total: 575ms	remaining: 13.8s
4:	learn: 0.6632542	test: 0.6645623	best: 0.6645623 (4)	total: 675ms	remaining: 12.8s
5:	learn: 0.6596857	test: 0.6612205	best: 0.6612205 (5)	total: 797ms	remaining: 12.5s
6:	learn: 0.6592134	test: 0.6608429	best: 0.6608429 (6)	total: 850ms	remaining: 11.3s
7:	learn: 0.6569223	test: 0.6588199	best: 0.6588199 (7)	total: 964ms	remaining: 11.1s
8:	learn: 0.6526590	test: 0.6546167	best: 0.6546167 (8)	total: 1.09s	remaining: 11s
9:	learn: 0.6491842	test: 0.6511218	best: 0.6511218 (9)	total: 1.19s	remaining: 10.7s
10:	learn: 0.6471611	test: 0.6492965	best: 0.6492965 (10)	total: 1.29s	remaining: 10.4s
11:	learn: 0.6457556	test: 0.6481438	best: 0.6481438 (

<catboost.core.CatBoostClassifier at 0x31b695990>

## <span style="color:#ff5f27">👮🏻‍♂️ Model Validation </span>

Next, you'll evaluate how well the model performs on the validation data.

In [14]:
preds = model.predict(pool_val)

precision, recall, fscore, _ = precision_recall_fscore_support(y_val, preds, average="binary")

metrics = {
    "precision" : precision,
    "recall" : recall,
    "fscore" : fscore,
}
print(classification_report(y_val, preds))

              precision    recall  f1-score   support

           0       0.95      0.66      0.78    109103
           1       0.15      0.64      0.25     10267

    accuracy                           0.66    119370
   macro avg       0.55      0.65      0.51    119370
weighted avg       0.88      0.66      0.74    119370



It can be seen that the model has a low F1-score on the positive class (higher is better). The performance could potentially be improved by adding more features to the dataset, e.g. image embeddings.

Let's see which features your model considers important.

In [15]:
feat_to_score = {
    feature: score 
    for feature, score 
    in zip(
        X_train.columns, 
        model.feature_importances_,
    )
}

feat_to_score = dict(
    sorted(
        feat_to_score.items(),
        key=lambda item: item[1],
        reverse=True,
    )
)
feat_to_score

{'month_cos': 16.26058805492722,
 'section_name': 12.448872241732783,
 'product_type_name': 11.02442694398654,
 'product_group_name': 8.649829404999933,
 'age': 8.635678193738908,
 'index_name': 6.610860268903256,
 'garment_group_name': 6.176435577741327,
 'department_name': 5.633598814176692,
 'month_sin': 5.443556811474526,
 'graphical_appearance_name': 4.393307941356726,
 'perceived_colour_master_name': 4.119596374216312,
 'index_group_name': 3.7732437756750663,
 'perceived_colour_value_name': 3.5972351610381046,
 'colour_group_name': 3.232770436032624}

It can be seen that the model places high importance on user and item embedding features. Consequently, better trained embeddings could yield a better ranking model.

Finally, you'll save your model.

In [16]:
joblib.dump(model, 'ranking_model.pkl')

['ranking_model.pkl']

### <span style="color:#ff5f27">💾  Upload Model to Model Registry </span>

You'll upload the model to the Hopsworks Model Registry.

In [17]:
# Connect to Hopsworks Model Registry
mr = project.get_model_registry()

Connected. Call `.close()` to terminate connection gracefully.


In [18]:
from hsml.schema import Schema
from hsml.model_schema import ModelSchema

input_example = X_train.sample().to_dict("records")
input_schema = Schema(X_train)
output_schema = Schema(y_train)
model_schema = ModelSchema(input_schema, output_schema)

ranking_model = mr.python.create_model(
    name="ranking_model", 
    metrics=metrics,
    model_schema=model_schema,
    input_example=input_example,
    description="Ranking model that scores item candidates",
)
ranking_model.save("ranking_model.pkl")

  0%|          | 0/6 [00:00<?, ?it/s]

Uploading: 0.000%|          | 0/2587982 elapsed<00:00 remaining<?

Uploading: 0.000%|          | 0/495 elapsed<00:00 remaining<?

Uploading: 0.000%|          | 0/1274 elapsed<00:00 remaining<?

Model created, explore it at https://c.app.hopsworks.ai:443/p/15551/models/ranking_model/1


Model(name: 'ranking_model', version: 1)

---

In [19]:
# End the timer
notebook_end_time = time.time()

# Calculate and print the execution time
notebook_execution_time = notebook_end_time - notebook_start_time
print(f"⌛️ Notebook Execution time: {notebook_execution_time:.2f} seconds")

⌛️ Notebook Execution time: 509.24 seconds


---
## <span style="color:#ff5f27">⏩️ Next Steps </span>

Now you have trained both a retrieval and a ranking model, which will allow you to generate recommendations for users. In the next notebook, you'll take a look at how you can deploy these models with the `HSML` library.