## <span style="color:#ff5f27">👨🏻‍🏫 Train Ranking Model </span>

In this notebook, you will train a ranking model using gradient boosted trees. 

## <span style="color:#ff5f27">📝 Imports </span>

In [1]:
import pandas as pd
from catboost import CatBoostClassifier, Pool
from sklearn.metrics import classification_report, precision_recall_fscore_support
import joblib

## <span style="color:#ff5f27">🔮 Connect to Hopsworks Feature Store </span>

In [2]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

  from .autonotebook import tqdm as notebook_tqdm


Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/398
Connected. Call `.close()` to terminate connection gracefully.


In [3]:
users_fg = fs.get_feature_group(
    name="users",
    version=1,
)

videos_fg = fs.get_feature_group(
    name="videos",
    version=1,
)

rank_fg = fs.get_feature_group(
    name="ranking",
    version=1,
)

## <span style="color:#ff5f27">⚙️ Feature View Creation </span>

In [4]:
# Select features
selected_features_customers = users_fg.select_all()

fs.get_or_create_feature_view( 
    name='users',
    query=selected_features_customers,
    version=1,
)

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/398/fs/335/fv/users/version/1


<hsfs.feature_view.FeatureView at 0x7efda585f290>

In [5]:
# Select features
selected_features_articles = videos_fg.select_all()

fs.get_or_create_feature_view(
    name='videos',
    query=selected_features_articles,
    version=1,
)

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/398/fs/335/fv/videos/version/1


<hsfs.feature_view.FeatureView at 0x7efda4b14150>

In [6]:
# Select features
selected_features_ranking = rank_fg.select_except(["user_id", "video_id"])

feature_view_ranking = fs.get_or_create_feature_view(
    name='ranking',
    query=selected_features_ranking,
    labels=["label"],
    version=1,
)

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/398/fs/335/fv/ranking/version/1


## <span style="color:#ff5f27">🗄️ Train Data loading </span>

In [7]:
X_train, X_val, y_train, y_val = feature_view_ranking.train_test_split(
    test_size=0.1,
    description='Ranking training dataset',
)

X_train.head(3)

Finished: Reading data from Hopsworks, using ArrowFlight (19.09s) 




Unnamed: 0,category,views,likes,video_length,upload_date,gender,age,country
0,Entertainment,241125,125268,58,2023-08-25,Female,33,U.S. Outlying Islands
1,Music,5990,4262,192,2022-10-18,Male,57,Luxembourg
2,Cooking,3180,1002,40,2023-02-04,Male,41,Jamaica


In [8]:
y_train.head(3)

Unnamed: 0,label
0,1
1,0
2,0


## <span style="color:#ff5f27">🏃🏻‍♂️ Model Training </span>

Let's train a model.

In [9]:
cat_features = list(
    X_train.select_dtypes(include=['string', 'object']).columns
)

pool_train = Pool(X_train, y_train, cat_features=cat_features)
pool_val = Pool(X_val, y_val, cat_features=cat_features)

model = CatBoostClassifier(
    learning_rate=0.2,
    iterations=100,
    depth=10,
    scale_pos_weight=10,
    early_stopping_rounds=5,
    use_best_model=True,
)

model.fit(
    pool_train, 
    eval_set=pool_val,
)

0:	learn: 0.6048009	test: 0.6050235	best: 0.6050235 (0)	total: 195ms	remaining: 19.3s
1:	learn: 0.5468945	test: 0.5472990	best: 0.5472990 (1)	total: 328ms	remaining: 16.1s
2:	learn: 0.5077420	test: 0.5082995	best: 0.5082995 (2)	total: 515ms	remaining: 16.6s
3:	learn: 0.4808692	test: 0.4815537	best: 0.4815537 (3)	total: 563ms	remaining: 13.5s
4:	learn: 0.4623195	test: 0.4631163	best: 0.4631163 (4)	total: 688ms	remaining: 13.1s
5:	learn: 0.4495122	test: 0.4504052	best: 0.4504052 (5)	total: 745ms	remaining: 11.7s
6:	learn: 0.4406975	test: 0.4416729	best: 0.4416729 (6)	total: 782ms	remaining: 10.4s
7:	learn: 0.4346626	test: 0.4357079	best: 0.4357079 (7)	total: 838ms	remaining: 9.63s
8:	learn: 0.4305576	test: 0.4316620	best: 0.4316620 (8)	total: 871ms	remaining: 8.81s
9:	learn: 0.4277847	test: 0.4289386	best: 0.4289386 (9)	total: 902ms	remaining: 8.11s
10:	learn: 0.4259234	test: 0.4271217	best: 0.4271217 (10)	total: 954ms	remaining: 7.72s
11:	learn: 0.4246832	test: 0.4259156	best: 0.4259156

<catboost.core.CatBoostClassifier at 0x7efda4b5a350>

## <span style="color:#ff5f27">👮🏻‍♂️ Model Validation </span>

Next, you'll evaluate how well the model performs on the validation data.

In [10]:
preds = model.predict(pool_val)

precision, recall, fscore, _ = precision_recall_fscore_support(y_val, preds, average="binary")

metrics = {
    "precision" : precision,
    "recall" : recall,
    "fscore" : fscore,
}
print(classification_report(y_val, preds))

              precision    recall  f1-score   support

           0       0.00      0.00      0.00     63877
           1       0.36      1.00      0.53     36037

    accuracy                           0.36     99914
   macro avg       0.18      0.50      0.27     99914
weighted avg       0.13      0.36      0.19     99914





In [11]:
feat_to_score = {
    feature: score 
    for feature, score 
    in zip(
        X_train.columns, 
        model.feature_importances_,
    )
}

feat_to_score = dict(
    sorted(
        feat_to_score.items(),
        key=lambda item: item[1],
        reverse=True,
    )
)
feat_to_score

{'video_length': 21.614086566171583,
 'likes': 18.48926306069348,
 'category': 17.317338298151636,
 'age': 11.869562955866614,
 'views': 11.271824597482247,
 'gender': 11.052552315887501,
 'country': 8.385372205746922,
 'upload_date': 0.0}

It can be seen that the model places high importance on user and item embedding features. Consequently, better trained embeddings could yield a better ranking model.

Finally, you'll save your model.

In [12]:
joblib.dump(model, 'ranking_model.pkl')

['ranking_model.pkl']

### <span style="color:#ff5f27">💾  Upload Model to Model Registry </span>

You'll upload the model to the Hopsworks Model Registry.

In [13]:
# Connect to Hopsworks Model Registry
mr = project.get_model_registry()

Connected. Call `.close()` to terminate connection gracefully.


In [14]:
from hsml.schema import Schema
from hsml.model_schema import ModelSchema

input_example = X_train.sample().to_dict("records")
input_schema = Schema(X_train)
output_schema = Schema(y_train)
model_schema = ModelSchema(input_schema, output_schema)

ranking_model = mr.python.create_model(
    name="ranking_model", 
    metrics=metrics,
    model_schema=model_schema,
    input_example=input_example,
    description="Ranking model that scores item candidates",
)
ranking_model.save("ranking_model.pkl")

Uploading model files (0 dirs, 0 files):  17%|███████████████▏                                                                           | 1/6 [00:00<00:01,  3.12it/s]
Uploading: 0.000%|                                                                                                                 | 0/962281 elapsed<00:00 remaining<?[A
Uploading: 100.000%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 962281/962281 elapsed<00:02 remaining<00:00[A
Uploading input_example and model_schema:  33%|██████████████████████████████                                                            | 2/6 [00:02<00:06,  1.55s/it]
Uploading: 0.000%|                                                                                                                    | 0/158 elapsed<00:00 remaining<?[A
Uploading: 100.000%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 158/158 elapsed<00:01

Model created, explore it at https://c.app.hopsworks.ai:443/p/398/models/ranking_model/1





Model(name: 'ranking_model', version: 1)

---
## <span style="color:#ff5f27">⏩️ Next Steps </span>

Now you have trained both a retrieval and a ranking model, which will allow you to generate recommendations for users. In the next notebook, you'll take a look at how you can deploy these models with the `HSML` library.