## <span style="color:#ff5f27">👨🏻‍🏫 Train Ranking Model </span>

In this notebook, you will train a ranking model using gradient boosted trees. 

## <span style="color:#ff5f27">📝 Imports </span>

In [1]:
import pandas as pd
from catboost import CatBoostClassifier, Pool
from sklearn.metrics import classification_report, precision_recall_fscore_support
import joblib

## <span style="color:#ff5f27">🔮 Connect to Hopsworks Feature Store </span>

In [2]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

  from .autonotebook import tqdm as notebook_tqdm


Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/17565
Connected. Call `.close()` to terminate connection gracefully.


In [3]:
users_fg = fs.get_feature_group(
    name="users",
    version=1,
)

videos_fg = fs.get_feature_group(
    name="videos",
    version=1,
)

rank_fg = fs.get_feature_group(
    name="ranking",
    version=1,
)

## <span style="color:#ff5f27">⚙️ Feature View Creation </span>

In [4]:
# Select features
selected_features_customers = users_fg.select_all()

fs.get_or_create_feature_view( 
    name='users',
    query=selected_features_customers,
    version=1,
)

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/17565/fs/17485/fv/users/version/1


<hsfs.feature_view.FeatureView at 0x7fb0602779d0>

In [5]:
# Select features
selected_features_articles = videos_fg.select_all()

fs.get_or_create_feature_view(
    name='videos',
    query=selected_features_articles,
    version=1,
)

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/17565/fs/17485/fv/videos/version/1


<hsfs.feature_view.FeatureView at 0x7fb059c32890>

In [6]:
# Select features
selected_features_ranking = rank_fg.select_except(["user_id", "video_id"])

feature_view_ranking = fs.get_or_create_feature_view(
    name='ranking',
    query=selected_features_ranking,
    labels=["label"],
    version=1,
)

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/17565/fs/17485/fv/ranking/version/1


## <span style="color:#ff5f27">🗄️ Train Data loading </span>

In [7]:
X_train, X_val, y_train, y_val = feature_view_ranking.train_test_split(
    test_size=0.1,
    description='Ranking training dataset',
)

X_train.head(3)

/arrow/cpp/src/arrow/status.cc:137: DoAction result was not fully consumed: Cancelled: Flight cancelled call, with message: CANCELLED. Detail: Cancelled


Finished: Reading data from Hopsworks, using ArrowFlight (20.78s) 




Unnamed: 0,category,views,likes,video_length,upload_date,gender,age,country
0,Cooking,119073,7089,97,2023-01-19,Other,25,Algeria
1,Comedy,60725,4806,227,2023-10-30,Other,81,Zimbabwe
2,Travel,108802,28057,125,2022-12-10,Other,66,Angola


In [8]:
y_train.head(3)

Unnamed: 0,label
0,0
1,0
2,0


## <span style="color:#ff5f27">🏃🏻‍♂️ Model Training </span>

Let's train a model.

In [9]:
cat_features = list(
    X_train.select_dtypes(include=['string', 'object']).columns
)

pool_train = Pool(X_train, y_train, cat_features=cat_features)
pool_val = Pool(X_val, y_val, cat_features=cat_features)

model = CatBoostClassifier(
    learning_rate=0.2,
    iterations=100,
    depth=10,
    scale_pos_weight=10,
    early_stopping_rounds=5,
    use_best_model=True,
)

model.fit(
    pool_train, 
    eval_set=pool_val,
)

0:	learn: 0.6047280	test: 0.6046217	best: 0.6046217 (0)	total: 181ms	remaining: 17.9s
1:	learn: 0.5467417	test: 0.5465388	best: 0.5465388 (1)	total: 248ms	remaining: 12.2s
2:	learn: 0.5075345	test: 0.5072511	best: 0.5072511 (2)	total: 362ms	remaining: 11.7s
3:	learn: 0.4806208	test: 0.4802679	best: 0.4802679 (3)	total: 414ms	remaining: 9.94s
4:	learn: 0.4620408	test: 0.4616289	best: 0.4616289 (4)	total: 457ms	remaining: 8.68s
5:	learn: 0.4492112	test: 0.4487500	best: 0.4487500 (5)	total: 511ms	remaining: 8.01s
6:	learn: 0.4403797	test: 0.4398751	best: 0.4398751 (6)	total: 557ms	remaining: 7.4s
7:	learn: 0.4343326	test: 0.4337914	best: 0.4337914 (7)	total: 600ms	remaining: 6.89s
8:	learn: 0.4302189	test: 0.4296466	best: 0.4296466 (8)	total: 635ms	remaining: 6.42s
9:	learn: 0.4274398	test: 0.4268414	best: 0.4268414 (9)	total: 682ms	remaining: 6.13s
10:	learn: 0.4255754	test: 0.4249552	best: 0.4249552 (10)	total: 743ms	remaining: 6.01s
11:	learn: 0.4243326	test: 0.4236946	best: 0.4236946 

<catboost.core.CatBoostClassifier at 0x7fb0599e6d70>

## <span style="color:#ff5f27">👮🏻‍♂️ Model Validation </span>

Next, you'll evaluate how well the model performs on the validation data.

In [10]:
preds = model.predict(pool_val)

precision, recall, fscore, _ = precision_recall_fscore_support(y_val, preds, average="binary")

metrics = {
    "precision" : precision,
    "recall" : recall,
    "fscore" : fscore,
}
print(classification_report(y_val, preds))

              precision    recall  f1-score   support

           0       0.00      0.00      0.00     63624
           1       0.36      1.00      0.53     36295

    accuracy                           0.36     99919
   macro avg       0.18      0.50      0.27     99919
weighted avg       0.13      0.36      0.19     99919





In [11]:
feat_to_score = {
    feature: score 
    for feature, score 
    in zip(
        X_train.columns, 
        model.feature_importances_,
    )
}

feat_to_score = dict(
    sorted(
        feat_to_score.items(),
        key=lambda item: item[1],
        reverse=True,
    )
)
feat_to_score

{'upload_date': 20.901916787767,
 'age': 17.81349501610978,
 'video_length': 14.853599022593444,
 'country': 11.669956042301889,
 'views': 10.584542237304394,
 'likes': 10.551485001202199,
 'category': 9.153808255766801,
 'gender': 4.471197636954517}

It can be seen that the model places high importance on user and item embedding features. Consequently, better trained embeddings could yield a better ranking model.

Finally, you'll save your model.

In [12]:
joblib.dump(model, 'ranking_model.pkl')

['ranking_model.pkl']

### <span style="color:#ff5f27">💾  Upload Model to Model Registry </span>

You'll upload the model to the Hopsworks Model Registry.

In [13]:
# Connect to Hopsworks Model Registry
mr = project.get_model_registry()

Connected. Call `.close()` to terminate connection gracefully.


In [14]:
from hsml.schema import Schema
from hsml.model_schema import ModelSchema

input_example = X_train.sample().to_dict("records")
input_schema = Schema(X_train)
output_schema = Schema(y_train)
model_schema = ModelSchema(input_schema, output_schema)

ranking_model = mr.python.create_model(
    name="ranking_model", 
    metrics=metrics,
    model_schema=model_schema,
    input_example=input_example,
    description="Ranking model that scores item candidates",
)
ranking_model.save("ranking_model.pkl")

Uploading model files (0 dirs, 0 files):  17%|███████████████▏                                                                           | 1/6 [00:00<00:01,  3.06it/s]
Uploading: 0.000%|                                                                                                                | 0/1116190 elapsed<00:00 remaining<?[A
Uploading: 6.058%|██████▎                                                                                                 | 67614/1116190 elapsed<00:01 remaining<00:18[A
Uploading: 100.000%|████████████████████████████████████████████████████████████████████████████████████████████████████| 1116190/1116190 elapsed<00:02 remaining<00:00[A
Uploading input_example and model_schema:  33%|██████████████████████████████                                                            | 2/6 [00:02<00:06,  1.60s/it]
Uploading: 0.000%|                                                                                                                    | 0/167 elapsed<0

Model created, explore it at https://c.app.hopsworks.ai:443/p/17565/models/ranking_model/1





Model(name: 'ranking_model', version: 1)

---
## <span style="color:#ff5f27">⏩️ Next Steps </span>

Now you have trained both a retrieval and a ranking model, which will allow you to generate recommendations for users. In the next notebook, you'll take a look at how you can deploy these models with the `HSML` library.