## <span style="color:#ff5f27">👨🏻‍🏫 Create Ranking Dataset </span>

In this notebook, we'll create a dataset for our ranking model. Since our dataset only consists of positive user-item interactions (transactions) we need to do negative sampling. (Otherwise our model might just recommend all items to all users.)

## <span style="color:#ff5f27">📝 Imports </span>

In [None]:
# Hosted notebook environments may not have the local features package
import os

def need_download_modules():
    if 'google.colab' in str(get_ipython()):
        return True
    if 'HOPSWORKS_PROJECT_ID' in os.environ:
        return True
    return False

if need_download_modules():
    print("⚙️ Downloading modules...")
    os.system('mkdir -p functions')
    os.system('cd functions && wget https://raw.githubusercontent.com/logicalclocks/hopsworks-tutorials/master/advanced_tutorials/recommender-system/functions/ranking_dataset.py')
    print('✅ Done!')
else:
    print("Local environment")

In [None]:
try:
    from functions.ranking_dataset import get_ranking_dataset
except ImportError:
    print("⚙️ Downloading modules...")
    os.system('mkdir -p functions')
    os.system('cd functions && wget https://raw.githubusercontent.com/logicalclocks/hopsworks-tutorials/master/advanced_tutorials/recommender-system/functions/ranking_dataset.py')
    print('✅ Done!')
    from functions.ranking_dataset import get_ranking_dataset

In [None]:
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

## <span style="color:#ff5f27">🔮 Connect to Hopsworks Feature Store </span>

In [None]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

## <span style="color:#ff5f27">🪝 Retrieve Feature View and Training Dataset</span>


In [None]:
feature_view_retrieval = fs.get_feature_view(
    name="retrieval", 
    version=1,
)
feature_view_articles = fs.get_feature_view(
    name='articles',
    version=1,
)

In [None]:
train_df, val_df, test_df, y_train, y_val, y_test = feature_view_retrieval.train_validation_test_split(
    validation_size=0.1, 
    test_size=0.1,
    description='Retrieval dataset splits',
)

In [None]:
train_df["article_id"] = train_df["article_id"].astype(str) # to be deleted
val_df["article_id"] = val_df["article_id"].astype(str)
test_df["article_id"] = test_df["article_id"].astype(str)

In [None]:
ranking_train = get_ranking_dataset(
    train_df, 
    'train', 
    feature_view_articles,
)
ranking_validation = get_ranking_dataset(
    val_df, 
    'validation', 
    feature_view_articles,
)
ranking_train.head(3)

## <span style="color:#ff5f27">🪄 Ranking Feature Group Creation </span>


In [None]:
trans_fg = fs.get_feature_group(
    name="transactions",
    version=1,
)
customers_fg = fs.get_feature_group(
    name="customers",
    version=1,
)
articles_fg = fs.get_feature_group(
    name="articles",
    version=1,
)

In [None]:
ranking_train_fg = fs.get_or_create_feature_group(
    name="ranking_train",
    description="Training Ranking Data",
    version=1,
    primary_key=["index"],
    parents=[articles_fg, customers_fg, trans_fg],
)
ranking_train_fg.insert(ranking_train.reset_index())

In [None]:
ranking_val_fg = fs.get_or_create_feature_group(
    name="ranking_val",
    description="Validation Ranking Data",
    version=1,
    primary_key=["index"],
    parents=[articles_fg, customers_fg, trans_fg],
)
ranking_val_fg.insert(ranking_validation.reset_index())

## <span style="color:#ff5f27">⚙️ Feature View Creation </span>

In [None]:
query_train = ranking_train_fg.select_except('index')

ranking_train_fv = fs.get_or_create_feature_view(
    name='ranking_train',
    version=1,
    query=query_train,
)

In [None]:
query_val = ranking_val_fg.select_except('index')

ranking_val_fv = fs.get_or_create_feature_view(
    name='ranking_val',
    version=1,
    query=query_val,
)

---
## <span style="color:#ff5f27">⏩️ Next Steps </span>

In the next notebook, we'll train a ranking model on the dataset we created in this notebook.