In [1]:
import hopsworks

project = hopsworks.login()  # insert API Key from https://app.hopsworks.ai

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://hopsworks0.logicalclocks.com/p/119


## Create Retrieval Dataset

In this notebook, we'll create a dataset for our retrieval model.

In [2]:
fs = project.get_feature_store()

Connected. Call `.close()` to terminate connection gracefully.


### Feature Selection

First, we'll load the feature groups we created in the previous tutorial.

In [3]:
trans_fg = fs.get_feature_group("transactions",version=1)
customers_fg = fs.get_feature_group("customers",version=1)
articles_fg = fs.get_feature_group("articles",version=1)

We'll need to join these three data sources to make the data compatible with out retrieval model. Recall that each row in the `transactions` feature group relates information about which customer bought which item. We'll join this feature group with the `customers` and `articles` feature groups to inject customer and item features into each row.

In [4]:
query = trans_fg.select(["customer_id", "article_id", "t_dat", "month_sin", "month_cos"])\
    .join(customers_fg.select(["age"]), on="customer_id")\
    .join(articles_fg.select(["garment_group_name", "index_group_name"]), on="article_id")

### Feature View Creation
In Hopsworks, you write features to feature groups (where the features are stored) and you read features from feature views. A feature view is a logical view over features, stored in feature groups, and a feature view typically contains the features used by a specific model. This way, feature views enable features, stored in different feature groups, to be reused across many different models.

In [5]:
# explore available transformation functions

print("Transformation functions available:")
for tr_fn in fs.get_transformation_functions():
    print("- " + tr_fn.name + " - version: " + str(tr_fn.version))

Transformation functions available:
- min_max_scaler - version: 1
- standard_scaler - version: 1
- label_encoder - version: 1
- month_sin - version: 1
- robust_scaler - version: 1
- month_cos - version: 1


In [6]:
month_to_sin = fs.get_transformation_function(name="month_sin", version=1)
month_to_cos = fs.get_transformation_function(name="month_cos", version=1)

feature_view = fs.create_feature_view(
    name='retrieval',
    query=query,
    transformation_functions={
        "month_sin": month_to_sin,
        "month_cos": month_to_cos,
    }
)

Feature view created successfully, explore it at 
https://hopsworks0.logicalclocks.com/p/119/fs/67/fv/retrieval/version/1


To view and explore data in the feature view we can retrieve batch data using the `get_batch_data()` method.

### Training Dataset Creation

Finally, we can create our dataset.

In [7]:
feature_view = fs.get_feature_view("retrieval", version=1)

td_version, td_job = feature_view.create_train_validation_test_split(
    validation_size = 0.1, 
    test_size = 0.1,
    description = 'Retrieval dataset splits',
    data_format = 'csv',
    write_options = {'wait_for_job': True},
    coalesce = True
)

Training dataset job started successfully, you can follow the progress at 
https://hopsworks0.logicalclocks.com/p/119/jobs/named/retrieval_1_create_fv_td_10072023185611/executions




### Next Steps

In the next notebook, we'll train a model on the dataset we created in this notebook.