## Create Retrieval Dataset

In this notebook, we'll create a dataset for our retrieval model.

In [1]:
# Uncomment this cell and fill in details if you are running external Python
import os
key=""
with open("api-key.txt", "r") as f:
    key = f.read().rstrip()
os.environ['HOPSWORKS_PROJECT']="hm"
os.environ['HOPSWORKS_HOST']="35.240.81.237"
os.environ['HOPSWORKS_API_KEY']=key

In [2]:
import hopsworks
project = hopsworks.login()
fs = project.get_feature_store()

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://35.240.81.237:443/p/119
Connected. Call `.close()` to terminate connection gracefully.


### Feature Selection

First, we'll load the feature groups we created in the previous tutorial.

In [3]:
trans_fg = fs.get_feature_group("transactions",version=1)
customers_fg = fs.get_feature_group("customers",version=1)
articles_fg = fs.get_feature_group("articles",version=1)

We'll need to join these three data sources to make the data compatible with out retrieval model. Recall that each row in the `transactions` feature group relates information about which customer bought which item. We'll join this feature group with the `customers` and `articles` feature groups to inject customer and item features into each row.

In [4]:
query = trans_fg.select(["customer_id", "article_id", "month_sin", "month_cos"])\
    .join(customers_fg.select(["age"]), on="customer_id")\
    .join(articles_fg.select(["garment_group_name", "index_group_name"]), on="article_id")

### Feature View Creation
In Hopsworks, you write features to feature groups (where the features are stored) and you read features from feature views. A feature view is a logical view over features, stored in feature groups, and a feature view typically contains the features used by a specific model. This way, feature views enable features, stored in different feature groups, to be reused across many different models.

In [5]:
feature_view = fs.create_feature_view(
    name='retrieval',
    query=query
)

Feature view created successfully, explore it at 
https://35.240.81.237:443/p/119/fs/67/fv/retrieval/version/1


To view and explore data in the feature view we can retrieve batch data using the `get_batch_data()` method.

### Training Dataset Creation

Finally, we can create our dataset.

In [9]:
feature_view = fs.get_feature_view("retrieval", version=1)

td_version, td_job = feature_view.create_train_validation_test_splits(
    description = 'retrieval_dataset_split',
    data_format = 'csv',
    val_size = 0.1,
    test_size = 0.1,
    write_options = {'wait_for_job': True},
    coalesce = True,
)

Training dataset job started successfully, you can follow the progress at 
https://35.240.81.237/p/119/jobs/named/retrieval_1_2_create_fv_td_28062022103918/executions


FeatureStoreException: The Hopsworks Job failed, use the Hopsworks UI to access the job logs

### Next Steps

In the next notebook, we'll train a model on the dataset we created in this notebook.