## Create Retrieval Dataset

In this notebook, we'll create a dataset for our retrieval model.

In [None]:
import hsfs

conn = hsfs.connection()
fs = conn.get_feature_store()

### Feature Selection

First, we'll load the feature groups we created in the previous tutorial.

In [None]:
trans_fg = fs.get_feature_group("transactions")
customers_fg = fs.get_feature_group("customers")
articles_fg = fs.get_feature_group("articles")

We'll need to join these three data sources to make the data compatible with out retrieval model. Recall that each row in the `transactions` feature group relates information about which customer bought which item. We'll join this feature group with the `customers` and `articles` feature groups to inject customer and item features into each row.

In [None]:
query = trans_fg.select(["customer_id", "article_id", "month_sin", "month_cos"])\
    .join(customers_fg.select(["age"]), on="customer_id")\
    .join(articles_fg.select(["garment_group_name", "index_group_name"]), on="article_id")

### Feature View Creation
In Hopsworks, you write features to feature groups (where the features are stored) and you read features from feature views. A feature view is a logical view over features, stored in feature groups, and a feature view typically contains the features used by a specific model. This way, feature views enable features, stored in different feature groups, to be reused across many different models.

In [None]:
feature_view = fs.create_feature_view(
    name='retrieval_fv',
    query=query
)

To view and explore data in the feature view we can retrieve batch data using the `get_batch_data()` method.

In [None]:
feature_view.get_batch_data().head(5)

### Training Dataset Creation

Finally, we can create our dataset.

In [None]:
# TODO we will use a chronological split instead.

td = feature_view.create_training_dataset(
    description = 'retrieval_dataset_splitted',
    data_format = 'csv',
    splits = {'train': 80, 'validation': 20},
    train_split = "train"
)

### Next Steps

In the next notebook, we'll train a model on the dataset we created in this notebook.