This notebook will walk you through an example of setting up a model for the RentTheRunway dataset stored in a csv file and then fetching ranked movies for a specific user.

Let's get started! 🚀

## Setup

Replace `<YOUR_API_KEY>` with your API key below.

*If you don't have an API Key, feel free to [signup on our website](https://www.shaped.ai/#contact-us) :)*

In [4]:
import os

SHAPED_API_KEY = os.getenv('TEST_SHAPED_API_KEY', '<YOUR_API_KEY>')

1. Install `shaped` to leverage the Shaped CLI to create, view, and use your model.
2. Install `pandas` to view and edit the sample dataset.
3. Install `pyyaml` to create Shaped Dataset and Model schema files.

In [None]:
! pip install shaped
! pip install pandas
! pip install pyyaml

Initialize the CLI with your API key.

In [None]:
! shaped init --api-key $SHAPED_API_KEY

## Downlaod Public Dataset

Fetch the publicly hosted RentTheRunway dataset.

In [7]:
! echo "Downloading RentTheRunway data..."

DIR_NAME = "notebook_assets"
! wget https://mcauleylab.ucsd.edu/public_datasets/data/renttherunway/renttherunway_final_data.json.gz --no-check-certificate -P $DIR_NAME
! gunzip $DIR_NAME/renttherunway_final_data.json.gz

Downloading RentTheRunway data...
--2025-03-05 12:27:22--  https://mcauleylab.ucsd.edu/public_datasets/data/renttherunway/renttherunway_final_data.json.gz
Resolving mcauleylab.ucsd.edu (mcauleylab.ucsd.edu)... 169.228.63.88
Connecting to mcauleylab.ucsd.edu (mcauleylab.ucsd.edu)|169.228.63.88|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 30744190 (29M) [application/gzip]
Saving to: ‘notebook_assets/renttherunway_final_data.json.gz’


2025-03-05 12:27:22 (88.8 MB/s) - ‘notebook_assets/renttherunway_final_data.json.gz’ saved [30744190/30744190]



Let's take a look at the downloaded dataset. There is a JSON file called ``renttherunway_final_data.json`` which inlcudes information about events, users and items in one table. Let us convert the data in this JSON file into a .csv file.

Additionally let us preprocess the weight and height columns into numerical type. In the raw data, the rating column is available in 1-10 range. Let us preprocess such that any value > 8 is considered as 1 and <=8 as 0.

Finally lets rename some columns so that they are in a format supported by shaped platform (spaces are not allowed in the column names).

In [8]:
import pandas as pd
import json
import numpy as np
import re

json_file_path = "notebook_assets/renttherunway_final_data.json"
data = pd.read_json(json_file_path, lines=True)

# Function to convert weight string to numerical format.
def extract_weight(value):
    if pd.isna(value):
        return np.nan

    # Find numeric part using regular expression
    match = re.search(r'(\d+)', str(value))
    if match:
        return int(match.group(1))
    return np.nan

# Function to convert height string form feet and inches to inches in numerical format.
def convert_height(value):
    if pd.isna(value):
        return np.nan

    # Extract feet and inches using regex
    match = re.search(r'[-]?(\d+)\'\s*(\d+)"?', str(value))

    if match:
        feet = int(match.group(1))
        inches = int(match.group(2))

        # Calculate total inches
        total_inches = feet * 12 + inches
        return total_inches

    return np.nan


data['weight'] = data['weight'].apply(extract_weight)
data['height'] = data['height'].apply(convert_height)
data['rating'] = data['rating'].apply(lambda x: 1 if x>8 else 0)

data.rename(columns = {'rented for': 'rented_for', 'body type': 'body_type', 'bust size': 'bust_size'}, inplace = True)
data.to_csv('notebook_assets/events.csv', sep='\t', index=False)
display(data.head())

Unnamed: 0,fit,user_id,bust_size,item_id,weight,rating,rented_for,review_text,body_type,review_summary,category,height,size,age,review_date
0,fit,420272,34d,2260466,137.0,1,vacation,An adorable romper! Belt and zipper were a lit...,hourglass,So many compliments!,romper,68.0,14,28.0,"April 20, 2016"
1,fit,273551,34b,153475,132.0,1,other,I rented this dress for a photo shoot. The the...,straight & narrow,I felt so glamourous!!!,gown,66.0,12,36.0,"June 18, 2013"
2,fit,360448,,1063761,,1,party,This hugged in all the right places! It was a ...,,It was a great time to celebrate the (almost) ...,sheath,64.0,4,116.0,"December 14, 2015"
3,fit,909926,34c,126335,135.0,0,formal affair,I rented this for my company's black tie award...,pear,Dress arrived on time and in perfect condition.,dress,65.0,8,34.0,"February 12, 2014"
4,fit,151944,34b,616682,145.0,1,wedding,I have always been petite in my upper body and...,athletic,Was in love with this dress !!!,gown,69.0,12,27.0,"September 26, 2016"


## Upload Data to Shaped

Shaped has support for many data connectors! For this tutorial we're going to be using native Shaped Datasets. To do that we need to:
1. Create a .yaml file containing the dataset schema definition.
2. Use Shaped CLI to create the dataset.
3. Use Shaped CLI to upload the .csv files we just created.

In [9]:
"""
Create a Shaped Dataset schema for each of the datasets and store in a .yaml file.
"""

import yaml

dir_path = "notebook_assets"

events_dataset_schema = {
    "name": "rent_runway_events",
    "schema_type": "CUSTOM",
    "column_schema": {
        "fit": "String",
        "user_id": "String",
        "bust_size": "String",
        "item_id": "String",
        "weight": "Int32",
        "rating": "Int32",
        "rented_for": "String",
        "review_text": "String",
        "body_type": "String",
        "review_summary": "String",
        "category": "String",
        "height": "Int32",
        "size": "Int32",
        "age": "Int32",
        "review_date": "DateTime"
    }
}

with open(f'{dir_path}/events_dataset_schema.yaml', 'w') as file:
    yaml.dump(events_dataset_schema, file)

In [10]:
"""
Create a Shaped Dataset using the .yaml schema files.
"""
! shaped create-dataset --file $DIR_NAME/events_dataset_schema.yaml

{
  "column_schema": {
    "age": "Int32",
    "body_type": "String",
    "bust_size": "String",
    "category": "String",
    "fit": "String",
    "height": "Int32",
    "item_id": "String",
    "rating": "Int32",
    "rented_for": "String",
    "review_date": "DateTime",
    "review_summary": "String",
    "review_text": "String",
    "size": "Int32",
    "user_id": "String",
    "weight": "Int32"
  },
  "name": "rent_runway_events",
  "schema_type": "CUSTOM"
}
message: Dataset with name 'rent_runway_events' was successfully scheduled for creation



It takes a moment to provision the infrastructure required for the datasets. You can monitor them using the CLI commnad:

In [11]:
! shaped list-datasets

datasets:
- dataset_name: amazon_beauty_products
  dataset_uri: https://api.shaped.ai/v1/datasets/amazon_beauty_products
  created_at: 2024-07-16T02:40:32 UTC
  schema_type: CUSTOM
  status: ACTIVE
- dataset_name: amazon_games_items
  dataset_uri: https://api.shaped.ai/v1/datasets/amazon_games_items
  created_at: 2024-07-17T00:08:38 UTC
  schema_type: CUSTOM
  status: ACTIVE
- dataset_name: amazon_beauty_ratings
  dataset_uri: https://api.shaped.ai/v1/datasets/amazon_beauty_ratings
  created_at: 2024-10-17T14:56:58 UTC
  schema_type: CUSTOM
  status: ACTIVE
- dataset_name: amazon_games_ratings
  dataset_uri: https://api.shaped.ai/v1/datasets/amazon_games_ratings
  created_at: 2024-10-29T16:20:36 UTC
  schema_type: CUSTOM
  status: ACTIVE
- dataset_name: h_and_m_transactions
  dataset_uri: https://api.shaped.ai/v1/datasets/h_and_m_transactions
  created_at: 2024-11-18T03:37:01 UTC
  schema_type: CUSTOM
  status: ACTIVE
- dataset_name: h_and_m_articles
  dataset_uri: https://api.shaped.a

In [12]:
"""
Upload the .csv files. You'll see the records uploading in batches of 1000. To upload all 192,544 events, it will take a couple minutes.
"""

! shaped dataset-insert --dataset-name rent_runway_events --file notebook_assets/events.csv --type 'tsv'

192544 Records [03:18, 971.97 Records/s]


## Model Creation

We're now ready to create your Shaped model! To keep things simple, today, we're using the ratings records to build a collaborative filtering model. Shaped will use these ratings to determine which users like which items with the assumption that the user likes an item if the label is 1 and don't like it if the rating filed is 0.


1. Create a .yaml file containing the model schema definition.
2. Use Shaped CLI to create the model!

For further details about creating models please refer to the [Create Model](https://docs.shaped.ai/docs/api#tag/Model/operation/post_create_models_post) API reference.

In [13]:
"""
Create a Shaped Model schema and store in a .yaml file.
"""

import yaml

rent_runway_model_schema = {
    "model": {
        "name": "rent_runway_recommendations"
    },
    "connectors": [
        {
            "type": "Dataset",
            "id": "rent_runway_events",
            "name": "rent_runway_events"
        },
    ],
    "fetch": {
        "events": "SELECT user_id, item_id, review_date AS created_at, rating AS label, rented_for, review_text, review_summary FROM rent_runway_events",
        "users": "SELECT user_id, bust_size, weight, body_type, height, age FROM rent_runway_events",
        "items": "SELECT item_id, fit, category, size FROM rent_runway_events"
    }
}

with open(f'{dir_path}/rent_runway_model_schema.yaml', 'w') as file:
    yaml.dump(rent_runway_model_schema, file)

In [None]:
"""
Create a Shaped Model using the .yaml schema file.
"""

! shaped create-model --file $DIR_NAME/rent_runway_model_schema.yaml

Your recommendation model can take up to a few hours to provision your infrastructure and train on your historic events. This time mostly depends on how large your dataset is i.e. the volume of your users, items and interactions and the number of attributes you're providing.

While the model is being setup, you can view its status with either the [List Models](https://docs.shaped.ai/docs/api#tag/Model/operation/get_models_models_get) or [View Model](https://docs.shaped.ai/docs/api) endpoints. For example, with the CLI:

In [17]:
! shaped list-models

models:
- model_name: movielens_recommendations
  model_uri: https://api.shaped.ai/v1/models/movielens_recommendations
  created_at: 2024-07-16T17:12:06 UTC
  status: ACTIVE
- model_name: amazon_game_recommendations
  model_uri: https://api.shaped.ai/v1/models/amazon_game_recommendations
  created_at: 2024-07-17T00:23:15 UTC
  status: ACTIVE
- model_name: h_and_m_fashion_recommendations_session
  description: 3M raw events
  model_uri: https://api.shaped.ai/v1/models/h_and_m_fashion_recommendations_session
  created_at: 2025-01-21T20:38:45 UTC
  status: ACTIVE
- model_name: shaped_docs_search_model
  model_uri: https://api.shaped.ai/v1/models/shaped_docs_search_model
  created_at: 2025-02-06T20:44:45 UTC
  status: ACTIVE
- model_name: movielens_valuemodel
  model_uri: https://api.shaped.ai/v1/models/movielens_valuemodel
  created_at: 2025-03-04T18:35:24 UTC
  status: ACTIVE
- model_name: movielens_valuemodel_2
  model_uri: https://api.shaped.ai/v1/models/movielens_valuemodel_2
  create

The initial model creation goes through the following stages in order:

1. `SCHEDULING`<br/>
2. `FETCHING`<br/>
3. `TUNING`<br/>
4. `TRAINING`<br/>
5. `DEPLOYING`<br/>
6. `ACTIVE`

You can periodically poll Shaped to inspect these status changes. Once it's in the ACTIVE state, you can move to next step and use it to make rank requests.

## Rank

You're now ready to fetch your movie recommendations! You can do this with the [Rank endpoint](https://docs.shaped.ai/docs/api#tag/Rank/operation/post_rank_models__model_id__rank_post). Just provide the user_id you wish to get the recommendations for and the number of recommendations you want returned.

Shaped's CLI provides a convenience rank command to quickly retrieve results from the command line. You can use it as follows:

In [None]:
! shaped rank --model-name rent_runway_recommendations --user-id 1 --limit 5

The response returns 2 parallel arrays containing the ids and ranking scores for the movies that Shaped estimates are most interesting to the given user.

If you want to integrate this endpoint into your website or application you can use the Rank POST REST endpoint directly with the following request:

In [None]:
! curl https://api.prod.shaped.ai/v1/models/rent_runway_recommendations/rank \
  -H "x-api-key: <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{ "user_id": "1", "limit": 5 }'

Wow! It was that easy to see top 5 rated movies for the passed in `user_id` 🍾. Now let's add ranking to your product :)

## Clean Up

Don't forget to delete your model (and its assets) and the datasets once you're finished with them. You can do it with the following CLI command:

In [None]:
! shaped delete-model --model-name rent_runway_recommendations

! shaped delete-dataset --dataset-name rent_runway_events

! rm -r notebook_assets