This notebook will walk you through an example of:
1. \*Downloading the Amazon Beauty dataset into jsonl files and storing it in a Postgres instance.
2. Setting up a model for the Amazon Beauty dataset from the Postgres instance.
3. Fetching the products that are most likely to be purcahsed for a specific user.

\*This step has nothing to do with Shaped and is all about prepping the test dataset. If you already have your data in a Postgres instance you can skip to step 2.

Let's get started! 🚀

## Preparing Data
### Setup
1. Install `pandas` to view and edit the sample dataset.
2. Install `sqlalchemy` and `psycopg2` for connecting to your Postgres instance and uploading data to it.

In [29]:
! pip install pandas
! pip install sqlalchemy
! pip install psycopg2



### Download Public Dataset

Fetch the publicly hosted Amazon Beauty dataset.

In [30]:
! echo "Downloading Amazon Beauty data..."

DIR_NAME = "amazon_beauty_model_assets"
! mkdir $DIR_NAME

# Beauty product ratings.
! wget https://jmcauley.ucsd.edu/data/amazon_v2/categoryFiles/All_Beauty.json.gz --no-check-certificate -P $DIR_NAME
! gzip -d $DIR_NAME/All_Beauty.json.gz

# Bueaty product metadata.
! wget https://jmcauley.ucsd.edu/data/amazon_v2/metaFiles2/meta_All_Beauty.json.gz --no-check-certificate -P $DIR_NAME
! gzip -d $DIR_NAME/meta_All_Beauty.json.gz

Downloading Amazon Beauty data...
--2023-04-17 20:58:05--  https://jmcauley.ucsd.edu/data/amazon_v2/categoryFiles/All_Beauty.json.gz
Resolving jmcauley.ucsd.edu (jmcauley.ucsd.edu)... 137.110.160.73
Connecting to jmcauley.ucsd.edu (jmcauley.ucsd.edu)|137.110.160.73|:443... connected.
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 200 OK
Length: 47350910 (45M) [application/x-gzip]
Saving to: ‘amazon_beauty_model_assets/All_Beauty.json.gz’


2023-04-17 20:58:07 (27.7 MB/s) - ‘amazon_beauty_model_assets/All_Beauty.json.gz’ saved [47350910/47350910]

--2023-04-17 20:58:09--  https://jmcauley.ucsd.edu/data/amazon_v2/metaFiles2/meta_All_Beauty.json.gz
Resolving jmcauley.ucsd.edu (jmcauley.ucsd.edu)... 137.110.160.73
Connecting to jmcauley.ucsd.edu (jmcauley.ucsd.edu)|137.110.160.73|:443... connected.
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 200 OK
Length: 10329961 (9.9M) [application/x-gzip]
Saving 

Let's take a look at the downloaded dataset. There are two tables of interest:
- `reviews` which are stored in `All_Beauty.json`
- `products` which are stored in `meta_All_Beauty.json`

Note that the data is actually stored in jsonl files so we need to use `lines=True` when reading the data.

In [31]:
import pandas as pd

data_dir = "amazon_beauty_model_assets"

events_df = pd.read_json(f'{data_dir}/All_Beauty.json', lines=True)
display(events_df.head())

products_df = pd.read_json(f'{data_dir}/meta_All_Beauty.json', lines=True)
display(products_df.head())

Unnamed: 0,overall,verified,reviewTime,reviewerID,asin,reviewerName,reviewText,summary,unixReviewTime,vote,style,image
0,1,True,"02 19, 2015",A1V6B6TNIC10QE,143026860,theodore j bigham,great,One Star,1424304000,,,
1,4,True,"12 18, 2014",A2F5GHSXFQ0W6J,143026860,Mary K. Byke,My husband wanted to reading about the Negro ...,... to reading about the Negro Baseball and th...,1418860800,,,
2,4,True,"08 10, 2014",A1572GUYS7DGSR,143026860,David G,"This book was very informative, covering all a...",Worth the Read,1407628800,,,
3,5,True,"03 11, 2013",A1PSGLFK1NSVO,143026860,TamB,I am already a baseball fan and knew a bit abo...,Good Read,1362960000,,,
4,5,True,"12 25, 2011",A6IKXKZMTKGSC,143026860,shoecanary,This was a good story of the Black leagues. I ...,"More than facts, a good story read!",1324771200,5.0,,


Unnamed: 0,category,tech1,description,fit,title,also_buy,tech2,brand,feature,rank,also_view,details,main_cat,similar_item,date,price,asin,imageURL,imageURLHighRes
0,[],,[Loud 'N Clear Personal Sound Amplifier allows...,,Loud 'N Clear&trade; Personal Sound Amplifier,[],,idea village,[],"2,938,573 in Beauty & Personal Care (",[],{'ASIN: ': '6546546450'},All Beauty,,,,6546546450,[],[]
1,[],,[No7 Lift & Luminate Triple Action Serum 50ml ...,,No7 Lift &amp; Luminate Triple Action Serum 50...,"[B01E7LCSL6, B008X5RVME]",,,[],"872,854 in Beauty & Personal Care (",[],"{'Shipping Weight:': '0.3 ounces (', 'ASIN: ':...",All Beauty,"class=""a-bordered a-horizontal-stripes a-spa...",,$44.99,7178680776,[],[]
2,[],,[No7 Stay Perfect Foundation now stays perfect...,,No7 Stay Perfect Foundation Cool Vanilla by No7,[],,No7,[],"956,696 in Beauty & Personal Care (","[B01B8BR0O8, B01B8BR0NO, B014MHXXM8]","{'Shipping Weight:': '3.5 ounces (', 'ASIN: ':...",All Beauty,,,$28.76,7250468162,[],[]
3,[],,[],,Wella Koleston Perfect Hair Colour 44/44 Mediu...,[B0041PBXX8],,,[],"1,870,258 in Beauty & Personal Care (",[],"{'  Item Weight: ': '1.76 ounces', 'Sh...",All Beauty,,,,7367905066,[https://images-na.ssl-images-amazon.com/image...,[https://images-na.ssl-images-amazon.com/image...
4,[],,[Lacto Calamine Skin Balance Daily Nourishing ...,,Lacto Calamine Skin Balance Oil control 120 ml...,[],,Pirmal Healthcare,[],"67,701 in Beauty & Personal Care (","[3254895630, B007VL1D9S, B00EH9A0RI, B0773MBG4...","{'Shipping Weight:': '12 ounces (', 'ASIN: ': ...",All Beauty,,,$12.15,7414204790,[https://images-na.ssl-images-amazon.com/image...,[https://images-na.ssl-images-amazon.com/image...


As we can see, there is a lot of data! You'll notice that it's very noisy. The prices have to be treated as strings because of the '$' sign, and many of the fields contain empty strings or malformed HTML that was misparsed during the crawling process. In most cases, you'd have to spend time cleaning all this data, however, with Shaped you can feed it through in this state and Shaped will do the cleaning for you. The way we do this is by treating all input data as unstructured, and using large language models to distill the meaning of each column.

Shaped doesn't require much data to work. At a minimum we need to know the `user_id`, `item_id`, `label`, and `created_at` columns of the interactions table. If the `users` and `items` tables are provided then the only requirement is their respective id columns are aliased to `user_id` and `item_id`.

For brevity, we'll only use a subset of the columns. You'll notice we include a couple of feature columns for the `items` (`title` and `price`).

For the interaction data we take a look at `events_df` and see the relevant columns for the interactions are:
- `reviewerID`: Is the user who is reviewing the item.
- `asin`: Is a unique identification for a product. It will be used as an item to train our models.
- `overall`: Is the review of a product given by a user.
- `unixReviewTime`: Is the time the review was given.

For the item data we take a look at `products_df` and see the relevant columns for the items are:
- `asin`: Is a unique identification for a product.
- `title`: Is the name of the product.
- `price`: Is the price in US dollars.

In [32]:
events_df = events_df[["reviewerID","asin","overall","unixReviewTime"]]
products_df = products_df[["asin", "title", "price"]]

display(events_df.head())
display(products_df.head())

Unnamed: 0,reviewerID,asin,overall,unixReviewTime
0,A1V6B6TNIC10QE,143026860,1,1424304000
1,A2F5GHSXFQ0W6J,143026860,4,1418860800
2,A1572GUYS7DGSR,143026860,4,1407628800
3,A1PSGLFK1NSVO,143026860,5,1362960000
4,A6IKXKZMTKGSC,143026860,5,1324771200


Unnamed: 0,asin,title,price
0,6546546450,Loud 'N Clear&trade; Personal Sound Amplifier,
1,7178680776,No7 Lift &amp; Luminate Triple Action Serum 50...,$44.99
2,7250468162,No7 Stay Perfect Foundation Cool Vanilla by No7,$28.76
3,7367905066,Wella Koleston Perfect Hair Colour 44/44 Mediu...,
4,7414204790,Lacto Calamine Skin Balance Oil control 120 ml...,$12.15


### Upload Data to Postgres

Shaped has support for many data connectors! For this tutorial we're going to be using Postgres. To do that we need to:
1. Connect to your Postgres instance. (We're assuming you've already set one up. We did it with [AWS RDS](https://aws.amazon.com/rds/)).
2. Create the tables and upload data to your Postgres instance.

In [33]:
HOST = "amazon-beauty-reviews.clb5z5lddhvn.us-east-2.rds.amazonaws.com"
PORT = "5432"
DATABASE = "postgres"
USER = "shaped"
PASSWORD = "Ht7%$7Lfucew"

from sqlalchemy import create_engine, text

engine = create_engine(f"postgresql://{USER}:{PASSWORD}@{HOST}:{PORT}/{DATABASE}")

events_df.to_sql(name="reviews", con=engine, index=False, if_exists="replace")
products_df.to_sql(name="products", con=engine, index=False, if_exists="replace")

# Check that the data was uploaded correctly.
with engine.connect() as conn:
    display(pd.read_sql_query(text("""SELECT * FROM products LIMIT 5"""), conn))
    display(pd.read_sql_query(text("""SELECT * FROM reviews LIMIT 5"""), conn))


Unnamed: 0,asin,title,price
0,6546546450,Loud 'N Clear&trade; Personal Sound Amplifier,
1,7178680776,No7 Lift &amp; Luminate Triple Action Serum 50...,$44.99
2,7250468162,No7 Stay Perfect Foundation Cool Vanilla by No7,$28.76
3,7367905066,Wella Koleston Perfect Hair Colour 44/44 Mediu...,
4,7414204790,Lacto Calamine Skin Balance Oil control 120 ml...,$12.15


Unnamed: 0,reviewerID,asin,overall,unixReviewTime
0,A1V6B6TNIC10QE,143026860,1,1424304000
1,A2F5GHSXFQ0W6J,143026860,4,1418860800
2,A1572GUYS7DGSR,143026860,4,1407628800
3,A1PSGLFK1NSVO,143026860,5,1362960000
4,A6IKXKZMTKGSC,143026860,5,1324771200


## Using Shaped
### Setup

1. Install `shaped` to leverage the Shaped CLI to create, view, and use your model.
2. Install `pyyaml` to create Model schema files.

In [34]:
! pip install shaped
! pip install pyyaml



Replace `<YOUR_API_KEY>` with your API key below.

*If you don't have an API Key, feel free to [signup on our website](https://www.shaped.ai/#contact-us) :)*

In [35]:
SHAPED_API_KEY = "<YOUR_API_KEY>"

Initialize the Shaped CLI with your API key.

In [None]:
! shaped init --api-key $SHAPED_API_KEY

### Model Creation

We're now ready to create your Shaped model! To keep things simple, today, we're using the beauty product reviews data to build a collaborative filtering model. Shaped will use these reviews to determine which users like which beauty products with the assumption that the higher the rating the more likely a user will want to purchase that product.


1. Create a .yaml file containing the model schema definition.
2. Use Shaped CLI to create the model!

For further details about creating models please refer to the [Create Model](https://docs.shaped.ai/docs/api#tag/Model/operation/post_create_models_post) API reference.

In [37]:
"""
Create a Shaped Model schema and store in a .yaml file.
"""

import yaml

movielens_ratings_model_schema = {
    "model": {
        "name": "amazon_beauty_product_recommendations",
    },
    "connectors": [
        {
            "id": "postgres_amazon",
            "type": "Postgres",
            "user": USER,
            "password": PASSWORD,
            "host": HOST,
            "port": PORT,
            "database": DATABASE
        },
    ],
    "fetch": {
        "events": "SELECT 'reviewerID' as user_id, asin AS item_id, overall::float AS label, 'unixReviewTime' AS created_at FROM postgres_amazon.reviews",
        "items": "SELECT asin AS item_id, title, price FROM postgres_amazon.products"
    }
}

dir_path = "amazon_beauty_model_assets"

with open(f'{dir_path}/amazon_beauty_model_schema.yaml', 'w') as file:
    yaml.dump(movielens_ratings_model_schema, file)

In [None]:
"""
Create a Shaped Model using the .yaml schema file.
"""

! shaped create-model --file $DIR_NAME/amazon_beauty_model_schema.yaml

Your recommendation model can take up to a few hours to provision your infrastructure and train on your historic events. This time mostly depends on how large your dataset is i.e. the volume of your users, items and interactions and the number of attributes you're providing. For the model you just created it will take no more than 30 minutes.

While the model is being setup, you can view its status with either the [List Models](https://docs.shaped.ai/docs/api#tag/Model/operation/get_models_models_get) or [View Model](https://docs.shaped.ai/docs/api) endpoints. For example, with the CLI:

In [1]:
! shaped list-models

models:
- model_name: amazon_beauty_product_recommendations
  model_uri: https://api.prod.shaped.ai/v1/models/amazon_beauty_product_recommendations
  created_at: 2023-04-17T21:02:34 UTC
  trained_at: 2023-04-17T21:39:46 UTC
  status: ACTIVE



The initial model creation goes through the following stages in order:

1. `SCHEDULING`<br/>
2. `FETCHING`<br/>
3. `TRAINING`<br/>
4. `DEPLOYING`<br/>
5. `ACTIVE`

You can periodically poll Shaped to inspect these status changes. Once it's in the ACTIVE state, you can move to next step and use it to make rank requests.

### Rank!

You're now ready to fetch your Amazon beautfy product recommendations! You can do this with the [Rank endpoint](https://docs.shaped.ai/docs/api#tag/Rank/operation/post_rank_models__model_id__rank_post). Just provide the user_id you wish to get the recommendations for and the number of recommendations you want returned.

Shaped's CLI provides a convenience rank command to quickly retrieve results from the command line. You can use it as follows:

In [5]:
! shaped rank --model-name amazon_beauty_product_recommendations --user-id A2F5GHSXFQ0W6J --limit 15

ids:
- B00XGPKC3G
- B00WMP9V5G
- B00A0Q5XQU
- B0107T06KC
- B01GB0WIB6
- B00WTBQRLO
- B00ODEJN66
- B00MJCHR30
- B00SRODEA4
- B01DVG8XHQ
- B00SABI85A
- B00PK1T1WW
- B00QW1W1CQ
- B01ACE0J3Q
- B01CD2K86I
scores:
- 1.0
- 1.0
- 1.0
- 1.0
- 0.0
- 1.0
- 1.0
- 1.0
- 1.0
- 0.0
- 1.0
- 1.0
- 1.0
- 1.0
- 1.0



The response returns 2 parallel arrays containing the ids and ranking scores for the beauty products that Shaped estimates are most relevant to the given user.

If you want to integrate this endpoint into your website or application you can use the Rank POST REST endpoint directly with the following request:

In [None]:
! curl https://api.prod.shaped.ai/v1/models/amazon_beauty_product_recommendations/rank \
  -H "x-api-key: <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{ "user_id": "A2F5GHSXFQ0W6J", "limit": 15 }'

Wow! It was that easy to see the top 15 rated beauty products for the passed in `user_id` 🍾. Now let's add ranking to your product :)

### Clean Up

Don't forget to delete your model (and its assets) and the datasets once you're finished with them. You can do it with the following CLI command:

In [None]:
! shaped delete-model --model-name amazon_beauty_product_recommendations

! rm -r amazon_beauty_model_assets