This notebook will walk you through: 
1. importing an Amazon review dataset into a PostgreSQL database
2. setting up a model that has the objective of returning a personalized list of the recommended products on a home or discovery page
3. fetching ranked products for a specific user

*For a more in depth walkthrough of this notebook, checkout the [accompanying blog post](blog_post_link)

Let's get started! 🚀

### Setup

Replace `SHAPED_API_KEY` with your API key below.

*If you don't have an API Key, feel free to [signup on our website](https://www.shaped.ai/#contact-us) :)*

In [1]:
import os

SHAPED_API_KEY = os.getenv("SHAPED_KEY")

Install the packages needed:
- `requests` is needed for making HTTP requests
- `pandas` is needed for handling the data
- `ipython-sql` is needed for connecting with the database
- `sqlalchemy` is needed for executing db queries via DBApi's
- `psycopg2` is needed for postgresql connection

In [None]:
!pip install requests
!pip install pandas
!pip install ipython-sql
!pip install sqlalchemy
!pip install psycopg2



In [3]:
from urllib.request import urlretrieve
import requests
import pandas as pd
from IPython.display import display
import json
from datetime import datetime
import gzip

### Preview Dataset

[The Amazon dataset](http://jmcauley.ucsd.edu/data/amazon/links.html) has a lot of data! Looking through it, we want the interaction and item data so we'll be using the [ratings only](http://snap.stanford.edu/data/amazon/productGraph/item_dedup.csv) and [metadata](http://snap.stanford.edu/data/amazon/productGraph/metadata.json.gz) datasets respectively.
Let's use the script provided by the website and take a look at the relevent columns:

In [None]:
def parse(path):
  g = gzip.open(path, 'rb')
  for l in g:
    yield eval(l)

def getDF(path):
  i = 0
  df = {}
  for d in parse(path):
    df[i] = d
    i += 1
  return pd.DataFrame.from_dict(df, orient='index')

ratings_only = "ratings_only.json.gz"
metadata = "metadata.json.gz"

df = getDF(ratings_only)

- `reviewer_id` which stores the user who is reviewing the item.
- `asin` is a unique identification for a product. It will be used as an item to train our models.
- `overall` which stores the reviews given by a user.

In [6]:
review_df = pd.read_sql('select * from amazon_ratings limit 5', engine)
review_df

Unnamed: 0,reviewer_id,asin,reviewer_name,helpful,review_text,overall,summary,unit_review_time,review_time
0,AYPYL1DQOK9TM,B002RUG7K8,"Randall Benoit ""rjb""","[0, 0]",i need a cheep amp to power some pioniers as i...,5.0,hey this 1 contradicts itself but in ACTUAL PO...,1373328000,2013-07-09
1,AYPYL1DQOK9TM,B002WC7RBO,"Randall Benoit ""rjb""","[0, 2]",ive got 4 12's in two different bandpass boxs ...,5.0,4 12's,1375920000,2013-08-08
2,AYPYL1DQOK9TM,B004AHQP6W,"Randall Benoit ""rjb""","[0, 0]",in a small box comes many packages of true cop...,5.0,i will most defenatly buy again awsome deal,1394668800,2014-03-13
3,AYPYL1DQOK9TM,B004AJ4Z62,"Randall Benoit ""rjb""","[0, 0]",twintake I love it looks sexy sound awesome a ...,5.0,tight,1394150400,2014-03-07
4,AYPYM7ITWJ3SF,B001KFUR9I,blackcamel,"[0, 0]",Quality is fine as far as the workmanship and ...,3.0,ah so so,1384819200,2013-11-19


### Setup Endpoint

Once we have all our data prepared, we can upload it using a [`POST` call to the `/models` endpoint](https://docs.shaped.ai/reference/create-model). The body of the request contains all the info needed to setup the model.

*If you try `POST`ing to the `/models` endpoint multiple times with the same `model_name`, you will encounter an error saying `"Model with name: '{model_name}' already exists with status: '{status}'"`. If you would like to update or create a new model with the same `model_name` you must first delete the existing model with `model_name`. You can do that by making a [`DELETE` request to the `/models/{model_name}` endpoint](https://docs.shaped.ai/reference/delete-model). The `DELETE` call can be made from the cell in the Clean Up section at the bottom of this notebook.*

In [7]:
model_name = "amazon_dataset_postgres"

url = "https://api.prod.shaped.ai/v0/models/"

payload = json.dumps({
  "connector_configs": [{
    "id": "postgres",
    "type": "Postgres",
    "user": "postgres",
    "password": "FSUIH6x14hqRM5lDQE0v",
    "host": "amazon-ratings.clb5z5lddhvn.us-east-2.rds.amazonaws.com",
    "port": 5432,
    "database": "amazon"
  }],
  "model_name": model_name,
  "schema": {
    "user": {
      "created_at": "event_timestamp",
      "id": "reviewer_id",
      "source": {
        "connector_id": "postgres",
        "query": "select reviewer_id, reviewer_name, review_time::timestamp as event_timestamp from amazon_ratings limit 1000000"
      }
    },
    "item": {
      "created_at": "event_timestamp",
      "id": "asin",
      "source": {
        "connector_id": "postgres",
        "query": "select asin, helpful, review_text, overall::float, summary, review_time::timestamp as event_timestamp from amazon_ratings limit 1000000"
      }
    },
    "interaction": {
      "created_at": "event_timestamp",
      "label": {
        "name": "overall",
        "type": "Rating"
      },
      "source": {
        "connector_id": "postgres",
        "query": "select reviewer_id, asin, overall::float, review_time::timestamp as event_timestamp from amazon_ratings limit 1000000"
      }
    }
  }
})

headers = {
  'x-api-key': SHAPED_API_KEY,
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(json.dumps(json.loads(response.content), indent=2))
print(response.status_code)
assert response.status_code==200

{}
200


The `response` from the `POST` call to the `/models` endpoint return a json object on successful execution with a status code of 200. In case of error, status code could be 404 or 409 with a proper error message.

### Rank!

After we make the `POST` call to `/models`, we can make a [`GET` call to `/models`](https://docs.shaped.ai/reference/list-models) to see our newly created model. 

In [8]:
response = requests.get(
    f"https://api.prod.shaped.ai/v0/models",
    headers={
        "x-api-key": SHAPED_API_KEY,
        "Content-Type":"application/json"
    }
)
print(json.dumps(json.loads(response.content), indent=2))

{
  "models": [
    {
      "created_at": "2022-09-12T16:57:06 UTC",
      "input_schema": {
        "interaction": {
          "created_at": "event_timestamp",
          "label": {
            "name": "overall",
            "type": "Rating"
          },
          "source": {
            "connector_id": "postgres",
            "query": "select reviewer_id, asin, overall::float, review_time::timestamp as event_timestamp from amazon_ratings limit 1000000"
          }
        },
        "item": {
          "created_at": "event_timestamp",
          "id": "asin",
          "source": {
            "connector_id": "postgres",
            "query": "select asin, helpful, review_text, overall::float, summary, review_time::timestamp as event_timestamp from amazon_ratings limit 1000000"
          }
        },
        "user": {
          "created_at": "event_timestamp",
          "id": "reviewer_id",
          "source": {
            "connector_id": "postgres",
            "query": "select reviewe

You'll notice the `"status"` of the model you just created is most likely `"PREPARING"`. This means that the initial training job hasn't completed yet. The amount of time it takes will be dependent on the amount of data. Feel free to keep querying the `/models` endpoint to check the status of your model. When it is ready, the `"status"` will read `"ACTIVE"`.

Once your model is ready (`"status": "ACTIVE"`), you can hit the [rank endpoint](https://docs.shaped.ai/reference/rank)!

Remember, `user_id` is the id of the User you want to fetch rankings for. You can also add an optional query param, `limit`, which will inform how many results to return (with the default being 5).

In [9]:
response = requests.get(
    f"https://api.prod.shaped.ai/v0/models/{model_name}/rank?user_id=1",
    headers={
        "x-api-key": SHAPED_API_KEY,
        "Content-Type":"application/json"
    }
)
print(json.dumps(json.loads(response.content), indent=2))

{
  "error": "Model with name 'amazon_dataset_postgres' is not in 'ACTIVE' state, but it has the 'PREPARING' status"
}


Wow! It was that easy to see top 5 rated items for the passed in `user_id` 🍾. Now let's add ranking to your product :)

### Clean Up

__The below code should ONLY be run if you want to delete the model with `model_name`.__

In [10]:
response = requests.delete(
    f"https://api.prod.shaped.ai/v0/models/{model_name}",
    headers={
        "x-api-key": SHAPED_API_KEY,
        "Content-Type":"application/json"
    }
)
print(json.dumps(json.loads(response.content), indent=2))

{
  "message": "Model with name 'amazon_dataset_postgres' was successfully deleted"
}
