This notebook will walk you through an example of setting up a model for the delicious beverages contained within the RateBeer dataset, fetching ranked beers for a specific user, and finding similar beers.

Let's get started! 🚀

### Setup

Replace `{YOUR_API_KEY}` with your API key below.

*If you don't have an API Key, feel free to [signup on our website](https://www.shaped.ai/#contact-us) :)*

In [None]:
SHAPED_BASE_URL = "https://api.prod.shaped.ai/v0"
SHAPED_API_KEY = "{YOUR_API_KEY}"

Install the packages needed:
- `requests` is needed for making HTTP requests
- `pandas` is needed for handling the data
- `boto3` is needed for making calls to AWS, specifically s3
- `tqdm` is a great library for providing progress bars for long-running commands, in this case, file processing

In [None]:
!pip install requests
!pip install pandas
!pip install boto3
!pip install tqdm

Import the modules needed.

In [None]:
import json
import pandas as pd
import requests
import gzip
import ast
from datetime import datetime
from IPython.display import display
from tqdm import tqdm
from urllib.request import urlretrieve

### Download and Normalize Public Dataset

Fetch the publicly hosted Ratebeer dataset using the cell below.

In [None]:
print("Downloading Ratebeer data.")
RATEBEER_ARCHIVE_URL = "http://jmcauley.ucsd.edu/data/beer/ratebeer.json.gz"
FILENAME_GZ = "ratebeer.json.gz"
FILENAME = "ratebeer.json"

def tqdm_reporthook(tqdm):
    """Hook to wrap 'tqdm' library to provide download progress."""
    last_b = [0]

    def update_to(b=1, bsize=1, tsize=None):
        if tsize is not None:
            tqdm.total = tsize
        tqdm.update((b - last_b[0]) * bsize)
        last_b[0] = b

    return update_to

with tqdm(unit = 'B', unit_scale = True, unit_divisor = 1024, miniters = 1, desc = FILENAME_GZ) as t:
    urlretrieve(RATEBEER_ARCHIVE_URL, filename=FILENAME_GZ, reporthook=tqdm_reporthook(t))

with gzip.open(FILENAME_GZ) as ratebeer_dict_file:
    line_count = sum(1 for _ in ratebeer_dict_file)

print(f"Ratebeer interactions line count: {line_count}.")

Shaped supports ingesting compressed data files (such as GZip), but as this file is formatted as line-delimited Python dictionaries, we must unzip the file, and normalize the rows to a line-delimited JSON file (JSONL).

We also require to normalize the rating columns in the Dataframe to provide integer values, e.g. "12/20" becomes 12.

In [None]:
print("Normalizing Ratebeer interaction dicts.")
with gzip.open(FILENAME_GZ) as ratebeer_dict_file:
    # Evaluate each line in file as a Python dictionary stored as utf-8 string
    dict_list = [
        ast.literal_eval(str(ratebeer_dict_line, 'utf-8'))
        for ratebeer_dict_line in tqdm(ratebeer_dict_file, total=line_count)
    ]

print("Converting rating columns to integers.")
df = pd.DataFrame(dict_list)

def normalize_review_series(*df_keys):
    for df_key in df_keys:
        df[df_key] = df[df_key].str.split('/').str[0].dropna().astype(int)

normalize_review_series(
    'review/overall', 
    'review/appearance',
    'review/aroma',
    'review/palate',
    'review/taste',
)

print(df.head())
print(f"Writing JSONL to file {FILENAME}.")
df.to_json(path_or_buf=FILENAME, orient='records', lines=True)

### Upload Data to Shaped

Once we have all our data prepared, we can upload it using a [`POST` call to the `/models` endpoint](https://docs.shaped.ai/reference/create-model). The body of the request contains all the info needed to setup the model. [Please reference the docs for details on each field and their types](https://docs.shaped.ai/reference/create-model).

For simplicity, we are specifying the user and item schemas to use the name of the profile (`review/profileName`), and the name of the beer (`beer/name`) respectively to identify the Item and User entities. In a production environment it is recommended to use well-known user and item identifiers to integrate your service into the Shaped Model API.

*If you try `POST`ing to the `/models` endpoint multiple times with the same `model_name`, you will encounter an error saying `"Model with name: '{model_name}' already exists with status: '{status}'"`. If you would like to update or create a new model with the same `model_name` you must first delete the existing model with `model_name`. You can do that by making a [`DELETE` request to the `/models/{model_name}` endpoint](https://docs.shaped.ai/reference/delete-model). The `DELETE` call can be made from the cell in the Clean Up section at the bottom of this notebook.*

In [None]:
MODEL_NAME = "rating_ratebeer"
response = requests.post(
    f"{SHAPED_BASE_URL}/models",
    headers={
        "x-api-key": SHAPED_API_KEY,
        "Content-Type":"application/json"
    },
    json={
        "model_name": MODEL_NAME,
        "connector_configs": [
            {
                "id": "file",
                "type": "File"
            }
        ],
        "schema": {
            "user": {
                "id": "review/profileName",
            },
            "item": {
                "id": "beer/name"
            },
            "interaction": {
                "label": {
                    "name": "review/overall",
                    "type": "Rating"
                },
                "source": {
                    "connector_id": "file",
                    "path": FILENAME
                },
                "created_at": "review/time"
            }
        }
    }
)
upload_request = json.loads(response.content)
print(json.dumps(upload_request, indent=2))

The `response` from the `POST` call to the `/models` endpoint returns a json object containing info, namely `'url'` and `'fields'`, about the s3 bucket to upload your data to. So let's go ahead and use those to upload our Ratebeer data! This can take some time depending on your connection upload speed so feel free to sit back as the data is uploaded and the Shaped model begins to build!

*In this example we are using the `File` type in the `connector_config` which requires explicitly uploading the data to an s3 bucket. If we use a different supported type, like `Redshift` for example, the data will be pulled directly from your Datasource and won't require you to manually upload it.*

In [None]:
with open(FILENAME, 'rb') as file:
    files = {'file': (FILENAME, file)}
    upload_response = requests.post(
        upload_request['upload_file_url']['url'], 
        data=upload_request['upload_file_url']['fields'], 
        files=files
    )

print(f"Upload response: {upload_response}.")

### Rank!

After we make the `POST` call to `/models`, we can make a [`GET` call to `/models`](https://shaped.stoplight.io/docs/shaped-api/b3A6NDA5ODEzNjY-list-models) to see our newly created model. 

In [None]:
response = requests.get(
    f"{SHAPED_BASE_URL}/models",
    headers={
        "x-api-key": SHAPED_API_KEY,
        "Content-Type":"application/json"
    }
)
print(json.dumps(json.loads(response.content), indent=2))

You'll notice the `"status"` of the model you just created is most likely `"PREPARING"`. This means that the initial training job hasn't completed yet. Depending on the size of your data this could take up to 30 min. Feel free to keep querying the `/models` endpoint to check the status of your model. When it is ready, the `"status"` will read `"ACTIVE"`.

Once your model is ready (`"status": "ACTIVE"`), you can hit the [`/models/{model_name}/rank?context_id={context_id}` endpoint](https://docs.shaped.ai/reference/rank)!

Remember, `{context_id}` is the id of the entity (in this example, User) you want to fetch rankings for. You can also add an optional query param, `limit`, which will inform how many results to return (with the default being 5).

In [None]:
response = requests.get(
    f"{SHAPED_BASE_URL}/models/{MODEL_NAME}/rank?context_id=1",
    headers={
        "x-api-key": SHAPED_API_KEY,
        "Content-Type":"application/json"
    }
)
print(json.dumps(json.loads(response.content), indent=2))

Wow! It was that easy to see top 15 rated movies for the passed in `context_id` 🍾. Now let's add ranking to your product :)

### Clean Up

__The below code should ONLY be run if you want to delete the model with `model_name`.__

In [None]:
response = requests.delete(
    f"{SHAPED_BASE_URL}/models/{MODEL_NAME}",
    headers={
        "x-api-key": SHAPED_API_KEY,
        "Content-Type":"application/json"
    }
)
print(json.dumps(json.loads(response.content), indent=2))