# Simple Recommender for B2B-Retail with Amazon Personalize

Building a recommender system with Amazon Personalize using the SDK for Python option (boto3). The data is the same longtail B2B-Retail set that I worked with in the "Association Rules Mining" ML-Project, but this time I don't reduce it to the approx 3'000 most popular items. I upload the full interactions set with roughly 74'000 different items.

I also provided the industry sector of the users as meta-data but this did not improve the solution. I left the choice of algorithm and model tuning to the Personalize Service and it selected a HRNN model.

For more details see the full [Documentation](https://docs.aws.amazon.com/personalize/latest/dg/what-is-personalize.html) for Amazon Personalize.

Note, I learned the hard way:
- For Europe AWS Personalize is only available in Region Ireland (eu-west-1), this is important when configuring the AWSCLI.
- Timestamp col in interactions dataset has to be in int format


**Data Sources:**

- `data/raw/sales_total.csv`: Transaction data ('sales log') for 2017/18, this is the main data file representing the interactions between users and items.
- `data/raw/customers_agg_2018.csv`: (Optional) data containing metadata for the users (meaning their respective business sector).
- `data/raw/artikel_agg_2018.csv`: (Optional) data containing the names of the artikel, only needed for final output.

**Changes**

- 2019-07-18: Start project (in St. Ulrich, IT ;-))
- 2019-07-25: End project (in Copenhagen, DK ;-))


<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Import-libraries,-load-data" data-toc-modified-id="Import-libraries,-load-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Import libraries, load data</a></span></li><li><span><a href="#Prepare-and-upload-training-data-to-S3-bucket" data-toc-modified-id="Prepare-and-upload-training-data-to-S3-bucket-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Prepare and upload training data to S3 bucket</a></span><ul class="toc-item"><li><span><a href="#Upload-data-to-S3-bucket" data-toc-modified-id="Upload-data-to-S3-bucket-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Upload data to S3 bucket</a></span></li></ul></li><li><span><a href="#Prepare-Data-Structure" data-toc-modified-id="Prepare-Data-Structure-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Prepare Data Structure</a></span><ul class="toc-item"><li><span><a href="#Create-Schemas" data-toc-modified-id="Create-Schemas-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Create Schemas</a></span></li><li><span><a href="#Create-(and-wait-for)-Dataset-Group" data-toc-modified-id="Create-(and-wait-for)-Dataset-Group-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Create (and wait for) Dataset Group</a></span></li><li><span><a href="#Create-Datasets" data-toc-modified-id="Create-Datasets-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Create Datasets</a></span></li></ul></li><li><span><a href="#Prepare,-create,-and-wait-for-Dataset-Import-Job" data-toc-modified-id="Prepare,-create,-and-wait-for-Dataset-Import-Job-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Prepare, create, and wait for Dataset Import Job</a></span></li><li><span><a href="#Select-a-Recipe-(for-demo-only)" data-toc-modified-id="Select-a-Recipe-(for-demo-only)-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Select a Recipe (for demo only)</a></span></li><li><span><a href="#Create-and-Wait-for-Solution-(version)" data-toc-modified-id="Create-and-Wait-for-Solution-(version)-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Create and Wait for Solution (version)</a></span><ul class="toc-item"><li><span><a href="#Get-type-and-metrics-of-solution-version" data-toc-modified-id="Get-type-and-metrics-of-solution-version-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Get type and metrics of solution version</a></span></li></ul></li><li><span><a href="#Create-and-wait-for-campaign" data-toc-modified-id="Create-and-wait-for-campaign-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Create and wait for campaign</a></span></li><li><span><a href="#Get-Recommendations" data-toc-modified-id="Get-Recommendations-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Get Recommendations</a></span></li><li><span><a href="#Appendix:-Delete-existing-resources-if-you-want-to-re-run-the-project" data-toc-modified-id="Appendix:-Delete-existing-resources-if-you-want-to-re-run-the-project-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Appendix: Delete existing resources if you want to re-run the project</a></span></li></ul></div>

---

## Import libraries, load data

In [1]:
# Import libraries, get personalize boto3 client
import numpy as np
import pandas as pd
import json
import time

import boto3
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

# Display settings
from IPython.display import display
pd.options.display.max_columns = 100

# Ignore warnings
import warnings
warnings.simplefilter('ignore')

In [2]:
# Load data
interactions_raw = pd.read_csv('data/raw/sales_total.csv', parse_dates=['Fakturadatum'])
users_raw = pd.read_csv('data/raw/customers_agg_2018.csv')
artikel_raw = pd.read_csv('data/raw/artikel_agg_2018.csv')

## Prepare and upload training data to S3 bucket

Amazon Personalize recognizes three types of historical datasets. Each type has an associated schema (see next section) with a name key whose value matches the dataset type. The three types are: 
- **Users:** This dataset is intended to provide metadata about your users. This includes information such as age, gender, and loyalty membership, among others, which can be important signals in personalization systems. 
- **Items:** This dataset is intended to provide metadata about your items. This includes information such as price, SKU type, and availability, among others. 
- **Interactions:** This dataset is intended to provide historical interaction data between users and items. 

The Users and Items dataset types are known as metadata types and are only used by certain recipes. As we have no relevant metadata for items, we prepare 2 datasets for

- interactions
- users

[docs](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html)

In [3]:
"""Prepare interaction data"""

# Subset data for 2018 data only
interactions_18_full = interactions_raw.loc[interactions_raw['Fakturadatum'].dt.year == 2018]
interactions_18_part = interactions_18_full[['Kunde', 'Artikel', 'Fakturadatum', 'Nettowert']]

# Kick out all artikel that contain str values in their code
interactions_18_part['num'] = pd.to_numeric(interactions_18_part['Artikel'], errors='coerce')
interactions_18 = interactions_18_part.dropna(how='any')
interactions_18.drop(['num'], axis=1, inplace=True)

# Kick-out special customers
interactions = interactions_18.loc[interactions_18['Kunde'] > 700000]

# Set datatypes
interactions['Kunde'] = interactions['Kunde'].astype(str)
interactions['Artikel'] = interactions['Artikel'].astype(str)
interactions['Fakturadatum'] = interactions['Fakturadatum'].apply(lambda x: x.timestamp()).astype(int)
interactions['Nettowert'] = interactions['Nettowert'].astype(float)

# Rename Columns
interactions = interactions.rename(columns={'Kunde': 'USER_ID', 
                                            'Artikel': 'ITEM_ID',
                                            'Fakturadatum': 'TIMESTAMP',
                                            'Nettowert': 'EVENT_VALUE',
                                           })

In [4]:
# Check results
assert interactions.isnull().sum().sum() == 0
print(interactions.shape)
display(interactions.head(2))

(1402641, 4)


Unnamed: 0,USER_ID,ITEM_ID,TIMESTAMP,EVENT_VALUE
1388625,8488019,5171607,1514937600,77.3
1388626,8488019,5171101,1514937600,32.0


In [5]:
# Save to CSV
interactions.to_csv("data/interim/interactions.csv", index=False)

In [6]:
"""Prepare User data"""

users = users_raw[['Unnamed: 0', 'Branche']]
users = users.rename(columns={'Unnamed: 0': 'USER_ID', 
                              'Branche': 'BRANCHE',
                             })

# Set datatypes
users['USER_ID'] = users['USER_ID'].astype(str)
users['BRANCHE'] = users['BRANCHE'].astype(str)

# Check results
print(users.shape)
display(users.head(2))

(18625, 2)


Unnamed: 0,USER_ID,BRANCHE
0,8107232,15.0
1,8155006,10.0


In [7]:
# Save to CSV
users.to_csv("data/interim/users.csv", index=False)

### Upload data to S3 bucket

After you create a CSV file with your data, upload the file to your Amazon S3 bucket. This is the location that Amazon Personalize imports your data from. Amazon Personalize needs permission to access the Amazon S3 bucket, so a policy has to be attached.

[docs](https://docs.aws.amazon.com/personalize/latest/dg/data-prep-upload-s3.html)

In [8]:
# Retrieve the list of existing buckets (optional)
s3 = boto3.client('s3')
response= s3.list_buckets()
for bucket in response['Buckets']:
    print(bucket['Name'])

rbuerki-01-personalize


In [9]:
"""Specify a s3 Bucket and attach policy to it"""

bucket = "rbuerki-01-personalize"  # name of my S3 bucket
policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket),
                "arn:aws:s3:::{}/*".format(bucket)
            ]
        }
    ]
}

s3.put_bucket_policy(Bucket=bucket, Policy=json.dumps(policy))

{'ResponseMetadata': {'RequestId': '8B7ACD024A1AEB4F',
  'HostId': '1OKTq1uUJE5t0Qjjw2z4O0ac5t4B2GzgMIprw46QhqvoLlewWPXkEs0xQzWn3i/Awl4S/Tc2Exw=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': '1OKTq1uUJE5t0Qjjw2z4O0ac5t4B2GzgMIprw46QhqvoLlewWPXkEs0xQzWn3i/Awl4S/Tc2Exw=',
   'x-amz-request-id': '8B7ACD024A1AEB4F',
   'date': 'Fri, 26 Jul 2019 21:05:31 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 1}}

In [10]:
"""Upload interactions data"""

filename_i = 'interactions.csv' 
boto3.Session().resource('s3').Bucket(bucket).Object(
    filename_i).upload_file("data/interim/{}".format(filename_i))

In [11]:
"""Upload user data"""

filename_u = 'users.csv'
boto3.Session().resource('s3').Bucket(bucket).Object(
    filename_u).upload_file("data/interim/{}".format(filename_u))

## Prepare Data Structure


Import your training data into Amazon Personalize by first creating matching data schemas for your sets, then an empty dataset group and then an empty dataset in that dataset group. Next, create an import job that populates the dataset with data from your Amazon S3 bucket. 

### Create Schemas

Schemas in Amazon Personalize are defined in the Avro format. For more information, see [Apache Avro](https://avro.apache.org/docs/current/). The schema fields can be in any order but must match the order of the corresponding column headers in the data files to be imported. 

[docs](https://docs.aws.amazon.com/personalize/latest/dg/data-prep-formatting.html)

In [12]:
interactions_schema = {"type": "record", 
                       "name": "Interactions",
                       "namespace": "com.amazonaws.personalize.schema",
                       "fields": [
                       {
                           "name": "USER_ID",
                           "type": "string"
                       },
                       {
                           "name": "ITEM_ID",
                           "type": "string"
                       },
                       {
                           "name": "TIMESTAMP",
                           "type": "long"
                       },
                       {
                           "name": "EVENT_VALUE",
                           "type": "float"
                       }
                                  ],
                                  "version": "1.0"
                      }

In [1]:
# Create schema
create_schema_response = personalize.create_schema(
    name = "interactions-schema",
    schema = json.dumps(interactions_schema))

# Get the ARN
interactions_schema_arn = create_schema_response['schemaArn']
print(interactions_schema_arn)

In [14]:
users_schema = {"type": "record", 
                "name": "Users",
                "namespace": "com.amazonaws.personalize.schema",
                "fields": [
                {
                    "name": "USER_ID",
                    "type": "string"
                },
                {
                    "name": "BRANCHE",
                    "type": "string",
                    "categorical": True
                }
                          ],
                          "version": "1.0"
               }

In [2]:
# Create schema
create_schema_response = personalize.create_schema(
    name = "users-schema",
    schema = json.dumps(users_schema))

# Get the ARN
users_schema_arn = create_schema_response['schemaArn']
print(users_schema_arn)

### Create (and wait for) Dataset Group

In [3]:
"""Create the Dataset Group"""

create_dataset_group_response = personalize.create_dataset_group(
    name = "recommender-test-dataset-group")

# Get the ARN
dataset_group_arn = create_dataset_group_response['datasetGroupArn']
print(dataset_group_arn)

'Create the Dataset Group'

In [32]:
"""Wait for Dataset Group to have ACTIVE status"""

max_time = time.time() + 3*60 # 3 minutes
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = dataset_group_arn)
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(30)

DatasetGroup: CREATE PENDING
DatasetGroup: ACTIVE


### Create Datasets

In [4]:
dataset_type = "INTERACTIONS"
create_dataset_response = personalize.create_dataset(
    name = "recommender-test-interactions",
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = interactions_schema_arn)

# Get the ARN
dataset_arn_i = create_dataset_response['datasetArn']
print(dataset_arn_i)

In [5]:
dataset_type = "USERS"
create_dataset_response = personalize.create_dataset(
    name = "recommender-test-users",
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = users_schema_arn)

# Get the ARN
dataset_arn_u = create_dataset_response['datasetArn']
print(dataset_arn_u)

## Prepare, create, and wait for Dataset Import Job

Import your training data into Amazon Personalize by first creating an empty dataset group and then an empty dataset in that dataset group. Next, create an import job that populates the dataset with data from your Amazon S3 bucket. 

The `roleArn` parameter specifies the AWS Identity and Access Management role that gives Amazon Personalize permissions to access your Amazon S3 bucket. (Because I have initially already set up a Personalize role in the console (see [docs](https://docs.aws.amazon.com/personalize/latest/dg/setup.html)), the first code cell is inactive and I simply load the existing roleArn in the second code cell.)

In [21]:
# """Create Personalize role"""

# iam = boto3.client("iam")

# role_name = "PersonalizeRole"
# assume_role_policy_document = {
#     "Version": "2012-10-17",
#     "Statement": [
#         {
#           "Effect": "Allow",
#           "Principal": {
#             "Service": "personalize.amazonaws.com"
#           },
#           "Action": "sts:AssumeRole"
#         }
#     ]
# }

# create_role_response = iam.create_role(
#     RoleName = role_name,
#     AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
# )

# # AmazonPersonalizeFullAccess provides access to any S3 bucket with a name that includes "personalize" or "Personalize" 
# # if you would like to use a bucket with a different name, please consider creating and attaching a new policy
# # that provides read access to your bucket or attaching the AmazonS3ReadOnlyAccess policy to the role
# policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess"
# iam.attach_role_policy(
#     RoleName = role_name,
#     PolicyArn = policy_arn
# )

# time.sleep(60) # wait for a minute to allow IAM role policy attachment to propagate

# role_arn = create_role_response["Role"]["Arn"]
# print(role_arn)

In [43]:
# Load existing role ARN
role_arn = "" # hidden from public

In [21]:
"""Create dataset import job for interactions"""

create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "interactions-dataset-import-job",
    datasetArn = dataset_arn_i,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, filename_i)
    }, roleArn = role_arn)

# Get the ARN
dataset_import_job_arn_i = create_dataset_import_job_response['datasetImportJobArn']
print(dataset_import_job_arn_i)

In [45]:
"""Wait for Dataset Import Job to Have ACTIVE Status"""

max_time = time.time() + 10*60 # 10 minutes
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_import_job_arn_i)
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print("DatasetImportJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: ACTIVE


In [6]:
"""Create dataset import job for users"""

create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "users-dataset-import-job",
    datasetArn = dataset_arn_u,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, filename_u)
    }, roleArn = role_arn)

# Get the ARN
dataset_import_job_arn_u = create_dataset_import_job_response['datasetImportJobArn']
print(dataset_import_job_arn_u)

'Create dataset import job for users'

In [46]:
"""Wait for Dataset Import Job to Have ACTIVE Status"""

max_time = time.time() + 10*60 # 10 minutes
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_import_job_arn_u)
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print("DatasetImportJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetImportJob: ACTIVE


## Select a Recipe (for demo only)

A _recipe_ in Amazon Personalize is made up of an algorithm with hyperparameters, and a feature transformation. Amazon Personalize provides a number of predefined recipes that allow you to make recommendations with no knowledge of machine learning.The predefined recipes are also useful for quick experimentation.

(To customize the training, supply the `solutionConfig` parameter. The SolutionConfig object allows you to override the default solution and recipe parameters. This is not done here.)

**NOTE:** _For this case I won't use a predefined recipe, I will let Personalize choose the optimal algorithm in the next step by calling `createSolution` with param `autoML=True`. Therefore the next codeblock is inactivated. See demo notebook for use of a predefined recipe._

[docs](https://docs.aws.amazon.com/personalize/latest/dg/working-with-predefined-recipes.html) referring to available recipes:
- popularity count (baseline model)
- HRNN
- HRNN-Metadata
- HRNN-Coldstart
- SIMS (based on item similarities, based on collaborative filtering)
- personalized ranking (for search results, curated lists)

In [1]:
"""For demo purpose only: select an AWS HRNN"""

# list_recipes_response = personalize.list_recipes()
# recipe_arn = "arn:aws:personalize:::recipe/aws-hrnn"
# list_recipes_response

## Create and Wait for Solution (version)

Creating a solution entails optimizing the model to deliver the best results for a specific business need. Amazon Personalize uses "recipes" to create these personalized solutions. (Altough in this specific case no pre-definied recipe is used.) A _solution version_ is the term Amazon Personalize uses for a trained machine learning model that makes recommendations to customers. 

A solution is created by calling the `CreateSolution` and `CreateSolutionVersion` operations. CreateSolution creates the configuration for training a model. CreateSolutionVersion starts the training process, which results in a specific version of the solution.

[docs](https://docs.aws.amazon.com/personalize/latest/dg/training-deploying-solutions.html)

In [7]:
"""Create solution"""

response = personalize.create_solution(
    name = "recommender-test-solution",
    datasetGroupArn = dataset_group_arn,
    performAutoML = True)

# Get the ARN
solution_arn = response['solutionArn']
print(solution_arn)

'Create solution'

In [49]:
"""Wait for solution to have ACTIVE status"""

max_time = time.time() + 20*60 # 20 minutes
while time.time() < max_time:
    # Use the solution ARN to get the solution status.
    solution_description = personalize.describe_solution(solutionArn = solution_arn)['solution']
    print('Solution status: ' + solution_description['status'])
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

Solution status: ACTIVE


In [8]:
"""Create solution version"""

print ('Creating solution version')
response = personalize.create_solution_version(solutionArn = solution_arn)
solution_version_arn = response['solutionVersionArn']
print('Solution version ARN: ' + solution_version_arn)

Creating solution version


In [3]:
# Save / load solution Version ARN for convenience

# %store solution_version_arn
%store -r solution_version_arn

In [4]:
"""Check solution version status (manually)"""

solution_version_description = personalize.describe_solution_version(
    solutionVersionArn = solution_version_arn)['solutionVersion']
print('Solution version status: ' + solution_version_description['status'])

Solution version status: ACTIVE


### Get type and metrics of solution version

For each metric (not including coverage), higher numbers are better. 
- **coverage:** The proportion of unique recommended items from all queries out of the total number of unique items in the training data (includes both the Items and Interactions datasets). 
- **mean_reciprocal_rank_at_25:** The mean of the reciprocal ranks of the first relevant recommendation out of the top 25 recommendations over all queries. This metric is appropriate if you're interested in the single highest ranked recommendation. 
- **normalized_discounted_cumulative_gain_at_K:** Discounted gain assumes that recommendations lower on a list of recommendations are less relevant than higher recommendations. NDCG is between 0 - 1. This metric rewards relevant items that appear near the top of the list, because the top of a list usually draws more attention. 
- **precision_at_K:** The number of relevant recommendations out of the top K recommendations divided by K. This metric rewards precise recommendation of the relevant items.

[docs](https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html)

In [9]:
# Get recipe type
personalize.describe_solution_version(solutionVersionArn=solution_version_arn) 

**Findings:** Personalize has chose an HRNN, so the user metadata (industry sector) was of no use for the performance.

In [11]:
# Get metrics
get_solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn = solution_version_arn)

print(json.dumps(get_solution_metrics_response['metrics'], indent=2))

{
  "coverage": 0.1212,
  "mean_reciprocal_rank_at_25": 0.0566,
  "normalized_discounted_cumulative_gain_at_10": 0.0705,
  "normalized_discounted_cumulative_gain_at_25": 0.0779,
  "normalized_discounted_cumulative_gain_at_5": 0.0677,
  "precision_at_10": 0.0085,
  "precision_at_25": 0.0046,
  "precision_at_5": 0.0153
}


## Create and wait for campaign

You create a campaign by deploying a solution version. [docs](https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html
)

In [10]:
"""Create campaign"""

create_campaign_response = personalize.create_campaign(
    name = "recommender-test-campaign",
    solutionVersionArn = solution_version_arn,
    minProvisionedTPS = 1)

campaign_arn = create_campaign_response['campaignArn']
campaign_description = personalize.describe_campaign(campaignArn = campaign_arn)['campaign']
print('Name: ' + campaign_description['name'])
print('ARN: ' + campaign_description['campaignArn'])
print('Status: ' + campaign_description['status'])

'Create campaign'

In [28]:
# Save / load campaign ARN for convenience

# %store campaign_arn
%store -r campaign_arn

In [13]:
"""Wait for campaign to have ACTIVE status"""

max_time = time.time() + 20*60 # 20 minutes
while time.time() < max_time:
    describe_campaign_response = personalize.describe_campaign(
        campaignArn = campaign_arn)
    status = describe_campaign_response["campaign"]["status"]
    print("Campaign: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

Campaign: CREATE PENDING
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: ACTIVE


## Get Recommendations

To get recommendations, call the `GetRecommendations` API. Supply either the user ID or item ID, dependent on the recipe type used to create the solution the campaign is based on. The solution backing the campaign must have been created using a recipe of type USER_PERSONALIZATION or RELATED_ITEMS. That is the case here. For more information, see Using Predefined Recipes.

(Note: If the solution backing the campaign has been created using a recipe of type PERSONALIZED_RANKING, you can instead of getting recommendations get a personalized ranking - a list of recommended items that are re-ranked for a specific user. To get personalized rankings, call the `GetPersonalizedRanking` API.) 

[docs](https://docs.aws.amazon.com/personalize/latest/dg/getting-recommendations.html)

In [12]:
# Prepare dataframe for recommendations to display the artikel name
artikel = artikel_raw[['id', 'name']]
artikel.columns = ['ITEM_ID', 'TITLE']

In [27]:
"""Select a random user-item combination from transaction set"""

user_id, item_id = interactions.iloc[:,:2].sample().values[0]
item_title = artikel.loc[artikel['ITEM_ID'] == item_id].values[0][-1]

print("USER:", user_id)
print("ITEM:",item_title)

# items

USER: 8826031
ITEM: Deckenklips, Kunststoff


In [38]:
"""Get the recommendations"""

get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = 'User ID')

item_list = get_recommendations_response['itemList']
recommendations = pd.DataFrame(
    [artikel.loc[artikel['ITEM_ID'] == item['itemId']].values[0][-1] for item in item_list], 
    columns=['Artikel'], 
    index = np.arange(1,26,1))

print("Recommended items")
display(recommendations)

Recommended items


Unnamed: 0,Artikel
1,Decoupier-Sägeblätter 1.5 X 130 X 0.48mm
2,Hebel-Revolverlochzange 250 mm mit 6 Loc
3,Spannschrauben M5 x14 zu Laubsägebogen S
4,Nachschlüssel SR100 Kaba star für Schlie
5,"Fugen- und Furnierleim COLLANO FL 330, G"
6,"Silberlot cadmiumfrei 1,5 mm Pk. zu 100g"
7,Möbelgleiter Basis Modul ø20mm Kunst.sch
8,Wendehobelmesser HM Länge 82 mm
9,Bandsägeblatt JET JWBS-14 10 x 2560 x 0.
10,Bandsägeblatt 10 x 1875mm gebrauchsferti


---

## Appendix: Delete existing resources if you want to re-run the project

In [19]:
# First things first: Delete existing resources before (re-)running the project
group = 'recommender-test-dataset-group'
set_list = ['INTERACTIONS', 'USERS']
schema_list = ['interactions-schema', 'users-schema']

try:
    for set in set_list:
        personalize.delete_dataset(
            datasetArn="arn:aws:personalize:eu-west-1:873674308518:dataset/{}/{}".format(group, set))
        print('set deleted')
except Exception:
    pass

In [22]:
try:
    personalize.delete_dataset_group(
        datasetGroupArn="arn:aws:personalize:eu-west-1:873674308518:dataset-group/{}".format(group))
    print('group deleted')
except Exception:
    pass  

In [4]:
try:
    for schema in ['interactions-schema', 'users-schema']:
        personalize.delete_schema(
            schemaArn="arn:aws:personalize:eu-west-1:873674308518:schema/{}".format(schema))
        print('schema deleted')
except Exception:
    pass

schema deleted
schema deleted
