# Simple Recommender for B2B-Retail with AWS Personalize

Building a recommender system with AWS Personalize using the SDK for Python option (boto3). The data is the same longtail B2B-Retail set as in the "Association Rules Mining" ML-Project, but this time I don't reduce it to the approx 3'000 most popular items. I upload the full set.

[Documentation](https://docs.aws.amazon.com/personalize/latest/dg/what-is-personalize.html) for AWS Personalize.

I learned the hard way:
- For Europe AWS Personalize is only available in Region Ireland (eu-west-1), this is important when configuring the AWSCLI.
- Timestamp col in interactions dataset has to be in int format


**Data Sources:**

- `data/raw/sales_total.csv`: Transaction data ('sales log') for 2017/18, this is the main data file representing the interactions between users and items.
- `data/raw/customers_agg_2018.csv`: (Optional) data containing metadata for the users (their respective business sector).


**Data Output:**

- `xxx.csv`: blablabla

**Changes**

- 2019-07-18: Start project



<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Import-libraries,-load-data" data-toc-modified-id="Import-libraries,-load-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Import libraries, load data</a></span></li><li><span><a href="#Prepare-and-upload-training-data-to-S3-bucket" data-toc-modified-id="Prepare-and-upload-training-data-to-S3-bucket-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Prepare and upload training data to S3 bucket</a></span><ul class="toc-item"><li><span><a href="#Upload-data-to-S3-bucket" data-toc-modified-id="Upload-data-to-S3-bucket-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Upload data to S3 bucket</a></span></li></ul></li><li><span><a href="#Prepare-Data-Structure" data-toc-modified-id="Prepare-Data-Structure-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Prepare Data Structure</a></span><ul class="toc-item"><li><span><a href="#Create-Schemas" data-toc-modified-id="Create-Schemas-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Create Schemas</a></span></li><li><span><a href="#Create-(and-wait-for)-Dataset-Group" data-toc-modified-id="Create-(and-wait-for)-Dataset-Group-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Create (and wait for) Dataset Group</a></span></li><li><span><a href="#Create-Datasets" data-toc-modified-id="Create-Datasets-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Create Datasets</a></span></li></ul></li><li><span><a href="#Prepare,-create,-and-wait-for-Dataset-Import-Job" data-toc-modified-id="Prepare,-create,-and-wait-for-Dataset-Import-Job-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Prepare, create, and wait for Dataset Import Job</a></span></li><li><span><a href="#Select-a-Recipe-(for-demo-only)" data-toc-modified-id="Select-a-Recipe-(for-demo-only)-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Select a Recipe (for demo only)</a></span></li><li><span><a href="#Create-and-Wait-for-Solution-(version)" data-toc-modified-id="Create-and-Wait-for-Solution-(version)-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Create and Wait for Solution (version)</a></span></li></ul></div>

---

## Import libraries, load data

In [1]:
# Import libraries, get personalize boto3 client
import numpy as np
import pandas as pd
import json
import time

import boto3
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

# Display settings
from IPython.display import display
pd.options.display.max_columns = 100

In [2]:
# First things first: Delete all existing resources before (re-)running project

group = 'recommender-test-dataset-group'
set_list = ['INTERACTIONS', 'USERS']
schema_list = ['interactions-schema', 'users-schema']

try:
    for set in set_list:
        personalize.delete_dataset(
            datasetArn="arn:aws:personalize:eu-west-1:873674308518:dataset/{}/{}".format(group, set))
        print('set deleted')
except:
    pass

try:
    personalize.delete_dataset_group(
        datasetGroupArn="arn:aws:personalize:eu-west-1:873674308518:dataset-group/{}".format(group))
    print('group deleted')
except:
    pass
    
    
try:
    for schema in ['interactions-schema', 'users-schema']:
        personalize.delete_schema(
            schemaArn="arn:aws:personalize:eu-west-1:873674308518:schema/{}".format(schema))
        print('schema deleted')
except:
    pass
    
time.sleep(30)

group deleted
schema deleted


In [3]:
# Load data
interactions_raw = pd.read_csv('data/raw/sales_total.csv', parse_dates=['Fakturadatum'])
users_raw = pd.read_csv('data/raw/customers_agg_2018.csv')

## Prepare and upload training data to S3 bucket

Check documentation for more info. As we have no relevant metadata for items, we prepare 2 datasets for

- interactions
- users

In [4]:
"""Prepare interaction data"""

# Subset data for 2018 data only
interactions_18_full = interactions_raw.loc[interactions_raw['Fakturadatum'].dt.year == 2018]
interactions_18_part = interactions_18_full[['Kunde', 'Artikel', 'Fakturadatum', 'Nettowert']]

# Kick out all artikel that contain str values in their code
interactions_18_part['num'] = pd.to_numeric(interactions_18_part['Artikel'], errors='coerce')
interactions_18 = interactions_18_part.dropna(how='any')
interactions_18.drop(['num'], axis=1, inplace=True)

# Kick-out special customers
interactions = interactions_18.loc[interactions_18['Kunde'] > 700000]

# Set datatypes
interactions['Kunde'] = interactions['Kunde'].astype(str)
interactions['Artikel'] = interactions['Artikel'].astype(str)
interactions['Fakturadatum'] = interactions['Fakturadatum'].apply(lambda x: x.timestamp()).astype(int)
interactions['Nettowert'] = interactions['Nettowert'].astype(float)

# Rename Columns
interactions = interactions.rename(columns={'Kunde': 'USER_ID', 
                                            'Artikel': 'ITEM_ID',
                                            'Fakturadatum': 'TIMESTAMP',
                                            'Nettowert': 'EVENT_VALUE',
                                           })


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be s

In [5]:
# Check results
assert interactions.isnull().sum().sum() == 0
print(interactions.shape)
display(interactions.head(2))

(1402641, 4)


Unnamed: 0,USER_ID,ITEM_ID,TIMESTAMP,EVENT_VALUE
1388625,8488019,5171607,1514937600,77.3
1388626,8488019,5171101,1514937600,32.0


In [8]:
# Save to CSV
interactions.to_csv("data/interim/interactions.csv", index=False)

In [6]:
"""Prepare User data"""

users = users_raw[['Unnamed: 0', 'Branche']]
users = users.rename(columns={'Unnamed: 0': 'USER_ID', 
                              'Branche': 'BRANCHE',
                             })

# Set datatypes
users['USER_ID'] = users['USER_ID'].astype(str)
users['BRANCHE'] = users['BRANCHE'].astype(str)

# Check results
print(users.shape)
display(users.head(2))

(18625, 2)


Unnamed: 0,USER_ID,BRANCHE
0,8107232,15.0
1,8155006,10.0


In [7]:
# Save to CSV
users.to_csv("data/interim/users.csv", index=False)

### Upload data to S3 bucket

In [9]:
# Retrieve the list of existing buckets (optional)
s3 = boto3.client('s3')
response= s3.list_buckets()
for bucket in response['Buckets']:
    print(bucket['Name'])

rbuerki-01-personalize


In [10]:
"""Specify a s3 Bucket and attach policy to it"""

bucket = "rbuerki-01-personalize"  # name of my S3 bucket
policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket),
                "arn:aws:s3:::{}/*".format(bucket)
            ]
        }
    ]
}

s3.put_bucket_policy(Bucket=bucket, Policy=json.dumps(policy))

{'ResponseMetadata': {'RequestId': '3350DA265FA1CEB4',
  'HostId': 'JBwGOMbCtCpluYN0M0/K3cAxEcIjH4nyVsZmJVsEoYX5bkFjRpFKVLrCncGBesMXlW6PQHk5ah4=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': 'JBwGOMbCtCpluYN0M0/K3cAxEcIjH4nyVsZmJVsEoYX5bkFjRpFKVLrCncGBesMXlW6PQHk5ah4=',
   'x-amz-request-id': '3350DA265FA1CEB4',
   'date': 'Tue, 23 Jul 2019 03:23:29 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 1}}

In [11]:
"""Upload interactions data"""

filename_i = 'interactions.csv' 
# boto3.Session().resource('s3').Bucket(bucket).Object(
#     filename_i).upload_file("data/interim/{}".format(filename_i))

In [12]:
"""Upload user data"""

filename_u = 'users.csv'
# boto3.Session().resource('s3').Bucket(bucket).Object(
#     filename_u).upload_file("data/interim/{}".format(filename_u))

## Prepare Data Structure

### Create Schemas

Schemas in Amazon Personalize are defined in the Avro format. For more information, see [Apache Avro](https://avro.apache.org/docs/current/). The schema fields can be in any order but must match the order of the corresponding column headers in the data files to be imported. 

In [13]:
interactions_schema = {"type": "record", 
                       "name": "Interactions",
                       "namespace": "com.amazonaws.personalize.schema",
                       "fields": [
                       {
                           "name": "USER_ID",
                           "type": "string"
                       },
                       {
                           "name": "ITEM_ID",
                           "type": "string"
                       },
                       {
                           "name": "TIMESTAMP",
                           "type": "long"
                       },
                       {
                           "name": "EVENT_VALUE",
                           "type": "float"
                       }
                                  ],
                                  "version": "1.0"
                      }

In [14]:
create_schema_response = personalize.create_schema(
    name = "interactions-schema",
    schema = json.dumps(interactions_schema)
)

interactions_schema_arn = create_schema_response['schemaArn']
print(json.dumps(create_schema_response, indent=2))

{
  "schemaArn": "arn:aws:personalize:eu-west-1:873674308518:schema/interactions-schema",
  "ResponseMetadata": {
    "RequestId": "1f74fbc7-3096-447a-aa12-085bbbf446ad",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Tue, 23 Jul 2019 03:23:34 GMT",
      "x-amzn-requestid": "1f74fbc7-3096-447a-aa12-085bbbf446ad",
      "content-length": "85",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


In [15]:
users_schema = {"type": "record", 
                "name": "Users",
                "namespace": "com.amazonaws.personalize.schema",
                "fields": [
                {
                    "name": "USER_ID",
                    "type": "string"
                },
                {
                    "name": "BRANCHE",
                    "type": "string",
                    "categorical": True
                }
                          ],
                          "version": "1.0"
               }

In [16]:
create_schema_response = personalize.create_schema(
    name = "users-schema",
    schema = json.dumps(users_schema)
)

users_schema_arn = create_schema_response['schemaArn']
print(json.dumps(create_schema_response, indent=2))

{
  "schemaArn": "arn:aws:personalize:eu-west-1:873674308518:schema/users-schema",
  "ResponseMetadata": {
    "RequestId": "1f9c457f-6a7e-4716-ac6e-b69002ff78c0",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Tue, 23 Jul 2019 03:23:35 GMT",
      "x-amzn-requestid": "1f9c457f-6a7e-4716-ac6e-b69002ff78c0",
      "content-length": "78",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


### Create (and wait for) Dataset Group

In [17]:
create_dataset_group_response = personalize.create_dataset_group(
    name = "recommender-test-dataset-group"
)

dataset_group_arn = create_dataset_group_response['datasetGroupArn']
print(json.dumps(create_dataset_group_response, indent=2))

{
  "datasetGroupArn": "arn:aws:personalize:eu-west-1:873674308518:dataset-group/recommender-test-dataset-group",
  "ResponseMetadata": {
    "RequestId": "df62824c-dec3-430b-8553-0385642a91e9",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Tue, 23 Jul 2019 03:23:37 GMT",
      "x-amzn-requestid": "df62824c-dec3-430b-8553-0385642a91e9",
      "content-length": "109",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


In [18]:
"""Wait for Dataset Group to have ACTIVE status"""

max_time = time.time() + 3*60 # 3 minutes
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = dataset_group_arn
    )
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(30)

DatasetGroup: CREATE PENDING
DatasetGroup: ACTIVE


### Create Datasets

In [19]:
dataset_type = "INTERACTIONS"
create_dataset_response = personalize.create_dataset(
    name = "recommender-test-interactions",
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = interactions_schema_arn
)

dataset_arn_i = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

{
  "datasetArn": "arn:aws:personalize:eu-west-1:873674308518:dataset/recommender-test-dataset-group/INTERACTIONS",
  "ResponseMetadata": {
    "RequestId": "e06c5ca8-08c8-412e-a02d-14880c856a05",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Tue, 23 Jul 2019 03:24:14 GMT",
      "x-amzn-requestid": "e06c5ca8-08c8-412e-a02d-14880c856a05",
      "content-length": "111",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


In [20]:
dataset_type = "USERS"
create_dataset_response = personalize.create_dataset(
    name = "recommender-test-users",
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = users_schema_arn
)

dataset_arn_u = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

{
  "datasetArn": "arn:aws:personalize:eu-west-1:873674308518:dataset/recommender-test-dataset-group/USERS",
  "ResponseMetadata": {
    "RequestId": "12ec0533-eacf-491c-8749-3fb265df205d",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Tue, 23 Jul 2019 03:24:16 GMT",
      "x-amzn-requestid": "12ec0533-eacf-491c-8749-3fb265df205d",
      "content-length": "104",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


## Prepare, create, and wait for Dataset Import Job

Because I have initially already set up a Personalize role (see documentation) the first code cell is inactive and I simply load the roleArn in the second code cell.

In [21]:
# """Create Personalize role"""

# iam = boto3.client("iam")

# role_name = "PersonalizeRole"
# assume_role_policy_document = {
#     "Version": "2012-10-17",
#     "Statement": [
#         {
#           "Effect": "Allow",
#           "Principal": {
#             "Service": "personalize.amazonaws.com"
#           },
#           "Action": "sts:AssumeRole"
#         }
#     ]
# }

# create_role_response = iam.create_role(
#     RoleName = role_name,
#     AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
# )

# # AmazonPersonalizeFullAccess provides access to any S3 bucket with a name that includes "personalize" or "Personalize" 
# # if you would like to use a bucket with a different name, please consider creating and attaching a new policy
# # that provides read access to your bucket or attaching the AmazonS3ReadOnlyAccess policy to the role
# policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess"
# iam.attach_role_policy(
#     RoleName = role_name,
#     PolicyArn = policy_arn
# )

# time.sleep(60) # wait for a minute to allow IAM role policy attachment to propagate

# role_arn = create_role_response["Role"]["Arn"]
# print(role_arn)

In [22]:
role_arn = "arn:aws:iam::873674308518:role/PersonalizeRole"

In [23]:
"""Create dataset import job for interactions"""

create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "interactions-dataset-import-job",
    datasetArn = dataset_arn_i,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, filename_i)
    },
    roleArn = role_arn
)

dataset_import_job_arn_i = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

{
  "datasetImportJobArn": "arn:aws:personalize:eu-west-1:873674308518:dataset-import-job/interactions-dataset-import-job",
  "ResponseMetadata": {
    "RequestId": "358063ac-614b-4985-a10e-8f9e99653867",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Tue, 23 Jul 2019 03:24:22 GMT",
      "x-amzn-requestid": "358063ac-614b-4985-a10e-8f9e99653867",
      "content-length": "119",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


In [24]:
"""Wait for Dataset Import Job to Have ACTIVE Status"""

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_import_job_arn_i
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print("DatasetImportJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: ACTIVE


In [25]:
"""Create dataset import job for users"""

create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "users-dataset-import-job",
    datasetArn = dataset_arn_u,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, filename_u)
    },
    roleArn = role_arn
)

dataset_import_job_arn_u = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

{
  "datasetImportJobArn": "arn:aws:personalize:eu-west-1:873674308518:dataset-import-job/users-dataset-import-job",
  "ResponseMetadata": {
    "RequestId": "f8ad25c3-c03c-472f-baae-219227e9490f",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Tue, 23 Jul 2019 03:28:42 GMT",
      "x-amzn-requestid": "f8ad25c3-c03c-472f-baae-219227e9490f",
      "content-length": "112",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


In [26]:
"""Wait for Dataset Import Job to Have ACTIVE Status"""

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_import_job_arn_u
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print("DatasetImportJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetImportJob: CREATE PENDING
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: ACTIVE


## Select a Recipe (for demo only)

A _recipe_ in Amazon Personalize is made up of an algorithm with hyperparameters, and a feature transformation. Amazon Personalize provides a number of predefined recipes that allow you to make recommendations with no knowledge of machine learning.The predefined recipes are also useful for quick experimentation.

(To customize the training, supply the `solutionConfig` parameter. The SolutionConfig object allows you to override the default solution and recipe parameters. This is not done here.)

**NOTE:** _For this case I won't use a predefined recipe, I will let Personalize choose the optimal algorithm in the next step by calling `createSolution` with param `autoML=True`. Therefore the next codeblock is inactivated. See demo notebook for use of a predefined recipe._

[docs](https://docs.aws.amazon.com/personalize/latest/dg/training-deploying-solutions.html)

In [1]:
"""For demo purpose only: select an aws-hrnn"""

# list_recipes_response = personalize.list_recipes()
# recipe_arn = "arn:aws:personalize:::recipe/aws-hrnn"
# list_recipes_response

## Create and Wait for Solution (version)

Creating a solution entails optimizing the model to deliver the best results for a specific business need. Amazon Personalize uses "recipes" to create these personalized solutions. (Altough in this specific case no pre-definied recipe is used.) A _solution version_ is the term Amazon Personalize uses for a trained machine learning model that makes recommendations to customers. 

A solution is created by calling the `CreateSolution` and `CreateSolutionVersion` operations. CreateSolution creates the configuration for training a model. CreateSolutionVersion starts the training process, which results in a specific version of the solution.

[docs](https://docs.aws.amazon.com/personalize/latest/dg/training-deploying-solutions.html)

In [None]:
"""Create solution"""

print ('Creating solution')
response = personalize.create_solution(
    name = "recommender-test-solution",
    datasetGroupArn = "recommender-test-dataset-group",
    performAutoML = True)

# Get the solution ARN.
solution_arn = response['solutionArn']
print('Solution ARN: ' + solution_arn)

In [None]:
"""Wait for solution to have ACTIVE status"""

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    # Use the solution ARN to get the solution status.
    solution_description = personalize.describe_solution(solutionArn = solution_arn)['solution']
    print('Solution status: ' + solution_description['status'])
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

In [None]:
"""Create solution version"""

# Use the solution ARN to create a solution version.
print ('Creating solution version')
response = personalize.create_solution_version(solutionArn = solution_arn)
solution_version_arn = response['solutionVersionArn']
print('Solution version ARN: ' + solution_version_arn)

In [None]:
# Save / load solution Version ARN for convenience

%store solution_version_arn
# %store -r solution_version_arn

In [None]:
"""Check solution version status (manually)"""

# Use the solution version ARN to get the solution version status.
solution_version_description = personalize.describe_solution_version(
    solutionVersionArn = solution_version_arn)['solutionVersion']
print('Solution version status: ' + solution_version_description['status'])

- please note in initial comments that we use hrnn algorithm
- generally update the comments with an overview of what we are exactly doing (boto vs. console vs. prompt)
- for me: check what else would be possible to build a more complex model in the end