# Amazon Personalize Workshop

## Part 1 - Build your campaign
> This notebook will walk you through the steps to build a recommendation model for movies based on data collected from the movielens data set. The goal is to recommend movies that are relevant based on a particular user. The data is coming from the MovieLens project.

### How to Use the Notebook

Code is broken up into cells like the one below. There's a triangular `Run` button at the top of this page you can click to execute each cell and move onto the next, or you can press `Shift` + `Enter` while in the cell to execute it and move onto the next one.

As a cell is executing you'll notice a line to the side showcase an `*` while the cell is running or it will update to a number to indicate the last cell that completed executing after it has finished exectuting all the code within a cell.


Simply follow the instructions below and execute the cells to get started with Amazon Personalize.

### Imports 

Python ships with a broad collection of libraries and we need to import those as well as the ones installed to help us like [boto3](https://aws.amazon.com/sdk-for-python/) (AWS SDK for python) and [Pandas](https://pandas.pydata.org/)/[Numpy](https://numpy.org/) which are core data science tools.

In [None]:
# Imports
import boto3
import json
import numpy as np
import pandas as pd
import time
!conda install -y -c conda-forge unzip

Collecting package metadata (current_repodata.json): done
Solving environment: | 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - conda-forge/noarch::seaborn-base==0.11.1=pyhd8ed1ab_1
  - conda-forge/noarch::nbclassic==0.2.6=pyhd8ed1ab_0
  - conda-forge/linux-64::blaze==0.11.3=py36_0
  - conda-forge/linux-64::matplotlib==3.3.4=py36h5fab9bb_0
  - defaults/linux-64::_anaconda_depends==5.1.0=py36_2
  - conda-forge/noarch::jupyterlab==3.0.9=pyhd8ed1ab_0
  - conda-forge/noarch::python-language-server==0.36.2=pyhd8ed1ab_0
  - conda-forge/noarch::jupyterlab_server==2.3.0=pyhd8ed1ab_0
  - conda-forge/noarch::pyls-black==0.4.6=pyh9f0ad1d_0
  - conda-forge/linux-64::scikit-image==0.16.2=py36hb3f55d8_0
  - conda-forge/noarch::black==20.8b1=py_1
  - conda-forge/linux-64::anyio==2.1.0=py36h5fab9bb_0
  - conda-forge/linux-64::jupyter_server==1.4.1=py36h5fab9bb_0
  - conda-forge/linux-64::bokeh==2.2.3=py36h5fab9bb_0
 

Next you will want to validate that your environment can communicate successfully with Amazon Personalize, the lines below do just that.

In [None]:
# Configure the SDK to Personalize:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

In [None]:
aws_account_number = boto3.client('sts').get_caller_identity().get('Account')
aws_account_number

'284105231590'

### Configure the data

Data is imported into Amazon Personalize through Amazon S3, below we will specify a bucket that you have created within AWS for the purposes of this exercise.

Below you will update the `bucket` variable to instead be set to the value that you created earlier in the CloudFormation steps, this should be in a text file from your earlier work. the `filename` does not need to be changed.

#### Specify a Bucket and Data Output Location
Be sure to update the `bucket` value if you customized it during deployment of the CloudFormation template. Click in the cell below to make changes.

In [None]:
bucket = "personalizedemobucket202106200728"       # replace with the name of your S3 bucket
filename = "movie-lens-100k.csv"

#### Download, Prepare, and Upload Training Data

At present you do not have the MovieLens data loaded locally yet for examination, execute the lines below to download the latest copy and to examine it quickly.

##### Download and Explore the Dataset

In [None]:
!wget -N http://files.grouplens.org/datasets/movielens/ml-100k.zip
!unzip -o ml-100k.zip
data = pd.read_csv('./ml-100k/u.data', sep='\t', names=['USER_ID', 'ITEM_ID', 'RATING', 'TIMESTAMP'])
pd.set_option('display.max_rows', 5)
data

--2021-06-20 02:18:52--  http://files.grouplens.org/datasets/movielens/ml-100k.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4924029 (4.7M) [application/zip]
Saving to: ‘ml-100k.zip’


2021-06-20 02:18:53 (13.0 MB/s) - ‘ml-100k.zip’ saved [4924029/4924029]

Archive:  ml-100k.zip
   creating: ml-100k/
  inflating: ml-100k/allbut.pl       
  inflating: ml-100k/mku.sh          
  inflating: ml-100k/README          
  inflating: ml-100k/u.data          
  inflating: ml-100k/u.genre         
  inflating: ml-100k/u.info          
  inflating: ml-100k/u.item          
  inflating: ml-100k/u.occupation    
  inflating: ml-100k/u.user          
  inflating: ml-100k/u1.base         
  inflating: ml-100k/u1.test         
  inflating: ml-100k/u2.base         
  inflating: ml-100k/u2.test         
  inflating: ml-100k/u3.base    

Unnamed: 0,USER_ID,ITEM_ID,RATING,TIMESTAMP
0,196,242,3,881250949
1,186,302,3,891717742
...,...,...,...,...
99998,13,225,2,882399156
99999,12,203,3,879959583


##### Prepare and Upload Data

As you can see the data contains a UserID, ItemID, Rating, and Timestamp.

We are now going to remove the items with low rankings, and remove the Rating column before we build our model.

Once done we will now save the file as a new CSV and then upload it to S3.

All of that is done by executing the lines in the cell below.

In [None]:
data = data[data['RATING'] > 3]                # Keep only movies rated higher than 3 out of 5.
data = data[['USER_ID', 'ITEM_ID', 'TIMESTAMP']] # select columns that match the columns in the schema below
data.to_csv(filename, index=False)
boto3.Session().resource('s3').Bucket(bucket).Object(filename).upload_file(filename)

#### Create Schema

A core component of how Personalize understands your data comes from the Schema that is defined below. This configuration tells the service how to digest the data provided via your CSV file. Note the columns and types align to what was in the file you created above.

In [None]:
schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "TIMESTAMP",
            "type": "long"
        }
    ],
    "version": "1.0"
}

try:
    create_schema_response = personalize.create_schema(
        name = "personalize-demo-schema",
        schema = json.dumps(schema)
    )
    schema_arn = create_schema_response['schemaArn']
    print(json.dumps(create_schema_response, indent=2))
except:
    schema_arn = "arn:aws:personalize:us-east-1:{}:schema/personalize-demo-schema".format(aws_account_number)
    print("Resource already exist with ARN :{}".format(schema_arn))

Resource already exist with ARN :arn:aws:personalize:us-east-1:284105231590:schema/personalize-demo-schema


#### Create and Wait for Dataset Group

The largest grouping in Personalize is a Dataset Group, this will isolate your data, event trackers, solutions, and campaigns. Grouping things together that share a common collection of data. Feel free to alter the name below if you'd like.

##### Create Dataset Group

In [None]:
try:
    create_dataset_group_response = personalize.create_dataset_group(
    name = "personalize-launch-demo")
    dataset_group_arn = create_dataset_group_response['datasetGroupArn']
    print(json.dumps(create_dataset_group_response, indent=2))
except:
    dataset_group_arn = "arn:aws:personalize:us-east-1:{}:dataset-group/personalize-launch-demo".format(aws_account_number)
    print("Resource already exist with ARN :{}".format(dataset_group_arn))

{
  "datasetGroupArn": "arn:aws:personalize:us-east-1:284105231590:dataset-group/personalize-launch-demo",
  "ResponseMetadata": {
    "RequestId": "998c048b-22ba-42fc-bb58-c5c6d8c390b6",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Sun, 20 Jun 2021 02:43:40 GMT",
      "x-amzn-requestid": "998c048b-22ba-42fc-bb58-c5c6d8c390b6",
      "content-length": "102",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


##### Wait for Dataset Group to Have ACTIVE Status

Before we can use the Dataset Group in any items below it must be active, execute the cell below and wait for it to show active.

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = dataset_group_arn
    )
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetGroup: ACTIVE


##### Create Dataset

After the group, the next thing to create is the actual datasets, in this example we will only create 1 for the interactions data. Execute the cells below to create it.

In [None]:
dataset_type = "INTERACTIONS"
create_dataset_response = personalize.create_dataset(
    name = "personalize-launch-interactions",
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = schema_arn
)

dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

{
  "datasetArn": "arn:aws:personalize:us-east-1:284105231590:dataset/personalize-launch-demo/INTERACTIONS",
  "ResponseMetadata": {
    "RequestId": "7a6533d3-4358-4040-aa09-ce251236d3ff",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Sun, 20 Jun 2021 02:45:56 GMT",
      "x-amzn-requestid": "7a6533d3-4358-4040-aa09-ce251236d3ff",
      "content-length": "104",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


##### Attach Policy to S3 Bucket

Amazon Personalize needs to be able to read the content of your S3 bucket that you created earlier. The lines below will do that.

In [None]:
s3 = boto3.client("s3")

policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket),
                "arn:aws:s3:::{}/*".format(bucket)
            ]
        }
    ]
}

s3.put_bucket_policy(Bucket=bucket, Policy=json.dumps(policy))

{'ResponseMetadata': {'RequestId': 'V0GWAVB6A7MYM4H3',
  'HostId': 'JKRdfmw/tDRvtC3jhfRi62TLCltQTxUUnTV8gQOzkf1cs+IGsxyvc06+wB4g92NUUnORjZEXWNs=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': 'JKRdfmw/tDRvtC3jhfRi62TLCltQTxUUnTV8gQOzkf1cs+IGsxyvc06+wB4g92NUUnORjZEXWNs=',
   'x-amz-request-id': 'V0GWAVB6A7MYM4H3',
   'date': 'Sun, 20 Jun 2021 02:46:28 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}

##### Create Personalize Role

Also Amazon Personalize needs the ability to assume Roles in AWS in order to have the permissions to execute certain tasks, the lines below grant that.

In [None]:
iam = boto3.client("iam")

role_name = "PersonalizeRoleDemo"
assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "personalize.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
    ]
}

create_role_response = iam.create_role(
    RoleName = role_name,
    AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
)

# AmazonPersonalizeFullAccess provides access to any S3 bucket with a name that includes "personalize" or "Personalize" 
# if you would like to use a bucket with a different name, please consider creating and attaching a new policy
# that provides read access to your bucket or attaching the AmazonS3ReadOnlyAccess policy to the role
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess"
iam.attach_role_policy(
    RoleName = role_name,
    PolicyArn = policy_arn
)

# Now add S3 support
iam.attach_role_policy(
    PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess',
    RoleName=role_name
)
time.sleep(60) # wait for a minute to allow IAM role policy attachment to propagate

role_arn = create_role_response["Role"]["Arn"]
print(role_arn)

arn:aws:iam::284105231590:role/PersonalizeRoleDemo


### Import the data

Earlier you created the DatasetGroup and Dataset to house your information, now you will execute an import job that will load the data from S3 into Amazon Personalize for usage building your model.

##### Create Dataset Import Job

In [None]:
create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "personalize-demo-import1",
    datasetArn = dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, filename)
    },
    roleArn = role_arn
)

dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

{
  "datasetImportJobArn": "arn:aws:personalize:us-east-1:284105231590:dataset-import-job/personalize-demo-import1",
  "ResponseMetadata": {
    "RequestId": "f766900d-2d98-4791-8788-3b4e4153e932",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Sun, 20 Jun 2021 02:49:04 GMT",
      "x-amzn-requestid": "f766900d-2d98-4791-8788-3b4e4153e932",
      "content-length": "112",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


##### Wait for Dataset Import Job to Have ACTIVE Status

It can take a while before the import job completes, please wait until you see that it is active below.

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_import_job_arn
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print("DatasetImportJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetImportJob: CREATE PENDING
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: ACTIVE


### Create the Solution and Version

In Amazon Personalize a trained model is called a Solution, each Solution can have many specific versions that relate to a given volume of data when the model was trained.

To begin we will list all the recipies that are supported, a recipie is an algorithm that has not been trained on your data yet. After listing you'll select one and use that to build your model.

#### Select Recipe

In [None]:
list_recipes_response = personalize.list_recipes()
list_recipes_response

{'recipes': [{'name': 'aws-hrnn',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-hrnn',
   'status': 'ACTIVE',
   'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()),
   'lastUpdatedDateTime': datetime.datetime(2021, 2, 6, 19, 6, 40, 447000, tzinfo=tzlocal())},
  {'name': 'aws-hrnn-coldstart',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-hrnn-coldstart',
   'status': 'ACTIVE',
   'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()),
   'lastUpdatedDateTime': datetime.datetime(2021, 2, 6, 19, 6, 40, 447000, tzinfo=tzlocal())},
  {'name': 'aws-hrnn-metadata',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-hrnn-metadata',
   'status': 'ACTIVE',
   'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()),
   'lastUpdatedDateTime': datetime.datetime(2021, 2, 6, 19, 6, 40, 447000, tzinfo=tzlocal())},
  {'name': 'aws-personalized-ranking',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-personalized-ranking',
   'stat

##### User Personalization
The [User-Personalization](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-new-item-USER_PERSONALIZATION.html) (aws-user-personalization) recipe is optimized for all USER_PERSONALIZATION recommendation scenarios. When recommending items, it uses automatic item exploration.

With automatic exploration, Amazon Personalize automatically tests different item recommendations, learns from how users interact with these recommended items, and boosts recommendations for items that drive better engagement and conversion. This improves item discovery and engagement when you have a fast-changing catalog, or when new items, such as news articles or promotions, are more relevant to users when fresh.

You can balance how much to explore (where items with less interactions data or relevance are recommended more frequently) against how much to exploit (where recommendations are based on what we know or relevance). Amazon Personalize automatically adjusts future recommendations based on implicit user feedback.

First, select the recipe by finding the ARN in the list of recipes above.

In [None]:
recipe_arn = "arn:aws:personalize:::recipe/aws-user-personalization" # aws-user-personalization selected for demo purposes

#### Create and Wait for Solution

First you will create the solution with the API, then you will create a version. It will take several minutes to train the model and thus create your version of a solution. Once it gets started and you are seeing the in progress notifications it is a good time to take a break, grab a coffee, etc.

##### Create Solution

In [None]:
create_solution_response = personalize.create_solution(
    name = "personalize-demo-soln-user-personalization",
    datasetGroupArn = dataset_group_arn,
    recipeArn = recipe_arn
)

solution_arn = create_solution_response['solutionArn']
print(json.dumps(create_solution_response, indent=2))

{
  "solutionArn": "arn:aws:personalize:us-east-1:284105231590:solution/personalize-demo-soln-user-personalization",
  "ResponseMetadata": {
    "RequestId": "4b7f984a-bd8a-4cd1-b7a5-6a9f578a577c",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Sun, 20 Jun 2021 02:52:04 GMT",
      "x-amzn-requestid": "4b7f984a-bd8a-4cd1-b7a5-6a9f578a577c",
      "content-length": "112",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


##### Create Solution Version

In [None]:
create_solution_version_response = personalize.create_solution_version(
    solutionArn = solution_arn
)

solution_version_arn = create_solution_version_response['solutionVersionArn']
print(json.dumps(create_solution_version_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:284105231590:solution/personalize-demo-soln-user-personalization/c8dd176c",
  "ResponseMetadata": {
    "RequestId": "2613b4c0-67f5-4186-8db3-27aeb17a326f",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Sun, 20 Jun 2021 02:52:05 GMT",
      "x-amzn-requestid": "2613b4c0-67f5-4186-8db3-27aeb17a326f",
      "content-length": "128",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


##### Wait for Solution Version to Have ACTIVE Status

This will take approximately 40-50 minutes.

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_solution_version_response = personalize.describe_solution_version(
        solutionVersionArn = solution_version_arn
    )
    status = describe_solution_version_response["solutionVersion"]["status"]
    print("SolutionVersion: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

SolutionVersion: ACTIVE


##### Get Metrics of Solution Version

Now that your solution and version exists, you can obtain the metrics for it to judge its performance. These metrics are not particularly good as it is a demo set of data, but with larger more complex datasets you should see improvements.

In [None]:
get_solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn = solution_version_arn
)

print(json.dumps(get_solution_metrics_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:284105231590:solution/personalize-demo-soln-user-personalization/c8dd176c",
  "metrics": {
    "coverage": 0.3184,
    "mean_reciprocal_rank_at_25": 0.1395,
    "normalized_discounted_cumulative_gain_at_10": 0.174,
    "normalized_discounted_cumulative_gain_at_25": 0.2434,
    "normalized_discounted_cumulative_gain_at_5": 0.1339,
    "precision_at_10": 0.0427,
    "precision_at_25": 0.0342,
    "precision_at_5": 0.0449
  },
  "ResponseMetadata": {
    "RequestId": "31f4c894-2e81-400a-aa82-af6c459d0d24",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Sun, 20 Jun 2021 03:42:46 GMT",
      "x-amzn-requestid": "31f4c894-2e81-400a-aa82-af6c459d0d24",
      "content-length": "425",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


We recommend reading [the documentation](https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html) to understand the metrics, but we have also copied parts of the documentation below for convenience.

You need to understand the following terms regarding evaluation in Personalize:

- *Relevant recommendation* refers to a recommendation that matches a value in the testing data for the particular user.
- *Rank* refers to the position of a recommended item in the list of recommendations. Position 1 (the top of the list) is presumed to be the most relevant to the user.
- *Query* refers to the internal equivalent of a GetRecommendations call.

The metrics produced by Personalize are:

- coverage: The proportion of unique recommended items from all queries out of the total number of unique items in the training data (includes both the Items and Interactions datasets).
- mean_reciprocal_rank_at_25: The [mean of the reciprocal ranks](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) of the first relevant recommendation out of the top 25 recommendations over all queries. This metric is appropriate if you're interested in the single highest ranked recommendation.
- normalized_discounted_cumulative_gain_at_K: Discounted gain assumes that recommendations lower on a list of recommendations are less relevant than higher recommendations. Therefore, each recommendation is discounted (given a lower weight) by a factor dependent on its position. To produce the [cumulative discounted gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) (DCG) at K, each relevant discounted recommendation in the top K recommendations is summed together. The normalized discounted cumulative gain (NDCG) is the DCG divided by the ideal DCG such that NDCG is between 0 - 1. (The ideal DCG is where the top K recommendations are sorted by relevance.) Amazon Personalize uses a weighting factor of 1/log(1 + position), where the top of the list is position 1. This metric rewards relevant items that appear near the top of the list, because the top of a list usually draws more attention.
- precision_at_K: The number of relevant recommendations out of the top K recommendations divided by K. This metric rewards precise recommendation of the relevant items.

### Create and Wait for the Campaign

Now that you have a working solution version you will need to create a campaign to use it with your applications. A campaign is a hosted solution version; an endpoint which you can query for recommendations. Pricing is set by estimating throughput capacity (requests from users for personalization per second). When deploying a campaign, you set a minimum transactions per second (TPS) value (`minProvisionedTPS`). This service, like many within AWS, will automatically scale based on demand, but if latency is critical, you may want to provision ahead for larger demand. For this demo, the minimum throughput threshold is set to 1. For more information, see the [pricing](https://aws.amazon.com/personalize/pricing/) page.

As mentioned above, the user-personalization recipe used for our solution supports automatic exploration of "cold" items. You can control how much exploration is performed when creating your campaign. The `itemExplorationConfig` data type supports `explorationWeight` and `explorationItemAgeCutOff` parameters. Exploration weight determines how frequently recommendations include items with less interactions data or relevance. The closer the value is to 1.0, the more exploration. At zero, no exploration occurs and recommendations are based on current data (relevance). Exploration item age cut-off determines items to be explored based on time frame since latest interaction. Provide the maximum item age, in days since the latest interaction, to define the scope of item exploration. The larger the value, the more items are considered during exploration. For our campaign below, we'll specify an exploration weight of 0.5.

##### Create Campaign

In [None]:
create_campaign_response = personalize.create_campaign(
    name = "personalize-demo-camp",
    solutionVersionArn = solution_version_arn,
    minProvisionedTPS = 1,
    campaignConfig = {
        "itemExplorationConfig": {
            "explorationWeight": "0.5"
        }
    }
)

campaign_arn = create_campaign_response['campaignArn']
print(json.dumps(create_campaign_response, indent=2))

{
  "campaignArn": "arn:aws:personalize:us-east-1:284105231590:campaign/personalize-demo-camp",
  "ResponseMetadata": {
    "RequestId": "6f9fbb78-aa32-40f0-9bbb-fef8ce90d612",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Sun, 20 Jun 2021 03:43:29 GMT",
      "x-amzn-requestid": "6f9fbb78-aa32-40f0-9bbb-fef8ce90d612",
      "content-length": "91",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


##### Wait for Campaign to Have ACTIVE Status

This should take about 10 minutes.

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_campaign_response = personalize.describe_campaign(
        campaignArn = campaign_arn
    )
    status = describe_campaign_response["campaign"]["status"]
    print("Campaign: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

Campaign: CREATE PENDING
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: ACTIVE


### Get Sample Recommendations

After the campaign is active you are ready to get recommendations. First we need to select a random user from the collection. Then we will create a few helper functions for getting movie information to show for recommendations instead of just IDs.

In [None]:
# data = pd.read_csv('./ml-100k/u.data', sep='\t', names=['USER_ID', 'ITEM_ID', 'RATING', 'TIMESTAMP'])
# data.head()

Unnamed: 0,USER_ID,ITEM_ID,RATING,TIMESTAMP
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [None]:
# Getting a random user:
user_id, item_id, _, _ = data.sample().values[0]
print("USER: {}".format(user_id))

USER: 839


In [None]:
# First load items into memory
items = pd.read_csv('./ml-100k/u.item', sep='|', usecols=[0,1], encoding='latin-1', names=['ITEM_ID', 'TITLE'], index_col='ITEM_ID')

def get_movie_title(movie_id):
    """
    Takes in an ID, returns a title
    """
    movie_id = int(movie_id)-1
    return items.iloc[movie_id]['TITLE']


##### Call GetRecommendations

Using the user that you obtained above, the lines below will get recommendations for you and return the list of movies that are recommended.


In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = str(user_id),
)
# Update DF rendering
pd.set_option('display.max_rows', 30)

print("Recommendations for user: ", user_id)

item_list = get_recommendations_response['itemList']

recommendation_list = []

for item in item_list:
    title = get_movie_title(item['itemId'])
    recommendation_list.append(title)
    
recommendations_df = pd.DataFrame(recommendation_list, columns = ['OriginalRecs'])
recommendations_df

Recommendations for user:  839


Unnamed: 0,OriginalRecs
0,Schindler's List (1993)
1,Pulp Fiction (1994)
2,"Usual Suspects, The (1995)"
3,Raiders of the Lost Ark (1981)
4,"Silence of the Lambs, The (1991)"
5,Casablanca (1942)
6,"Fugitive, The (1993)"
7,"Shawshank Redemption, The (1994)"
8,Amadeus (1984)
9,Braveheart (1995)


### Review

Using the codes above you have successfully trained a deep learning model to generate movie recommendations based on prior user behavior. Think about other types of problems where this data is available and what it might look like to build a system like this to offer those recommendations.

Now you are ready to move onto the next notebook.



### Notes for the Next Notebook:

There are a few values you will need for the next notebook, execute the cells below to store them so they can be copied and pasted into the next part of the exercise.

In [None]:
%store campaign_arn

Stored 'campaign_arn' (str)


In [None]:
%store dataset_group_arn

Stored 'dataset_group_arn' (str)


In [None]:
%store solution_version_arn

Stored 'solution_version_arn' (str)


In [None]:
%store solution_arn

Stored 'solution_arn' (str)


In [None]:
%store dataset_arn

Stored 'dataset_arn' (str)


In [None]:
%store campaign_arn

Stored 'campaign_arn' (str)


In [None]:
%store schema_arn

Stored 'schema_arn' (str)


In [None]:
%store bucket

Stored 'bucket' (str)


In [None]:
%store filename

Stored 'filename' (str)


In [None]:
%store role_name

Stored 'role_name' (str)


In [None]:
%store recommendations_df

Stored 'recommendations_df' (DataFrame)


In [None]:
%store user_id

Stored 'user_id' (int64)


## Part 2 - View Campaign and Interactions
> In the first part, you successfully built and deployed a recommendation model using deep learning with Amazon Personalize. This notebook will expand on that and will walk you through adding the ability to react to real time behavior of users. If their intent changes while browsing a movie, you will see revised recommendations based on that behavior. It will also showcase demo code for simulating user behavior selecting movies before the recommendations are returned.

Below we start with just importing libraries that we need to interact with Personalize

In [None]:
# Imports
import boto3
import json
import numpy as np
import pandas as pd
import time
import uuid

The line below will retrieve your shared variables from the first notebook.

In [None]:
%store -r

In [None]:
# Setup and Config
# Recommendations from Event data
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

# Establish a connection to Personalize's Event Streaming
personalize_events = boto3.client(service_name='personalize-events')

### Creating an Event Tracker

Before your recommendation system can respond to real time events you will need an event tracker, the code below will generate one and can be used going forward with this lab. Feel free to name it something more clever.

In [None]:
response = personalize.create_event_tracker(
    name='MovieClickTracker',
    datasetGroupArn=dataset_group_arn
)
print(response['eventTrackerArn'])
print(response['trackingId'])
TRACKING_ID = response['trackingId']

arn:aws:personalize:us-east-1:284105231590:event-tracker/a1b7dce4
f252168d-a73d-467a-b2e7-66071f5d6d78


In [None]:
event_tracker_arn = response['eventTrackerArn']

### Configuring Source Data

Above you'll see your tracking ID and this has been assigned to a variable so no further action is needed by you. The lines below are going to setup the data used for recommendations so you can render the list of movies later.

In [None]:
# First load items into memory
items = pd.read_csv('./ml-100k/u.item', sep='|', usecols=[0,1], encoding='latin-1', names=['ITEM_ID', 'TITLE'], index_col='ITEM_ID')

def get_movie_title(movie_id):
    """
    Takes in an ID, returns a title
    """
    movie_id = int(movie_id)-1
    return items.loc[movie_id]['TITLE']

### Getting Recommendations

First we will render the recommendations again from the previous notebook:

In [None]:
recommendations_df

Unnamed: 0,OriginalRecs
0,Schindler's List (1993)
1,Pulp Fiction (1994)
2,"Usual Suspects, The (1995)"
3,Raiders of the Lost Ark (1981)
4,"Silence of the Lambs, The (1991)"
5,Casablanca (1942)
6,"Fugitive, The (1993)"
7,"Shawshank Redemption, The (1994)"
8,Amadeus (1984)
9,Braveheart (1995)


### Simulating User Behavior

The lines below provide a code sample that simulates a user interacting with a particular item, you will then get recommendations that differ from those when you started.

In [None]:
session_dict = {}

In [None]:
def send_movie_click(USER_ID, ITEM_ID):
    """
    Simulates a click as an event
    to send an event to Amazon Personalize's Event Tracker
    """
    # Configure Session
    try:
        session_ID = session_dict[USER_ID]
    except:
        session_dict[USER_ID] = str(uuid.uuid1())
        session_ID = session_dict[USER_ID]
        
    # Configure Properties:
    event = {
    "itemId": str(ITEM_ID),
    }
    event_json = json.dumps(event)
        
    # Make Call
    personalize_events.put_events(
    trackingId = TRACKING_ID,
    userId= USER_ID,
    sessionId = session_ID,
    eventList = [{
        'sentAt': int(time.time()),
        'eventType': 'EVENT_TYPE',
        'properties': event_json
        }]
)

Immediately below this line will update the tracker as if the user has clicked a particular title.


If the table generated by the cells below does not shift the recommendations simply try another random 3 digit number in the cell above and run both cells again. You'll see a third column generated of recommendations.

In [None]:
# Pick a movie, we will use ID 270 or Gattaca
movie_to_click = 270
movie_title_clicked = get_movie_title(movie_to_click)
send_movie_click(USER_ID=str(user_id), ITEM_ID=movie_to_click)

After executing this block you will see the alterations in the recommendations now that you have event tracking enabled and that you have sent the events to the service.

In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = str(user_id),
)

print("Recommendations for user: ", user_id)

item_list = get_recommendations_response['itemList']

recommendation_list = []

for item in item_list:
    title = get_movie_title(item['itemId'])
    recommendation_list.append(title)
    
new_rec_DF = pd.DataFrame(recommendation_list, columns = [movie_title_clicked])

recommendations_df = recommendations_df.join(new_rec_DF)
recommendations_df

Recommendations for user:  839


Unnamed: 0,OriginalRecs,"Full Monty, The (1997)"
0,Schindler's List (1993),"Professional, The (1994)"
1,Pulp Fiction (1994),unknown
2,"Usual Suspects, The (1995)",Amistad (1997)
3,Raiders of the Lost Ark (1981),Chasing Amy (1997)
4,"Silence of the Lambs, The (1991)",Muppet Treasure Island (1996)
5,Casablanca (1942),"Empire Strikes Back, The (1980)"
6,"Fugitive, The (1993)","River Wild, The (1994)"
7,"Shawshank Redemption, The (1994)",Midnight in the Garden of Good and Evil (1997)
8,Amadeus (1984),"Last of the Mohicans, The (1992)"
9,Braveheart (1995),Marvin's Room (1996)


### Conclusion

You can see now that recommendations are altered by changing the movie that a user interacts with, this system can be modified to any application where users are interacting with a collection of items. These tools are available at any time to pull down and start exploring what is possible with the data you have.

Execute the cell below to store values needed for the cleanup notebook.

Finally when you are ready to remove the items from your account, open the `Cleanup.ipynb` notebook and execute the steps there.


In [None]:
%store event_tracker_arn

Stored 'event_tracker_arn' (str)


## Part 3 - Cleanup the resources
> After building your model you may want to delete your campaign, solutions, and datasets.

In [None]:
# Imports
import boto3
import json
import numpy as np
import pandas as pd
import time

In [None]:
# Configure the SDK to Personalize:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

### Defining the Things to Cleanup

Using the store command we will retrieve all the values needed to cleanup our work.

In [None]:
%store -r

In [None]:
# Delete the campaign:
personalize.delete_campaign(campaignArn=campaign_arn)
time.sleep(60)

In [None]:
# Delete the solution
personalize.delete_solution(solutionArn=solution_arn)
time.sleep(60)

In [None]:
# Delete the event tracker
personalize.delete_event_tracker(eventTrackerArn=event_tracker_arn)
time.sleep(60)

In [None]:
# Delete the interaction dataset
personalize.delete_dataset(datasetArn=dataset_arn)
time.sleep(60)

In [None]:
# Delete the event dataset
event_interactions_dataset_arn = dataset_arn
event_interactions_dataset_arn = event_interactions_dataset_arn.replace("INTERACTIONS", "EVENT_INTERACTIONS")
personalize.delete_dataset(datasetArn=event_interactions_dataset_arn)
time.sleep(60)

In [None]:
# Delete the schema
personalize.delete_schema(schemaArn=schema_arn)

### Empty Your S3 Bucket

Next empty your S3 bucket, you uploaded a movie file to it in the first notebook.


In [None]:
boto3.Session().resource('s3').Bucket(bucket).Object(filename).delete()

### IAM Policy Cleanup

The very last step in the notebooks is to remove the policies that were attached to a role and then to delete it. No changes should need to be made here, just execute the cell.

In [None]:
# IAM policies should also be removed
iam = boto3.client("iam")
iam.detach_role_policy(PolicyArn="arn:aws:iam::aws:policy/AmazonS3FullAccess", RoleName=role_name)
iam.detach_role_policy(PolicyArn="arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess",RoleName=role_name)

iam.delete_role(RoleName=role_name)

#### Last Step

After cleaning up all of the resources you can now close this window and go back to the github page you stareted on. At the bottom of the Readme file are steps to delete the CloudFormation stack you created earlier. Once that is done you are 100% done with the lab.

Congratulations!

## Part 4 - Security best practices (optional)
> We are using a pre-made dataset that hasn't been encrypted so there is no need to decrypt this dataset. However, it would be a good security practice to store your datasets encrypted.

### Get the Personalize boto3 Client

In [None]:
import boto3

import json
import numpy as np
import pandas as pd
import time

personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')
iam = boto3.client("iam")
s3 = boto3.client("s3")

### Specify a Bucket and Data Output Location

In [None]:
bucket = "personalize-demo"       # replace with the name of your S3 bucket
filename = "movie-lens-100k.csv"  # replace with a name that you want to save the dataset under

### Download, Prepare, and Upload Training Data

#### Download and Explore the Dataset

In [None]:
!wget -N http://files.grouplens.org/datasets/movielens/ml-100k.zip
!unzip -o ml-100k.zip
data = pd.read_csv('./ml-100k/u.data', sep='\t', names=['USER_ID', 'ITEM_ID', 'RATING', 'TIMESTAMP'])
pd.set_option('display.max_rows', 5)
data

--2020-05-05 09:38:29--  http://files.grouplens.org/datasets/movielens/ml-100k.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:80... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘ml-100k.zip’ not modified on server. Omitting download.

Archive:  ml-100k.zip
  inflating: ml-100k/allbut.pl       
  inflating: ml-100k/mku.sh          
  inflating: ml-100k/README          
  inflating: ml-100k/u.data          
  inflating: ml-100k/u.genre         
  inflating: ml-100k/u.info          
  inflating: ml-100k/u.item          
  inflating: ml-100k/u.occupation    
  inflating: ml-100k/u.user          
  inflating: ml-100k/u1.base         
  inflating: ml-100k/u1.test         
  inflating: ml-100k/u2.base         
  inflating: ml-100k/u2.test         
  inflating: ml-100k/u3.base         
  inflating: ml-100k/u3.test         
  inflating: ml-100k/u4.base         
  inflat

Unnamed: 0,USER_ID,ITEM_ID,RATING,TIMESTAMP
0,196,242,3,881250949
1,186,302,3,891717742
...,...,...,...,...
99998,13,225,2,882399156
99999,12,203,3,879959583


#### Optional security practice: Protect data at rest - Encrypt/decrypt your dataset
We are using a pre-made dataset that hasn't been encrypted so there is no need to decrypt this dataset. However, it would be a good security practice to store your datasets encrypted.

For more information on encrypting your data when using S3, visit https://docs.aws.amazon.com/AmazonS3/latest/dev/KMSUsingRESTAPI.html

#### Optional security practice: Protect data in transit - SSL access only for S3 bucket

In [None]:
requires_ssl_access_policy = {
    "Version": "2012-10-17",
    "Id": "RequireSSLAccess",
    "Statement": [
        {
            "Sid": "RequireSSLAccess",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "*",
            "Resource": [
                "arn:aws:s3:::{}".format(bucket),
                "arn:aws:s3:::{}/*".format(bucket)
            ],
            "Condition": {
                "Bool": {
                    "aws:SecureTransport": "false"
                }
            }
        }
    ]
}

s3.put_bucket_policy(Bucket=bucket, Policy=json.dumps(requires_ssl_access_policy))

{'ResponseMetadata': {'RequestId': 'FA2BA6A24A738415',
  'HostId': 'PWU2HsPCLALBFzYHzVEUK5EuODQkMTrj2l9IuCs3x+GBDlM24tg2k4BF1fMlnmzCAwy/KRVplQw=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': 'PWU2HsPCLALBFzYHzVEUK5EuODQkMTrj2l9IuCs3x+GBDlM24tg2k4BF1fMlnmzCAwy/KRVplQw=',
   'x-amz-request-id': 'FA2BA6A24A738415',
   'date': 'Tue, 05 May 2020 16:38:34 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}

#### Additional security note:
Some users prevent accidental information disclosure by limiting S3 access to only come from a VPC. Another common security practice is to validate this limited access. It should be noted that this security check will fail when performed against S3 buckets used for Personalize - as Personalize copies data from the user's S3 into the internal systems used by Personalize (during dataset import jobs).

#### Optional security practice: validate bucket owner matches your account canonical id
[More information about canonical ids here](https://docs.aws.amazon.com/general/latest/gr/acct-identifiers.html#FindingCanonicalId)

In [None]:
bucket_owner_id = boto3.client('s3').get_bucket_acl(Bucket=bucket)['Owner']['ID']
print("This bucket belongs to: {} ".format(bucket_owner_id))

This bucket belongs to: 28398dc6b1acac01a4a73b246b5f9c9a688f50b9ce70240f74c0f90ebf5e2c61 


#### Optional security practice: Protect data integrity - Enable S3 bucket versioning


In [None]:
s3_resource = boto3.resource('s3')
bucket_versioning = s3_resource.BucketVersioning(bucket)
bucket_versioning.enable()

{'ResponseMetadata': {'RequestId': '4C3AA32EF96CA240',
  'HostId': 'HQaKmMvUTbXTRsxLL/9BubrfD09xisEO8x72cpQP0syhAH9dXKRx7gqmzchq3TbERdwprUKcRmQ=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'HQaKmMvUTbXTRsxLL/9BubrfD09xisEO8x72cpQP0syhAH9dXKRx7gqmzchq3TbERdwprUKcRmQ=',
   'x-amz-request-id': '4C3AA32EF96CA240',
   'date': 'Mon, 04 May 2020 20:46:23 GMT',
   'content-length': '0',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}

#### Prepare and Upload Data

In [None]:
data = data[data['RATING'] > 3.6]                # keep only movies rated 3.6 and above
data = data[['USER_ID', 'ITEM_ID', 'TIMESTAMP']] # select columns that match the columns in the schema below
data.to_csv(filename, index=False)

boto3.Session().resource('s3').Bucket(bucket).Object(filename).upload_file(filename)

### Create Schema

In [None]:
schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "TIMESTAMP",
            "type": "long"
        }
    ],
    "version": "1.0"
}

create_schema_response = personalize.create_schema(
    name = "DEMO-schema",
    schema = json.dumps(schema)
)

schema_arn = create_schema_response['schemaArn']
print(json.dumps(create_schema_response, indent=2))

{
  "schemaArn": "arn:aws:personalize:us-west-2:237539672711:schema/DEMO-schema", 
  "ResponseMetadata": {
    "RetryAttempts": 0, 
    "HTTPStatusCode": 200, 
    "RequestId": "12eb7cba-2b64-4be9-9f6e-eeebff7629a5", 
    "HTTPHeaders": {
      "date": "Tue, 04 Dec 2018 05:49:04 GMT", 
      "x-amzn-requestid": "12eb7cba-2b64-4be9-9f6e-eeebff7629a5", 
      "content-length": "79", 
      "content-type": "application/x-amz-json-1.1", 
      "connection": "keep-alive"
    }
  }
}


#### Optional security practice - Protect data at rest - Encrypt datasets under Personalize
If you skip this step, do not pass kmsKeyArn or roleArn when you create your dataset group.

In [None]:
kmsKeyArn = boto3.client('kms').create_key(Description="personalize-data")['KeyMetadata']['Arn']
print(kmsKeyArn)
key_accessor_policy_name = "AccessPersonalizeDatasetPolicy"
key_accessor_policy = {
  "Version": "2012-10-17",
  "Statement": {
    "Effect": "Allow",
    "Action": [
      "kms:*"
    ],
    "Resource": [ kmsKeyArn ]
  }
}
key_access_policy = iam.create_policy(
    PolicyName = key_accessor_policy_name,
    PolicyDocument = json.dumps(key_accessor_policy)
)

key_access_role_name = "AccessPersonalizeDatasetRole"
assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "personalize.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
    ]
}

create_role_response = iam.create_role(
    RoleName = key_access_role_name,
    AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
)
iam.attach_role_policy(
    RoleName = create_role_response["Role"]["RoleName"],
    PolicyArn = key_access_policy["Policy"]["Arn"]
)

keyAccessRoleArn = create_role_response["Role"]["Arn"]
print(keyAccessRoleArn)

arn:aws:kms:us-west-2:001513653716:key/f5be82af-a160-4c49-813e-e5448fa95693
arn:aws:iam::001513653716:role/AccessPersonalizeDatasetRole


### Create and Wait for Encrypted Dataset Group

#### Create Encrypted Dataset Group
If you did not create a KMS Key and IAM role from the last step, then do not pass in roleArn or kmsKeyArn.

In [None]:
create_dataset_group_response = personalize.create_dataset_group(
    name = "DEMO-dataset-group",
    roleArn = keyAccessRoleArn,
    kmsKeyArn = kmsKeyArn
)

dataset_group_arn = create_dataset_group_response['datasetGroupArn']
print(json.dumps(create_dataset_group_response, indent=2))

ClientError: An error occurred (AccessDeniedException) when calling the CreateDatasetGroup operation: Cross-account pass role is not allowed.

#### Wait for Dataset Group to Have ACTIVE Status

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = dataset_group_arn
    )
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetGroup: CREATE PENDING
DatasetGroup: CREATE FAILED


### Create Dataset

In [None]:
dataset_type = "INTERACTIONS"
create_dataset_response = personalize.create_dataset(
    name = "DEMO-dataset",
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = schema_arn
)

dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

{
  "ResponseMetadata": {
    "RetryAttempts": 0, 
    "HTTPStatusCode": 200, 
    "RequestId": "29ab75c8-df6e-4807-943f-1b48014181d1", 
    "HTTPHeaders": {
      "date": "Tue, 04 Dec 2018 05:50:19 GMT", 
      "x-amzn-requestid": "29ab75c8-df6e-4807-943f-1b48014181d1", 
      "content-length": "101", 
      "content-type": "application/x-amz-json-1.1", 
      "connection": "keep-alive"
    }
  }, 
  "datasetArn": "arn:aws:personalize:us-west-2:237539672711:dataset/DEMO-dataset-group/INTERACTIONS"
}


### Prepare, Create, and Wait for Dataset Import Job

#### Attach Policy to S3 Bucket

In [None]:
policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket),
                "arn:aws:s3:::{}/*".format(bucket)
            ]
        }
    ]
}

s3.put_bucket_policy(Bucket=bucket, Policy=json.dumps(policy))

#### Create Personalize Role

In [None]:
role_name = "PersonalizeRole"
assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "personalize.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
    ]
}

create_role_response = iam.create_role(
    RoleName = role_name,
    AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
)

# AmazonPersonalizeFullAccess provides access to any S3 bucket with a name that includes "personalize" or "Personalize" 
# if you would like to use a bucket with a different name, please consider creating and attaching a new policy
# that provides read access to your bucket or attaching the AmazonS3ReadOnlyAccess policy to the role
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess"
iam.attach_role_policy(
    RoleName = role_name,
    PolicyArn = policy_arn
)

time.sleep(60) # wait for a minute to allow IAM role policy attachment to propagate

role_arn = create_role_response["Role"]["Arn"]
print(role_arn)

arn:aws:iam::660166145966:role/PersonalizeRole2


#### Create Dataset Import Job

In [None]:
create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "DEMO-dataset-import-job",
    datasetArn = dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, filename)
    },
    roleArn = role_arn
)

dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

{
  "datasetImportJobArn": "arn:aws:personalize:us-west-2:237539672711:dataset-import-job/DEMO-dataset-import-job", 
  "ResponseMetadata": {
    "RetryAttempts": 0, 
    "HTTPStatusCode": 200, 
    "RequestId": "3c77fe8d-d9fe-4ca5-ad03-b18e937acbb3", 
    "HTTPHeaders": {
      "date": "Tue, 04 Dec 2018 05:50:55 GMT", 
      "x-amzn-requestid": "3c77fe8d-d9fe-4ca5-ad03-b18e937acbb3", 
      "content-length": "113", 
      "content-type": "application/x-amz-json-1.1", 
      "connection": "keep-alive"
    }
  }
}


#### Wait for Dataset Import Job to Have ACTIVE Status

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_import_job_arn
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print("DatasetImportJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetImportJob: CREATE PENDING
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: ACTIVE


### Select Recipe

In [None]:
list_recipes_response = personalize.list_recipes()
recipe_arn = "arn:aws:personalize:::recipe/aws-hrnn" # aws-hrnn selected for demo purposes
list_recipes_response

{'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
   'content-length': '1287',
   'content-type': 'application/x-amz-json-1.1',
   'date': 'Fri, 22 Mar 2019 19:28:37 GMT',
   'x-amzn-requestid': '275695de-45b2-4077-82d4-c961ceeaf367'},
  'HTTPStatusCode': 200,
  'RequestId': '275695de-45b2-4077-82d4-c961ceeaf367',
  'RetryAttempts': 0},
 u'recipes': [{u'creationDateTime': datetime.datetime(2018, 11, 25, 16, 0, tzinfo=tzlocal()),
   u'lastUpdatedDateTime': datetime.datetime(1969, 12, 31, 16, 0, tzinfo=tzlocal()),
   u'name': u'aws-hrnn',
   u'recipeArn': u'arn:aws:personalize:::recipe/aws-hrnn',
   u'status': u'ACTIVE'},
  {u'creationDateTime': datetime.datetime(2018, 11, 25, 16, 0, tzinfo=tzlocal()),
   u'lastUpdatedDateTime': datetime.datetime(1969, 12, 31, 16, 0, tzinfo=tzlocal()),
   u'name': u'aws-hrnn-coldstart',
   u'recipeArn': u'arn:aws:personalize:::recipe/aws-hrnn-coldstart',
   u'status': u'ACTIVE'},
  {u'creationDateTime': datetime.datetime(2018, 11, 25, 16,

### Create and Wait for Solution

#### Create Solution

In [None]:
create_solution_response = personalize.create_solution(
    name = "DEMO-solution",
    datasetGroupArn = dataset_group_arn,
    recipeArn = recipe_arn
)

solution_arn = create_solution_response['solutionArn']
print(json.dumps(create_solution_response, indent=2))

{
  "solutionArn": "arn:aws:personalize:us-west-2:237539672711:solution/DEMO-solution", 
  "ResponseMetadata": {
    "RetryAttempts": 0, 
    "HTTPStatusCode": 200, 
    "RequestId": "2042832f-0775-43e2-86de-53a061be1f63", 
    "HTTPHeaders": {
      "date": "Mon, 03 Dec 2018 23:55:17 GMT", 
      "x-amzn-requestid": "2042832f-0775-43e2-86de-53a061be1f63", 
      "content-length": "83", 
      "content-type": "application/x-amz-json-1.1", 
      "connection": "keep-alive"
    }
  }
}


#### Create Solution Version

In [None]:
create_solution_version_response = personalize.create_solution_version(
    solutionArn = solution_arn
)

solution_version_arn = create_solution_version_response['solutionVersionArn']
print(json.dumps(create_solution_version_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-west-2:237539672711:solution/DEMO-solution/702e0792", 
  "ResponseMetadata": {
    "RetryAttempts": 0, 
    "HTTPStatusCode": 200, 
    "RequestId": "2042832f-0775-43e2-86de-53a061be1f65", 
    "HTTPHeaders": {
      "date": "Mon, 03 Dec 2018 23:55:17 GMT", 
      "x-amzn-requestid": "2042832f-0775-43e2-86de-53a061be1f65", 
      "content-length": "90", 
      "content-type": "application/x-amz-json-1.1", 
      "connection": "keep-alive"
    }
  }
}


#### Wait for Solution Version to Have ACTIVE Status

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_solution_version_response = personalize.describe_solution_version(
        solutionVersionArn = solution_version_arn
    )
    status = describe_solution_version_response["solutionVersion"]["status"]
    print("SolutionVersion: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

SolutionVersion: CREATE PENDING
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGR

#### Get Metrics of Solution

In [None]:
get_solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn = solution_version_arn
)

print(json.dumps(get_solution_metrics_response, indent=2))

{
  "metrics": {
    "coverage": 0.2603, 
    "mean_reciprocal_rank_at_25": 0.0539, 
    "normalized_discounted_cumulative_gain_at_5": 0.0486, 
    "normalized_discounted_cumulative_gain_at_10": 0.0649, 
    "normalized_discounted_cumulative_gain_at_25": 0.0918, 
    "precision_at_5": 0.0109, 
    "precision_at_10": 0.0098, 
    "precision_at_25": 0.0083, 
  }, 
  "solutionVersionArn": "arn:aws:personalize:us-west-2:237539672711:solution/DEMO-solution/702e0792", 
  "ResponseMetadata": {
    "RetryAttempts": 0, 
    "HTTPStatusCode": 200, 
    "RequestId": "5b5f4f4f-5249-4c0e-9f83-45e3fe22f09f", 
    "HTTPHeaders": {
      "date": "Tue, 04 Dec 2018 00:53:54 GMT", 
      "x-amzn-requestid": "5b5f4f4f-5249-4c0e-9f83-45e3fe22f09f", 
      "content-length": "724", 
      "content-type": "application/x-amz-json-1.1", 
      "connection": "keep-alive"
    }
  }
}


### Create and Wait for Campaign

#### Create Campaign

In [None]:
create_campaign_response = personalize.create_campaign(
    name = "DEMO-campaign",
    solutionVersionArn = solution_version_arn,
    minProvisionedTPS = 1
)

campaign_arn = create_campaign_response['campaignArn']
print(json.dumps(create_campaign_response, indent=2))

{
  "campaignArn": "arn:aws:personalize:us-west-2:237539672711:campaign/DEMO-campaign", 
  "ResponseMetadata": {
    "RetryAttempts": 0, 
    "HTTPStatusCode": 200, 
    "RequestId": "527e97ba-683c-4dc7-8218-00716f22c904", 
    "HTTPHeaders": {
      "date": "Tue, 04 Dec 2018 00:54:17 GMT", 
      "x-amzn-requestid": "527e97ba-683c-4dc7-8218-00716f22c904", 
      "content-length": "83", 
      "content-type": "application/x-amz-json-1.1", 
      "connection": "keep-alive"
    }
  }
}


#### Wait for Campaign to Have ACTIVE Status

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_campaign_response = personalize.describe_campaign(
        campaignArn = campaign_arn
    )
    status = describe_campaign_response["campaign"]["status"]
    print("Campaign: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

Campaign: CREATE PENDING
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: ACTIVE


### Get Recommendations

#### Select a User and an Item

In [None]:
items = pd.read_csv('./ml-100k/u.item', sep='|', usecols=[0,1], encoding='latin-1')
items.columns = ['ITEM_ID', 'TITLE']

user_id, item_id, _ = data.sample().values[0]
item_title = items.loc[items['ITEM_ID'] == item_id].values[0][-1]
print("USER: {}".format(user_id))
print("ITEM: {}".format(item_title))

items

USER: 711
ITEM: Silence of the Lambs, The (1991)


Unnamed: 0,ITEM_ID,TITLE
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
...,...,...
1680,1681,You So Crazy (1994)
1681,1682,Scream of Stone (Schrei aus Stein) (1991)


#### Call GetRecommendations

In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = str(user_id),
    itemId = str(item_id)
)

item_list = get_recommendations_response['itemList']
title_list = [items.loc[items['ITEM_ID'] == np.int(item['itemId'])].values[0][-1] for item in item_list]

print("Recommendations: {}".format(json.dumps(title_list, indent=2)))

Recommendations: [
  "Godfather, The (1972)", 
  "Contact (1997)", 
  "Titanic (1997)", 
  "Star Wars (1977)", 
  "Fargo (1996)", 
  "Liar Liar (1997)", 
  "Evita (1996)", 
  "Jerry Maguire (1996)", 
  "Scream (1996)", 
  "Devil's Advocate, The (1997)", 
  "Full Monty, The (1997)", 
  "Conspiracy Theory (1997)", 
  "Edge, The (1997)", 
  "Sense and Sensibility (1995)", 
  "English Patient, The (1996)", 
  "Twelve Monkeys (1995)", 
  "L.A. Confidential (1997)", 
  "As Good As It Gets (1997)", 
  "In & Out (1997)", 
  "Rock, The (1996)", 
  "Return of the Jedi (1983)", 
  "Amistad (1997)", 
  "Men in Black (1997)", 
  "Truth About Cats & Dogs, The (1996)", 
  "Alien: Resurrection (1997)"
]
