# Make a set of Amazon Personalize solution versions and deploy a campaign
Now that the data has been generated, the datasets have been created, and the
data has been imported, we can create Personalize solutions, solution
versions, and campaigns.

<a id='contents' />

## Content Table

1. [Loading libraries and data](#loading)
2. [Create solution and solution versions](#solution)
3. [Solutions versions metrics](#metrics)
4. [Hyperparameter Optimization](#hpo)
5. [Create campaigns](#campaigns)

<a id='loading' />

## Loading libraries and data
[(back to top)](#contents)

In [1]:
account_num = '<YOUR_ACCOUNT_NUMBER>'
import json
import boto3
import time
from tqdm import tqdm_notebook
import pandas as pd
import numpy as np
#We import the metrics functions from the metricsPersonalize.py script.
from metricsPersonalize import mean_reciprocal_rank, ndcg_at_k, precision_at_k

region   = boto3.Session().region_name 
print(region)

personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

In [3]:
dataset_group_name = 'video-dataset-group'
dataset_group_arn = 'arn:aws:personalize:{}:{}:dataset-group/{}'.format(region, account_num, dataset_group_name)

MAX_WAIT_TIME = 60*60 
SLEEP_TIME    = 60    

<a id='solution' />

## Create solution and solution versions
[(back to top)](#contents)

**Definition of important functions:**

This function creates a new solution version in our dataset group based on a specific
Personalize recipe. A trained model is known as a solution version. Each time you train a model, it is assigned a new solution version. 

In [4]:
#Function to verify if the solution already exists
def solution_exists(solution_arn):
    _exists = False
    try:
        _resp   = personalize.describe_solution(solutionArn = solution_arn)
        _exists = True
    except Exception as e:
        pass
        
    return _exists

In [5]:
#If the solution already exists, get the solution version.
def get_existing_solution_version(recipe_arn, name):
    _solution_version_arn = ''
    _solution_arn = 'arn:aws:personalize:{}:{}:solution/{}'.format(region, account_num,name)
    _status = 'UNKNOWN'
    
    try:
        _resp = personalize.describe_solution(solutionArn = _solution_arn)
        _latest_version = _resp['solution']['latestSolutionVersion']
        _status = _latest_version['status']
        if _status in ['CREATE IN_PROGRESS','ACTIVE']:
            _solution_version_arn = _latest_version['solutionVersionArn']
    except Exception as e:
        pass
        
    return _solution_version_arn, _status, _solution_arn

In [6]:
#If campaign is already created: get the campaign ARN and status
def get_existing_campaign(recipe_arn, name):
    _solution_version_arn = ''
    _solution_arn = 'arn:aws:personalize:{}:{}:solution/{}'.format(region, account_num,
                                                                   name)
    _campaign_arn = 'arn:aws:personalize:{}:{}:campaign/{}'.format(region, account_num,
                                                                   name)
    _campaign_status = 'UNKNOWN'
    
    try:
        _resp = personalize.describe_campaign(campaignArn = _campaign_arn)
        _campaign_status = _resp['campaign']['status']
        if _campaign_status in ['CREATE_IN_PROGESS','ACTIVE']:
            _campaign_arn = _resp['campaign']['campaignArn']
            _solution_version_arn = _resp['campaign']['solutionVersionArn']
    except Exception as e:
        _solution_version_arn = ''
        _campaign_arn = ''
        
    return _solution_version_arn, _solution_arn, _campaign_status, _campaign_arn

In [7]:
#Wait for Solution Version to Have Active Status
def wait_for_solution_version(solution_version_arn, name):
    _latest_time = time.time() + MAX_WAIT_TIME
    _first_time_through = True
    
    while time.time() < _latest_time:
        describe_solution_version_response = personalize.describe_solution_version(
            solutionVersionArn = solution_version_arn
        )
        _status = describe_solution_version_response['solutionVersion']['status']

        if _status in ['ACTIVE', 'CREATE FAILED']:
            if _status == 'CREATE FAILED':
                print('*** Solution version creation failed ***')
            break

        print('SolutionVersion: {} - {}...'.format(name, _status))
        time.sleep(SLEEP_TIME)

**Create a single solution version**

In [8]:
#Create a solution version: 
def make_solution_version(recipe_arn, name, dataset_group_arn):
    print('Entered make_solution_version for {}'.format(name))
    
    #Create solution
    create_solution_response = personalize.create_solution(
        name = name,
        datasetGroupArn = dataset_group_arn,
        recipeArn       = recipe_arn
    )
    
    _solution_arn = create_solution_response['solutionArn']
    time.sleep(20)
    
    print('created solution: {}'.format(_solution_arn))


    #Create solution version
    print('Creating a new solution version for solution: {}...'.format(_solution_arn))
    create_solution_version_response = personalize.create_solution_version(
        solutionArn = _solution_arn
    )
    
    _solution_version_arn = create_solution_version_response['solutionVersionArn']

    wait_for_solution_version(_solution_version_arn, name)
        
    print('Exiting wait for solution version for {}'.format(name))
    return(_solution_arn, _solution_version_arn)

**Create multiple solution versions in parallel**

In [9]:
from multiprocessing import Process

def make_solution_versions_in_parallel(recipes, dg_arn):
    jobs = []
    for i in recipes:
        p = Process(target = make_solution_version, args=(i[0], i[1], dg_arn))
        jobs.append(p)
        
    for p in jobs:
        p.start()
        
    for p in jobs:
        p.join()


**Tried recipes:**

| Use it? | Recipe | Description 
|-------- | -------- |:------------
| Y | aws-popularity-count | Calculates popularity of items based on count of events against that item in user-item interactions dataset.
| Y | aws-hrnn | Predicts items a user will interact with. A hierarchical recurrent neural network which can model the temporal order of user-item interactions.
| Y | aws-hrnn-metadata | Predicts items a user will interact with. HRNN with additional features derived from contextual (user-item interaction metadata), user medata (user dataset) and item metadata (item dataset)

In [10]:
recipes = [['arn:aws:personalize:::recipe/aws-popularity-count',       'video-popularity-count'],
            ['arn:aws:personalize:::recipe/aws-hrnn', 'video-hrnn'],
            ['arn:aws:personalize:::recipe/aws-hrnn-metadata',        'video-hrnn-metadata']]

In [11]:
make_solution_versions_in_parallel(recipes, dataset_group_arn)

<a id='metrics' />

## Solutions versions metrics
[(back to top)](#contents)

In [96]:
def display_solution_metrics(recipes):
    print('{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}'.format( 'NDCG@25', 'NDCG@10', 'NDCG@5', 'rank@25', 'prec@25', 'prec@10', 'prec@5','name'))
    
    for c in recipes:
        (_solution_version_arn, _status, _solution_arn) = \
            get_existing_solution_version(c[0], c[1])
        if _status != 'ACTIVE':
            print('Solution version for {} does not exist'.format(c[1]))
        else:
            _get_solution_metrics_response = personalize.get_solution_metrics(
                solutionVersionArn = _solution_version_arn
            )

            print('{:.3f}\t{:.3f}\t{:.3f}\t{:.3f}\t{:.3f}\t{:.3f}\t{:.3f}\t{}'.format(
                                          _get_solution_metrics_response['metrics']['normalized_discounted_cumulative_gain_at_25'],
                                          _get_solution_metrics_response['metrics']['normalized_discounted_cumulative_gain_at_10'],
                                          _get_solution_metrics_response['metrics']['normalized_discounted_cumulative_gain_at_5'],             
                                          _get_solution_metrics_response['metrics']['mean_reciprocal_rank_at_25'],
                                          _get_solution_metrics_response['metrics']['precision_at_25'],
                                          _get_solution_metrics_response['metrics']['precision_at_10'],
                                          _get_solution_metrics_response['metrics']['precision_at_5'],
                                          c[1]
                                         ))

***Metrics documentation***

1. **Precision_at_k**: 
The number of relevant recommendations out of the top K recommendations divided by K.
Recommendations for user 1: A, B, C, D, E (user liked B and E, thus the precision at 5 is 2/5=0.4) 

2. **Mean_reciprocal_rank_at_25**:
The mean of the reciprocal ranks of the first relevant recommendation out of the top 25 recommendations

Example for mean repciprocal rank (5): 
Recommendations for user 1: A, B, C, D, E (user liked B and E, thus the Reciprocal Rank is 1/2)

3. **NDCG: Normalized discount cumulative gain at K**: DCG/ideal DCG 

*DCG (Discounted cumulative gain at K)*: 

Each recommendation is discounted (given a lower weight) by a factor dependent on its position: weighting factor of 1/log(1 + position)
Each **relevant** discounted recommendation in the top K recommendations is summed together.

*Ideal DCG*: 

Value of DCG where top K recommendations are sorted by relevance. Each **relevant** discounted recommendation in the top K recommendations is summed together.


In [None]:
display_solution_metrics(recipes)

<a id='hpo' />

## Hyperparameter optimization
[(back to top)](#contents)

We will be tunning the following allowed hyperparameters for the book-hrnn-metadata recipe, that was the one with best performance

**Rencency mask:**

Determines whether the model should consider the latest popularity trends in the Interactions dataset. Latest popularity trends might include sudden changes in the underlying patterns of interaction events. To train a model that places more weight on recent events, set recency_mask to true. To train a model that equally weighs all past interactions, set recency_mask to false. To get good recommendations using an equal weight, you might need a larger training dataset.

**bptt**:

Determines whether to use the back-propagation through time technique. Back-propagation through time is a technique that updates weights in recurrent neural network-based algorithms. Use bptt for long-term credits to connect delayed rewards to early events. For example, a delayed reward can be a purchase made after several clicks. An early event can be an initial click. Even within the same event types, such as a click, it’s a good idea to consider long-term effects and maximize the total rewards. To consider long-term effects, use larger bptt values. Using a larger bptt value requires larger datasets and more time to process.

In [12]:
#Create solution versions with HPO

def make_solution_version_hpo(recipe_arn, name, dataset_group_arn):
    print('Entered make_solution_version for {}'.format(name))   
    print('Creating new solution for {}...'.format(name))
    
    create_solution_response = personalize.create_solution(
        name = name,
        datasetGroupArn = dataset_group_arn,
        recipeArn       = recipe_arn,
        performHPO = True,
        performAutoML= False
    )
    
    _solution_arn = create_solution_response['solutionArn']
    time.sleep(20)
    print('created solution: {}'.format(_solution_arn))


    print('Creating a new solution version for solution: {}...'.format(_solution_arn))
    create_solution_version_response = personalize.create_solution_version(
        solutionArn = _solution_arn
    )
    
    _solution_version_arn = create_solution_version_response['solutionVersionArn']
    wait_for_solution_version(_solution_version_arn, name)
        
    print('Exiting wait for solution version for {}'.format(name))
    return(_solution_arn, _solution_version_arn)

In [13]:
from multiprocessing import Process

def make_solution_versions_hpo_in_parallel(recipes, dg_arn):
    jobs = []
    for i in recipes:
        p = Process(target = make_solution_version_hpo, args=(i[0], i[1], dg_arn))
        jobs.append(p)
        
    for p in jobs:
        p.start()
        
    for p in jobs:
        p.join()

In [14]:
recipes_hpo = [['arn:aws:personalize:::recipe/aws-hrnn', 'video-hrnn-hpo'],
             ['arn:aws:personalize:::recipe/aws-hrnn-metadata',        'video-hrnn-metadata-hpo']]

In [15]:
make_solution_versions_hpo_in_parallel(recipes_hpo, dataset_group_arn)

In [None]:
display_solution_metrics(recipes+recipes_hpo)

We observe that the solution with the best metrics is the HRNN-metadata with HPO.

<a id='campaigns' />

## Create campaigns
[(back to top)](#contents)

We will create a campaign for the best solution: HRNN-metadata with HPO

In [16]:
#Making a single campaign
def make_campaign(recipe_arn, name):
    print('Entered make_campaign for {}'.format(name))
    
    _solution_version_arn=get_existing_solution_version(recipe_arn, name)[0]
    
    create_campaign_response = personalize.create_campaign(
        name = name,
        solutionVersionArn = _solution_version_arn,
        minProvisionedTPS = 1
    )
    
    _campaign_arn = create_campaign_response['campaignArn']
    print('Waiting for campaign to become active : {}...'.format(name))

    latest_time = time.time() + MAX_WAIT_TIME
    while time.time() < latest_time:
        describe_campaign_response = personalize.describe_campaign(
            campaignArn = _campaign_arn
        )
        status = describe_campaign_response['campaign']['status']
        print('Campaign: {} - {}'.format(name, status))

        if status == 'ACTIVE' or status == 'CREATE FAILED':
            break

        time.sleep(SLEEP_TIME)
        
    print('Exiting make_campaign for {}'.format(name))
    return(_solution_version_arn, _campaign_arn)

In [17]:
campaigns = ['arn:aws:personalize:::recipe/aws-hrnn-metadata', 'video-hrnn-metadata-hpo']

In [18]:
make_campaign('arn:aws:personalize:::recipe/aws-hrnn-metadata', 'video-hrnn-metadata-hpo')