# Santander Bank Recommender System Demo
## Creating and Deploying AWS Personalize Campaign

This notebook demonstrates the process of creating and deploying an AWS Personalize campaign using the Santander dataset we prepared earlier. We'll go through the following steps:

1. Set up the AWS Personalize client
2. Create a dataset group
3. Define schemas and create datasets
4. Create and start import jobs for each dataset
5. Create a solution (train a model)
6. Create a campaign (deploy the model)

This process will result in a deployed recommendation model that we can use to generate personalized product recommendations for Santander Bank customers.

## Setup and Initialization

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
from io import StringIO
import boto3
import json
import time
import os

from sklearn.model_selection import train_test_split
from tqdm import tqdm

In [2]:
# Initialize the Personalize client
personalize = boto3.client('personalize')

In [3]:
# Function to wait for a resource to be in the specified state
def wait_for_resource(resource_arn, desired_status):
    while True:
        response = personalize.describe_dataset_group(datasetGroupArn=resource_arn)
        status = response['datasetGroup']['status']
        if status == desired_status:
            print(f"Resource {resource_arn} is now in {desired_status} state")
            break
        elif status == 'CREATE FAILED':
            print(f"Resource {resource_arn} creation failed")
            break
        print(f"Resource {resource_arn} is in {status} state. Waiting...")
        time.sleep(60)  # Wait for 60 seconds before checking again

## Step 1: Create a Dataset Group

A dataset group is a container for Amazon Personalize components, including datasets, event trackers, solutions, filters, campaigns, and batch inference jobs.

In [4]:
dataset_group_name = "SantanderRecommender2"
response = personalize.create_dataset_group(name=dataset_group_name)
dataset_group_arn = response['datasetGroupArn']
wait_for_resource(dataset_group_arn, 'ACTIVE')

Resource arn:aws:personalize:us-east-1:279988746206:dataset-group/SantanderRecommender2 is in CREATE PENDING state. Waiting...
Resource arn:aws:personalize:us-east-1:279988746206:dataset-group/SantanderRecommender2 is now in ACTIVE state


## Step 2: Define Schemas and Create Datasets

We need to define schemas for our Users, Items, and Interactions datasets. These schemas should match the structure of the CSV files we prepared earlier.

In [5]:
schemas = {
    'Users': {
        "name": "UserSchema",
        "schema": json.dumps({
            "type": "record",
            "name": "Users",
            "namespace": "com.amazonaws.personalize.schema",
            "fields": [
                {"name": "USER_ID", "type": "string"},
                {"name": "AGE", "type": "int"},
                {"name": "CUSTOMER_TENURE", "type": "int"},
                {"name": "INCOME", "type": "float"}
            ],
            "version": "1.0"
        })
    },
    'Items': {
        "name": "ItemSchema",
        "schema": json.dumps({
            "type": "record",
            "name": "Items",
            "namespace": "com.amazonaws.personalize.schema",
            "fields": [
                {"name": "ITEM_ID", "type": "string"},
                {
                    "name": "PRODUCT_DESCRIPTION",
                    "type": "string",
                    "categorical": True
                }
            ],
            "version": "1.0"
        })
    },
    'Interactions': {
        "name": "InteractionSchema",
        "schema": json.dumps({
            "type": "record",
            "name": "Interactions",
            "namespace": "com.amazonaws.personalize.schema",
            "fields": [
                {"name": "USER_ID", "type": "string"},
                {"name": "ITEM_ID", "type": "string"},
                {"name": "TIMESTAMP", "type": "long"},
                {"name": "EVENT_TYPE", "type": "string"}
            ],
            "version": "1.0"
        })
    }
}

In [None]:
datasets = {}
for dataset_type, schema_info in schemas.items():
    try:
        print(f"Creating a schema for {dataset_type}")
        create_schema_response = personalize.create_schema(
            name=schema_info['name'],
            schema=schema_info['schema']
        )
        schema_arn = create_schema_response['schemaArn']
    
        # Create dataset
        create_dataset_response = personalize.create_dataset(
            name=f"{dataset_type.capitalize()}Dataset",
            schemaArn=schema_arn,
            datasetGroupArn=dataset_group_arn,
            datasetType=dataset_type.upper()
        )
        datasets[dataset_type] = create_dataset_response['datasetArn']
    except Exception as e:
        print(f"Error creating schema or dataset for {dataset_type}: {e}")

## Step 3: Create and Start Import Jobs for Each Dataset

Now that we have created our datasets, we need to import the data from our S3 bucket into these datasets.

In [7]:
bucket_name = 'souhail-work-bucket-1'
folder = 'personalize-data'
iam_role_arn = 'arn:aws:iam::279988746206:role/PersonalizeServiceRole'

for dataset_type, dataset_arn in datasets.items():
    try:
        print(f"Creating dataset import job for {dataset_type}")
        job_name = f"{dataset_type.capitalize()}ImportJob"
        data_location = f"s3://{bucket_name}/{folder}/{dataset_type}.csv"
        
        response = personalize.create_dataset_import_job(
            jobName=job_name,
            datasetArn=dataset_arn,
            dataSource={'dataLocation': data_location},
            roleArn=iam_role_arn
        )
        # Note: In a production environment, you should wait for each import job to complete
        # before proceeding to the next step
    except Exception as e:
        print(f"Error creating import job for {dataset_type}: {e}")

Creating dataset import job for  Users
Creating dataset import job for  Items
Creating dataset import job for  Interactions


## Step 4: Create a Solution (Train a Model)

A solution is the term used in Amazon Personalize for a trained model. We'll use the User-Personalization recipe, which is suitable for generating personalized recommendations.

In [8]:
solution_name = "SantanderSolutionv2"
recipe_arn = "arn:aws:personalize:::recipe/aws-user-personalization-v2"  # User-Personalization recipe

create_solution_response = personalize.create_solution(
    name=solution_name,
    datasetGroupArn=dataset_group_arn,
    recipeArn=recipe_arn
)
solution_arn = create_solution_response['solutionArn']

In [9]:
# Create a solution version (train the model)
create_solution_version_response = personalize.create_solution_version(solutionArn=solution_arn)
solution_version_arn = create_solution_version_response['solutionVersionArn']

# Note: Training a model can take a significant amount of time. In a production environment,
# we should implement a waiting mechanism to check when the solution version is ready.

## Step 5: Create a Campaign (Deploy the Model)

Finally, we'll create a campaign, which deploys our trained model and makes it available for generating recommendations.

In [10]:
campaign_name = "SantanderCampaign"
create_campaign_response = personalize.create_campaign(
    name=campaign_name,
    solutionVersionArn=solution_version_arn,
    minProvisionedTPS=1  # Minimum number of transactions per second that the campaign can support
)
campaign_arn = create_campaign_response['campaignArn']

In [11]:
print(f"Campaign created successfully. Campaign ARN: {campaign_arn}")

Campaign created successfully. Campaign ARN: arn:aws:personalize:us-east-1:279988746206:campaign/SantanderCampaign


## Conclusion

We have successfully created and deployed an AWS Personalize campaign for the Santander Bank recommender system. Here's a summary of what we've accomplished:

1. Created a dataset group
2. Defined schemas and created datasets for Users, Items, and Interactions
3. Imported our prepared data into these datasets
4. Created a solution (trained model) using the User-Personalization recipe
5. Deployed the model as a campaign

Remember to keep an eye on your AWS usage and costs, especially when working with large datasets or high-traffic recommendation scenarios.