# Introduction

Retailing began with shopkeepers who would welcome in people from the neighborhood and then come to learn their customers’ needs and preferences.

Now in our constantly connected world, devices have become a proxy - but there is still a need to get to know your customers.  Devices provide context, helping us learn what matters to a customer in a particular location and at a particular time. The right message at the right moment is the next level in customer service — it can quickly and easily turn intent into action.

People are constantly looking for product information, deals, local availability and local discounts online — and retailers who aren’t there to supply the right information when people raise their virtual hand will lose out.

The Retail Recommender Solution Accelerator was developed to use intelligent and automated means to provide personalized product recommendations to users based on their purchase history, product selection in the e-commerce channel, or their activity in a physical store.

The architecture and methodology extend to other key areas where personalization can enhance customer engagement - email campaigns, discounts/offers, content suggestions, even application configuration/menu options.   Anywhere choices can be made to provide a more customized experience for your customers.

For this solution accelerator we will use an example of a grocery store where we want to personalize suggestions to customers for items they might want to add to their basket/order based on their preferences and context but also leveraging learnings from other customers and previous purchases.  




# Prerequisites

Confirm the following **5 steps** are complete prior to proceeding through the notebook.

**1.** These notebooks are intended to be used with Azure Machine Learning services, follow these instructions [Get Started with Azure Machine Learning Worskapce](https://docs.microsoft.com/en-us/azure/machine-learning/quickstart-create-resources) to set this up.


**2.** You will need an Azure Storage Account for managing the files used by the Azure services in the Retail Recommender.  You can use an existing account, or follow these instructions to [Create a storage account](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-create?toc=%2Fazure%2Fstorage%2Fblobs%2Ftoc.json&tabs=azure-portal).  **NOTE:** The default settings and locally-redundant storage are sufficient for running this example, but you might want to modify this based on your organization's data policies.  


**3.** Configure the storage account for use with Intelligent Recommendations.  Follow steps 1-3 of [Set up a container, root folder, and log folder](https://docs.microsoft.com/en-us/industry/retail/intelligent-recommendations/deploy-data-lake-storage#set-up-a-container-root-folder-and-log-folder), leaving the default option for the container (Private Access).  You only need to add the root and log folders now, we will add the others later. 

**NOTE:**  In this example we named our root folder "groceries" rather than "ir_root".  If you change this, you will need to update the references in these notebooks.

**TIP:**  Folders in a container can only be created by adding a file and specifying a folder name.  We suggest creating a blank text file, then loading it using the browser/portal or Azure Storage Explorer.  If you use the portal, select to upload the file, then expand the "Advancee" option when you upload it to specifying to create a folder at the same time.  


![image-alt-text](./images/uploadfile_addfolder.png)


**4.** Setup security for the container, following steps 1-5 of [Setup security for the container](https://docs.microsoft.com/en-us/industry/retail/intelligent-recommendations/deploy-data-lake-storage#set-up-security-for-the-container).

**TIP:**  If you do not find "Intelligent Recommendations" in step 2d and 3d, try looking for "Dynamics 365 Recommendations Service", a legacy name for this application, and use this instead.

**5.** Note the Security Key for you Blob Storage account, instructions to obtain a key are [here](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal#view-account-access-keys).

# Setup & Configuration

The following steps will setup the environment we will work in to prepare our sample data files and load them to the storage account created in the prerequisites.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime
%matplotlib inline

import os

The following variables should be set based on the Azure Storage Account you created in the prerequisites.

In [2]:
account_name=os.getenv("BLOB_ACCOUNTNAME", "Your_Storage_Account_Name") # Storage account name
container_name=os.getenv("BLOB_CONTAINER", "Your_Container_Name") # Name of Azure blob container
account_key=os.getenv("BLOB_ACCOUNT_KEY", "Your_Storage_Account_Key") # Storage account access key

# The root folder you created in the container for Intelligent Recommendations
intelligent_recs_path = 'groceries'

# Data Preparation

This notebook takes example grocery purchase data and transforms it into the file formats used for building a recommender system using the Intelligent Recommendations service.  We will assume the orders are collected from a web or mobile application that customers use to order online from a grocery store. The customer interactions we are leveraging in this example is a purchase of an item. 

We want to make suggestions for other items they might want to add to their basket based on their preferences, environmental context (weather, time of day, etc.), along with learning from their purchase history and other customer's purchases.  For example, other customers like you also buy, or when they buy item x,y,z they also buy item a and c.

**Details of the dataset**

The dataset has 38,765 rows of items purchaed by grocery shoppers.  The dataset was sourced from https://www.kaggle.com/datasets/heeraldedhia/groceries-dataset 

It consists of 3 columns:
-   Member_number - an identifier for a customer.
-   Date - the date the member purchased the items, each Member_number + Date combo is an order.
-   itemDescription - the textual description of the item purchased

Customers will have multiple orders on different days, so we get a history of their purchase patterns.


## Import Orders Data

In [None]:
# Setup supporting dataasets
orders_df =pd.read_csv('data/Groceries_dataset.csv')
print(orders_df.shape)
orders_df.head()

In [None]:
orders_df.nunique()

We have 167 customers, and 3898 individual items we will be working with that were purchased multiple times in 728 different orders. 

Setup the dates - Intelligent Recommendations requires this specific format.  Since we don't have time available in our data we will make these 0's.

In [None]:
orders_df['Date'] = orders_df['Date'].apply(pd.to_datetime, format='%d-%m-%Y') 
orders_df['Date'] = orders_df['Date'].dt.strftime('%Y-%m-%d')
orders_df['Date'] = orders_df['Date'] + "T00:00:00.000Z"
orders_df

## Catalog - Items & Variants

Items (and Variants) represent the content, products, offers, etc. that we have available to recommend.  Variants are a subset or grouping of items that inherit from a parent/higher level item.  

[Items & Variants Documentation](https://docs.microsoft.com/en-us/industry/retail/intelligent-recommendations/catalog-data-entity#items-and-variants)

The only mandatory field is the **ItemId**, everything else is optional.  Additional catalog features can be utilized by intelligent recommendations - including item images, availabilities, filters, other groupings/categorization.  See the [Catalog Data Entities Documentation](https://docs.microsoft.com/en-us/industry/retail/intelligent-recommendations/catalog-data-entity) for more details on setting these up.

Example dataset:

|ItemId|ItemVariantId|Title|Description|Release Date
|--|--|--|--|--|
|Item1||||2018-05-15T13:30:00.000Z| 
|Item1|Item1Var1|Black Sunglasses|Black sunglasses for children|| 
|Item1|Item1Var2|Brown Sunglasses|Brown sunglasses for adults|2018-08-01T10:45:00.000Z| 
|Item2||Glasses Cleaning Cloth||2019-09-20T18:45:00.000Z|


For our scenario, we will just setup a catalog of items.   - the products that a customer has purchased.  We will take all the unique itemDescriptions, and use the minimum order date as the "Release Date", and also add an itemId we will reference as the identifer for the item.

In [None]:
products_df = orders_df[['itemDescription','Date']].groupby("itemDescription").min()
products_df.reset_index(inplace=True)
products_df.columns = ['itemDescription', 'earliestDate']

#setup id for each item
products_df.reset_index(inplace=True)
products_df = products_df.rename(columns = {'index':'itemId'})
products_df['itemId'] = products_df['itemId'] + 1

print(products_df.shape)
products_df.head(10)


**Format the dataset for use in Intelligent Recommendations**

In [None]:
ir_reco_item_variants = products_df[['itemId', 'itemDescription', 'earliestDate']]
ir_reco_item_variants['ItemVariantID'] = ''
ir_reco_item_variants['Title'] = ir_reco_item_variants['itemDescription']
ir_reco_item_variants['Description'] = ir_reco_item_variants['itemDescription']  #Using it for both for now, but this could be edited with more detail
ir_reco_item_variants['ReleaseDate'] = ir_reco_item_variants['earliestDate']
ir_reco_item_variants = ir_reco_item_variants[['itemId', 'ItemVariantID', 'Title', 'Description', 'ReleaseDate']]
ir_reco_item_variants


## Interactions

Interactions represent the ways a user interacts with the catalog items. Some examples include transactional interactions (purchases), views (click-through), ratings, or any other action that occurs between a user and an item or item variant.

The mandatory fields are the **InteractionGroupingId** and **ItemId** everything else is optional.

[Interactions Documentation](https://docs.microsoft.com/en-us/industry/retail/intelligent-recommendations/interactions-data-entity#introduction-to-interactions-data-entities)

Example dataset:

|InteractionGroupingId|ItemId|ItemVariantId|UserId|InteractionType|Timestamp|FutureAttribute1|FutureAttribute2|Channel|Catalog|Strength|IsPositive
|--|--|--|--|--|--|--|--|--|--|--|--|
|Interaction100|Item1|Item1Var1|User1|Transaction|2020-05-15T13:30:00.000Z|||Mobile|Europe|.77|True|
|Interaction100|Item2||User1|Transaction|2020-05-15T13:30:00.000Z|||Mobile|Europe|.12|True|
|Interaction101|Item2|||Like|2020-05-01T10:17:00.000Z||||||False|
|Interaction102|Item3||User1|Rating|2020-05-01T13:24:00.000Z|||||4.0|True|


Based on the data we have available, we are choosing to use each purchase as an interaction (so the InteractionType will be set to "Transaction").

The InteractionGroupingId is a way to group interactions that involves multiple items, essentially think of it as a grocery basket that was purchased on that date.  Each purchase date in our dataset is considered a separate order, we will construct an id (called basketId to indicate an individual shopping basket) to use for this.   NOTE: another option with this dataset would be to use the customer (User) as the InteractionGroupingId as well rather than their individaul orders.  

An optional field we will not use is "Strength".  It represents a weighting for the signficance of an interaction.  In the case of a ratings system, this would commonly be the rating value.  In our scenario, with implict feedback (purchase) we could define strength as the number of times a user has purchased this item out of all their orders, in particular if we decided to group interactions by customer.  This is a way to extend the functionality of this example.

For our dataset we won't consider Returns/Exchanges, so all of our interactions will be positive in that the customer chooses to purchase an item.  We could set the "IsPositive" field to False to indicate when they return something they actually may not prefer it, similary for other interaction types - a dislike or a low rating.

**Add ItemId to Interactions**

We derived an ItemId when we created the catalog in the previous section.  Now we will add that to our orders data by joining on the item descriptions.

In [None]:
interactions_df = pd.merge(products_df[['itemId', 'itemDescription']], orders_df, on=["itemDescription"])
interactions_df

**Create an ID for each grocery store visit (based on date)**

This is the field we will use to group mutliple interactions together - our shopping baskets.

In [None]:
#baskets_df = interactions_df.groupby(['Member_number','Date']).agg (totalBasketItems= ('itemId', 'count'))
baskets_df = interactions_df[['Member_number','Date']].groupby(['Member_number','Date']).count()
baskets_df.reset_index(inplace=True)
baskets_df.reset_index(inplace=True)
baskets_df = baskets_df.rename(columns = {'index':'basketId'})
baskets_df['basketId'] = baskets_df['basketId'] + 1
baskets_df

Now add the basketId to the interactions dataset.

In [None]:
interactions_df = pd.merge(baskets_df, interactions_df, on=['Member_number','Date'])
interactions_df

** Setup the Interactions Dataset to be used in Intelligent Recommendations **

In [None]:
ir_reco_interactions = interactions_df[['basketId', 'itemId', 'Member_number', 'Date']]
ir_reco_interactions.head()


In [None]:
ir_reco_interactions['InteractionGroupId'] = ir_reco_interactions['basketId']
ir_reco_interactions['ItemId'] = ir_reco_interactions['itemId']
ir_reco_interactions['ItemVariantId'] = ''
ir_reco_interactions['UserId'] = ir_reco_interactions['Member_number']
ir_reco_interactions['InteractionType'] = 'Transaction'
ir_reco_interactions['TimeStamp'] = ir_reco_interactions['Date']
ir_reco_interactions['FutureAttribute1'] = ''
ir_reco_interactions['FutureAttribute2'] = ''
ir_reco_interactions['Channel'] = ''
ir_reco_interactions['Catalog'] = ''
ir_reco_interactions['Strength'] = ''
ir_reco_interactions['IsPositive'] = 'TRUE'

In [None]:
ir_reco_interactions = ir_reco_interactions.drop(columns=['basketId', 'itemId', 'Member_number', 'Date'])

In [None]:
ir_reco_interactions

# Export Data Files

## Export locally

We will export the product catalog and interactions for use in analyzing the results of the recommendation into our data folder.

**NOTE:** Example output files are included in this repo, but they will be overwritten by the files you are generating.

In [None]:
products_df.to_csv('data/products.csv', index=False, header=True)

In [None]:
interactions_df.to_csv('data/orders.csv', index=False, header=True)

The two formatted files to be used with Intelligent Recommendations will be setup in the same folder structure/naming that is required by the IR service.

In [None]:
output_path = 'output'

In [None]:
# Create a folder to use for the output files for items and variants 
items_path = 'Reco_ItemsAndVariants'
os.makedirs(output_path + '/' + items_path, exist_ok=True)

# Create a folder to use for the output files for interactions
interactions_path = 'Reco_Interactions'
os.makedirs(output_path + '/' + interactions_path, exist_ok=True)

In [None]:
# the files CANNOT have the headers included, as Intelligent Recommendations expects them this way
ir_reco_item_variants.to_csv(output_path + '/' + items_path + '/item_variants.csv', index=False, header=False)

In [None]:
ir_reco_interactions.to_csv(output_path + '/' + interactions_path + '/interactions.csv', index=False, header=False)

Once the files are exported you can donwload them and load into the Azure Storage account container using the portal or Storage Explorer.

Instructions for setting up manually are here: [Configure the root folder](https://docs.microsoft.com/en-us/industry/retail/intelligent-recommendations/deploy-data-lake-storage#download-the-modeljson-file-and-configure-the-root-folder) but use the interactions.csv and item_variants.csv files instead of the two samples provided.     **NOTE:** We do not have image data available, so you will not need to setup the Reco_ItemAndVariantImages folder at this time. 

However, the next steps allow you to access and load the files leveraging a datastore in Azure Machine Learning.

## (Option) Add the datasets to IR Storage through AML

We will leverage the storage used for Intelligent Recommendations within AML.   This allows us to both write out the files we will use for training mnodels, but you could also register the dataasets to share with other users if needed.

[Connect to Storage Services on Azure with datastores](https://docs.microsoft.com/en-us/azure/machine-learning/v1/how-to-access-data)

The first time you use the Azure CLI (command line interface) you will likely need to authenticate.  Follow the directions after executing the next cell - go to the browser link provided and enter the code, then login, and agree to "Are you trying to login as the Azure CLI".  

In [None]:
import azureml.core
from azureml.core import Workspace, Datastore, Dataset

ws = Workspace.from_config()

In [None]:
# Register the storage account you plan to use for Intelligent Recommendations (variables were set in previous steps)

blob_datastore_name='irdatablob' # Name of the datastore in your Azure Machine Learning Workspace

blob_datastore = Datastore.register_azure_blob_container(workspace=ws, 
                                                         datastore_name=blob_datastore_name, 
                                                         container_name=container_name, 
                                                         account_name=account_name,
                                                         account_key=account_key)

In [None]:
# Upload items and variants data
src_path = output_path + '/' + items_path
target_path = intelligent_recs_path + '/' + items_path
blob_datastore.upload(src_dir=src_path, target_path=target_path, overwrite=True)

In [None]:
# Upload interactions data
src_path = output_path + '/' + interactions_path
target_path = intelligent_recs_path + '/' + interactions_path
blob_datastore.upload(src_dir=src_path, target_path=target_path, overwrite=True)

## Add Configuration Files

There are two configuration files required for Intelligent Recommendations that are provided and should not be modified.  Versions of these are included with this repository - but it is recommended you check the documentation and download any updated files required.  

Additional information about the [json.model file](https://docs.microsoft.com/en-us/industry/retail/intelligent-recommendations/deploy-data-lake-storage#download-the-modeljson-file-and-configure-the-root-folder) and [the Reco_Config file](https://docs.microsoft.com/en-us/industry/retail/intelligent-recommendations/deploy-data-lake-storage#download-the-modeljson-file-and-configure-the-root-folder).

The model.json file should be placed in the root folder you are using for the Intelligent Recommendations files ("groceries" in our example) .   The config.csv file should be placed in a folder called "Reco_Config". 


In [None]:
# Upload json.model file from local files (setup in the root data/root folder)
src_path = 'data/root'
target_path = intelligent_recs_path
blob_datastore.upload(src_dir=src_path, target_path=target_path, overwrite=True)

In [None]:
# Upload configuration file from local files (setup in the root data/Reco_Config folder)
src_path = 'data/Reco_Config'
target_path = intelligent_recs_path + '/Reco_Config'
blob_datastore.upload(src_dir=src_path, target_path=target_path, overwrite=True)

## Confirm File Setup

Once you have completed these steps, your storage account should look similar to the following.  

**NOTE:** The .amlignore and .amlignore.amltmp files may appear if you followed the steps above to add them to the AML datastore, they will be ignored by Intelligent Recommendations.  If you uploaded the files using Storage Explorer or your browser you will not see those files.


**Main Folder (groceries) in the root container (ircontainer).**

<img src="./images/ircontainer_root.png" width="500 px"/>

**Reco_Config folder**


![Reco_Config Folder](./images/ircontainer_reco_config.png)



**Reco_ItemsAndVariants folder**

![Reco_Config Folder](./images/ircontainer_reco_itemsandvariants.png)

**Reco_Interactions folder**

![Reco_Config Folder](./images/ircontainer_reco_interactions.png)