<p style="font-size:25px; color:black;"><u><i><b>Product Recommendations</b></i></u></p>
<p style="font-size:16px; color:#117d30;">
    Product recommendations is a filtering system that predicts and shows the items that a user would likely purchase based on their purchase history.
</p>

<p style="font-size:15px; color:#318f50;">
Note:
</p>
<p style="font-size:15px; color:#117d30;">
 This notebook is written in Scala, and there is interoperability between Scala and Python code.
</p>
<p style="font-size:15px; color:#117d30;">
    <u> Steps: </u>
</p>
<p style="font-size:15px; color:#117d30;">
1) Data is ingested from Azure Synapse Data Warehouse using PySpark.
</p>
<p style="font-size:15px; color:#117d30;">
2) The model is trained using the PySpark ML-Lib recommendations module.
</p>
<p style="font-size:15px; color:#117d30;">
3) Product recommendations are generated for the user.
</p>

## *Connecting to Azure Synapse Data Warehouse*
<p style="font-size:16px; color:#117d30;">
    Connection to Azure Synapse Data Warehouse is initiated and the required data is ingested for processing.
    The warehouse is connected with a single line of code. Just point to actions in a table, click on a new notebook, and then click on "Load to DataFrame".  </p>
   <p style="font-size:16px; color:#117d30;"> After providing the necessary details,  the required data is loaded in the form of a Spark dataframe.
One magical line of code converts a dataframe from Scala to Python!
</p>

In [3]:
%%pyspark
import os
import sys
import pandas as pd 
import numpy as np
import re
import pandas as pd
from scipy import spatial
from IPython.display import display
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window
from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating
from pyspark import SparkContext

import traceback

In [4]:
%%pyspark
customer_data = spark.read.load('abfss://machine-learning@#DATA_LAKE_NAME#.dfs.core.windows.net/customer-sales-latest.csv'
    ,format='csv'
    ,header=True)
customer_data.show(10)

+-----------+----------+------+--------------------+
|customer_id|product_id|rating|        product_name|
+-----------+----------+------+--------------------+
|       1402|        29|     5|Gray with white s...|
|      51036|         4|     3|     Brown SurfBoard|
|      33662|        23|     5|   Crystal Wineglass|
|      73162|         3|     5|    Blue Surf Board |
|      14164|        17|     5|Wood and Cork Coa...|
|      36731|        30|     5|         Brown Shoes|
|      90545|        14|     1|    Designer Coaster|
|      36574|         2|     5|     Retro surfboard|
|      20246|        26|     3|         Black Shoes|
|      14262|        14|     1|    Designer Coaster|
+-----------+----------+------+--------------------+
only showing top 10 rows

In [5]:
%%pyspark
product_info = customer_data.select('product_id', 'product_name').distinct()
product_info.show(10)

+----------+--------------------+
|product_id|        product_name|
+----------+--------------------+
|        19|Yellow mature Dut...|
|        16|        Turkish Lira|
|        12|        Blue Coaster|
|        25|         White Shoes|
|        31|          Blue Shoes|
|        27|          Pink Shoes|
|        11|     Black corkscrew|
|        28|Black with red so...|
|         8|       Red Corkscrew|
|        29|Gray with white s...|
+----------+--------------------+
only showing top 10 rows

## ***Training the model***
<p style="font-size:16px; color:#117d30;">
    
    The machine learning model used is the recommendation module present in
    pyspark.mllib.
</p>
<p style="font-size:16px; color:#117d30;">
    Using the ALS (alternating least square) method, we can train the model, which takes a list of tuples consisting mainly of "userID", "productID" and "rating".
</p>
<p style="font-size:16px; color:#117d30;">
    The parameters passed in training the model are a list of tuples, no. of iterations, and rank.
</p>
<!-- <p style="font-size:16px; color:#117d30;">
    Rank is the no. of features to use while training the model.
</p> -->


In [6]:
%%pyspark

def train_model():
  """
    Training model for predicting the recommendation on given set of input
  """
  try:
    rank = 5
    numIterations = 10
    print("Training model.........")
    
    model = ALS.train(customer_data.select('customer_id', 'product_id', 'rating'), rank, numIterations, seed=30)
    # model.save(sc, PATH)
    return model
  except:
    traceback.print_exc()
    return "Error while loading model"

In [7]:
%%pyspark
trained_model = train_model()

Training model.........

In [8]:
%%pyspark
def calculate_similarities(product_id, product_vector, threshold):
    similarities = trained_model.productFeatures() \
        .map(lambda products: [product_id, products[0], float(1 - spatial.distance.cosine(products[1], product_vector))]) \
        .filter(lambda x: x[2] >= threshold) \
        .collect()
    return similarities

In [9]:
%%pyspark
product_recommendations = []

for key,value in trained_model.productFeatures().collect():
    product_recommendations += calculate_similarities(key, value, 0.75)

In [10]:
%%pyspark

recommend_df = spark.createDataFrame(product_recommendations, ['ProductId', 'RecommendedProductId', 'Similarity'])

result = recommend_df \
    .join(product_info, recommend_df.ProductId == product_info.product_id, how='inner') \
    .withColumnRenamed('product_name', 'ProductName') \
    .select('ProductId', 'ProductName', 'RecommendedProductId', 'Similarity') \
    .join(product_info, recommend_df.RecommendedProductId == product_info.product_id, how='inner') \
    .withColumnRenamed('product_name', 'RecommendedProductName') \
    .select('ProductId', 'ProductName', 'RecommendedProductId', 'RecommendedProductName', 'Similarity') \
    .orderBy('ProductId')
result.show(100)

+---------+--------------------+--------------------+----------------------+------------------+
|ProductId|         ProductName|RecommendedProductId|RecommendedProductName|        Similarity|
+---------+--------------------+--------------------+----------------------+------------------+
|        2|     Retro surfboard|                   2|       Retro surfboard|               1.0|
|        2|     Retro surfboard|                  15|      Brown Cupholders|0.9053134954610791|
|        2|     Retro surfboard|                  19|  Yellow mature Dut...| 0.823542595663269|
|        2|     Retro surfboard|                  25|           White Shoes|0.8958373087190166|
|        3|    Blue Surf Board |                  27|            Pink Shoes|0.7562873258519358|
|        3|    Blue Surf Board |                  31|            Blue Shoes|0.7836240001945383|
|        3|    Blue Surf Board |                   3|      Blue Surf Board |               1.0|
|        3|    Blue Surf Board |        

In [11]:
result \
    .repartition(1) \
    .write.format('csv') \
    .option("header", "true") \
    .mode("overwrite") \
    .save('abfss://machine-learning@#DATA_LAKE_NAME#.dfs.core.windows.net/product-recommendations.csv')