<p style="font-size:25px; color:black;"><u><i><b>Product Recommendations</b></i></u></p>
<p style="font-size:16px; color:#117d30;">
    Product recommendations is a filtering system that predicts and shows the items that a user would likely purchase based on their purchase history.
</p>

<p style="font-size:15px; color:#318f50;">
Note:
</p>
<p style="font-size:15px; color:#117d30;">
 This notebook is written in Scala, and there is interoperability between Scala and Python code.
</p>
<p style="font-size:15px; color:#117d30;">
    <u> Steps: </u>
</p>
<p style="font-size:15px; color:#117d30;">
1) Data is ingested from Azure Synapse Data Warehouse using PySpark.
</p>
<p style="font-size:15px; color:#117d30;">
2) The model is trained using the PySpark ML-Lib recommendations module.
</p>
<p style="font-size:15px; color:#117d30;">
3) Product recommendations are generated for the user.
</p>

## *Connecting to Azure Synapse Data Warehouse*
<p style="font-size:16px; color:#117d30;">
    Connection to Azure Synapse Data Warehouse is initiated and the required data is ingested for processing.
    The warehouse is connected with a single line of code. Just point to actions in a table, click on a new notebook, and then click on "Load to DataFrame".  </p>
   <p style="font-size:16px; color:#117d30;"> After providing the necessary details,  the required data is loaded in the form of a Spark dataframe.
One magical line of code converts a dataframe from Scala to Python!
</p>

In [65]:
%%pyspark
import os
import sys
import pandas as pd 
import numpy as np
import re
import pandas as pd
from IPython.display import display
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window
from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating
from pyspark import SparkContext

import traceback

In [66]:
%%spark
val df = spark.read.sqlanalytics("SQLPool01.dbo.Customer_SalesLatest") 
  df.head(10)
  //Create a Temp view for using the dataframe from Scala to Python
  df.createOrReplaceTempView ("df")

df: org.apache.spark.sql.DataFrame = [customer_id: int, product_id: int ... 3 more fields]
res10: Array[org.apache.spark.sql.Row] = Array([84793,25,White Shoes,1,0], [43603,5,Orange SurfBoard,2,1], [33925,29,Gray with white sole shoes,3,3], [83108,25,White Shoes,2,1], [34904,19,Yellow mature Dutch cheese ,3,3], [49590,21,Cheese circle,1,0], [95886,27,Pink Shoes,1,0], [39979,2,Retro surfboard,3,3], [58533,16,Turkish Lira,2,1], [61681,17,Wood and Cork Coaster,3,3])

In [67]:
print(df)

DataFrame[customer_id: int, product_id: int, product_name: string, total_quantity: int, rating: int]

In [68]:
display(df)

DataFrame[customer_id: int, product_id: int, product_name: string, total_quantity: int, rating: int]

In [69]:
%%pyspark
import pyspark 
print(print(pyspark.__version__)) 

2.4.4.dev0
None

In [70]:
%%pyspark
import os
import sys
import pandas as pd 
import numpy as np
import re
import pandas as pd
from IPython.display import display
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window
from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating
from pyspark import SparkContext

import traceback

In [71]:
%%pyspark
#Calling the dataframe df created in Scala to Python
df = sqlContext.table("df")
# *********************


Customer_data = df.select("customer_id", "product_id", "rating")

_toExplore = df.select("*").toPandas()

unique_users = _toExplore.customer_id.unique()

In [72]:
%%pyspark
display(_toExplore[['customer_id', 'product_id', 'product_name', 'rating']].sample(n=10))

customer_id  product_id                product_name  rating
68971          85233          22                  Wineglass        5
776197         43211          18               French cheese       3
444660         60281           3            Blue Surf Board        1
1392226        22685          25                 White Shoes       5
97619           5450          11             Black corkscrew       1
1181750        69360          13               Brown Coaster       1
1183253          400          20                Cheese chunk       5
81326          12576          26                 Black Shoes       5
454659         45165          29  Gray with white sole shoes       3
1410239        30297           7               Red Surfboard       0

## ***Training the model***
<p style="font-size:16px; color:#117d30;">
    
    The machine learning model used is the recommendation module present in
    pyspark.mllib.
</p>
<p style="font-size:16px; color:#117d30;">
    Using the ALS (alternating least square) method, we can train the model, which takes a list of tuples consisting mainly of "userID", "productID" and "rating".
</p>
<p style="font-size:16px; color:#117d30;">
    The parameters passed in training the model are a list of tuples, no. of iterations, and rank.
</p>
<!-- <p style="font-size:16px; color:#117d30;">
    Rank is the no. of features to use while training the model.
</p> -->


In [73]:
%%pyspark

def train_model():
  """
    Training model for predicting the recommendation on given set of input
  """
  try:
    rank = 5
    numIterations = 10
    print("Training model.........")
    
    model = ALS.train(Customer_data, rank, numIterations, seed=30)
    # model.save(sc, PATH)
    return model
  except:
    traceback.print_exc()
    return "Error while loading model"

In [74]:
%%pyspark
trained_model = train_model()

Training model.........

## *** Loading the model***
<p style="font-size:16px; color:#117d30;">
    Once the model is trained, it is saved on the required path for loading the weights generated after training the model. 
</p>
<p style="font-size:16px; color:#117d30;">
    Using the loaded model, we can generate product recommendations for customers. 
</p>


In [75]:
%%pyspark
def load_model():
  try:
    saved_model = MatrixFactorizationModel.load(sc, PATH)
    return saved_model
  except:
    return "Model not loaded"

## ***Product Recommender***
<p style="font-size:16px; color:#117d30;">
    The "recommend_products" method is a main wrapper function which consists of certain other methods to  recommend items to the user". 
</p>


In [76]:
%%pyspark
def recommend_products(user_id, num):
  """
    Function for recommending products to user
    
    Parameters:
      user_id    : int
      no of product to recommend : int 
  """
  try:
    user_id = int(user_id)
      
    check_user = validate_user(user_id)
    
    if len(check_user) == 0: return "User does not exist"
        #       products = top_products()
        #       return products
        
    data = trained_model.recommendProducts(user_id ,num)
    result = map_products(data)
    return result
  except:
    traceback.print_exc()
    return "Error while recommending product"

## ***Validate user***
<p style="font-size:16px; color:#117d30;">
    The "validate_user" method is used to verify if a particualar user_id exists in the database.
</p>


In [77]:
%%pyspark
def validate_user(user_id):
  """
    Checks if user exist in database
    
    Parameters:
    
      user_id : int
  """
  try:
    if user_id is not None:
      user = df.filter(df.customer_id == user_id).collect()
      return user
    else:
      return "Please pass user_id"
  except:
    traceback.print_exc()
    return "Error"

## ***Verify products***
<p style="font-size:16px; color:#117d30;">
    The "verify_product" method is used for checking if a product exists in the database.
</p>

In [78]:
%%pyspark
def verify_product(product):
  """
    Validating if product exist in database
  """
  try:
    prod = df.filter(df.product_id == product).collect()
    return prod
  except:
    traceback.print_exc()
    return "Error"

## ***Map products***
<p style="font-size:16px; color:#117d30;">
    The "map_products" method is used to map a  product id with a product name.
</p>

In [79]:
%%pyspark
def map_products(data):
  try:
    dataFrame = pd.DataFrame(data)
    dataFrame = dataFrame[['product', 'rating']]
    dataFrame = pd.DataFrame(dataFrame)
    # temp_dict = _toExplore.set_index('product_id').to_dict()['product_name']
    # mapped_prod = dataFrame.replace(temp_dict)
    dataFrame.rename(columns={'product':'Recommended-Products','rating':'Rating'}, inplace=True)
    dataFrame.index.name = None
    return dataFrame.sample(n=5)
  except:
    traceback.print_exc()
    return "Error"


In [80]:
%%pyspark
new = trained_model.recommendProductsForUsers(5)

In [81]:
%%pyspark
data=new.collect()
allproduct=[]
alluser=[]
for  h in data:
    t=str(h)
    res=re.split('[\W]+', t)
    userid=[]
    product=[]
    for w in range(0,len(res)):
        if res[w]=='user':
            userid.append(res[w+1])
        elif res[w]=='product':
            product.append(res[w+1])
    allproduct.append(product)
    alluser.append(userid[0])
recomm_df1=pd.DataFrame(alluser,columns=['userid'])
recomm_df2=pd.DataFrame(allproduct,columns=['Recommendation1','Recommendation2','Recommendation3','Recommendation4','Recommendation5'])

FinalData=pd.concat([recomm_df1,recomm_df2],sort=True,axis=1)
# print(FinalData)
FinalData.head(n=25)

userid Recommendation1       ...       Recommendation4 Recommendation5
0   18624              12       ...                     5               9
1   80704              26       ...                    32              12
2    3456              10       ...                     5              31
3    6400              12       ...                     5              10
4   24384              30       ...                    10              15
5   29696              20       ...                     8              13
6   61696              24       ...                    29               4
7   20160              24       ...                    19              22
8   59200              10       ...                    16               9
9   74816               7       ...                    24               4
10  66112              24       ...                    21               4
11  83904              10       ...                    31              30
12  33920              31       ...      

<p style="font-size:16px; color:#117d30;">
    Finally, call the main function and pass the two parameters "user_id" and "product_id" to generate product recommendations.
</p>


In [82]:
%%pyspark
output = recommend_products(user_id=1533, num=7)
output

Recommended-Products    Rating
4                     5  4.520149
6                    25  4.364865
2                    31  4.875994
5                    16  4.458258
3                    30  4.708153