# **Final Implementaion**:

---
I have Implemented a Class **ProductRecommendationProgram** which helps us make an object of products simillarities using cosine_similarity i.e. the content based relation between different products. 


1.  **Initialization** : During initialization we provide products data and site name .
2.   **Method - Update Matrix** to update the matrix with new product if not listed in products dataset . 
3.   **recommend** -Method to recommend products from site handles both list and not listed products .
4.   **Print Message** a method to print recommended product information and also return Dataframe of product Information .


In [None]:
import pandas as pd 
import numpy as np 

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

from typing import List, Dict

In [None]:
class ProductRecommendationProgram():
  def __init__(self, products_data , site):

        products_info = products_data['product_info']
        #  After that, we use TF-IDF vectorizerthat calculates the TF-IDF score for each product info, word-by-word.
        # Here, we pay particular attention to the arguments we can specify.
        tfidf = TfidfVectorizer(analyzer='word', stop_words='english')
        lyrics_matrix = tfidf.fit_transform(products_info)

        # How do we use this matrix for a recommendation?
        # We now need to calculate the similarity of one lyric to another. We are going to use cosine similarity.
        # We want to calculate the cosine similarity of each item with every other item in the dataset. So we just pass the lyrics_matrix as argument.

        cosine_similarities = cosine_similarity(lyrics_matrix)
        # Once we get the similarities, we'll store in a dictionary the names of the 10 most similar products for each product in our dataset.
        similarities = {}
        for i in range(len(cosine_similarities)):
          # Now we'll sort each element in cosine_similarities and get the indexes of the products. 
          similar_indices = cosine_similarities[i].argsort()[:-10:-1] 
          # After that, we'll store in similarities each name of the 10 most similar product.
          similarities[products_data['product_name'].iloc[i]] = [(cosine_similarities[i][x], products_data['product_name'][x] ,products_data['retail_price'][x],products_data['discounted_price'][x]  ) for x in similar_indices]


        self.matrix_similar = similarities
        self.products_data = products_data
        self.site = site

  def _print_message(self, product_name, recom_product):
        rec_items = len(recom_product)
        dict = {
            'product name':[],
            'retail price':[],
            'discounted price':[],
            'accuracy':[],
        }
        
        print(f'The {rec_items} searched and recommended products for {product_name} from {self.site} are:')
        for i in range(rec_items):
            print(f"Number {i+1}:")
            print(f"{recom_product[i][1]} >> retail_price : {recom_product[i][2]} >> discoundted_price : {recom_product[i][3]} with {round(recom_product[i][0], 3)} similarity score") 
            print("--------------------")
            dict['product name'].append(recom_product[i][1])
            dict['retail price'].append(recom_product[i][2])
            dict['discounted price'].append(recom_product[i][3])
            dict['accuracy'].append(recom_product[i][0])
        df =  pd.DataFrame(dict)
        df.columns = [f'product_name in {self.site}',f'retail price in {self.site}',f'discounted price in {self.site}',f'accuracy in {self.site}']
        return df

        

  def updateMatrix(self,product_name , product_info , retail_price=0 , discounted_price=0):
      unknown_product = {'product_name':product_name,'retail_price':retail_price,'discounted_price':discounted_price,'product_info':product_info}
      self.products_data = self.products_data.append(unknown_product,ignore_index = True)
      products_info = self.products_data['product_info']
      tfidf = TfidfVectorizer(analyzer='word', stop_words='english')
      lyrics_matrix = tfidf.fit_transform(products_info)
      cosine_similarities = cosine_similarity(lyrics_matrix)
      similar_indices = cosine_similarities[-1].argsort()[:-10:-1] 
      # in this we will recommend products simillar to product info provides by user so that's why similar_indices[1:] that means products except the given one 
      self.matrix_similar[self.products_data['product_name'].iloc[-1]] = [(cosine_similarities[-1][x], self.products_data['product_name'][x] ,self.products_data['retail_price'][x],self.products_data['discounted_price'][x]  ) for x in similar_indices][1:]



  def recommend(self, recommendation):
        # Get product to find recommendations for
        product_name = recommendation['product_name']
        # Get number of products to recommend
        number_products = recommendation['number_products']
        # Get any product information regarding it's size , color , branding etc .
        product_info = recommendation['product_info']
        # Get the number of products most similars from matrix similarities
        try :
          recom_product = self.matrix_similar[product_name][:number_products]
          # return the matching products dataframe
          return self._print_message(product_name=product_name, recom_product=recom_product)
        except :
          # if the product is not found in the Dataset then we will return the maching products related to query/product provided by the user
          self.updateMatrix(product_name,product_info)
          recom_product = self.matrix_similar[product_name][:number_products]
          return self._print_message(product_name=product_name, recom_product=recom_product)

# *Datasets*
---
In this task, we will be using transformed datasets prepared by me in **EDA.ipynb**, which contains grouped textual information in one single column from all the columns .

Dataset 1- amazon transformed data.csv

Dataset 2- flipkart transfromed data.csv

datasets can be found in Shack Labs Assignment/Assignment-2/transformed data/

In [None]:
flipkart = pd.read_csv('/content/drive/MyDrive/Shack Labs Assignment/Assignment-2/transformed data/transformed Flipkart Data.csv')
amazon = pd.read_csv('/content/drive/MyDrive/Shack Labs Assignment/Assignment-2/transformed data/transformed Amazon Data.csv',encoding='latin-1')

In [None]:
flipkart.drop('Unnamed: 0',axis=1,inplace=True)

In [None]:
amazon.drop('Unnamed: 0',axis=1,inplace=True)

In [None]:
FK_products = flipkart.drop_duplicates(subset='product_name')
AZ_products = amazon.drop_duplicates(subset='product_name')


Because of the dataset being so big, and many products being repeated with most of the features same I am going to remove it .

In [None]:
FK_products = FK_products.reset_index(drop=True)
AZ_products = AZ_products.reset_index(drop=True)

In [None]:
FK_products.shape

(12676, 4)

In [None]:
AZ_products.shape

(12734, 4)

In [None]:
# Now, instantiate class
# objects for Flipkart product recommendation and Amazon Product recommendation 
flipkart_obj = ProductRecommendationProgram(FK_products,'Flipkart')
amazon_obj = ProductRecommendationProgram(AZ_products,'Amazon')

## Want to try for different products  ?
## Enter the enteries in the below cell regarding : 
1. Product Name e.g. Iphone Cover 
2. Number of Products (less than 10) e.g. 7
3. Product Info (any info like product name , brand , color , size etc .) e.g Iphone Cover black ...

! All the Information is mandatory 

In [None]:
# Then, we are ready to pick a product from the dataset and make a recommendation.
# Here I want User To specify product name , number of products and product information - include product name , brand , color , size etc helps in better prediction
recommendation = {
    "product_name": "T-Shirt",
    "number_products": 4,
    "product_info":"clothing men's T-Shirt Polo Shirt" 
}

In [None]:
fk_data = flipkart_obj.recommend(recommendation)
az_data = amazon_obj.recommend(recommendation)

The 4 searched and recommended products for T-Shirt from Flipkart are:
Number 1:
Candy House Solid Men's Polo Neck T-Shirt >> retail_price : 2499.0 >> discoundted_price : 799.0 with 0.588 similarity score
--------------------
Number 2:
Well on Embroidered Men's Polo Neck T-Shirt >> retail_price : 600.0 >> discoundted_price : 380.0 with 0.572 similarity score
--------------------
Number 3:
Okane Solid Men's Polo T-Shirt >> retail_price : 500.0 >> discoundted_price : 425.0 with 0.542 similarity score
--------------------
Number 4:
Onn Solid Men's Polo Neck T-Shirt >> retail_price : 799.0 >> discoundted_price : 799.0 with 0.531 similarity score
--------------------
The 4 searched and recommended products for T-Shirt from Amazon are:
Number 1:
Candy House Solid Men's Polo Neck T-Shirt >> retail_price : 2498 >> discoundted_price : 971 with 0.588 similarity score
--------------------
Number 2:
Well on Embroidered Men's Polo Neck T-Shirt >> retail_price : 593 >> discoundted_price : 465 with 0

In [None]:
fk_data

Unnamed: 0,product_name in Flipkart,retail price in Flipkart,discounted price in Flipkart,accuracy in Flipkart
0,Candy House Solid Men's Polo Neck T-Shirt,2499.0,799.0,0.587695
1,Well on Embroidered Men's Polo Neck T-Shirt,600.0,380.0,0.57184
2,Okane Solid Men's Polo T-Shirt,500.0,425.0,0.541853
3,Onn Solid Men's Polo Neck T-Shirt,799.0,799.0,0.531372


In [None]:
az_data

Unnamed: 0,product_name in Amazon,retail price in Amazon,discounted price in Amazon,accuracy in Amazon
0,Candy House Solid Men's Polo Neck T-Shirt,2498,971,0.58789
1,Well on Embroidered Men's Polo Neck T-Shirt,593,465,0.572049
2,Okane Solid Men's Polo T-Shirt,490,504,0.541956
3,Onn Solid Men's Polo Neck T-Shirt,789,933,0.531481


# Result : Final Dataset for Polo T-Shirt

In [None]:
final_data = pd.concat([fk_data,az_data],axis=1)
final_data.head()

Unnamed: 0,product_name in Flipkart,retail price in Flipkart,discounted price in Flipkart,accuracy in Flipkart,product_name in Amazon,retail price in Amazon,discounted price in Amazon,accuracy in Amazon
0,Candy House Solid Men's Polo Neck T-Shirt,2499.0,799.0,0.587695,Candy House Solid Men's Polo Neck T-Shirt,2498,971,0.58789
1,Well on Embroidered Men's Polo Neck T-Shirt,600.0,380.0,0.57184,Well on Embroidered Men's Polo Neck T-Shirt,593,465,0.572049
2,Okane Solid Men's Polo T-Shirt,500.0,425.0,0.541853,Okane Solid Men's Polo T-Shirt,490,504,0.541956
3,Onn Solid Men's Polo Neck T-Shirt,799.0,799.0,0.531372,Onn Solid Men's Polo Neck T-Shirt,789,933,0.531481


In [None]:
recommendation1 = {
    "product_name": "Alisha Solid Women's Cycling Shorts",
    "number_products": 4,
    "product_info":"Alisha Solid Women's Cycling Shorts" 
}

In [None]:
fk_data = flipkart_obj.recommend(recommendation1)
az_data = amazon_obj.recommend(recommendation1)

The 4 searched and recommended products for Alisha Solid Women's Cycling Shorts from Flipkart are:
Number 1:
Alisha Solid Women's Cycling Shorts >> retail_price : 999.0 >> discoundted_price : 379.0 with 1.0 similarity score
--------------------
Number 2:
Mynte Solid Women's Cycling Shorts, Gym Shorts, Swim Shorts >> retail_price : 1499.0 >> discoundted_price : 649.0 with 0.558 similarity score
--------------------
Number 3:
Ashdan Solid Women's Basic Shorts >> retail_price : 999.0 >> discoundted_price : 549.0 with 0.438 similarity score
--------------------
Number 4:
Only Printed Women's Purple Basic Shorts >> retail_price : 1295.0 >> discoundted_price : 1295.0 with 0.43 similarity score
--------------------
The 4 searched and recommended products for Alisha Solid Women's Cycling Shorts from Amazon are:
Number 1:
Alisha Solid Women's Cycling Shorts >> retail_price : 982 >> discoundted_price : 438 with 1.0 similarity score
--------------------
Number 2:
ALISHA SOLID WOMEN'S CYCLING Shor

## Result : Final Dataset for product Alisha Solid Women's Cycling Shorts

In [None]:
final_data1 = pd.concat([fk_data,az_data],axis=1)
final_data1.head()

Unnamed: 0,product_name in Flipkart,retail price in Flipkart,discounted price in Flipkart,accuracy in Flipkart,product_name in Amazon,retail price in Amazon,discounted price in Amazon,accuracy in Amazon
0,Alisha Solid Women's Cycling Shorts,999.0,379.0,1.0,Alisha Solid Women's Cycling Shorts,982,438,1.0
1,"Mynte Solid Women's Cycling Shorts, Gym Shorts...",1499.0,649.0,0.558492,ALISHA SOLID WOMEN'S CYCLING ShorTS,-2,0,0.90088
2,Ashdan Solid Women's Basic Shorts,999.0,549.0,0.438338,"Mynte Solid Women's Cycling Shorts, Gym Shorts...",1489,748,0.557657
3,Only Printed Women's Purple Basic Shorts,1295.0,1295.0,0.429641,Ashdan Solid Women's Basic Shorts,998,706,0.443099
