# Recommender System for a jewelry company

## Capstone Project 2: Springboard Data Science Career Track 

### Notebook by Rupal Gandhre 

#### Introduction:
The jewelry industry has a potential to benefit from data and advanced analytics. Many of the retail industry sectors are already leveraging the benefits. With the current COVID-19 impact, most of the sales have been through ecommerce websites. The Recommender Engines would help customers by providing various other products available, instead of customer browsing by themselves.

#### Goal:
The goal of this project is to build a recommender system for the company. If a customer purchases 'Earring', recommend other jewelry items to purchase, like 'Bracelet', 'Ring', or  'Necklace' that can compliment the ‘Earring’ 

#### The Data:
Data is web-scrapped from one of the leading jewelry brand using BeautifulSoup. I am thankful to the web developers for not implementing a script to block my nuisance of an IP address.



### Exploratory Data Analysis 

#### Import the necessary libraries and the data

In [1]:
#import the nece
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from numpy import nan
import json
import re
import seaborn as sns

#Create label for price
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)

#ignore warning messages to ensure clean outputs
import warnings
warnings.filterwarnings('ignore')

In [2]:
# the supplied CSV data file is the raw_data directory
raw_df = pd.read_csv('/Users/rupalgandhre/SpringBoard/DataScience_Capstone3_Recommender System/data/raw_data/effy_clean_split_data.csv')
#raw_df[raw_df['Stone2_Desc'].isnull()]

In [3]:
df = raw_df.drop(columns=['Item_Number','Metal', 'Metal Color','Discount_Price','Stones',
                      'Stone3_Desc', 'Stone3_Carat', 'Stone3_Stone', 'Stone3_Color','Stone3_Cut',                               
                      'Stone4_Desc', 'Stone4_Carat', 'Stone4_Stone', 'Stone4_Color','Stone4_Cut',
                                  'Stone5_Desc','Stone5_Carat', 'Stone5_Stone', 'Stone5_Color', 'Stone5_Cut',
                                  'Stone6_Desc', 'Stone6_Carat', 'Stone6_Stone', 'Stone6_Color','Stone6_Cut'])

In [4]:
df.columns

Index(['Description', 'Price', 'Jewelry_Type', 'Product_Carat', 'Stone1_Desc',
       'Stone1_Carat', 'Stone1_Stone', 'Stone1_Color', 'Stone1_Cut',
       'Stone2_Desc', 'Stone2_Carat', 'Stone2_Stone', 'Stone2_Color',
       'Stone2_Cut'],
      dtype='object')

In [5]:
df['Stone1_Cut'] = df['Stone1_Cut'].str.replace('round','Round')
df['Stone1_Desc'] = df['Stone1_Desc'].str.replace('round','Round')
df['Stone1_Stone'] = df['Stone1_Stone'].str.replace('Lazuli','Lapis-Lazuli')

df.loc[(df['Stone1_Stone'] == 'Quartz') & 
       (df['Stone1_Cut'] == 'Smokey'), 'Stone1_Cut'] = 'Oval'

df.loc[(df['Stone1_Stone'] == 'Diamond') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'White'

df.loc[(df['Stone1_Stone'] == 'Diamond') & 
             (df['Stone1_Cut'].isnull()),
             'Stone1_Cut'] = 'Round'



df.loc[((df['Stone1_Stone'] == 'Alexandrite') & 
            (df['Stone1_Color'].isnull())),
            'Stone1_Color'] = 'Bluish-Green'

df.loc[(df['Stone1_Stone'] == 'Amethyst') & 
           (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Green'

df.loc[(df['Stone1_Stone'] == 'Aquamarine') & 
             (df['Stone1_Color'].isnull()), 
             'Stone1_Color'] = 'Greenish-Blue'

df.loc[(df['Stone1_Stone'] == 'Citrine') & 
             (df['Stone1_Color'].isnull()), 
             'Stone1_Color'] = 'Yellow'

df.loc[(df['Stone1_Stone'] == 'Emerald') & 
             (df['Stone1_Color'].isnull()), 
             'Stone1_Color'] = 'Green'

df.loc[(df['Stone1_Stone'] == 'Garnet') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Red'

df.loc[(df['Stone1_Stone'] == 'Jade') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Green'


df.loc[(df['Stone1_Stone'] == 'Lapis-Lazuli') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Blue'


df.loc[(df['Stone1_Stone'] == 'Malachite') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Green'



df.loc[(df['Stone1_Stone'] == 'Morganite') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Pink'


df.loc[(df['Stone1_Stone'] == 'Multi-Color') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Multi-Color'


df.loc[(df['Stone1_Stone'] == 'Multi-Sapphire') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Multi-Sapphire'


df.loc[(df['Stone1_Stone'] == 'Onyx') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Black'


df.loc[(df['Stone1_Stone'] == 'Opal') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'White'

df.loc[(df['Stone1_Stone'] == 'Pearl') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'White'

df.loc[(df['Stone1_Stone'] == 'Quartz') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Smokey'


df.loc[(df['Stone1_Stone'] == 'Ruby') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Red'


df.loc[(df['Stone1_Stone'] == 'Sapphire') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Blue'


df.loc[(df['Stone1_Stone'] == 'Sapphire') & 
             (df['Stone1_Cut'].isnull()),
             'Stone1_Cut'] = 'Round'

df.loc[(df['Stone1_Stone'] == 'Tanzanite') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Blue'


df.loc[(df['Stone1_Stone'] == 'Topaz') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Blue'


df.loc[(df['Stone1_Stone'] == 'Turquoise') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Turquoise'

df.loc[(df['Stone1_Stone'] == 'Tsavorite') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Green'

df.loc[(df['Stone1_Stone'] == 'Coral') & 
             (df['Stone1_Color'].isnull()), 
             'Stone1_Color'] = 'Orange'

df.loc[(df['Stone1_Stone'] == 'Tanzanite') & 
             (df['Stone1_Color'].isnull()), 
             'Stone1_Color'] = 'Blue'

df.loc[(df['Stone1_Stone'] == 'Abalone') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Blue-Green'

df.loc[(df['Stone1_Stone'] == 'Tourmaline') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'Blue-Green'


df.loc[(df['Stone1_Desc'] == 'Round Mother of Pearl') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'White'

df.loc[(df['Stone1_Desc'] == 'Round Mother of Pearl') & 
             (df['Stone1_Stone'].isnull()),
             'Stone1_Stone'] = 'Pearl'

df.loc[(df['Stone1_Desc'] == 'Round Mother of Pearl') & 
             (df['Stone1_Cut'].isnull()),
             'Stone1_Cut'] = 'Round'

df.loc[(df['Stone1_Desc'] == 'Multi-Shape Mother of Pearl') & 
             (df['Stone1_Color'].isnull()),
             'Stone1_Color'] = 'White'

df.loc[(df['Stone1_Desc'] == 'Multi-Shape Mother of Pearl') & 
             (df['Stone1_Cut'].isnull()),
             'Stone1_Cut'] = 'Multi-Shape'

df.loc[(df['Stone1_Desc'] == 'Multi-Shape Mother of Pearl') & 
             (df['Stone1_Stone'].isnull()),
             'Stone1_Stone'] = 'Pearl'




In [6]:
#df.drop(df[df['Stone1_Stone']== ' '].index,inplace=True)
df.drop(df[df['Stone2_Stone'] == 'Band'].index, inplace=True)
df.drop(df[df['Stone2_Stone'] == 'Bands'].index, inplace=True)
df.drop(df[df['Stone2_Desc'] == 'Matching band for Style WZ0M486DTR'].index, inplace=True)
df.drop(df[df['Stone2_Desc'] == 'Matching band for Style WZ0P637D26'].index, inplace=True)



In [7]:
df.loc[(df['Stone2_Stone'] == 'Diamond') & 
             (df['Stone2_Color'].isnull()), 
             'Stone2_Color'] = 'White'

df.loc[(df['Stone2_Stone'] == 'Emerald') & 
             (df['Stone2_Color'].isnull()), 
             'Stone2_Color'] = 'Green'

df.loc[(df['Stone2_Stone'] == 'Garnet') & 
             (df['Stone2_Color'].isnull()),
             'Stone2_Color'] = 'Red'

df.loc[(df['Stone2_Stone'] == 'Multi-Sapphire') & 
             (df['Stone2_Color'].isnull()),
             'Stone2_Color'] = 'Multi-Sapphire'

df.loc[(df['Stone2_Stone'] == 'Ruby') & 
             (df['Stone2_Color'].isnull()),
             'Stone2_Color'] = 'Red'


df.loc[(df['Stone2_Stone'] == 'Sapphire') & 
             (df['Stone2_Color'].isnull()),
             'Stone2_Color'] = 'Blue'
df.loc[(df['Stone2_Stone'] == 'Peridot') & 
             (df['Stone2_Color'].isnull()),
             'Stone2_Color'] = 'Green'

df.loc[(df['Stone2_Stone'] == 'Peridot') & 
             (df['Stone2_Cut'].isnull()),
             'Stone2_Cut'] = 'Oval'

df.loc[(df['Stone2_Stone'] == 'Tsavorite') & 
             (df['Stone2_Color'].isnull()),
             'Stone2_Color'] = 'Green'

In [8]:
for index, row in df.iterrows():
    if (row['Stone1_Carat'] == 0 ) & (row['Stone2_Carat'] == 0):
        c = row['Product_Carat'] / 2 
        #print('Divided c: ', c )
        
        df.at[index, 'Stone1_Carat'] = c
        df.at[index, 'Stone2_Carat'] = c
        #print()
 
                                                                       
df['Stone2_Carat'].fillna(0, inplace=True)
df['Stone2_Desc'].fillna('', inplace=True)
df['Stone2_Stone'].fillna('', inplace=True)
df['Stone2_Color'].fillna('', inplace=True)
df['Stone2_Cut'].fillna('', inplace=True)


In [9]:
df.drop(df[df['Stone1_Cut'].isnull()].index, inplace=True)

In [10]:
df.isnull().any()

Description      False
Price            False
Jewelry_Type     False
Product_Carat    False
Stone1_Desc      False
Stone1_Carat     False
Stone1_Stone     False
Stone1_Color     False
Stone1_Cut       False
Stone2_Desc      False
Stone2_Carat     False
Stone2_Stone     False
Stone2_Color     False
Stone2_Cut       False
dtype: bool

# Recommender System


Recommendation system use specialized algorithm and machine learning solutions. There are three main types of recommendation systems-

##### 1. Collaborative Filtering: 
This method is based on gathering and analyzing data on user's behavior and predicting what they will like based on the similarity with other users. 

Example:

         User A --> Item A, Item B, Item C

         User B --> Item A, Item B, Item X
          
So by collaborative filtering method, predictions would be made for
         User A --> Item X
 
         User B --> Item C


##### 2. Content-Based Filtering: 
This method is based on gathering the attributes of a product and user's preferred choices. In this recommendation system, products are described using keywords, and a user profile is builto to express the kind of item this user likes

Example: Movie A (Genre1, Actor1, Director1)

         Movie B (Genre1, Actor2, Director2)
         
         Movie C (Genre2, Actor1, Director3)
         


Say, User A likes Movie A, So by content-based filtering Movie B would be recommended because it is same genre or Movie C would be recommended because it is same actor 
 

##### 3. Hybrid Recommendations: 
This method uses both content-based and collaborative filtering simultaneoulsy to recommend broader range of products to customers.

### Cosine Similarity
Is the most common metric used to measure how similar the documents are irrespective of their size. Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space.

I have used the cosine similarity to compare the products and used the higher value score to recommend the product


### Recommender System for current application

As this application has more textual data and there are no ratings available for any items, content based filtering is used to recommend items to customers based on the attributes of the items themselves.


Following are the 3 recommenders systems:
1. Content based Recomender with TF-IDF
2. Content Based Recomender with CountVectorizer
3. Content Based Recomender with KNN (using TF-IDF features and CountVectorizer features)


In [11]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [12]:
#Function to call the Vectorizer method to calculate on the product description
def create_desc_vector(data, vectorizer_method):
    # Instantiate the vectorizer object to the vectorizer variable
    vectorizer = vectorizer_method
    
    # Fit and transform
    vectorizer_data = vectorizer.fit_transform(data)
    
    # Look at the features generated
    #print(vectorizer.get_feature_names())
    
    # Create Dataframe from vectorized array
    vectorized_df = pd.DataFrame(vectorizer_data.toarray(), columns=vectorizer.get_feature_names())

    # Assign the index
    vectorized_df.index = data
    
    return vectorized_df

#Function to get the cosine similarity of the products  
def get_cosine_similarity(dataframe,column,current_item):  


    # Create the array of cosine similarity values
    cosine_similarity_array = cosine_similarity(dataframe)

    # Wrap the array in a pandas DataFrame
    cosine_similarity_df = pd.DataFrame(cosine_similarity_array, index=dataframe.index, columns=column)

    cosine_similarity_series = cosine_similarity_df.loc[current_item]

    # Sort these values highest to lowest
    ordered_similarities = cosine_similarity_series.sort_values(ascending=False)

    return ordered_similarities



#Function to process the logic to recommend the product
def recommender_engine(new_df,df): 
    

    type_df = pd.DataFrame(df[['Description','Jewelry_Type','Stone1_Desc','Stone2_Desc']])
    final_df = new_df.merge(type_df, on='Description',how='inner' )
    final_df = final_df.sort_values(by='Score',ascending=False)
    
    jewel_types = list(df['Jewelry_Type'].unique())
    jewel_types.remove(current_type)

    recommended_list = list()
    #Scan the recommender list from previous Vectorizers and cosine similarity
    # For each recommended item, check if the Stone1 and Stone2 are of same description
    # If the selected item is Earring, suggest other category item(Bracelets, Rings, Necklace) 
    # with closest cosine similarity 
    # If the Stone1 and Stone2 don't match suggest all other items based on the 
    # closest cosine similarity 
    
    for index, row in final_df.iterrows():
        if row['Jewelry_Type'] in jewel_types:
            if (row['Stone1_Desc'] == current_Stone1_Desc):
                if (row['Stone2_Desc'] == current_Stone2_Desc):
                    recommended_list.append(row['Description'])
                    jewel_types.remove(row['Jewelry_Type'])    


    if recommended_list == [] :
        recommended_list = final_df['Description'][0:5]
    return recommended_list



###### Setting the Current Item to find recommendations

In [41]:
#This item does not have any matching item
#current_item = 'Effy Sunset 14K Yellow Gold Citrine and Diamond Ring, 7.02 TCW'

#This item KNN does not match other methods
#current_item = 'Effy Ruby Royale 14K Rose Gold Ruby and Diamond Bow Ring, 1.61 TCW'

#This item KNN does not match other methods
#current_item = 'Effy Ruby Royale 14K White Gold Ruby and Diamond Ring, 2.12 TCW'

#This item KNN does not match other methods
#current_item = 'Effy Ruby Royale 14K White Gold Ruby and Diamond Hoop Earrings, 1.19 TCW'


current_item = '14K Two Tone Gold Blue and White Diamond Crossover Bangle, 2.00 TCW'

current_type = (df.loc[df['Description'] == current_item]['Jewelry_Type']).values
current_Stone1_Desc = (df.loc[df['Description'] == current_item]['Stone1_Desc']).values
current_Stone2_Desc = (df.loc[df['Description'] == current_item]['Stone2_Desc']).values


### Recommender System using CountVectorizer

In [42]:
# Instantiate the vectorizer object to the vectorizer variable
countvect_df = create_desc_vector(df['Description'],CountVectorizer())

# Get the cosine similarity
countvect_cosine = get_cosine_similarity(countvect_df,df['Description'],current_item)

# Conver the Series to DataFrame
cv_df= countvect_cosine.to_frame()
cv_df.columns = ['Score']
cv_df.reset_index()
    
# Call the recommender engine function to get the recommended list 
recommended_list_countVec = recommender_engine(cv_df,df)

recommended_list_countVec


['14K Two Tone Gold Blue and White Diamond Crossover Ring, 1.00 TCW',
 '14K Two Tone Gold Blue and White Diamond Crossover Pendant, 1.00 TCW',
 '14K 2-Tone Gold Espresso & White Diamond Crossover Hoop Earrings, 1.00 TCW']

### Recommender System using TD-IDF Vectorizer

In [43]:
tdidf_df = create_desc_vector(df['Description'],TfidfVectorizer())

tdidf_cosine = get_cosine_similarity(tdidf_df,df['Description'],current_item)

tv_df= tdidf_cosine.to_frame()
tv_df.columns = ['Score']
tv_df.reset_index()
    
recommended_list_tdidf = recommender_engine(tv_df,df)

recommended_list_tdidf

['14K Two Tone Gold Blue and White Diamond Crossover Ring, 1.00 TCW',
 '14K Two Tone Gold Blue and White Diamond Crossover Pendant, 1.00 TCW',
 '14K 2-Tone Gold Espresso & White Diamond Crossover Hoop Earrings, 1.00 TCW']

### Recommender System using KNN Using TD-IDF features

In [44]:
from sklearn.neighbors import NearestNeighbors

tfidf_vectorizer = TfidfVectorizer()
tfidf_jobid = tfidf_vectorizer.fit_transform((df['Description'])) 

user_tfidf = tfidf_vectorizer.transform(pd.Series(current_item))

n_neighbors = 50
KNN = NearestNeighbors(n_neighbors, p=2)
KNN.fit(tfidf_jobid)
NNs = KNN.kneighbors(user_tfidf, return_distance=True)

top_desc = NNs[1][0][1:]
score = NNs[0][0][1:]


j = 0
knn_df = pd.DataFrame(columns = ['Description', 'Score'])


for i in top_desc:
    knn_df.at[j, 'Description'] = df['Description'][i]
    knn_df.at[j, 'Score'] = score[j]
    j+=1
    
recommended_list_knn_tdidf = recommender_engine(knn_df,df)


recommended_list_knn_tdidf



['Effy Canare 14K 2-Tone Gold Yellow and White Diamond Ring, 0.60 TCW',
 'Effy Canare 14K 2-Tone Gold Yellow and White Diamond Earrings, 0.61 TCW',
 'Effy Canare 14K 2-Tone Gold Yellow and White Diamond Pendant, 0.45 TCW']

### Recommender System using KNN Using CountVectorizer features

In [45]:
from sklearn.neighbors import NearestNeighbors


count_vectorizer = CountVectorizer()
count_jobid = count_vectorizer.fit_transform((df['Description'])) 

user_count = count_vectorizer.transform(pd.Series(current_item))

n_neighbors = 50
KNN = NearestNeighbors(n_neighbors, p=2)
KNN.fit(count_jobid)
count_NNs = KNN.kneighbors(user_count, return_distance=True)

count_top_desc = count_NNs[1][0][1:]
count_score = count_NNs[0][0][1:]


j = 0
count_knn_df = pd.DataFrame(columns = ['Description', 'Score'])


for i in count_top_desc:
    count_knn_df.at[j, 'Description'] = df['Description'][i]
    count_knn_df.at[j, 'Score'] = count_score[j]
    j+=1
    
recommended_list_knn_countvec = recommender_engine(count_knn_df,df)


recommended_list_knn_countvec



['Effy Bridal 14K White Gold Diamond Solitaire Ring, 0.33 TCW',
 'Effy 14K Two Tone Gold Diamond Earrings, 0.41 TCW',
 '14K White and Yellow Gold Diamond Crossover Pendant, 1.00 TCW']


#### Evaluation of the Recommender Systems

In [22]:
##### 1. current_item = 'Effy Ruby Royale 14K Rose Gold Ruby and Diamond Bow Ring, 1.61 TCW'
# For above item KNN does not match other methods

#Labels = ['CountVectorizer', 'TfidfVectorizer', 'KNN with TfidfVectorizer','KNN with CountVectorizer']

#Matrix = pd.DataFrame(list(zip(recommended_list_countVec,
#                               recommended_list_tdidf,
#                               recommended_list_knn_tdidf,
#                               recommended_list_knn_countvec,
#                               
#                              
#                               )),
#                     columns=Labels)
#Matrix


Unnamed: 0,CountVectorizer,TfidfVectorizer,KNN with TfidfVectorizer,KNN with CountVectorizer
0,"Effy Ruby Royale 14K Rose Gold Ruby and Diamond Bow Necklace, 7.25 TCW","Effy Ruby Royale 14K Rose Gold Ruby and Diamond Bow Necklace, 7.25 TCW","Effy Ruby Royale 14K Rose Gold Ruby and Diamond Bow Ring, 1.61 TCW","Effy Ruby Royale 14K White Gold Ruby and Diamond Ring, 1.02 TCW"
1,"Effy Ruby Royale 14K Rose Gold Ruby and Diamond Earrings, 1.90 TCW","Effy Ruby Royale 14K Rose Gold Ruby and Diamond Earrings, 1.90 TCW","Effy Ruby Royale 14K Two Tone Gold Ruby and Diamond Pear Shaped Ring, 0.77 TCW","Effy Ruby Royale 14K Rose Gold Ruby and Diamond Drop Earrings, 1.68 TCW"
2,"Effy Ruby Royale 14K Rose Gold Ruby and Diamond Bangle, 3.93 TCW","Effy Ruby Royale 14K Rose Gold Ruby and Diamond Bangle, 3.93 TCW","Effy Royale Bleu 14K Yellow Gold Sapphire and Diamond Ring, 13.97 TCW","Effy Ruby Royale 14K White Gold Ruby and Diamond Ring, 2.14 TCW"


##### 1. current_item = 'Effy Ruby Royale 14K Rose Gold Ruby and Diamond Bow Ring, 1.61 TCW'
For above item KNN does not match other methods


In [30]:

#### 2. current_item = 'Effy Sunset 14K Yellow Gold Citrine and Diamond Ring, 7.02 TCW'
#This item does not have any matching item each system recommends different products

#Labels = ['CountVectorizer', 'TfidfVectorizer', 'KNN with TfidfVectorizer','KNN with CountVectorizer']

#Matrix = pd.DataFrame(list(zip(recommended_list_countVec,
#                               recommended_list_tdidf,
#                               recommended_list_knn_tdidf,
#                               recommended_list_knn_countvec,                              
#                               )),
#                     columns=Labels)
#Matrix


Unnamed: 0,CountVectorizer,TfidfVectorizer,KNN with TfidfVectorizer,KNN with CountVectorizer
0,"Effy Sunset 14K Yellow Gold Citrine and Diamond Ring, 7.02 TCW","Effy Sunset 14K Yellow Gold Citrine and Diamond Ring, 7.02 TCW","Effy Duo 14K Two Tone Gold Diamond Earrings, 0.49 TCW","Effy Brasilica 14K Yellow Gold Emerald and Diamond Pendant, 2.32 TCW"
1,"Effy Sunset 14K Yellow Gold Citrine and Diamond Ring, 6.72 TCW","Effy Sunset 14K Yellow Gold Citrine and Diamond Ring, 6.49 TCW","Effy Watercolors 14K Yellow Gold Multi Sapphire and Diamond Bracelet, 28.98 TCW","Effy D'Oro 14K Yellow Gold Diamond Links Necklace, 0.80 TCW"
2,"Effy Sunset 14K Yellow Gold Citrine and Diamond Ring, 3.58 TCW","Effy Sunset 14K Yellow Gold Citrine and Diamond Ring, 6.29 TCW","Effy Brasilica 14K Yellow Gold Emerald and Diamond Earrings, 4.62 TCW","Effy Brasilica 14K Yellow Gold Emerald and Diamond Pendant, 2.26 TCW"
3,"Effy Sunset 14K Yellow Gold Citrine and Diamond Ring, 9.31 TCW","Effy Sunset 14K Yellow Gold Citrine and Diamond Ring, 9.06 TCW","Effy Brasilica 14K Yellow Gold Emerald and Diamond Pendant, 2.26 TCW","Effy Aurora 14K Rose Gold Opal and Diamond Pendant, 2.57 TCW"
4,"Effy Sunset 14K Yellow Gold Citrine and Diamond Ring, 9.06 TCW","Effy Sunset 14K Yellow Gold Citrine and Diamond Ring, 6.72 TCW","Effy Brasilica 14K Yellow Gold Emerald and Diamond Earrings, 2.95 TCW","Effy D'Oro 14K Yellow Gold Diamond Maze Earrings, 1.35 TCW"


##### 2. current_item = 'Effy Sunset 14K Yellow Gold Citrine and Diamond Ring, 7.02 TCW'
For above item does not have any matching item each system recommends different products


In [35]:

#### 3. current_item = 'Effy Ruby Royale 14K White Gold Ruby and Diamond Ring, 2.12 TCW'
# For this item CountVectorizer and TfidfVectorizer show same results

#Labels = ['CountVectorizer', 'TfidfVectorizer', 'KNN with TfidfVectorizer','KNN with CountVectorizer']

#Matrix = pd.DataFrame(list(zip(recommended_list_countVec,
#                               recommended_list_tdidf,
#                               recommended_list_knn_tdidf,
#                               recommended_list_knn_countvec,                              
#                               )),
#                     columns=Labels)
#Matrix


Unnamed: 0,CountVectorizer,TfidfVectorizer,KNN with TfidfVectorizer,KNN with CountVectorizer
0,"Effy Ruby Royale 14K White Gold Ruby and Diamond Pendant, 2.23 TCW","Effy Ruby Royale 14K Yellow Gold Ruby and Diamond Bracelet, 12.19 TCW","Effy Ruby Royale 14K Yellow Gold Ruby and Diamond Earrings, 2.06 TCW","Effy Brasilica 14K Yellow Gold Emerald and Diamond Pendant, 2.32 TCW"
1,"Effy Ruby Royale 14K White Gold Ruby and Diamond Earrings, 3.26 TCW","Effy Ruby Royale 14K White Gold Ruby and Diamond Earrings, 3.26 TCW","Effy Ruby Royale 14K White Gold Ruby and Diamond Pendant, 2.23 TCW","Effy D'Oro 14K Yellow Gold Diamond Links Necklace, 0.80 TCW"
2,"Effy Ruby Royale 14K Yellow Gold Ruby and Diamond Bracelet, 12.19 TCW","Effy Ruby Royale 14K White Gold Ruby and Diamond Pendant, 2.23 TCW","Effy Ruby Royale 14K Yellow Gold Ruby and Diamond Bracelet, 12.19 TCW","Effy Brasilica 14K Yellow Gold Emerald and Diamond Pendant, 2.26 TCW"



##### 3. current_item = 'Effy Ruby Royale 14K White Gold Ruby and Diamond Ring, 2.12 TCW'
For above item CountVectorizer and TfidfVectorizer show same results


In [40]:
#### 4. current_item = 'Effy Ruby Royale 14K White Gold Ruby and Diamond Hoop Earrings, 1.19 TCW'
# For this item CountVectorizer and TfidfVectorizer show same results

#Labels = ['CountVectorizer', 'TfidfVectorizer', 'KNN with TfidfVectorizer','KNN with CountVectorizer']

#Matrix = pd.DataFrame(list(zip(recommended_list_countVec,
#                               recommended_list_tdidf,
#                               recommended_list_knn_tdidf,
#                               recommended_list_knn_countvec,                              
#                               )),
#                     columns=Labels)
#Matrix


Unnamed: 0,CountVectorizer,TfidfVectorizer,KNN with TfidfVectorizer,KNN with CountVectorizer
0,"Effy Ruby Royale 14K White Gold Ruby and Diamond Ring, 1.12 TCW","Effy Ruby Royale 14K White Gold Ruby and Diamond Ring, 1.12 TCW","Effy Ruby Royale 14K Rose Gold Ruby & Diamond Crossover Ring, 0.81 TCW","Effy Brasilica 14K Yellow Gold Emerald and Diamond Pendant, 2.32 TCW"
1,"Effy Ruby Royale 14K White Gold Ruby and Diamond Pendant, 1.17 TCW","Effy Ruby Royale 14K White Gold Ruby and Diamond Pendant, 1.17 TCW","Effy Ruby Royale 14K Yellow Gold Ruby and Diamond Pendant, 0.49 TCW","Effy D'Oro 14K Yellow Gold Diamond Links Necklace, 0.80 TCW"
2,"Effy Ruby Royale 14K White Gold Ruby and Diamond Tennis Bracelet, 5.22 TCW","Effy Ruby Royale 14K Rose Gold Ruby and Diamond Bangle, 1.16 TCW","Effy Ruby Royale 14K White Gold Ruby and Diamond Tennis Bracelet, 5.22 TCW","Effy Brasilica 14K Yellow Gold Emerald and Diamond Pendant, 2.26 TCW"


##### 4. current_item = 'Effy Ruby Royale 14K White Gold Ruby and Diamond Hoop Earrings, 1.19 TCW'
For above item CountVectorizer and TfidfVectorizer show same results


In [46]:
#### 5. current_item = '14K Two Tone Gold Blue and White Diamond Crossover Bangle, 2.00 TCW'
# For this item CountVectorizer and TfidfVectorizer show same results


#Labels = ['CountVectorizer', 'TfidfVectorizer', 'KNN with TfidfVectorizer','KNN with CountVectorizer']

#Matrix = pd.DataFrame(list(zip(recommended_list_countVec,
#                               recommended_list_tdidf,
#                               recommended_list_knn_tdidf,
#                               recommended_list_knn_countvec,                              
#                               )),
#                     columns=Labels)
#Matrix


Unnamed: 0,CountVectorizer,TfidfVectorizer,KNN with TfidfVectorizer,KNN with CountVectorizer
0,"14K Two Tone Gold Blue and White Diamond Crossover Ring, 1.00 TCW","14K Two Tone Gold Blue and White Diamond Crossover Ring, 1.00 TCW","Effy Canare 14K 2-Tone Gold Yellow and White Diamond Ring, 0.60 TCW","Effy Bridal 14K White Gold Diamond Solitaire Ring, 0.33 TCW"
1,"14K Two Tone Gold Blue and White Diamond Crossover Pendant, 1.00 TCW","14K Two Tone Gold Blue and White Diamond Crossover Pendant, 1.00 TCW","Effy Canare 14K 2-Tone Gold Yellow and White Diamond Earrings, 0.61 TCW","Effy 14K Two Tone Gold Diamond Earrings, 0.41 TCW"
2,"14K 2-Tone Gold Espresso & White Diamond Crossover Hoop Earrings, 1.00 TCW","14K 2-Tone Gold Espresso & White Diamond Crossover Hoop Earrings, 1.00 TCW","Effy Canare 14K 2-Tone Gold Yellow and White Diamond Pendant, 0.45 TCW","14K White and Yellow Gold Diamond Crossover Pendant, 1.00 TCW"


##### 5.  current_item = '14K Two Tone Gold Blue and White Diamond Crossover Bangle, 2.00 TCW'

For above item CountVectorizer and TfidfVectorizer show same results


### Conclusion

Comparing the above recommender systems. CountVectorizer and TfidfVectorizer recommendation match more often 

CountVectorizer would be the most appropiate one in this case 


### Future Work:

Due to constraints on data being only text format, content based recommender system were used. 

If data for ratings would be available, recommendations using other algorithm can be checked. 