# Recommender system - Content Based

This first recommender system has been possible thanks to the adaptation of the code and support of Firat Soydinc.

In [13]:
import pandas as pd 
from sklearn.metrics.pairwise import linear_kernel
from sklearn.feature_extraction.text import TfidfVectorizer

In [14]:
df = pd.read_csv(r'final.csv')

In [15]:
df.columns


Index(['Unnamed: 0', 'Product name', 'Description', 'Summary', 'Price',
       'Ingredients', 'Quantity', 'key word'],
      dtype='object')

___

### Brief data cleaning to remove the row with empty cells.

In [16]:
df = df.dropna()

### Application of Cosine Similarity 

To measure distances between the products' 'Summary' variables' vectors. This will make sure we obtain recommendations of products with a simmilar 'Summary' content. The variable 'Summary' was chosen because it was the most complete piece of information of the products to develop this first RS.

In [26]:

from sklearn.metrics.pairwise import linear_kernel
from sklearn.feature_extraction.text import TfidfVectorizer

tf = TfidfVectorizer(analyzer='word',
                     ngram_range=(1, 2),
                     min_df=0,
                     stop_words='english')

tfidf_matrix = tf.fit_transform(df['Summary'])
#%%

cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

#%%
titles = df[['Product name', 'Description', 'Summary', 'Price','Ingredients', 'Quantity', 'key word']]
indices = pd.Series(df.index, index=df['Product name'])


#%%
def get_content_recommendations(title):
    try:
        # handle case in which similar products are on the dataset
        idx = indices[title][0]
    except IndexError:
        idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    ## The second number represents how many recommendations I'll get at the end, in this case 15.
    sim_scores = sim_scores[1:15]
    product_indices = [i[0] for i in sim_scores]
    return products.iloc[product_indices]



In [27]:
index = 17
name = df.iloc[index]['Product name']
price = df.iloc[index].Price
quantity = df.iloc[index].Quantity
k_word = df.iloc[index]['key word']
print("Name:", name, "\nPrice:", price, "\nQuantity:", quantity, "\nk_word:", k_word)

Name: AH Schnitzel Hongaarse stijl 
Price: 3.79 
Quantity: 2 stuks 
k_word: Meat


___

### Change product name here to obtain recommendations!

The name of the products must obviously be taken from the dataframe. As mentioned before, this first RS is suggesting products related to the product selected 'Summary' variable. However, it is not taking into account the diet the product relates to. This is the first iteration, the creation of the working RS per se.

In [28]:
## This recommendation is based on the summary. Now we need to add constraints, 'key words' for the diets.

### To make it work, input product names from the 'final.csv' file.

get_content_recommendations('AH Rundergehakt')

Unnamed: 0,Product name,Description,Summary,Price,Ingredients,Quantity,key word
32,AH Rundergehakt,RundergehaktVerpakt onder beschermende atmosfeer.,Rundergehakt om eindeloos mee te varieren. Nat...,8.49,1 Kilogram,1000 g,Meat
2,AH Rundergehakt,RundergehaktVerpakt onder beschermende atmosfeer.,Rundergehakt om eindeloos mee te varieren. Nat...,3.49,300 Gram,300 g,Meat
11,AH Mager rundergehakt,Mager rundergehaktVerpakt onder beschermende a...,Mager rundergehakt met minder vet om eindeloos...,3.59,300 Gram,300 g,Meat
29,AH Mager rundergehakt,Mager rundergehaktVerpakt onder beschermende a...,Mager rundergehakt met minder vet om eindeloos...,5.25,500 Gram,500 g,Meat
9,AH Half-om-half gehakt,Half-om-half gehaktVerpakt onder beschermende ...,Half-om-half gehakt om eindeloos mee te varier...,3.79,500 Gram,500 g,Meat
25,AH Half-om-half gehakt,Half-om-half gehaktVerpakt onder beschermende ...,Half-om-half gehakt om eindeloos mee te varier...,2.49,300 Gram,300 g,Meat
63,AH Half-om-half gehakt,Half-om-half gehaktVerpakt onder beschermende ...,Half-om-half gehakt om eindeloos mee te varier...,6.49,1 Kilogram,1 kg,Meat
62,AH Biologisch Bio hoh gehakt 300 gram,Half-om-half gehaktSkal 001920\nEU-bio-logo\nN...,"Half om half gehakt van heerlijk mals, biologi...",3.89,300 Gram,300 g,Meat
10,AH Biologisch Rundergehakt,RundergehaktSkal 001920\nEU-bio-logo\nNL-BIO-0...,"Rundergehakt van mals, biologisch rundvlees. O...",3.99,300 Gram,300 g,Meat
26,AH Biologisch Rundergehakt,RundergehaktSkal 001920\nEU-bio-logo\nNL-BIO-0...,"Rundergehakt van mals, biologisch rundvlees. O...",6.49,500 Gram,500 g,Meat
