# Recommendation system
The two systems I evaluated are collaborative filtering and content based filtering.
Maybe for perfums a collaborative filtering would be a better idea, but lacking data about users' buying patterns the only alternative was to use a content based filtering system.

In this version of the system I used only one feature: the main accords. Perfumes with identical accords can be different enough because the weights of the accords are different.
The data is filtered by gender.
In future versions there will be more filters based on seasons, sillage, longevity, etc...

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

In [2]:
df = pd.read_csv("data_clean.csv", sep = ";")
df.shape

(24447, 15)

All unpopular/unknown perfumes are deleted to improve performance

In [3]:
#filtering the dataframe to delete unpopular/unknown perfumes (and to improve performance)
df = df.loc[df["voters"]>10]
df.shape

(9366, 15)

The dataframe is then filtered by gender

In [4]:
#filtering the dataframe based on gender: 0 for men, 1 for women, 2 for unisex, 3 for all
def filtering(data, gender):
    data = data[["name", "main_accords", "gender"]]
    if gender == 0:
        data = data.loc[data['gender'] == "men"]
    if gender == 1:
        data = data.loc[data['gender'] == "women"]
    if gender == 2:
        data = data.loc[data['gender'] == "unisex"]
    if gender == 3:
        pass
    del data["gender"]
    return data

gender_select = int(input("Select one gender:\n1)Men\n2)Women\n3)Unisex\n4)All perfumes\n"))
df = filtering(df, gender_select)


Select one gender:
1)Men
2)Women
3)Unisex
4)All perfumes
3


In [5]:
#preparing the text inside main_accords
for index, row in df.iterrows():
    df.at[index,"main_accords"] = row["main_accords"].replace(" ", "_").replace(",", " ")
df.reset_index(drop=True, inplace= True)

Now a TF-IDF Vectorizer is created, and each accord is stored in an n-dimensional space and the cosine similarities method is used to determine the similarity between two accords.

In [2]:
#using cosine similarities to find perfumes with similar notes
#creating TF-IDF vectorizer:
tfidf = TfidfVectorizer(analyzer='word', ngram_range=(1, 1), min_df=0)
tfidf_matrix = tfidf .fit_transform(df['main_accords'])

cosine = linear_kernel(tfidf_matrix, tfidf_matrix)
results = {}
for index, row in df.iterrows():
    sim_indices = cosine[index].argsort()[:-100:-1] 
    sim_items = [(cosine[index][i], df['name'][i]) for i in sim_indices] 
    results[row['name']] = sim_items[1:]

NameError: name 'df' is not defined

In [7]:
def recommend(item_id, num):
    print(str(num) + " perfumes similar to " + item_id + ":")
    print("-------")
    recs = results[item_id][:num]
    for rec in recs:
        score_percent = int(rec[0]*100)
        print(rec[1] + " (score:" + str(score_percent) + "%)")


The recommend function takes the name of a perfumes as input, and prints n perfumes similar to it. A perfume with 100% score has the same accords as the input, although the weights of the accords can differ.

In [8]:
recommend(item_id="Reaction Kenneth Cole for men", num=20)

20 perfumes similar to Reaction Kenneth Cole for men:
-------
Teatro Olfattivo Di Parma: Mangiami Dopo Teatro Hilde Soliani for women and men (score:100%)
Eternity Summer 2015  Calvin Klein for women (score:96%)
Marina Blue Princesse Marina De Bourbon for women (score:96%)
Speedlife Woman Tom Tailor for women (score:93%)
Eau Mega Viktor&Rolf for women (score:93%)
Exceptional Because You Are For Men Exceptional Parfums for men (score:93%)
Tommy Girl Summer 2011 Tommy Hilfiger for women (score:92%)
Pure Lightness Adidas for women (score:92%)
Davidoff Cool Water Woman Coral Reef Edition Davidoff for women (score:92%)
Neon Blue Superdry for women (score:92%)
OP Juice for Women Ocean Pacific for women (score:92%)
Head Over Heels Revlon for women (score:92%)
Blue Seduction Antonio Banderas for women (score:92%)
Harajuku Lovers Wicked Style G Harajuku Lovers for women (score:92%)
Ice Sheers Refreshing Avon for women (score:92%)
Freedom for Her Tommy Hilfiger for women (score:91%)
Heidi Klum M