# <h1 style="color:#80D7B8; font-size:22px;"><center><strong>🌸 Developing a Recommendation Engine for Skincare Products 🌸</strong></center></h1>

This project involves
<span style="color:#EB91B3;"><strong>the development of a content-based recommendation engine that should take the name of a skincare product as input and return several similar products based on the product's ingredients.</strong></span>


<p style="-moz-border-radius: 6px;
         -webkit-border-radius: 6px;
         background-color: #f0f7fb;
         background-image: url(https://f1.madcapsoftware.com/blogImages/2017/08/css-box-icon-3.png);
         background-position: 2px 0px;
         background-repeat: no-repeat;
         border: solid 1px #3498db;
         border-radius: 10px;
         line-height: 18px;
         overflow: hidden;
         padding: 15px 60px;
         font-size: 14px"><strong>Note:</strong>This was part of my industrial placement with a skincare company for my Masters degree last year. I have really condensed it in this notebook and simplified it as it was too long to be posting on here!</p>

# <p style="color:#c0a5e3; font-size:20px">Imports</p>

In [1]:
import numpy as np
import pandas as pd
import re

from sklearn.decomposition import TruncatedSVD
from sklearn.manifold import TSNE

from bokeh.io import curdoc, push_notebook, output_notebook
from bokeh.layouts import column, layout
from bokeh.models import ColumnDataSource, Div, Select, Slider, TextInput, HoverTool
from bokeh.plotting import figure, show
from ipywidgets import interact, interactive, fixed, interact_manual

ModuleNotFoundError: No module named 'numpy'

# <p style="color:#c0a5e3; font-size:20px">Loading data ⏳</p>

In [None]:
data = pd.read_csv('../input/skincare-products-clean-dataset/skincare_products_clean.csv')
data

In [None]:
data.info()

There are no missing values and the ingredients column was previously thoroughly cleaned. All that remains is for the ingredients to be processed into a different format that can be used in the recommendation engine. One-hot encoding will be used.

In [None]:
for i in range(len(data['clean_ingreds'])):
    data['clean_ingreds'].iloc[i] = str(data['clean_ingreds'].iloc[i]).replace('[', '').replace(']', '').replace("'", '').replace('"', '')

In [None]:
all_ingreds = []

for i in data['clean_ingreds']:
    ingreds_list = i.split(', ')
    for j in ingreds_list:
        all_ingreds.append(j)

In [None]:
all_ingreds = sorted(set(all_ingreds))
all_ingreds[0:20]

In [None]:
all_ingreds.remove('')
for i in range(len(all_ingreds)):
    if all_ingreds[i][-1] == ' ':
        all_ingreds[i] = all_ingreds[i][0:-1]
        
all_ingreds = sorted(set(all_ingreds))
all_ingreds[0:20]

In [None]:
one_hot_list = [[0] * 0 for i in range(len(all_ingreds))]

for i in data['clean_ingreds']:
    k=0
    for j in all_ingreds:
        if j in i:
            one_hot_list[k].append(1)
        else:
            one_hot_list[k].append(0)
        k+=1
        
ingred_matrix = pd.DataFrame(one_hot_list).transpose()
ingred_matrix.columns = [sorted(set(all_ingreds))]

ingred_matrix

This matrix contains zeros and ones - although all we can see in the small snip from above are zeros as it is very sparse. 

Part of the project that I skipped in this notebook was the weighting of this matrix. In each product type there were some ingredients that were very common and didn't define the product's unique function as well as some less common ingredients. These common ingredients were given lower weights. Some ingredients that were very important for particular functions such as moisturising or SPF were given higher weights. So in the full project the matrix used for this wasn't just zeros and ones.

✨ <strong style="color:#F26169"> TASK: Fork this notebook and see if you can improve the results by adding weights to the matrix!</strong> ✨

# <p style="color:#c0a5e3; font-size:20px">Visualising similarities 👀</p>

We will use TruncatedSVD and TSNE to summarise the whole matrix in 2 values for each row. These x and y values can be plotted to visualise the similarities between the products.

In [None]:
svd = TruncatedSVD(n_components=150, n_iter = 1000, random_state = 6) # firstly reduce features to 150 with truncatedSVD - this suppresses some noise
svd_features = svd.fit_transform(ingred_matrix)
tsne = TSNE(n_components = 2, n_iter = 1000000, random_state = 6) # reduce 150 features to 2 using t-SNE with exact method
tsne_features = tsne.fit_transform(svd_features)

data['X'] = tsne_features[:, 0]
data['Y'] = tsne_features[:, 1]

In [None]:
unique_types = ['Moisturiser', 'Serum', 'Oil', 'Mist', 'Balm', 'Mask', 'Peel',
       'Eye Care', 'Cleanser', 'Toner', 'Exfoliator', 'Bath Salts',
       'Body Wash', 'Bath Oil']

source = ColumnDataSource(data)

plot = figure(title = "Mapped Similarities", width = 800, height = 600)
plot.xaxis.axis_label = "t-SNE 1"
plot.yaxis.axis_label = 't-SNE 2'

plot.circle(x = 'X', y = 'Y', source = source, fill_alpha=0.7, size=10,
           color = '#c0a5e3', alpha = 1)

plot.background_fill_color = "#E9E9E9"
plot.background_fill_alpha = 0.3

hover = HoverTool(tooltips=[('Product', '@product_name'), ('Price', '@price')])
plot.add_tools(hover)

def type_updater(product_type = unique_types[0]):
    new_data = {'X' : data[data['product_type'] == product_type]['X'],
                'Y' : data[data['product_type'] == product_type]['Y'],
                'product_name' : data[data['product_type'] == product_type]['product_name'],
                'price' : data[data['product_type'] == product_type]['price']}
    source.data = new_data
    push_notebook()
  
output_notebook()
show(plot, notebook_handle = True)
interact(type_updater, product_type = unique_types)

# <p style="color:#c0a5e3; font-size:20px">Extracting brand names 🧪</p>

Many of the products in the dataset are from the same brands and almost share all of the same ingredients. It makes sense to use the brand name as a filter in the function to generate product recommendations from a good variety of brands and give the user more options (in the full project this was ONE OF the options - the user could choose to include products from the same brand if they wish).

Below is a list of all the brands included on the website that the data was taken from. We will use this list to extract the brand names from the product names. All product names contain the product's brand name.

In [None]:
brand_list = ["111skin", "a'kin", "acorelle", "adam revolution", "aesop", "ahava", "alchimie forever",
             "algenist", "alpha-h", "ambre solaire", "ameliorate", "american crew", "anthony", "antipodes",
             "apivita", "argentum", "ark skincare", "armani", "aromatherapy associates", "aromaworks", "aromatica",
             "aurelia probiotic skincare", "aurelia skincare",
             "australian bodycare", "avant skincare", "aveda", "aveeno", "avene", "avène",
             "bakel", "balance me", "barber pro", "bareminerals", "barry m cosmetics",
             "baxter of california", "bbb london", "beautypro", "benefit", "benton", "bioderma",
             "bioeffect", "bloom & blossom", "bloom and blossom", "bobbi brown", "bondi sands", "bubble t", "bulldog", "burt's bees",
             "by terry", "carita", "caudalie", "cerave", "chantecaille", "clinique",
             "comfort zone", "connock london", "cosmetics 27", "cosrx", "cowshed", "crystal clear", 
             "cult51", "darphin", "dear, klairs", "decleor", "decléor", "dermalogica", "dhc", "doctors formula",
             "dr. brandt", "dr brandt", "dr. hauschka", "dr hauschka", "dr. jackson's", "dr.jart+", "dr. lipp",
             "dr botanicals", "dr dennis", "dr. pawpaw", "ecooking", "egyptian magic",
             "eisenberg", "elemental herbology", "elemis", "elizabeth arden", "embryolisse",
             "emma hardie", "erno laszlo", "espa", "estée lauder", "estee lauder", "eucerin",
             "eve lom", "eve rebirth", "fade out", "farmacy", "filorga", "first aid beauty", "fit", "foreo",
             "frank body", "freezeframe", "gallinée", "garnier", "gatineau", "glamglow", "goldfaden md",
             "green people", "hawkins and brimble", "holika holika", "house 99", "huxley",
             "ilapothecary", "ila-spa", "indeed labs", "inika", "instant effects", "institut esthederm", "ioma", "klorane",
             "j.one", "jack black", "james read", "jason", "jo malone london", "juice beauty", "jurlique",
             "korres", "l:a bruket", "l'oréal men expert", "l'oreal men expert", "l'oréal paris", "l'oreal paris",
             "l’oréal paris", "lab series skincare for men",
             "lancaster", "lancer skincare", "lancôme", "lancome", "lanolips", "la roche-posay", "laura mercier",
             "liftlab", "little butterfly london", "lixirskin", "liz earle", "love boo",
             "löwengrip", "lowengrip", "lumene", "mac", "madara", "mádara", "magicstripes", "magnitone london",
             "mama mio", "mancave", "manuka doctor", "mauli", "mavala", "maybelline", "medik8", "men-u", "menaji", "molton brown", "moroccanoil",
             "monu", "murad", "naobay", "nars", "natio", "natura bissé", "natura bisse",
             "neal's yard remedies", "neom", "neostrata", "neutrogena", "niod", "nip+fab", "nuxe", "nyx",
             "oh k!", "omorovicza", "origins", "ortigia fico", "oskia", "ouai", "pai ", "paula's choice", "payot",
             "perricone md", "pestle & mortar", "pestle and mortar", "peter thomas roth",
             "philosophy", "pierre fabre", "pixi", "piz buin", "polaar", "prai", "project lip",
             "radical skincare", "rapideye", "rapidlash", "real chemistry", "recipe for men",
             "ren ", "renu", "revolution beauty", "revolution skincare", "rituals", "rmk", "rodial", "roger&gallet", "salcura",
             "sanctuary spa", "sanoflore", "sarah chapman", "sea magik", "sepai",
             "shaveworks", "shea moisture", "shiseido", "skin79", "skin authority", "skinceuticals",
             "skinchemists", "skindoctors", "skin doctors", "skinny tan", "sol de janeiro", "spa magik organiks",
              "st. tropez", "starskin", "strivectin", "sukin",
             "svr", "swiss clinic", "talika", "tan-luxe", "tanorganic", "tanworx", "thalgo", "the chemistry brand",
             "the hero project", "the inkey list", "the jojoba company", "the ordinary",
             "the organic pharmacy", "the ritual of namasté", "this works", "too faced", "trilogy", "triumph and disaster",
             "ultrasun", "uppercut deluxe", "urban decay", "uriage", "verso", "vichy",
             "vida glow", "vita liberata", "wahl", "weleda", "westlab", "wilma schumann", "yes to",
             "ysl", "zelens"]
brand_list = sorted(brand_list, key=len, reverse=True)

In [None]:
data['brand'] = data['product_name'].str.lower()
k=0
for i in data['brand']:
    for j in brand_list:
        if j in i:
            data['brand'][k] = data['brand'][k].replace(i, j.title())
    k+=1
    
data

In [None]:
sorted(data.brand.unique())

There are slight variations in some of the brand names which would cause them to be considered as separate brands. This needs to be resolved before continuing or else the brand filter wont function correctly.

In [None]:
data['brand'] = data['brand'].replace(['Aurelia Probiotic Skincare'],'Aurelia Skincare')
data['brand'] = data['brand'].replace(['Avene'],'Avène')
data['brand'] = data['brand'].replace(['Bloom And Blossom'],'Bloom & Blossom')
data['brand'] = data['brand'].replace(['Dr Brandt'],'Dr. Brandt')
data['brand'] = data['brand'].replace(['Dr Hauschka'],'Dr. Hauschka')
data['brand'] = data['brand'].replace(["L'oreal Paris", 'L’oréal Paris'], "L'oréal Paris")

# <p style="color:#c0a5e3; font-size:20px">Creating the recommendation function 👩🏻‍💻</p>

**The function below recommends products by:**

🔍 taking the name of a product as input

🧴 only including products of the same type

📛 not recommending products of the same brand name

➗ calculating cosine similarities and returning top 5 similar products

In [None]:
def recommender(search):
    cs_list = []
    brands = []
    output = []
    binary_list = []
    idx = data[data['product_name'] == search].index.item()
    for i in ingred_matrix.iloc[idx][1:]:
        binary_list.append(i)    
    point1 = np.array(binary_list).reshape(1, -1)
    point1 = [val for sublist in point1 for val in sublist]
    prod_type = data['product_type'][data['product_name'] == search].iat[0]
    brand_search = data['brand'][data['product_name'] == search].iat[0]
    data_by_type = data[data['product_type'] == prod_type]
    
    for j in range(data_by_type.index[0], data_by_type.index[0] + len(data_by_type)):
        binary_list2 = []
        for k in ingred_matrix.iloc[j][1:]:
            binary_list2.append(k)
        point2 = np.array(binary_list2).reshape(1, -1)
        point2 = [val for sublist in point2 for val in sublist]
        dot_product = np.dot(point1, point2)
        norm_1 = np.linalg.norm(point1)
        norm_2 = np.linalg.norm(point2)
        cos_sim = dot_product / (norm_1 * norm_2)
        cs_list.append(cos_sim)
    data_by_type = pd.DataFrame(data_by_type)
    data_by_type['cos_sim'] = cs_list
    data_by_type = data_by_type.sort_values('cos_sim', ascending=False)
    data_by_type = data_by_type[data_by_type.product_name != search] 
    l = 0
    for m in range(len(data_by_type)):
        brand = data_by_type['brand'].iloc[l]
        if len(brands) == 0:
            if brand != brand_search:
                brands.append(brand)
                output.append(data_by_type.iloc[l])
        elif brands.count(brand) < 2:
            if brand != brand_search:
                brands.append(brand)
                output.append(data_by_type.iloc[l])
        l += 1
        
    return print('\033[1m', 'Recommending products similar to', search,':', '\033[0m'), print(pd.DataFrame(output)[['product_name', 'cos_sim']].head(5))

# <p style="color:#c0a5e3; font-size:20px">Using function to get recommendations 📄</p>

Now we will feed some product names into the above function from a variety of product types to see what recommendations we get!

In [None]:
recommender("Origins GinZing™ Energy-Boosting Tinted Moisturiser SPF40 50ml")

In [None]:
recommender('Avène Antirougeurs Jour Redness Relief Moisturizing Protecting Cream (40ml)')

In [None]:
recommender('Bondi Sands Everyday Liquid Gold Gradual Tanning Oil 270ml')

In [None]:
recommender('Sukin Rose Hip Oil (25ml)')

In [None]:
recommender('La Roche-Posay Anthelios Anti-Shine Sun Protection Invisible SPF50+ Face Mist 75ml')

In [None]:
recommender('Clinique Even Better Clinical Radical Dark Spot Corrector + Interrupter 30ml')

In [None]:
recommender("FOREO 'Serum Serum Serum' Micro-Capsule Youth Preserve")

In [None]:
recommender('Garnier Organic Argan Mist 150ml')

In [None]:
recommender('Shea Moisture 100% Virgin Coconut Oil Daily Hydration Body Wash 384ml')

In [None]:
recommender('JASON Soothing Aloe Vera Body Wash 887ml')

# <p style="color:#c0a5e3; font-size:20px">Conclusion</p>

The content-based recommendation engine was successfully developed using cosine similarity. The recommendation engine enables users to make better decisions on which product to purchase, as many recommendations contain products that are better value for money. It also has the potential to improve business for less popular brands by recommending their products.

<span style="color:#EB91B3; font-size:16px;"><strong>Hope you enjoyed! 😊 </strong></span>