# Recommendation System for Amazon Clothing Products
---
## 4. Content Based Recommendation System

*Author*: Mariam Elsayed

*Contact*: mariamkelsayed@gmail.com

*Notebook*: 4 of 5

*Previous Notebook*: `popularity_rec.ipynb`

*Next Notebook*: `colab_rec.ipynb`

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity 

## Table of Contents

* [Introduction](#intro)

* [Loading the Data](#loading)

* [Content Based Recommendation System](#rec)

    * [Using Cosine Similarity](#cosine)

    * [Creating a General Function](#function)

* [Conclusion](#conc)

## Introduction <a class="anchor" id="intro"></a>

Next let's create a content based recommendation system. This would rely on ranking how similar different products are based on their description. 

## Loading the Data <a class="anchor" id="loading"></a>

This recommendation system will use the products data created in the preprocessing notebooks.

In [2]:
# Reading the data
products_df = pd.read_csv('Data/products_summary.csv')

In [3]:
products_df

Unnamed: 0,category,description,title,brand,rank,asin,imageURL,price,maincat_Luggage & Travel Gear,maincat_Backpacks,...,subcat_Girls,subcat_Boys,"subcat_Shoe, Jewelry & Watch Accessories",subcat_Jewelry Accessories,subcat_Shoe Care & Accessories,subcat_Contemporary & Designer,subcat_Travel Accessories,"subcat_Surf, Skate & Street",average_rating,total_reviews
0,"['Clothing, Shoes & Jewelry', 'Women', 'Import...",Veneziana Sexy Strip 20 Open Crotch Pantyhose ...,Sexystrip,Veneziana,734888,5120053017,['https://images-na.ssl-images-amazon.com/imag...,14.95,False,False,...,False,False,False,False,False,False,False,False,3.7,38
1,"['Clothing, Shoes & Jewelry', 'Women', 'Import...",Veneziana Ar Beautiful - Hold Ups Thigh High S...,Beautiful,Veneziana,551160,5120053351,['https://images-na.ssl-images-amazon.com/imag...,17.70,False,False,...,False,False,False,False,False,False,False,False,3.9,47
2,"['Clothing, Shoes & Jewelry', 'Women', 'Matern...",Dress Length (Neck to Bottom Hem) Small - 40 i...,sofsy Soft-Touch Rayon Blend Tie Front Nursing...,Unknown,740957,5120053890,['https://images-na.ssl-images-amazon.com/imag...,34.99,False,False,...,False,False,False,False,False,False,False,False,4.6,31
3,"['Clothing, Shoes & Jewelry', 'Baby', 'Baby Gi...","Little Brother: Size 70: Length 38 CM, Bust*2 ...",Toddler Girls Big Sister T Shirt Matching Litt...,Kingte,36991,5780122040,['https://images-na.ssl-images-amazon.com/imag...,10.49,False,False,...,False,False,False,False,False,False,False,False,3.5,6
4,"['Clothing, Shoes & Jewelry', 'Women', 'Clothi...","GorgeoUS lightweight cotton dress in red, pink...",Pistachio Women's Sun Flower Flowing Knee Leng...,Pistachio,1131061,6040972467,['https://images-na.ssl-images-amazon.com/imag...,22.99,False,False,...,False,False,False,False,False,False,False,False,4.2,26
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
147467,"['Clothing, Shoes & Jewelry', 'Men', 'Shoes', ...",These men's Tubular shoes honor running-inspir...,adidas Originals Men's Tubular Shadow Running ...,Unknown,20519,B01HJDW6ZC,['https://images-na.ssl-images-amazon.com/imag...,120.08,False,False,...,False,False,False,False,False,False,False,False,4.9,8
147468,"['Clothing, Shoes & Jewelry', 'Men', 'Shoes', ...",An edgy take on Adidas running-inspired herita...,adidas Originals Men's Tubular Shadow Fashion ...,Unknown,74828,B01HJDVCJI,['https://images-na.ssl-images-amazon.com/imag...,123.72,False,False,...,False,False,False,False,False,False,False,False,5.0,3
147469,"['Clothing, Shoes & Jewelry', 'Women', 'Shoes'...",Catalina - where sporty meets sparkly! adorned...,Dansko Women's Catalina Flat Sandal,Unknown,1114685,B01HJCMR4I,['https://images-na.ssl-images-amazon.com/imag...,94.98,False,False,...,False,False,False,False,False,False,False,False,4.1,15
147470,"['Clothing, Shoes & Jewelry', 'Men', 'Shoes', ...",A classic wingtip with a subtle twist that cov...,Deer Stags Men's Hampden Oxford,Unknown,956501,B01HJH7W0W,['https://images-na.ssl-images-amazon.com/imag...,52.38,False,False,...,False,False,False,False,False,False,False,False,4.0,5


In [4]:
products_df['title'].loc[3000]

'Hoodie: Batman - Logo Pullover Hoodie Size M'

In [5]:
products_df = products_df.dropna(axis=0)

## Content Based Recommendation System <a class="anchor" id="content_rec"></a>

Content based recommedation systems use the features of the products to recommend products that are similar. In our this case, most information describing the product are in the description. To quantify the descriptions, a TF-IDF matrix will be used. 

TF-IDF stands for term frequency - inverse document frequency, where term frequency is the number of times the word appears in a document and where inverse document frequency looks at how common the word is amongst the corpus.

### Using Cosine Similarity <a class="anchor" id="cosine"></a>

Cosine similarity is a metric used to measure the similarity between two vectors. This similarity is scored between 0 and 1, 1 being the most similar. Let's use the backpack below as an example of how cosine similarity works.

In [6]:
products_df['title'].loc['asin' = '0204444454']

'Sexystrip'

The TF IDF matrix would be too large to work with, so lets break up the dataframe into the category the item is in.

In [7]:
products_backpacks = products_df[products_df['maincat_Backpacks'] == True]
products_backpacks

Unnamed: 0,category,description,title,brand,rank,asin,price,maincat_Luggage & Travel Gear,maincat_Backpacks,maincat_Novelty & More,...,subcat_Girls,subcat_Boys,"subcat_Shoe, Jewelry & Watch Accessories",subcat_Jewelry Accessories,subcat_Shoe Care & Accessories,subcat_Contemporary & Designer,subcat_Travel Accessories,"subcat_Surf, Skate & Street",average_rating,total_reviews
0,"['Clothing, Shoes & Jewelry', 'Luggage & Trave...",The Hottest Bag in Town! Brand: Anello Conditi...,Japan Anello Backpack Unisex PINK BEIGE LARGE ...,Anello,3994472,0204444454,70.00,True,True,False,...,False,False,False,False,False,False,False,False,4.5,2
1,"['Clothing, Shoes & Jewelry', 'Luggage & Trave...",The Hottest Bag in Town! Brand: Anello Conditi...,Japan Anello Backpack Unisex BLACK LARGE PU LE...,Anello,635761,0204444403,65.99,True,True,False,...,False,False,False,False,False,False,False,False,5.0,2
146,"['Clothing, Shoes & Jewelry', 'Luggage & Trave...",Carry your essential items around town in this...,AmeriLeather Miles Backpack,Amerileather,2990358,B00065EIT8,75.99,True,True,False,...,False,False,False,False,False,False,False,False,4.5,4
698,"['Clothing, Shoes & Jewelry', 'Luggage & Trave...","Yak Pak's most popular backpack, the Student B...",Yak Pak 635 Basic Student Backpack - Black,Yak Pak,4404877,B00080LNUS,19.52,True,True,False,...,False,False,False,False,False,False,False,False,5.0,1
838,"['Clothing, Shoes & Jewelry', 'Luggage & Trave...",Osgoode Marley knows that backpacks remain a s...,Osgoode Marley Cashmere Large Organizer Backpack,Osgoode Marley,493947,B00097DYSE,14.20,True,True,False,...,False,False,False,False,False,False,False,False,5.0,10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
474165,"['Clothing, Shoes & Jewelry', 'Luggage & Trave...",This cute and fashionable Disney Little Mermai...,"Disney Girl's The Little Mermaid Ariel 16"" Sch...",Unknown,2059273,B01HGS2876,6.58,True,True,False,...,False,False,False,False,False,False,False,False,5.0,1
474174,"['Clothing, Shoes & Jewelry', 'Luggage & Trave...","Specifically designed for girls,lovely pattern...",Dog Pawprint Cat Fingerprint Backpack for Elem...,MIFULGOO,155356,B01HGSLJKI,4.58,True,True,False,...,False,False,False,False,False,False,False,False,4.7,40
474564,"['Clothing, Shoes & Jewelry', 'Luggage & Trave...",Kid's Teenage Mutant Ninja Turtles Out of the ...,Teenage Mutant Ninja Turtles Movie Out of The ...,Unknown,2683371,B01HHDG7L8,13.99,True,True,False,...,False,False,False,False,False,False,False,False,5.0,1
474695,"['Clothing, Shoes & Jewelry', 'Luggage & Trave...",Withmystyle provide the latest trends and styl...,Casual Canvas Fashion Shool Backpack,Generic,1659770,B01HHNCBH2,11.95,True,True,False,...,False,False,False,False,False,False,False,False,5.0,3


Now let's start vectorizing using the products_backpacks description column and finding the cosine similarities.

In [8]:
vectorizer =TfidfVectorizer(stop_words='english', min_df=5, lowercase=True)

TF_IDF_matrix = vectorizer.fit_transform(products_backpacks['description'])

In [9]:
TF_IDF_matrix.shape

(1538, 1663)

In [10]:
# Finding the similarities
similarities = cosine_similarity(TF_IDF_matrix, dense_output=False)

Let's create a dataframe to shows the products and their similarity to the first backpack.

In [11]:
product_index = products_backpacks\
    [products_backpacks['title'] == 'Japan Anello Backpack Unisex PINK BEIGE LARGE PU LEATHER Rucksack School Bag Campus']\
    .index

# defining dataframe containing product names and similarities
sim_df = pd.DataFrame(
    {
        'product_name': products_backpacks['title'],
        'similarity': np.array(similarities[product_index, :].todense()).squeeze()
    }
)

Sorting by the highest similarities, we get our original backpack. In the second and third place, the brands other backpacks appear.

In [12]:
sim_df.sort_values(by='similarity', ascending=False).head(10)

Unnamed: 0,product_name,similarity
0,Japan Anello Backpack Unisex PINK BEIGE LARGE ...,1.0
1,Japan Anello Backpack Unisex BLACK LARGE PU LE...,1.0
440185,Women Girls Ladies Backpack Fashion Shoulder B...,0.331545
286031,New Fashion Gold /Silver School Travel Gym Sho...,0.270979
451082,Handolederco Vintage Bag Leather Handmade Vint...,0.255005
274420,"Peppa Pig Large 16"" School Backpack(purple)",0.252894
286519,Donalworld Women Sequin Backpack Bling Paillet...,0.250486
238514,AmeriBag Small Classic Leather Healthy Back Bag,0.245286
88002,"ANGRY BIRDS 16"" LARGE SCHOOL BACKPACK",0.227965
28794,Leatherbay Leather Backpack with Pockets,0.226652


### Creating a General Function <a class="anchor" id="function"></a>

Let's create a function that takes in a product and returns the most similar items in the products dataframe. Some product categories still yield a TF-IDF matrix that is too large, so the dataframe will be sampled from if the number of rows is larger than 10000. 

In [13]:
def content_recommender_cosine(title, products, category):

    '''
    Function that recommends similar item using cosine similarity 

    INPUT: title (str) - Name of product
           products (df) - Dataframe containing product
           category (str) - Category of the product

    OUTPUT: dataframe containing top 10 most similar items
    '''
    # Making dataframe contain only that category
    products = products[products[category] == True]

    # Defining the input product 
    product = products[products['title'] == title]

    # Resampling if the dataframe is too large
    if products.shape[0] > 10000:

        products = products.sample(n=500).reset_index()

        products = pd.concat([products, product], axis=0)

        products = products.reset_index()

    else: pass

    vectorizer =TfidfVectorizer(stop_words='english', min_df=30, lowercase=True)
    
    TF_IDF_matrix = vectorizer.fit_transform(products['description'])

    similarities = cosine_similarity(TF_IDF_matrix, dense_output=False)

    product_index = products[products['title'] == title].index

    sim_df = pd.DataFrame(
        {
            'product_name': products['title'],
            'asin': products['asin'],
            'similarity': (np.array(similarities[product_index, :].todense())).squeeze()
        }
    )

    top_products = sim_df.sort_values(by='similarity', ascending=False).head(10)

    return top_products

In [14]:
content_recommender_cosine('Japan Anello Backpack Unisex PINK BEIGE LARGE PU LEATHER Rucksack School Bag Campus', 
                            products_df, 'maincat_Backpacks')

Unnamed: 0,product_name,asin,similarity
0,Japan Anello Backpack Unisex PINK BEIGE LARGE ...,0204444454,1.0
1,Japan Anello Backpack Unisex BLACK LARGE PU LE...,0204444403,1.0
440185,Women Girls Ladies Backpack Fashion Shoulder B...,B01E27V99M,0.494976
415526,Goson Genuine Leather Mini Backpack Handbag/Pu...,B01BL4C5HO,0.371702
323015,Batman Acrylic Logo Faux Leather Backpack [App...,B010JAK5NW,0.345792
227391,Floto Toscana Leather Pack,B00MOX5ZRE,0.332771
286519,Donalworld Women Sequin Backpack Bling Paillet...,B00UJK8IG4,0.329615
238514,AmeriBag Small Classic Leather Healthy Back Bag,B00NUGHNME,0.325498
432518,Harvest Label Urban Rolltop Backpack 2.0,B01D3RC9QE,0.32508
147255,Saddleback Leather Co. Full Grain Leather Back...,B00EK25RPM,0.322914


In [15]:
content_recommender_cosine("Deer Stags Men's Hampden Oxford", products_df, 'maincat_Shoes')

Unnamed: 0,product_name,asin,similarity
500,Deer Stags Men's Hampden Oxford,B01HJH7W0W,1.0
477,NIKE Men's Air Max MVP Elite Mid Baseball Cleats,B00OBZVR66,0.619057
102,RIDGEMONT Monty Lo Women's Oiled Suede Walking...,B017UOZFH6,0.573017
491,VANS Unisex Classic Slip-On Shoes,B00RQOMZOW,0.536507
479,adidas Performance Women's Matteo NUA Firm-Gro...,B00DQYP1EM,0.515588
323,Earth Origins London Women's Black 7 B(M) US,B005BB16BK,0.449446
134,Heart's Amanda-02 Women's Pointy Toe Chunky He...,B016VFGK56,0.408132
499,UGG Men's Classic Mini Bomber Winter Boot,B00RM65C7G,0.404158
425,Vans Atwood Unisex Kids' Low-Top Sneakers,B00HRGHTNG,0.392704
104,B.O.C - Mens - Eric,B0093OM0R0,0.392704


In [None]:
content_recommender_cosine("content_recommender_cosine("Deer Stags Men's Hampden Oxford", products_df, 'maincat_Shoes')", products_df, 'maincat_Shoes')

Unnamed: 0,product_name,asin,similarity
500,Deer Stags Men's Hampden Oxford,B01HJH7W0W,1.0
477,NIKE Men's Air Max MVP Elite Mid Baseball Cleats,B00OBZVR66,0.619057
102,RIDGEMONT Monty Lo Women's Oiled Suede Walking...,B017UOZFH6,0.573017
491,VANS Unisex Classic Slip-On Shoes,B00RQOMZOW,0.536507
479,adidas Performance Women's Matteo NUA Firm-Gro...,B00DQYP1EM,0.515588
323,Earth Origins London Women's Black 7 B(M) US,B005BB16BK,0.449446
134,Heart's Amanda-02 Women's Pointy Toe Chunky He...,B016VFGK56,0.408132
499,UGG Men's Classic Mini Bomber Winter Boot,B00RM65C7G,0.404158
425,Vans Atwood Unisex Kids' Low-Top Sneakers,B00HRGHTNG,0.392704
104,B.O.C - Mens - Eric,B0093OM0R0,0.392704


## Conclusion <a class="anchor" id="conc"></a>

The content-based recommendation created finds similar products using cosine similarity on the product's TF-IDF matrix made from the description of the product. A general function that does this was created.

*Next Notebook*: `colab_rec.ipynb`