# Simple Clothing Recommender

In order to create a clothing recommender, I found data from the website Rent the Runway from this [website](https://cseweb.ucsd.edu/~jmcauley/datasets.html). In this notebook, I will walk through the code as to how I created a simple recommender. When I say simple, I mean that my recommender for this notebook only has three variables, and only one in which it is making recommendations off of. The one variable that is needed for making any kind of recommender such as this is rating data. Due to the fact that this data set has a rating variable, I am able to make a recommender for users based on the rating they gave certain article of clothing. In this notebook, I am also only using a very small portion of the data that is available to practice creating the recommender on. The reason is because my computer was not able to handle all of the data, which was 192,544 samples. When creating the simple recommender in this notebook, I only used 5,000 samples. In another notebook, in which I was cloud computing, I was able to create the recommender that could take in all of the data points. The recommender made from this notebook of course was not able to recommend as well because it was not exposed to as more data. While more data can sometimes make a recommender, this may not always be the case because you may just have products that are all equally liked by users therefore does not hold as much value. So in the end it really just depends on the amount of products and users that a store or company has that could determine if adding a recommender for their users in worth it or not.


### Importing Data
Below I imported pandas in order to import the data into a dataframe in the notebook. The file below is in a JSON format, but when downloading it straight from the website in came in the format of a gzip, so I used a file converting website that allowed me to unzip the data, and save it in a JSON format. 

In [1]:
import pandas as pd

In [2]:
rent = pd.read_json('renttherunway_final_data.json', lines = True)

In [3]:
rent.index

RangeIndex(start=0, stop=192544, step=1)

### Inspecting the Data

Below I printed the data frame, in order to get an understanding of the columns and amount of samples I had. After looking at the columns, I knew for the recommender I was building I was only needed to use the following three variables:
 - user_id
 - rating
 - category
 
I knew that I needed these because I need the item name I was recommending, which was category, a variable that idenified the user, and a variable that the recommender was going to make recommendations off of.
 
I inspected each of these variables to see how many null values there were for each of these varaibles, how many categories their were, and how many rating categories there were.

In [4]:
rent

Unnamed: 0,fit,user_id,bust size,item_id,weight,rating,rented for,review_text,body type,review_summary,category,height,size,age,review_date
0,fit,420272,34d,2260466,137lbs,10.0,vacation,An adorable romper! Belt and zipper were a lit...,hourglass,So many compliments!,romper,"5' 8""",14,28.0,"April 20, 2016"
1,fit,273551,34b,153475,132lbs,10.0,other,I rented this dress for a photo shoot. The the...,straight & narrow,I felt so glamourous!!!,gown,"5' 6""",12,36.0,"June 18, 2013"
2,fit,360448,,1063761,,10.0,party,This hugged in all the right places! It was a ...,,It was a great time to celebrate the (almost) ...,sheath,"5' 4""",4,116.0,"December 14, 2015"
3,fit,909926,34c,126335,135lbs,8.0,formal affair,I rented this for my company's black tie award...,pear,Dress arrived on time and in perfect condition.,dress,"5' 5""",8,34.0,"February 12, 2014"
4,fit,151944,34b,616682,145lbs,10.0,wedding,I have always been petite in my upper body and...,athletic,Was in love with this dress !!!,gown,"5' 9""",12,27.0,"September 26, 2016"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
192539,fit,66386,34dd,2252812,140lbs,10.0,work,Fit like a glove!,hourglass,LOVE IT!!! First Item Im thinking of buying!,jumpsuit,"5' 9""",8,42.0,"May 18, 2016"
192540,fit,118398,32c,682043,100lbs,10.0,work,The pattern contrast on this dress is really s...,petite,LOVE it!,dress,"5' 1""",4,29.0,"September 30, 2016"
192541,fit,47002,36a,683251,135lbs,6.0,everyday,"Like the other DVF wraps, the fit on this is f...",straight & narrow,"Loud patterning, flattering fit",dress,"5' 8""",8,31.0,"March 4, 2016"
192542,fit,961120,36c,126335,165lbs,10.0,wedding,This dress was PERFECTION. it looked incredib...,pear,loved this dress it was comfortable and photog...,dress,"5' 6""",16,31.0,"November 25, 2015"


In [5]:
rent.isnull().sum()

fit                   0
user_id               0
bust size         18411
item_id               0
weight            29982
rating               82
rented for           10
review_text           0
body type         14637
review_summary        0
category              0
height              677
size                  0
age                 960
review_date           0
dtype: int64

In [6]:
#After noticing that of the variables I was observing, rating was the only one that had missing values, I wrote the
# following code to drop the samples that had missing data. Since I was only losing .04% of the total sample, I knew
#this was the best choice rather than inputing the values because it was such a minmial amount of data that it would
#not affect the recommnder at all.
rent.dropna(subset = ['rating'], inplace = True)

In [7]:
rent['rating'].isnull().sum()

0

In [8]:
#Here I dropped enough data so that I could practice writing that code I needed to make the recommender since this data
#set was too large for me to create within this notbook.
rent.drop(rent.index[5001:192543],0, inplace = True)

In [9]:
#Then I inspected the value counts of the ratings to understand how my recommender might be biased. Looking at the 
#value counts below shows me that my recommender will mostly have similar recommendations to give since most clothes
#are rated as an 8 or 10.
rent['rating'].value_counts()

10.0    3197
8.0     1402
6.0      288
4.0       90
2.0       24
Name: rating, dtype: int64

In [10]:
#In this cell I wanted to see what amount of clothes existed within each category to understand what clothing 
#categories would have more than others in order to better understand once I see the final product of the recommender. 
rent['category'].value_counts()

dress         2414
gown          1155
sheath         491
shift          142
jumpsuit       133
top            124
romper          86
maxi            80
jacket          54
skirt           50
mini            48
sweater         36
coat            29
blazer          25
blouse          20
shirtdress      17
down            17
pants            9
culottes         9
shirt            6
vest             6
frock            6
bomber           5
tunic            4
pant             3
cape             3
cardigan         3
tank             3
print            2
leggings         2
suit             2
poncho           2
knit             2
sweatshirt       2
trouser          2
legging          2
culotte          1
midi             1
ballgown         1
peacoat          1
pullover         1
trench           1
duster           1
Name: category, dtype: int64

In [11]:
#I wanted to make sure I knew that I had dropped all the cells before trying to create the recommender, and in this 
#dataframe below, you can see that there is only 5001 samples.
rent

Unnamed: 0,fit,user_id,bust size,item_id,weight,rating,rented for,review_text,body type,review_summary,category,height,size,age,review_date
0,fit,420272,34d,2260466,137lbs,10.0,vacation,An adorable romper! Belt and zipper were a lit...,hourglass,So many compliments!,romper,"5' 8""",14,28.0,"April 20, 2016"
1,fit,273551,34b,153475,132lbs,10.0,other,I rented this dress for a photo shoot. The the...,straight & narrow,I felt so glamourous!!!,gown,"5' 6""",12,36.0,"June 18, 2013"
2,fit,360448,,1063761,,10.0,party,This hugged in all the right places! It was a ...,,It was a great time to celebrate the (almost) ...,sheath,"5' 4""",4,116.0,"December 14, 2015"
3,fit,909926,34c,126335,135lbs,8.0,formal affair,I rented this for my company's black tie award...,pear,Dress arrived on time and in perfect condition.,dress,"5' 5""",8,34.0,"February 12, 2014"
4,fit,151944,34b,616682,145lbs,10.0,wedding,I have always been petite in my upper body and...,athletic,Was in love with this dress !!!,gown,"5' 9""",12,27.0,"September 26, 2016"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4996,fit,655656,36d,259136,140lbs,10.0,other,The dress zipped up smoothly. It fit perfectly...,athletic,I wore this dress to a high school graduation ...,sheath,"5' 5""",12,41.0,"June 5, 2016"
4997,fit,188946,34b,125424,110lbs,10.0,party,This dress fits perfectly! It's simple and def...,petite,This was the perfect dress for our winter enga...,dress,"4' 11""",1,29.0,"March 4, 2016"
4998,large,581325,38d,1614539,180lbs,8.0,formal affair,"This dress was extremely comfortable, very swa...",full bust,The people I shared the event with and the dress!,gown,"5' 6""",32,26.0,"March 1, 2016"
4999,fit,616038,34d,2347208,120lbs,10.0,work,Hung well and didn't look too boxy...even over...,full bust,Perfect for work.,down,"5' 4""",8,36.0,"March 4, 2016"


# Creating the Recommender

In order to create a recommender, you need to have a dataframe of only the variables your recommender needs, so that you can create a pivot table. Once you have the pivot table you will need to to create a sparse matrix, and make sure to fill all the spaces where there are NAs (this happens because not every single user rates every single category of clothing) with zeros. After creating the sparse matrix, we have to find the cosine of each category in relation to each other. The closer the cosine is to 1 the more similaer those items are, which means that the user will like the category because it is similar to another category they like. If the cosine is close to zero, it means there is not really anything we can infer about whether or not a user will like it or not. If the cosine is close to -1, this gives us insight into the fact that a user would not like this at all, so we would never recommend it to them.

In [12]:
#In this cell, I have created the dataframe need to make the pivot table.
pt_df = rent[['user_id', 'rating', 'category']]

In [13]:
#Here I doubled checked that there were no null values one more time just to make sure.
pt_df.isnull().sum()

user_id     0
rating      0
category    0
dtype: int64

In [14]:
#This cell is for creating the pivot table
piovt = pt_df.pivot_table(index = 'category', columns = 'user_id', values = 'rating')

In [15]:
#This cell contains imported need to create the recommender.
from scipy import sparse
from sklearn.metrics.pairwise import pairwise_distances

In [16]:
#Here is where the sparse martix is created.
pivot_sparse = sparse.csr_matrix(piovt.fillna(0))

In [17]:
#This cell is where the cosine values of the categories are found.
recommender = pairwise_distances(pivot_sparse, metric = 'cosine')

In [18]:
#Checking to make sure the code is giving me the right output.
recommender

array([[0.        , 1.        , 1.        , ..., 1.        , 1.        ,
        1.        ],
       [1.        , 0.        , 0.95858166, ..., 1.        , 1.        ,
        1.        ],
       [1.        , 0.95858166, 0.        , ..., 1.        , 1.        ,
        1.        ],
       ...,
       [1.        , 1.        , 1.        , ..., 0.        , 1.        ,
        1.        ],
       [1.        , 1.        , 1.        , ..., 1.        , 0.        ,
        1.        ],
       [1.        , 1.        , 1.        , ..., 1.        , 1.        ,
        0.        ]])

In [19]:
#Here I created the dataframe of the recommender, in order to see what the recommendations are.
recommender_df = pd.DataFrame(recommender, columns = piovt.index, index = piovt.index)

In [20]:
#This is to see what the dataframe looks like.
recommender_df

category,ballgown,blazer,blouse,bomber,cape,cardigan,coat,culotte,culottes,down,...,skirt,suit,sweater,sweatshirt,tank,top,trench,trouser,tunic,vest
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ballgown,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
blazer,1.0,0.0,0.958582,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,0.983212,1.0,1.0,1.0,1.0
blouse,1.0,0.958582,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,0.976488,1.0,1.0,1.0,1.0
bomber,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
cape,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
cardigan,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
coat,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.947473,...,1.0,1.0,1.0,1.0,1.0,0.979956,1.0,1.0,1.0,1.0
culotte,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
culottes,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,...,1.0,1.0,1.0,1.0,1.0,0.965485,1.0,1.0,1.0,1.0
down,1.0,1.0,1.0,1.0,1.0,1.0,0.947473,1.0,1.0,0.0,...,1.0,1.0,1.0,1.0,1.0,0.965036,1.0,1.0,1.0,1.0


In [21]:
#This is where I tested the recommender based on one category, it starts with the least related and goes to the most
#related but only shows the top 10.
recommender_df['romper'].sort_values()[1:11]

category
tunic       0.947606
coat        0.956163
pants       0.957088
sweater     0.957231
top         0.965263
blazer      0.974502
dress       0.977808
jacket      0.982427
skirt       0.985281
jumpsuit    0.990944
Name: romper, dtype: float64

In [22]:
recommender_df['cardigan'].sort_values()[1:11]

category
ballgown      1.0
peacoat       1.0
poncho        1.0
print         1.0
pullover      1.0
romper        1.0
sheath        1.0
shift         1.0
shirt         1.0
shirtdress    1.0
Name: cardigan, dtype: float64

In [25]:
#In order to see the top 10 recommender categories for each category, I have the following code posted.

for col in recommender_df.columns:
    print(col)
    print(recommender_df[col].sort_values(ascending = False)[1:11])
    print()

ballgown
category
dress       1.0
leggings    1.0
legging     1.0
knit        1.0
jumpsuit    1.0
jacket      1.0
gown        1.0
frock       1.0
duster      1.0
down        1.0
Name: ballgown, dtype: float64

blazer
category
frock       1.0
tunic       1.0
midi        1.0
maxi        1.0
leggings    1.0
legging     1.0
knit        1.0
jacket      1.0
duster      1.0
pants       1.0
Name: blazer, dtype: float64

blouse
category
frock       1.0
tunic       1.0
midi        1.0
maxi        1.0
leggings    1.0
legging     1.0
knit        1.0
jacket      1.0
duster      1.0
pants       1.0
Name: blouse, dtype: float64

bomber
category
pant        1.0
midi        1.0
maxi        1.0
leggings    1.0
legging     1.0
knit        1.0
jumpsuit    1.0
gown        1.0
frock       1.0
duster      1.0
Name: bomber, dtype: float64

cape
category
pant        1.0
midi        1.0
leggings    1.0
legging     1.0
knit        1.0
jumpsuit    1.0
gown        1.0
frock       1.0
duster      1.0
dress       1.

# Clothing Recommender

To see the recommender using all of the samples from the original data, except the samples that are missing the value for rating. [Click here](https://colab.research.google.com/drive/1467ZodIPj_ZBPfjPKWWnuTj3zR--LqB_). This is all the same code as above.