# Decision Exercise: Mods 21 & 22

## Personalizing the Fashion Browsing Experience

**Background**

[Rent the Runway](https://www.renttherunway.com/) allows users to browse and rent thousands of designer quality clothes so their users can always be wearing stylish apparel without breaking the bank. With thousands of options to choose, all with varying sizes and styles, the shopping experience could quickly become overwhelming without some degree of personalization. From a business perspective, revenue can be increased by presenting users with the right offer, the right product, at the right time.

**Prompt**

What features of a recommender system make it effective for both the business (increase revenue) and the user experience (personalize information space)? In addition to collaborative filtering approaches, how can other techniques (content based, rule based, or hybrid) be used to augment the recommendation engine? How should these methods be measured within the context of the business objectives of Rent the Runway?

**Data Source**

Decomposing fit semantics for product size recommendation in metric spaces
Rishabh Misra, Mengting Wan, Julian McAuley
RecSys, 2018
[pdf](http://cseweb.ucsd.edu/~jmcauley/pdfs/recsys18e.pdf)

**Data Description**

```
rows: user_ids; this is a particular user ID

columns: item_ids; these are the product number for a particular product
```

The values for a (row, column) is a rating that a user gave that particular item. As you might notice, the matrix contains mostly null values because most users rate a small percentage of items and items are rated by a handful of users.

Import our packages

In [11]:
import pandas as pd
import gzip
import json

Define our data file to the variable DATAFILE

In [12]:
DATAFILE = './datasets/RentTheRunwayRatings.csv'

Read our data file into a DataFrame and view the head of the DataFrame

In [13]:
# read in prepared ratings matrix
df = pd.read_csv(DATAFILE, index_col='user_id')
df.head()

FileNotFoundError: [Errno 2] No such file or directory: './datasets/RentTheRunwayRatings.csv'

### To view raw data...
Read in the raw data

In [14]:
RAW_DATA = './datasets/renttherunway_final_data.json.gz'
data = []
with gzip.open(RAW_DATA, 'r') as f:
    contents = f.readlines()
    for line in contents:
        data.append(json.loads(line))

Convert to `pandas.DataFrame()` object

In [15]:
df = pd.DataFrame(data)
df.head()

Unnamed: 0,fit,user_id,bust size,item_id,weight,rating,rented for,review_text,body type,review_summary,category,height,size,age,review_date
0,fit,420272,34d,2260466,137lbs,10,vacation,An adorable romper! Belt and zipper were a lit...,hourglass,So many compliments!,romper,"5' 8""",14,28,"April 20, 2016"
1,fit,273551,34b,153475,132lbs,10,other,I rented this dress for a photo shoot. The the...,straight & narrow,I felt so glamourous!!!,gown,"5' 6""",12,36,"June 18, 2013"
2,fit,360448,,1063761,,10,party,This hugged in all the right places! It was a ...,,It was a great time to celebrate the (almost) ...,sheath,"5' 4""",4,116,"December 14, 2015"
3,fit,909926,34c,126335,135lbs,8,formal affair,I rented this for my company's black tie award...,pear,Dress arrived on time and in perfect condition.,dress,"5' 5""",8,34,"February 12, 2014"
4,fit,151944,34b,616682,145lbs,10,wedding,I have always been petite in my upper body and...,athletic,Was in love with this dress !!!,gown,"5' 9""",12,27,"September 26, 2016"


In [16]:
df.to_csv('data/runway.csv', index = False)

In [17]:
df = pd.read_csv('data/runway.csv', index_col = 'user_id')

In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 192544 entries, 420272 to 123612
Data columns (total 14 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   fit             192544 non-null  object 
 1   bust size       174133 non-null  object 
 2   item_id         192544 non-null  int64  
 3   weight          162562 non-null  object 
 4   rating          192462 non-null  float64
 5   rented for      192534 non-null  object 
 6   review_text     192482 non-null  object 
 7   body type       177907 non-null  object 
 8   review_summary  192199 non-null  object 
 9   category        192544 non-null  object 
 10  height          191867 non-null  object 
 11  size            192544 non-null  int64  
 12  age             191584 non-null  float64
 13  review_date     192544 non-null  object 
dtypes: float64(2), int64(2), object(10)
memory usage: 22.0+ MB


Select user-item rating columns

In [6]:
ratings = df[['user_id', 'item_id', 'rating']].copy()

Drop duplicate ratings on same item from same user

In [7]:
to_drop = ratings[ratings[['user_id', 'item_id']].duplicated()].index
ratings_nodup = ratings.drop(to_drop)

Pivot these columns to produce ratings matrix

In [None]:
pivot_ratings = (
    ratings_nodup
        .sort_values(by=['user_id', 'item_id'])
        .pivot(index='user_id', columns='item_id', values='rating')
)

Save ratings matrix to csv file

In [20]:
# pivot_ratings.to_csv('./datasets/RentTheRunwayRatings.csv')