# Decision Exercise: Mods 21 & 22

## Personalizing the Fashion Browsing Experience

**Background**

[Rent the Runway](https://www.renttherunway.com/) allows users to browse and rent thousands of designer quality clothes so their users can always be wearing stylish apparel without breaking the bank. With thousands of options to choose, all with varying sizes and styles, the shopping experience could quickly become overwhelming without some degree of personalization. From a business perspective, revenue can be increased by presenting users with the right offer, the right product, at the right time.

**Prompt**

What features of a recommender system make it effective for both the business (increase revenue) and the user experience (personalize information space)? In addition to collaborative filtering approaches, how can other techniques (content based, rule based, or hybrid) be used to augment the recommendation engine? How should these methods be measured within the context of the business objectives of Rent the Runway?

**Data Source**

Decomposing fit semantics for product size recommendation in metric spaces
Rishabh Misra, Mengting Wan, Julian McAuley
RecSys, 2018
[pdf](http://cseweb.ucsd.edu/~jmcauley/pdfs/recsys18e.pdf)

**Data Description**

```
rows: user_ids; this is a particular user ID

columns: item_ids; these are the product number for a particular product
```

The values for a (row, column) is a rating that a user gave that particular item. As you might notice, the matrix contains mostly null values because most users rate a small percentage of items and items are rated by a handful of users.

Import our packages

In [1]:
import pandas as pd
import gzip
import json

Define our data file to the variable DATAFILE

In [2]:
DATAFILE = './datasets/RentTheRunwayRatings.csv'

Read our data file into a DataFrame and view the head of the DataFrame

In [None]:
# read in prepared ratings matrix
df = pd.read_csv(DATAFILE, index_col='user_id')
df.head()

### To view raw data...
Read in the raw data

In [4]:
RAW_DATA = './datasets/renttherunway_final_data.json.gz'
data = []
with gzip.open(RAW_DATA, 'r') as f:
    contents = f.readlines()
    for line in contents:
        data.append(json.loads(line))

Convert to `pandas.DataFrame()` object

In [5]:
df = pd.DataFrame(data)
df.head()

Unnamed: 0,age,body type,bust size,category,fit,height,item_id,rating,rented for,review_date,review_summary,review_text,size,user_id,weight
0,28,hourglass,34d,romper,fit,"5' 8""",2260466,10,vacation,"April 20, 2016",So many compliments!,An adorable romper! Belt and zipper were a lit...,14,420272,137lbs
1,36,straight & narrow,34b,gown,fit,"5' 6""",153475,10,other,"June 18, 2013",I felt so glamourous!!!,I rented this dress for a photo shoot. The the...,12,273551,132lbs
2,116,,,sheath,fit,"5' 4""",1063761,10,party,"December 14, 2015",It was a great time to celebrate the (almost) ...,This hugged in all the right places! It was a ...,4,360448,
3,34,pear,34c,dress,fit,"5' 5""",126335,8,formal affair,"February 12, 2014",Dress arrived on time and in perfect condition.,I rented this for my company's black tie award...,8,909926,135lbs
4,27,athletic,34b,gown,fit,"5' 9""",616682,10,wedding,"September 26, 2016",Was in love with this dress !!!,I have always been petite in my upper body and...,12,151944,145lbs


Select user-item rating columns

In [6]:
ratings = df[['user_id', 'item_id', 'rating']].copy()

Drop duplicate ratings on same item from same user

In [7]:
to_drop = ratings[ratings[['user_id', 'item_id']].duplicated()].index
ratings_nodup = ratings.drop(to_drop)

Pivot these columns to produce ratings matrix

In [None]:
pivot_ratings = (
    ratings_nodup
        .sort_values(by=['user_id', 'item_id'])
        .pivot(index='user_id', columns='item_id', values='rating')
)

Save ratings matrix to csv file

In [20]:
# pivot_ratings.to_csv('./datasets/RentTheRunwayRatings.csv')