# Recommendation System
 Next task of the project is to create following types of recommendation systems.
1. User-based recommendation system
2. Item-based recommendation system

Once you get the best-suited recommendation system, the next task is to recommend 20 products that a user is most likely to purchase based on the ratings. You can use the 'reviews_username' (one of the columns in the dataset) to identify your user. 

Now, the next task is to link this recommendation system with the sentiment analysis model that was built earlier (recall that we asked you to select one ML model out of the four options). Once you recommend 20 products to a particular user using the recommendation engine, you need to filter out the 5 best products based on the sentiments of the 20 recommended product reviews. 

### Loading the libraries and reading the data

In [429]:
# import libraties
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

In [430]:
#read the product reviews
prod_df = pd.read_csv('sample30.csv')
prod_df.head(3)

Unnamed: 0,id,brand,categories,manufacturer,name,reviews_date,reviews_didPurchase,reviews_doRecommend,reviews_rating,reviews_text,reviews_title,reviews_userCity,reviews_userProvince,reviews_username,user_sentiment
0,AV13O1A8GV-KLJ3akUyj,Universal Music,"Movies, Music & Books,Music,R&b,Movies & TV,Mo...",Universal Music Group / Cash Money,Pink Friday: Roman Reloaded Re-Up (w/dvd),2012-11-30T06:21:45.000Z,,,5,i love this album. it's very good. more to the...,Just Awesome,Los Angeles,,joshua,Positive
1,AV14LG0R-jtxr-f38QfS,Lundberg,"Food,Packaged Foods,Snacks,Crackers,Snacks, Co...",Lundberg,Lundberg Organic Cinnamon Toast Rice Cakes,2017-07-09T00:00:00.000Z,True,,5,Good flavor. This review was collected as part...,Good,,,dorothy w,Positive
2,AV14LG0R-jtxr-f38QfS,Lundberg,"Food,Packaged Foods,Snacks,Crackers,Snacks, Co...",Lundberg,Lundberg Organic Cinnamon Toast Rice Cakes,2017-07-09T00:00:00.000Z,True,,5,Good flavor.,Good,,,dorothy w,Positive


In [431]:
prod_df[prod_df['id']=='AV14LG0R-jtxr-f38QfS']['reviews_text']

1    Good flavor. This review was collected as part...
2                                         Good flavor.
Name: reviews_text, dtype: object

In [432]:
#No rows duplicated if take all the columns
prod_df[prod_df.duplicated()]

Unnamed: 0,id,brand,categories,manufacturer,name,reviews_date,reviews_didPurchase,reviews_doRecommend,reviews_rating,reviews_text,reviews_title,reviews_userCity,reviews_userProvince,reviews_username,user_sentiment


In [433]:
#length of id and name are same , so id can be consider as prod id
print(len(np.unique(prod_df.id)))
print(len(np.unique(prod_df.name)))

271
271


In [434]:
#checking prod id , prod name , username , reviews text and rating duplicates. 
#considering One user can rate each product multiple time with multiple reviews
prod_df[prod_df.duplicated(subset=['name','reviews_username','reviews_text','reviews_rating'], keep=False)]

Unnamed: 0,id,brand,categories,manufacturer,name,reviews_date,reviews_didPurchase,reviews_doRecommend,reviews_rating,reviews_text,reviews_title,reviews_userCity,reviews_userProvince,reviews_username,user_sentiment
43,AV1h6Gu0glJLPUi8IjA_,Johnson's,"Personal Care,Baby Care,Baby Bubble Bath,Baby,...",Johnson's,"Johnson's Baby Bubble Bath and Wash, 15oz",2017-05-09T07:36:44.000Z,,True,4,"Well, Johnson's need I say more I know, right....",2 In 1!!!,Rohnert Park,,solo,Positive
44,AV1h6Gu0glJLPUi8IjA_,Johnson's,"Personal Care,Baby Care,Baby Bubble Bath,Baby,...",Johnson's,"Johnson's Baby Bubble Bath and Wash, 15oz",2017-05-09T00:00:00.000Z,False,True,4,"Well, Johnson's need I say more I know, right....",2 in 1!!!,,,solo,Positive
67,AV1l8zRZvKc47QAVhnAv,Olay,"Personal Care,Skin Care,Anti-Aging,Beauty,Face...",P&G,Olay Regenerist Deep Hydration Regenerating Cream,2016-04-30T03:08:38.000Z,,True,3,Today is my first time using this product. Fel...,Why No Fragrance-Free Formula,Brooklyn,,mylifeinheels,Positive
79,AV1l8zRZvKc47QAVhnAv,Olay,"Personal Care,Skin Care,Anti-Aging,Beauty,Face...",P&G,Olay Regenerist Deep Hydration Regenerating Cream,2016-04-30T00:00:00.000Z,False,True,3,Today is my first time using this product. Fel...,Why No fragrance-free Formula,,,mylifeinheels,Positive
90,AV1l8zRZvKc47QAVhnAv,Olay,"Personal Care,Skin Care,Anti-Aging,Beauty,Face...",P&G,Olay Regenerist Deep Hydration Regenerating Cream,2016-04-30T03:08:38.000Z,,True,3,Today is my first time using this product. Fel...,Why No Fragrance-free Formula,Brooklyn,,mylifeinheels,Positive
191,AV1l8zRZvKc47QAVhnAv,Olay,"Personal Care,Skin Care,Anti-Aging,Beauty,Face...",P&G,Olay Regenerist Deep Hydration Regenerating Cream,2015-06-01T19:55:40.000Z,,True,5,I have used this product for years and my skin...,Wonderful Product,Houston,,ladyjs28,Positive
192,AV1l8zRZvKc47QAVhnAv,Olay,"Personal Care,Skin Care,Anti-Aging,Beauty,Face...",P&G,Olay Regenerist Deep Hydration Regenerating Cream,2015-05-27T22:28:52.000Z,,True,5,I loved this product. My skin was so soft and ...,Regenerating Cream,St. Clair Shores,,cindy95,Positive
193,AV1l8zRZvKc47QAVhnAv,Olay,"Personal Care,Skin Care,Anti-Aging,Beauty,Face...",P&G,Olay Regenerist Deep Hydration Regenerating Cream,2010-05-08T17:46:02.000Z,,,5,This is one of the best moisturizers I have ev...,Love This Product,,,itan,Positive
194,AV1l8zRZvKc47QAVhnAv,Olay,"Personal Care,Skin Care,Anti-Aging,Beauty,Face...",P&G,Olay Regenerist Deep Hydration Regenerating Cream,2015-05-26T18:17:32.000Z,,True,5,The best moisturizer I have found. Non greasy....,Great Moisturizer,,,gtsp,Positive
195,AV1l8zRZvKc47QAVhnAv,Olay,"Personal Care,Skin Care,Anti-Aging,Beauty,Face...",P&G,Olay Regenerist Deep Hydration Regenerating Cream,2016-06-27T18:39:33.000Z,,True,5,This is a really good product. It feels light ...,Great Product For A Reasonable Price,Santa Cruz,,lindylou60,Positive


In [435]:
prod_df.groupby(['id' ,'name']).agg('count')


Unnamed: 0_level_0,Unnamed: 1_level_0,brand,categories,manufacturer,reviews_date,reviews_didPurchase,reviews_doRecommend,reviews_rating,reviews_text,reviews_title,reviews_userCity,reviews_userProvince,reviews_username,user_sentiment
id,name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
AV13O1A8GV-KLJ3akUyj,Pink Friday: Roman Reloaded Re-Up (w/dvd),1,1,1,1,0,0,1,1,1,1,0,1,1
AV14LG0R-jtxr-f38QfS,Lundberg Organic Cinnamon Toast Rice Cakes,2,2,2,2,2,0,2,2,2,0,0,2,2
AV16khLE-jtxr-f38VFn,K-Y Love Sensuality Pleasure Gel,27,27,27,27,27,25,27,27,27,0,0,27,27
AV1YGDqsGV-KLJ3adc-O,Windex Original Glass Cleaner Refill 67.6oz (2 Liter),348,348,348,348,328,341,348,348,348,14,0,348,348
AV1YIch7GV-KLJ3addeG,"Heinz Tomato Ketchup, 38oz",1,1,1,1,0,1,1,1,1,0,0,1,1
AV1YlENIglJLPUi8IHsX,Kind Dark Chocolate Chunk Gluten Free Granola Bars - 5 Count,17,17,17,17,16,17,17,17,17,1,0,17,17
AV1YmBrdGV-KLJ3adewb,"Pantene Color Preserve Volume Shampoo, 25.4oz",18,18,18,18,18,17,18,18,18,0,0,18,18
AV1YmDL9vKc47QAVgr7_,"Aussie Aussome Volume Shampoo, 13.5 Oz",89,89,89,89,85,86,89,89,89,1,0,89,89
AV1Ymf_rglJLPUi8II2v,Cars Toon: Mater's Tall Tales,34,34,34,34,22,33,34,34,34,0,0,34,34
AV1Yn94nvKc47QAVgtst,CeraVe SA Renewing Cream,25,25,25,25,25,25,25,25,25,0,0,25,25


`271 unique rows ,Which means id can be considered as product id`

In [436]:
prod_df.shape

(30000, 15)

In [437]:
#removing the duplicate rows
prod_df_final = prod_df.sort_values('reviews_username').drop_duplicates(subset=['name','reviews_username','reviews_text','reviews_rating'], keep='last')

In [438]:
prod_df_final.shape

(28206, 15)

In [440]:
#Now we can see no duplicate record prod id , prod name , username , reviews text and rating duplicates.
prod_df_final[prod_df_final.duplicated(subset=['name','reviews_username','reviews_text','reviews_rating'], keep=False)]

Unnamed: 0,id,brand,categories,manufacturer,name,reviews_date,reviews_didPurchase,reviews_doRecommend,reviews_rating,reviews_text,reviews_title,reviews_userCity,reviews_userProvince,reviews_username,user_sentiment


In [441]:
#Now we can see no duplicate record  prod name , username and rating duplicates.
prod_df_final[prod_df_final.duplicated(subset=['name','reviews_username','reviews_rating'], keep=False)]

Unnamed: 0,id,brand,categories,manufacturer,name,reviews_date,reviews_didPurchase,reviews_doRecommend,reviews_rating,reviews_text,reviews_title,reviews_userCity,reviews_userProvince,reviews_username,user_sentiment
5852,AVpf2tw1ilAPnD_xjflC,Summit Entertainment,"Movies & TV Shows,Instawatch Movies By VUDU,Sh...",Summit Entertainment,Red (special Edition) (dvdvideo),2013-09-07T00:00:00.000Z,,True,5,I orderd on line and picked it up at the CS De...,Alot of action Great,,,143st,Positive
5853,AVpf2tw1ilAPnD_xjflC,Summit Entertainment,"Movies & TV Shows,Instawatch Movies By VUDU,Sh...",Summit Entertainment,Red (special Edition) (dvdvideo),2013-09-07T00:00:00.000Z,,True,5,I orderd on line and picked it up at the CS De...,Great Movie,,,143st,Positive
3296,AVpe5JOgilAPnD_xQPfE,Sony Music,"Movies, Music & Books,Music,Rock,Music on CD o...",Columbia,The Script - No Sound Without Silence (cd),2014-11-17T02:08:22.000Z,,True,5,"In my opinion, their best yet! Much more like ...",Their Best Album Yet!,,,abc,Positive
3323,AVpe5JOgilAPnD_xQPfE,Sony Music,"Movies, Music & Books,Music,Rock,Music on CD o...",Columbia,The Script - No Sound Without Silence (cd),2014-11-17T00:00:00Z,,,5,"In my opinion, their best yet! Much more like ...",Their best album yet!,,,abc,Positive
10712,AVpf3VOfilAPnD_xjpun,Clorox,"Household Essentials,Cleaning Supplies,Kitchen...",Clorox,Clorox Disinfecting Wipes Value Pack Scented 1...,2012-02-04T00:00:00.000Z,False,True,5,"I really like to wipe the door knobs and ref, ...",this product is great for a clean ups,,,ac94,Positive
10687,AVpf3VOfilAPnD_xjpun,Clorox,"Household Essentials,Cleaning Supplies,Kitchen...",Clorox,Clorox Disinfecting Wipes Value Pack Scented 1...,2012-01-26T00:00:00.000Z,False,True,5,It like to know the house is clean by using th...,product is very good,,,ac94,Positive
15350,AVpf5CnILJeJML43FjaU,Weather Tech,"Auto & Tires,Automotive Interior,Car Organizer...",WeatherTech,WeatherTech 40647 14-15 Outlander Cargo Liners...,2014-04-04T07:08:00Z,,,5,"I'll admit to being skeptical at first, but th...",,,,aclass,Positive
15351,AVpf5CnILJeJML43FjaU,Weather Tech,"Auto & Tires,Automotive Interior,Car Organizer...",WeatherTech,WeatherTech 40647 14-15 Outlander Cargo Liners...,2014-04-04T07:08:00Z,,,5,"I'll admit to being skeptical at first, but th...",,,,aclass,Positive
25890,AVpfPaoqLJeJML435Xk9,Warner Home Video,"Movies, Music & Books,Movies,Action & Adventur...",Test,Godzilla 3d Includes Digital Copy Ultraviolet ...,2014-12-26T00:00:00.000Z,,True,5,A wonderful creation of the Japanese classic w...,It's godzilla,,,adam,Positive
24686,AVpfPaoqLJeJML435Xk9,Warner Home Video,"Movies, Music & Books,Movies,Action & Adventur...",Test,Godzilla 3d Includes Digital Copy Ultraviolet ...,2016-01-17T00:00:00.000Z,,True,5,Great movie at a great value on sale. Excellen...,All is well,,,adam,Positive


In [442]:
#we can see duplicate records  for product name , reviews_username  reviews_rating
prod_df_final = prod_df_final.sort_values('reviews_username').drop_duplicates(subset=['name','reviews_username','reviews_rating'], keep='last')
prod_df_final[prod_df_final.duplicated(subset=['name','reviews_username','reviews_rating'], keep=False)]

Unnamed: 0,id,brand,categories,manufacturer,name,reviews_date,reviews_didPurchase,reviews_doRecommend,reviews_rating,reviews_text,reviews_title,reviews_userCity,reviews_userProvince,reviews_username,user_sentiment


In [444]:
prod_df_final.shape

(27763, 15)

#### As we know for recommending products to users , three important columns are 
- name --> Product Name
- reviews_username --> user Id
- reviews_rating --> rating provided by user

In [449]:
# lets take only these column and create new data frame
recm_df = prod_df_final[['name','reviews_username','reviews_rating']]
recm_df.head(3)

Unnamed: 0,name,reviews_username,reviews_rating
3499,Chex Muddy Buddies Brownie Supreme Snack Mix,00dog3,4
2603,My Big Fat Greek Wedding 2 (blu-Ray + Dvd + Di...,00sab00,3
1804,Mike Dave Need Wedding Dates (dvd + Digital),00sab00,1


In [450]:
# checking the shape 
recm_df.shape

(27763, 3)

In [451]:
# checking null values
recm_df.isnull().sum()

name                 0
reviews_username    24
reviews_rating       0
dtype: int64

In [452]:
# Can remove the null column
recm_df = recm_df[~recm_df.reviews_username.isna()]
print("After excluding the null user id " ,recm_df.shape)
print("=======checking null values=======")
recm_df.isnull().sum()

After excluding the null user id  (27739, 3)


name                0
reviews_username    0
reviews_rating      0
dtype: int64

In [453]:
new_column_name = {'name':'prod_name','reviews_username' :'user_name' ,'reviews_rating':'rating'}
recm_df= recm_df.rename(columns = new_column_name)
recm_df.rating.head(3)

3499    4
2603    3
1804    1
Name: rating, dtype: int64

In [454]:
recm_df.rating.value_counts()

5    19574
4     5504
3     1224
1     1059
2      378
Name: rating, dtype: int64

In [455]:
recm_df.user_name.value_counts()

mike              23
chris             18
lisa              15
rick              14
sandy             13
james             13
tony              11
jenn              11
jojo              11
laura             11
john              11
mary              10
linda             10
cindy             10
patty              9
charlie            9
dave               9
mimi               9
donna              9
brian              9
mark               9
joey               9
happy              8
steve              8
scott              8
matt               8
chrissy            8
rebecca            8
alex               8
angie              8
                  ..
katleo             1
monjack            1
mari0623           1
wyoshopgril        1
emmilou            1
coolbox            1
whythis            1
holly12345         1
saberwulf52        1
jrose              1
jerr88             1
allie32            1
peewee29           1
chunkin            1
mrlambert          1
shemmeter          1
photogeek200 

In [475]:
# cheking the duplicate of product name and user name 
recm_df[recm_df.duplicated(subset=['prod_name','user_name'], keep=False)]

Unnamed: 0,prod_name,user_name,rating
27254,Planes: Fire Rescue (2 Discs) (includes Digita...,7.87E+11,3
28354,Planes: Fire Rescue (2 Discs) (includes Digita...,7.87E+11,5
2219,Mike Dave Need Wedding Dates (dvd + Digital),aaron,5
2121,Mike Dave Need Wedding Dates (dvd + Digital),aaron,4
25774,Godzilla 3d Includes Digital Copy Ultraviolet ...,adam,5
24077,Godzilla 3d Includes Digital Copy Ultraviolet ...,adam,4
4669,The Resident Evil Collection 5 Discs (blu-Ray),akhan,4
5309,The Resident Evil Collection 5 Discs (blu-Ray),akhan,5
25702,Godzilla 3d Includes Digital Copy Ultraviolet ...,alex,5
23991,Godzilla 3d Includes Digital Copy Ultraviolet ...,alex,4


In [478]:
recm_df.shape

(27739, 3)

In [479]:
#  checking the duplicate of product name and username  and keeping the record with maximum rating
recm_df = recm_df.sort_values(by='rating' , ascending=False).drop_duplicates(subset=['prod_name','user_name'], keep='last')
recm_df.shape

(27588, 3)

In [481]:
# cheking the duplicate of product name and user name  , No duplicate of product and username now
recm_df[recm_df.duplicated(subset=['prod_name','user_name'], keep=False)]

Unnamed: 0,prod_name,user_name,rating


#### Note: We can't consider products which are rated only once, Lets take products with review count more than 50

In [482]:
#creating counts dataframe
recm_df_counts = recm_df.groupby('prod_name').count()
recm_df_counts = recm_df_counts.reset_index()
recm_df_counts.head()

Unnamed: 0,prod_name,user_name,rating
0,0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. ...,6,6
1,100:Complete First Season (blu-Ray),135,135
2,2017-2018 Brownline174 Duraflex 14-Month Plann...,4,4
3,"2x Ultra Era with Oxi Booster, 50fl oz",5,5
4,"42 Dual Drop Leaf Table with 2 Madrid Chairs""",1,1


In [485]:
#merge counts df with existing df
recm_df_final = recm_df.merge(recm_df_counts , on ='prod_name')
recm_df_final = recm_df_final.rename({'user_name_x': 'user_name', 'rating_x': 'rating','user_name_y' :'review_count'}, axis=1) 
recm_df_final = recm_df_final.drop(['rating_y'],axis=1)
recm_df_final.head()

Unnamed: 0,prod_name,user_name,rating,review_count
0,Godzilla 3d Includes Digital Copy Ultraviolet ...,kharmo88,5,3148
1,Godzilla 3d Includes Digital Copy Ultraviolet ...,kenneth,5,3148
2,Godzilla 3d Includes Digital Copy Ultraviolet ...,kellyk,5,3148
3,Godzilla 3d Includes Digital Copy Ultraviolet ...,kerstenjay,5,3148
4,Godzilla 3d Includes Digital Copy Ultraviolet ...,kbruno,5,3148


In [486]:
recm_df_final.shape

(27588, 4)

In [487]:
#lets take product having review count more than 50
recm_df_final = recm_df_final[recm_df_final['review_count']>50]
recm_df_final.shape

(25419, 4)

### Dividing the dataset into train and test

In [527]:
# Test and Train split of the dataset.
train, test = train_test_split(recm_df_final, test_size=0.30, random_state=31)
print(train.shape)
print(test.shape)

(17793, 4)
(7626, 4)


In [528]:
# Pivot the train ratings' dataset into matrix format in which columns are movies and the rows are user IDs.
df_pivot = train.pivot(
    index='user_name',
    columns='prod_name',
    values='rating'
    #aggfunc ={ 'rating':'mean'}
).fillna(0)

df_pivot.head(3)

prod_name,100:Complete First Season (blu-Ray),Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",Axe Dry Anti-Perspirant Deodorant Invisible Solid Phoenix,"Banana Boat Sunless Summer Color Self Tanning Lotion, Light To Medium",Bisquick Original Pancake And Baking Mix - 40oz,"Bounce Dryer Sheets, Fresh Linen, 160 sheets","Burt's Bees Lip Shimmer, Raisin",...,"Storkcraft Tuscany Glider and Ottoman, Beige Cushions, Espresso Finish",The Resident Evil Collection 5 Discs (blu-Ray),There's Something About Mary (dvd),Tostitos Bite Size Tortilla Chips,Tresemme Kertatin Smooth Infusing Conditioning,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
01impala,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
02deuce,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [529]:
df_pivot[df_pivot['100:Complete First Season (blu-Ray)']!=0.0].head(5)

prod_name,100:Complete First Season (blu-Ray),Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",Axe Dry Anti-Perspirant Deodorant Invisible Solid Phoenix,"Banana Boat Sunless Summer Color Self Tanning Lotion, Light To Medium",Bisquick Original Pancake And Baking Mix - 40oz,"Bounce Dryer Sheets, Fresh Linen, 160 sheets","Burt's Bees Lip Shimmer, Raisin",...,"Storkcraft Tuscany Glider and Ottoman, Beige Cushions, Espresso Finish",The Resident Evil Collection 5 Discs (blu-Ray),There's Something About Mary (dvd),Tostitos Bite Size Tortilla Chips,Tresemme Kertatin Smooth Infusing Conditioning,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
08dallas,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
aaronm,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
aechking,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
ald13,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
alex,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Creating dummy train & dummy test dataset
These dataset will be used for prediction 
- Dummy train will be used later for prediction of the movies which has not been rated by the user. To ignore the movies rated by the user, we will mark it as 0 during prediction. The movies not rated by user is marked as 1 for prediction in dummy train dataset. 

- Dummy test will be used for evaluation. To evaluate, we will only make prediction on the movies rated by the user. So, this is marked as 1. This is just opposite of dummy_train.

In [530]:
# Copy the train dataset into dummy_train
dummy_train = train.copy()

In [531]:
# The prod not rated by user is marked as 1 for prediction. 
dummy_train['rating'] = dummy_train['rating'].apply(lambda x: 0 if x>=1 else 1)

In [532]:
# Pivot the train ratings' dataset into matrix format in which columns are movies and the rows are user IDs.
dummy_train = dummy_train.pivot(
    index='user_name',
    columns='prod_name',
    values='rating'
    #aggfunc ={ 'rating':'mean'}
).fillna(1)

dummy_train.head(3)

prod_name,100:Complete First Season (blu-Ray),Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",Axe Dry Anti-Perspirant Deodorant Invisible Solid Phoenix,"Banana Boat Sunless Summer Color Self Tanning Lotion, Light To Medium",Bisquick Original Pancake And Baking Mix - 40oz,"Bounce Dryer Sheets, Fresh Linen, 160 sheets","Burt's Bees Lip Shimmer, Raisin",...,"Storkcraft Tuscany Glider and Ottoman, Beige Cushions, Espresso Finish",The Resident Evil Collection 5 Discs (blu-Ray),There's Something About Mary (dvd),Tostitos Bite Size Tortilla Chips,Tresemme Kertatin Smooth Infusing Conditioning,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
01impala,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
02deuce,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


**Cosine Similarity**

Cosine Similarity is a measurement that quantifies the similarity between two vectors [Which is Rating Vector in this case] 

**Adjusted Cosine**

Adjusted cosine similarity is a modified version of vector-based similarity where we incorporate the fact that different users have different ratings schemes. In other words, some users might rate items highly in general, and others might give items lower ratings as a preference. To handle this nature from rating given by user , we subtract average ratings for each user from each user's rating for different movies.



# User Similarity Matrix
### Using Cosine Similarity

In [533]:
from sklearn.metrics.pairwise import pairwise_distances

# Creating the User Similarity Matrix using pairwise_distance function.
user_correlation = 1 - pairwise_distances(df_pivot, metric='cosine')
user_correlation[np.isnan(user_correlation)] = 0
print(user_correlation)

[[1. 0. 0. ... 0. 0. 0.]
 [0. 1. 1. ... 0. 0. 0.]
 [0. 1. 1. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 1. 1. 1.]
 [0. 0. 0. ... 1. 1. 1.]
 [0. 0. 0. ... 1. 1. 1.]]


In [534]:
user_correlation.shape

(16595, 16595)

## Using adjusted Cosine 
### Here, we are not removing the NaN values and calculating the mean only for the movies rated by the user

In [535]:
# Pivot the train ratings' dataset into matrix format in which columns are movies and the rows are user IDs.
df_pivot = train.pivot(
    index='user_name',
    columns='prod_name',
    values='rating'
    #aggfunc ={ 'rating':'mean'}
)

df_pivot.head(3)

prod_name,100:Complete First Season (blu-Ray),Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",Axe Dry Anti-Perspirant Deodorant Invisible Solid Phoenix,"Banana Boat Sunless Summer Color Self Tanning Lotion, Light To Medium",Bisquick Original Pancake And Baking Mix - 40oz,"Bounce Dryer Sheets, Fresh Linen, 160 sheets","Burt's Bees Lip Shimmer, Raisin",...,"Storkcraft Tuscany Glider and Ottoman, Beige Cushions, Espresso Finish",The Resident Evil Collection 5 Discs (blu-Ray),There's Something About Mary (dvd),Tostitos Bite Size Tortilla Chips,Tresemme Kertatin Smooth Infusing Conditioning,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,,,,,,,,,,,...,,,,,,,,,,
01impala,,,,,,,,,,,...,,,,,,,,,,
02deuce,,,,,,,,,,,...,,,,,,,,,,


### Normalising the rating of the product for each user around 0 mean

In [536]:
mean = np.nanmean(df_pivot, axis=1)
df_subtracted = (df_pivot.T-mean).T
df_subtracted.head()

prod_name,100:Complete First Season (blu-Ray),Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",Axe Dry Anti-Perspirant Deodorant Invisible Solid Phoenix,"Banana Boat Sunless Summer Color Self Tanning Lotion, Light To Medium",Bisquick Original Pancake And Baking Mix - 40oz,"Bounce Dryer Sheets, Fresh Linen, 160 sheets","Burt's Bees Lip Shimmer, Raisin",...,"Storkcraft Tuscany Glider and Ottoman, Beige Cushions, Espresso Finish",The Resident Evil Collection 5 Discs (blu-Ray),There's Something About Mary (dvd),Tostitos Bite Size Tortilla Chips,Tresemme Kertatin Smooth Infusing Conditioning,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,,,,,,,,,,,...,,,,,,,,,,
01impala,,,,,,,,,,,...,,,,,,,,,,
02deuce,,,,,,,,,,,...,,,,,,,,,,
06stidriver,,,,,,,,,,,...,,,,,,,,,,
08dallas,0.0,,,,,,,,,,...,,,,,,,,,,


### Finding cosine similarity ( this will be adjusted)

In [537]:
from sklearn.metrics.pairwise import pairwise_distances

In [538]:
# Creating the User Similarity Matrix using pairwise_distance function.
user_correlation = 1 - pairwise_distances(df_subtracted.fillna(0), metric='cosine')
user_correlation[np.isnan(user_correlation)] = 0
print(user_correlation)

[[1. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


## Prediction - User User
Doing the prediction for the users which are positively related with other users, and not the users which are negatively related as we are interested in the users which are more similar to the current users. So, ignoring the correlation for values less than 0. 

In [539]:
user_correlation[user_correlation<0]=0
user_correlation

array([[1., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

Rating predicted by the user (for products rated as well as not rated) is the weighted sum of correlation with the product rating (as present in the rating dataset). 

In [540]:
user_predicted_ratings = np.dot(user_correlation, df_pivot.fillna(0))
user_predicted_ratings

array([[5.59732494, 0.        , 0.47245559, ..., 0.        , 0.96824584,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])

In [541]:
user_predicted_ratings.shape

(16595, 49)

Since we are interested only in the products not rated by the user, we will ignore the products rated by the user by making it zero. 

In [542]:
user_final_rating = np.multiply(user_predicted_ratings,dummy_train)
user_final_rating.head()

prod_name,100:Complete First Season (blu-Ray),Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",Axe Dry Anti-Perspirant Deodorant Invisible Solid Phoenix,"Banana Boat Sunless Summer Color Self Tanning Lotion, Light To Medium",Bisquick Original Pancake And Baking Mix - 40oz,"Bounce Dryer Sheets, Fresh Linen, 160 sheets","Burt's Bees Lip Shimmer, Raisin",...,"Storkcraft Tuscany Glider and Ottoman, Beige Cushions, Espresso Finish",The Resident Evil Collection 5 Discs (blu-Ray),There's Something About Mary (dvd),Tostitos Bite Size Tortilla Chips,Tresemme Kertatin Smooth Infusing Conditioning,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,5.597325,0.0,0.472456,0.0,0.0,0.0,0.0,2.5,0.0,4.436492,...,0.968246,13.286947,2.5,3.634733,0.0,2.0,3.162278,0.0,0.968246,0.0
01impala,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
02deuce,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
06stidriver,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
08dallas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [619]:
#shape of this dataframe
user_final_rating.shape

(16595, 49)

In [543]:
user_final_rating.loc[(user_final_rating.loc[:, user_final_rating.dtypes != object] != 0.0).any(1)]

prod_name,100:Complete First Season (blu-Ray),Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",Axe Dry Anti-Perspirant Deodorant Invisible Solid Phoenix,"Banana Boat Sunless Summer Color Self Tanning Lotion, Light To Medium",Bisquick Original Pancake And Baking Mix - 40oz,"Bounce Dryer Sheets, Fresh Linen, 160 sheets","Burt's Bees Lip Shimmer, Raisin",...,"Storkcraft Tuscany Glider and Ottoman, Beige Cushions, Espresso Finish",The Resident Evil Collection 5 Discs (blu-Ray),There's Something About Mary (dvd),Tostitos Bite Size Tortilla Chips,Tresemme Kertatin Smooth Infusing Conditioning,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,5.597325,0.000000,0.472456,0.000000,0.000000,0.0,0.000000,2.500000,0.0,4.436492,...,0.968246,13.286947,2.500000,3.634733,0.000000,2.000000,3.162278,0.000000,0.968246,0.000000
1943,4.508758,2.500000,0.000000,0.000000,0.000000,0.0,1.982726,2.886751,0.0,8.518975,...,4.579144,39.964263,8.390098,3.254312,0.000000,2.500000,0.000000,2.886751,4.782206,0.000000
aaron,5.597325,0.000000,0.000000,0.000000,1.290994,0.0,0.000000,2.500000,0.0,4.436492,...,0.000000,8.911456,1.666667,4.107189,0.000000,1.333333,3.162278,0.000000,0.000000,0.000000
abbey,0.000000,0.000000,0.000000,0.000000,2.500000,0.0,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
abby,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,2.500000,0.0,1.020621,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,3.061862,0.000000
aimee,0.422577,0.000000,0.472456,3.061862,0.000000,0.0,0.000000,3.061862,0.0,0.000000,...,0.968246,4.798068,0.000000,3.465303,0.000000,0.000000,0.000000,0.000000,3.468246,0.000000
aktcharlotte,4.886751,3.500000,0.000000,0.000000,0.215166,0.0,0.000000,0.000000,0.0,0.000000,...,0.000000,4.859681,3.333333,4.330127,0.000000,0.666667,0.000000,2.500000,4.500000,0.000000
alex,0.000000,0.422577,0.000000,0.517549,1.190187,0.0,0.000000,3.540962,0.0,5.594361,...,0.227838,0.000000,3.505819,5.797812,0.000000,1.549449,3.556284,0.487950,0.830956,0.000000
ammi,5.773503,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,2.000000,...,0.000000,2.689264,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
amy1,5.597325,0.000000,0.472456,0.000000,0.000000,0.0,0.000000,2.500000,0.0,4.436492,...,0.968246,13.286947,2.500000,3.634733,0.000000,2.000000,3.162278,0.000000,0.968246,0.000000


### Finding the top 20 recommendation for the *user*

In [617]:
# Take the user ID as input.
user_input = (input("Enter your user name"))
print(user_input)

Enter your user nametammy
tammy


In [618]:
# Top 20 products are 
d = user_final_rating.loc[user_input].sort_values(ascending=False)[0:20]
d

prod_name
Yes To Carrots Nourishing Body Wash                                                0.0
Hormel Chili, No Beans                                                             0.0
Head & Shoulders Classic Clean Conditioner                                         0.0
Godzilla 3d Includes Digital Copy Ultraviolet 3d/2d Blu-Ray/dvd                    0.0
Dark Shadows (includes Digital Copy) (ultraviolet) (dvdvideo)                      0.0
Cuisinart174 Electric Juicer - Stainless Steel Cje-1000                            0.0
Coty Airspun Face Powder, Translucent Extra Coverage                               0.0
Clorox Disinfecting Wipes Value Pack Scented 150 Ct Total                          0.0
Clorox Disinfecting Bathroom Cleaner                                               0.0
Clear Scalp & Hair Therapy Total Care Nourishing Shampoo                           0.0
Chips Ahoy! Original Chocolate Chip - Cookies - Family Size 18.2oz                 0.0
Chester's Cheese Flavored Puffcor

In [546]:
#Mapping product name with existing dataframe 
reviews_top20_df= pd.merge(d,prod_df,left_on='prod_name',right_on='name', how = 'left')
reviews_top20_df.head(4)

Unnamed: 0,alex,id,brand,categories,manufacturer,name,reviews_date,reviews_didPurchase,reviews_doRecommend,reviews_rating,reviews_text,reviews_title,reviews_userCity,reviews_userProvince,reviews_username,user_sentiment
0,37.311418,AVpfRTh1ilAPnD_xYic2,Disney,"Movies, Music & Books,Movies,Kids' & Family,Wa...",Walt Disney,Planes: Fire Rescue (2 Discs) (includes Digita...,2014-11-07T00:00:00.000Z,,True,1,Would recommend this movie for all families wi...,Great movie to watch with grand children,,,grumps,Negative
1,37.311418,AVpfRTh1ilAPnD_xYic2,Disney,"Movies, Music & Books,Movies,Kids' & Family,Wa...",Walt Disney,Planes: Fire Rescue (2 Discs) (includes Digita...,2014-11-07T00:00:00.000Z,,True,1,My daughter collects animated movies and she l...,cars was cute,,,katj,Positive
2,37.311418,AVpfRTh1ilAPnD_xYic2,Disney,"Movies, Music & Books,Movies,Kids' & Family,Wa...",Walt Disney,Planes: Fire Rescue (2 Discs) (includes Digita...,2014-11-09T00:00:00.000Z,,False,1,We all know that Disney releases great movies ...,Poor 2nd Tier Disney Movie,,,tholly,Positive
3,37.311418,AVpfRTh1ilAPnD_xYic2,Disney,"Movies, Music & Books,Movies,Kids' & Family,Wa...",Walt Disney,Planes: Fire Rescue (2 Discs) (includes Digita...,2014-11-18T00:00:00.000Z,False,True,1,Very good flick.,It's just GOOD! I can't wait for the next one.,,,papierone,Positive


# Evaluation - User User 
Evaluation will we same as you have seen above for the prediction. The only difference being, you will evaluate for the movie already rated by the user insead of predicting it for the movie not rated by the user. 

In [547]:
# Find out the common users of test and train dataset.
common = test[test.user_name.isin(train.user_name)]
common.shape

(907, 4)

In [548]:
common.head()

Unnamed: 0,prod_name,user_name,rating,review_count
6875,Clorox Disinfecting Wipes Value Pack Scented 1...,natashavs,5,7786
7073,Clorox Disinfecting Wipes Value Pack Scented 1...,moomoo,5,7786
15233,My Big Fat Greek Wedding 2 (blu-Ray + Dvd + Di...,tommty41,5,650
3745,Clorox Disinfecting Bathroom Cleaner,mommy2three,5,1893
11902,Clorox Disinfecting Wipes Value Pack Scented 1...,drvnsnow,4,7786


In [549]:
# convert into the user-product matrix.
common_user_based_matrix = common.pivot_table(index='user_name', columns='prod_name', values='rating')

In [550]:
common_user_based_matrix.head()

prod_name,100:Complete First Season (blu-Ray),Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",Bisquick Original Pancake And Baking Mix - 40oz,"Bounce Dryer Sheets, Fresh Linen, 160 sheets","Burt's Bees Lip Shimmer, Raisin","Caress Moisturizing Body Bar Natural Silk, 4.75oz",Cheetos Crunchy Flamin' Hot Cheese Flavored Snacks,...,Stargate (ws) (ultimate Edition) (director's Cut) (dvdvideo),"Storkcraft Tuscany Glider and Ottoman, Beige Cushions, Espresso Finish",The Resident Evil Collection 5 Discs (blu-Ray),There's Something About Mary (dvd),Tostitos Bite Size Tortilla Chips,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1234,,,,,,,,,,,...,,,,,,,,,,
aaron,,,,,,,,,,,...,,,,,,,,,,
abbi,,,,,,,,,,,...,,,,,,,,,,
ac94,,,,,,,,,,,...,,,,,,,,,,
acg1,,,,,,,,,,,...,,,,,,,,,,


In [551]:
common_user_based_matrix.shape

(788, 44)

In [552]:
# Convert the user_correlation matrix into dataframe.
user_correlation_df = pd.DataFrame(user_correlation)

In [553]:
user_correlation_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,16585,16586,16587,16588,16589,16590,16591,16592,16593,16594
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [554]:
df_subtracted.head(1)

prod_name,100:Complete First Season (blu-Ray),Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",Axe Dry Anti-Perspirant Deodorant Invisible Solid Phoenix,"Banana Boat Sunless Summer Color Self Tanning Lotion, Light To Medium",Bisquick Original Pancake And Baking Mix - 40oz,"Bounce Dryer Sheets, Fresh Linen, 160 sheets","Burt's Bees Lip Shimmer, Raisin",...,"Storkcraft Tuscany Glider and Ottoman, Beige Cushions, Espresso Finish",The Resident Evil Collection 5 Discs (blu-Ray),There's Something About Mary (dvd),Tostitos Bite Size Tortilla Chips,Tresemme Kertatin Smooth Infusing Conditioning,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,,,,,,,,,,,...,,,,,,,,,,


In [555]:
user_correlation_df['user_name'] = df_subtracted.index
user_correlation_df.set_index('user_name',inplace=True)
user_correlation_df.head()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,16585,16586,16587,16588,16589,16590,16591,16592,16593,16594
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
01impala,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
02deuce,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
06stidriver,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
08dallas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [556]:
common.head(1)

Unnamed: 0,prod_name,user_name,rating,review_count
6875,Clorox Disinfecting Wipes Value Pack Scented 1...,natashavs,5,7786


In [557]:
list_name = common.user_name.tolist()
list_name[:10]


['natashavs',
 'moomoo',
 'tommty41',
 'mommy2three',
 'drvnsnow',
 'mrsb',
 'tiffy',
 'adelynsmom',
 'luke',
 'xstr8edgex']

In [558]:
df_subtracted.index.tolist()

['00sab00',
 '01impala',
 '02deuce',
 '06stidriver',
 '08dallas',
 '09mommy11',
 '1.11E+24',
 '1085',
 '10ten',
 '1143mom',
 '11677j',
 '1234',
 '1234567',
 '123cat123',
 '123charlie',
 '123numbers',
 '123soccermom',
 '127726',
 '12cass12',
 '12gage',
 '132457',
 '13dani',
 '13ld',
 '1421nikki',
 '143st',
 '148maine',
 '1515',
 '15425shopper',
 '170361eggs',
 '1753',
 '17roses',
 '1943',
 '1950rmm',
 '1968bear',
 '1970',
 '19bubba67',
 '1awesome1',
 '1buzymom',
 '1cadet',
 '1chynna',
 '1clean1',
 '1friendlycat',
 '1gamer',
 '1glenn',
 '1happymom',
 '1hotmama',
 '1izzy1',
 '1jc1',
 '1kindword',
 '1movielover2',
 '1officegal',
 '1okc2thunder3up4',
 '1olaygal',
 '1pleasedclient',
 '1scooby1',
 '1shop',
 '1sonny',
 '1stgrade',
 '1stlady',
 '1sungirl',
 '1texasmom',
 '1tomg',
 '1wildbill2l',
 '1witch',
 '2011mom2b',
 '2013bestbuyer',
 '2014bestbuys',
 '2015mom',
 '2016',
 '2062351337',
 '2175046722',
 '21honey',
 '21please',
 '24hrstoneroses',
 '2532674594',
 '262jennifer',
 '2689',
 '28gre

In [559]:
user_correlation_df.columns = df_subtracted.index.tolist()


user_correlation_df_1 =  user_correlation_df[user_correlation_df.index.isin(list_name)]

In [560]:
user_correlation_df_1.shape

(788, 16595)

In [561]:
user_correlation_df_2 = user_correlation_df_1.T[user_correlation_df_1.T.index.isin(list_name)]

In [562]:
user_correlation_df_3 = user_correlation_df_2.T

In [563]:
user_correlation_df_3.head()

Unnamed: 0_level_0,1234,aaron,abbi,ac94,acg1,actionaction,acv4217,adam,adelynsmom,aep1010,...,wizard,wolfie,wonster67,worm,wvshopaholic,xstr8edgex,yeyo,yummy,yvonne,zach
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1234,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
aaron,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
abbi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
ac94,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
acg1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [564]:
user_correlation_df_3.shape

(788, 788)

In [565]:
user_correlation_df_3[user_correlation_df_3<0]=0

common_user_predicted_ratings = np.dot(user_correlation_df_3, common_user_based_matrix.fillna(0))
common_user_predicted_ratings

array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 2.        , 0.        , ..., 3.16227766, 1.15470054,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])

In [568]:
dummy_test = common.copy()

dummy_test['rating'] = dummy_test['rating'].apply(lambda x: 1 if x>=1 else 0)

dummy_test = dummy_test.pivot_table(index='user_name', columns='prod_name', values='rating').fillna(0)

In [569]:
dummy_test.shape

(788, 44)

In [570]:
common_user_predicted_ratings = np.multiply(common_user_predicted_ratings,dummy_test)

In [571]:
common_user_predicted_ratings.head(2)

prod_name,100:Complete First Season (blu-Ray),Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",Bisquick Original Pancake And Baking Mix - 40oz,"Bounce Dryer Sheets, Fresh Linen, 160 sheets","Burt's Bees Lip Shimmer, Raisin","Caress Moisturizing Body Bar Natural Silk, 4.75oz",Cheetos Crunchy Flamin' Hot Cheese Flavored Snacks,...,Stargate (ws) (ultimate Edition) (director's Cut) (dvdvideo),"Storkcraft Tuscany Glider and Ottoman, Beige Cushions, Espresso Finish",The Resident Evil Collection 5 Discs (blu-Ray),There's Something About Mary (dvd),Tostitos Bite Size Tortilla Chips,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1234,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
aaron,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Calculating the RMSE for only the movies rated by user. For RMSE, normalising the rating to (1,5) range.

In [572]:
from sklearn.preprocessing import MinMaxScaler
from numpy import *

X  = common_user_predicted_ratings.copy() 
X = X[X>0]

scaler = MinMaxScaler(feature_range=(1, 5))
print(scaler.fit(X))
y = (scaler.transform(X))

print(y)

MinMaxScaler(copy=True, feature_range=(1, 5))
[[nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 ...
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]]


  data_min = np.nanmin(X, axis=0)
  data_max = np.nanmax(X, axis=0)


In [573]:
common_ = common.pivot_table(index='user_name', columns='prod_name', values='rating')

In [574]:
# Finding total non-NaN value
total_non_nan = np.count_nonzero(~np.isnan(y))

In [575]:
rmse = (sum(sum((common_ - y )**2))/total_non_nan)**0.5
print(rmse)

2.2594021644398103


## Using Item similarity
# Item Based Similarity
Taking the transpose of the rating matrix to normalize the rating around the mean for different movie ID. In the user based similarity, we had taken mean for each user instead of each movie. 

In [576]:
df_pivot = train.pivot(
    index='user_name',
    columns='prod_name',
    values='rating'
    #aggfunc ={ 'rating':'mean'} 
).T

df_pivot.head()

user_name,00sab00,01impala,02deuce,06stidriver,08dallas,09mommy11,1.11E+24,1085,10ten,1143mom,...,zookeeper,zpalma,zsarah,zsazsa,zubb,zulaa118,zuttle,zwithanx,zxcsdfd,zyiah4
prod_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
100:Complete First Season (blu-Ray),,,,,5.0,,,,,,...,,,,,,,,,,
Alex Cross (dvdvideo),,,,,,,,,,,...,,,,,,,,,,
"Aussie Aussome Volume Shampoo, 13.5 Oz",,,,,,,,,1.0,,...,,,,,,,,,,
"Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz",,,,,,,,,,,...,,,,,,,,,,
"Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",,,,,,,,,,,...,,,,,,,,,,


Normalising the movie rating for each movie for using the Adujsted Cosine

In [577]:
mean = np.nanmean(df_pivot, axis=1)
df_subtracted = (df_pivot.T-mean).T

In [578]:
df_subtracted.head()

user_name,00sab00,01impala,02deuce,06stidriver,08dallas,09mommy11,1.11E+24,1085,10ten,1143mom,...,zookeeper,zpalma,zsarah,zsazsa,zubb,zulaa118,zuttle,zwithanx,zxcsdfd,zyiah4
prod_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
100:Complete First Season (blu-Ray),,,,,0.244186,,,,,,...,,,,,,,,,,
Alex Cross (dvdvideo),,,,,,,,,,,...,,,,,,,,,,
"Aussie Aussome Volume Shampoo, 13.5 Oz",,,,,,,,,-3.181818,,...,,,,,,,,,,
"Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz",,,,,,,,,,,...,,,,,,,,,,
"Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",,,,,,,,,,,...,,,,,,,,,,


Finding the cosine similarity using pairwise distances approach

In [579]:
from sklearn.metrics.pairwise import pairwise_distances

In [580]:
# Item Similarity Matrix
item_correlation = 1 - pairwise_distances(df_subtracted.fillna(0), metric='cosine')
item_correlation[np.isnan(item_correlation)] = 0
print(item_correlation)

[[1.        0.        0.        ... 0.        0.        0.       ]
 [0.        1.        0.        ... 0.        0.        0.       ]
 [0.        0.        1.        ... 0.        0.        0.       ]
 ...
 [0.        0.        0.        ... 1.        0.        0.0024206]
 [0.        0.        0.        ... 0.        1.        0.       ]
 [0.        0.        0.        ... 0.0024206 0.        1.       ]]


Filtering the correlation only for which the value is greater than 0. (Positively correlated)

In [581]:
item_correlation[item_correlation<0]=0
item_correlation

array([[1.       , 0.       , 0.       , ..., 0.       , 0.       ,
        0.       ],
       [0.       , 1.       , 0.       , ..., 0.       , 0.       ,
        0.       ],
       [0.       , 0.       , 1.       , ..., 0.       , 0.       ,
        0.       ],
       ...,
       [0.       , 0.       , 0.       , ..., 1.       , 0.       ,
        0.0024206],
       [0.       , 0.       , 0.       , ..., 0.       , 1.       ,
        0.       ],
       [0.       , 0.       , 0.       , ..., 0.0024206, 0.       ,
        1.       ]])

In [582]:
item_correlation.shape

(49, 49)

In [583]:
df_pivot.shape

(49, 16595)

# Prediction - Item Item

In [584]:
item_predicted_ratings = np.dot((df_pivot.fillna(0).T),item_correlation)
item_predicted_ratings

array([[0.0386923 , 0.0083131 , 0.00794257, ..., 0.17440336, 0.00280169,
        0.00985624],
       [0.0095759 , 0.        , 0.        , ..., 0.        , 0.        ,
        0.00271577],
       [0.01276786, 0.        , 0.        , ..., 0.        , 0.        ,
        0.00362103],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.00213688,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.00213688,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.00213688,
        0.        ]])

In [585]:
item_predicted_ratings.shape

(16595, 49)

In [586]:
dummy_train.shape

(16595, 49)

In [587]:
dummy_train.head()

prod_name,100:Complete First Season (blu-Ray),Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",Axe Dry Anti-Perspirant Deodorant Invisible Solid Phoenix,"Banana Boat Sunless Summer Color Self Tanning Lotion, Light To Medium",Bisquick Original Pancake And Baking Mix - 40oz,"Bounce Dryer Sheets, Fresh Linen, 160 sheets","Burt's Bees Lip Shimmer, Raisin",...,"Storkcraft Tuscany Glider and Ottoman, Beige Cushions, Espresso Finish",The Resident Evil Collection 5 Discs (blu-Ray),There's Something About Mary (dvd),Tostitos Bite Size Tortilla Chips,Tresemme Kertatin Smooth Infusing Conditioning,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
01impala,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
02deuce,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
06stidriver,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
08dallas,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


### Filtering the rating only for the movies not rated by the user for recommendation

In [588]:
item_final_rating = np.multiply(item_predicted_ratings,dummy_train)
item_final_rating.head()

prod_name,100:Complete First Season (blu-Ray),Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",Axe Dry Anti-Perspirant Deodorant Invisible Solid Phoenix,"Banana Boat Sunless Summer Color Self Tanning Lotion, Light To Medium",Bisquick Original Pancake And Baking Mix - 40oz,"Bounce Dryer Sheets, Fresh Linen, 160 sheets","Burt's Bees Lip Shimmer, Raisin",...,"Storkcraft Tuscany Glider and Ottoman, Beige Cushions, Espresso Finish",The Resident Evil Collection 5 Discs (blu-Ray),There's Something About Mary (dvd),Tostitos Bite Size Tortilla Chips,Tresemme Kertatin Smooth Infusing Conditioning,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,0.038692,0.008313,0.007943,0.0058,0.0,0.0,0.0,0.004325,0.001238,0.001438,...,0.004532,0.018343,0.0,0.0,0.0062,0.002869,0.0,0.174403,0.002802,0.009856
01impala,0.009576,0.0,0.0,0.0,0.0,0.0,0.0,0.001268,0.0,0.0,...,0.0,0.00528,0.002919,0.008486,0.007172,0.0,0.002062,0.0,0.0,0.002716
02deuce,0.012768,0.0,0.0,0.0,0.0,0.0,0.0,0.00169,0.0,0.0,...,0.0,0.00704,0.003893,0.011315,0.009563,0.0,0.00275,0.0,0.0,0.003621
06stidriver,0.0,0.0,0.0,0.0,0.000584,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.003852,0.0,0.003911,0.002422,0.0,0.0,0.002137,0.0
08dallas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.020606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [589]:
item_final_rating[item_final_rating['100:Complete First Season (blu-Ray)']!=0.0]

prod_name,100:Complete First Season (blu-Ray),Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",Axe Dry Anti-Perspirant Deodorant Invisible Solid Phoenix,"Banana Boat Sunless Summer Color Self Tanning Lotion, Light To Medium",Bisquick Original Pancake And Baking Mix - 40oz,"Bounce Dryer Sheets, Fresh Linen, 160 sheets","Burt's Bees Lip Shimmer, Raisin",...,"Storkcraft Tuscany Glider and Ottoman, Beige Cushions, Espresso Finish",The Resident Evil Collection 5 Discs (blu-Ray),There's Something About Mary (dvd),Tostitos Bite Size Tortilla Chips,Tresemme Kertatin Smooth Infusing Conditioning,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,0.038692,0.008313,0.007943,0.005800,0.000000,0.0,0.0,0.004325,0.001238,0.001438,...,0.004532,0.018343,0.000000,0.000000,0.006200,0.002869,0.000000,0.174403,0.002802,0.009856
01impala,0.009576,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.001268,0.000000,0.000000,...,0.000000,0.005280,0.002919,0.008486,0.007172,0.000000,0.002062,0.000000,0.000000,0.002716
02deuce,0.012768,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.001690,0.000000,0.000000,...,0.000000,0.007040,0.003893,0.011315,0.009563,0.000000,0.002750,0.000000,0.000000,0.003621
1085,0.064487,0.013855,0.013238,0.009667,0.000000,0.0,0.0,0.007209,0.000000,0.002396,...,0.007554,0.027361,0.000000,0.000000,0.010333,0.000000,0.000000,0.290672,0.000000,0.016427
1234,0.045413,0.000000,0.000000,0.000000,0.004010,0.0,0.0,0.001690,0.000000,0.000000,...,0.000000,0.010792,0.006974,0.016955,0.012692,0.001938,0.002750,0.000000,0.001710,0.003621
123cat123,0.064487,0.013855,0.013238,0.009667,0.000000,0.0,0.0,0.007209,0.000000,0.002396,...,0.007554,0.027361,0.000000,0.000000,0.010333,0.000000,0.000000,0.290672,0.000000,0.016427
123charlie,0.012768,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.001690,0.004951,0.000000,...,0.000000,0.014746,0.003893,0.011315,0.009563,0.011477,0.002750,0.000000,0.011207,0.003621
123numbers,0.015960,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.002113,0.000000,0.000000,...,0.000000,0.008800,0.004866,0.014144,0.011954,0.000000,0.003437,0.000000,0.000000,0.004526
12gage,0.015960,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.002113,0.000000,0.000000,...,0.000000,0.008800,0.004866,0.014144,0.011954,0.000000,0.003437,0.000000,0.000000,0.004526
13ld,0.040807,0.000000,0.000000,0.000000,0.004429,0.0,0.0,0.000000,0.000000,0.000000,...,0.000000,0.004690,0.000000,0.007050,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


### Finding the top 20 recommendation for the *user*

In [591]:
# Take the user ID as input
user_input = (input("Enter your user name"))
print(user_input)

Enter your user namealex
alex


In [593]:
# Recommending the Top 5 products to the user.
ditem = item_final_rating.loc[user_input].sort_values(ascending=False)[0:20]
ditem

prod_name
Chester's Cheese Flavored Puffcorn Snacks                                          0.149498
My Big Fat Greek Wedding 2 (blu-Ray + Dvd + Digital)                               0.138592
Jason Aldean - They Don't Know                                                     0.096085
Cuisinart174 Electric Juicer - Stainless Steel Cje-1000                            0.052971
Planes: Fire Rescue (2 Discs) (includes Digital Copy) (blu-Ray/dvd)                0.048179
Dark Shadows (includes Digital Copy) (ultraviolet) (dvdvideo)                      0.038792
Chips Ahoy! Original Chocolate Chip - Cookies - Family Size 18.2oz                 0.034822
Nexxus Exxtra Gel Style Creation Sculptor                                          0.025297
Cheetos Crunchy Flamin' Hot Cheese Flavored Snacks                                 0.024446
Caress Moisturizing Body Bar Natural Silk, 4.75oz                                  0.024237
Lysol Concentrate Deodorizing Cleaner, Original Scent                 

In [594]:
#Mapping product name with existing dataframe to get their review text and title 
reviews_item_top20_df= pd.merge(ditem,prod_df,left_on='prod_name',right_on='name', how = 'left')
reviews_item_top20_df.head(4)

Unnamed: 0,alex,id,brand,categories,manufacturer,name,reviews_date,reviews_didPurchase,reviews_doRecommend,reviews_rating,reviews_text,reviews_title,reviews_userCity,reviews_userProvince,reviews_username,user_sentiment
0,0.149498,AVpf5olc1cnluZ0-tPrO,Chester's,"Food,Packaged Foods,Snacks,Chips & Pretzels,Fo...",Frito-Lay,Chester's Cheese Flavored Puffcorn Snacks,2015-10-20T00:00:00.000Z,False,False,1,Decided to try these based on the good ratings...,Overrated salty air puffs,,,jmansinclair,Positive
1,0.149498,AVpf5olc1cnluZ0-tPrO,Chester's,"Food,Packaged Foods,Snacks,Chips & Pretzels,Fo...",Frito-Lay,Chester's Cheese Flavored Puffcorn Snacks,2016-01-31T00:00:00.000Z,False,False,1,bag was open and spilled all over box,bag open,,,jill,Negative
2,0.149498,AVpf5olc1cnluZ0-tPrO,Chester's,"Food,Packaged Foods,Snacks,Chips & Pretzels,Fo...",Frito-Lay,Chester's Cheese Flavored Puffcorn Snacks,2015-09-05T00:00:00.000Z,False,False,1,Half the bag was stale,Stale,,,jackie,Negative
3,0.149498,AVpf5olc1cnluZ0-tPrO,Chester's,"Food,Packaged Foods,Snacks,Chips & Pretzels,Fo...",Frito-Lay,Chester's Cheese Flavored Puffcorn Snacks,2017-03-09T00:00:00.000Z,True,True,2,Bag of flour broke opened the cheese puffs and...,broken cheese puffs and spread,,,sam97,Negative


# Evaluation - Item Item
Evaluation will we same as you have seen above for the prediction. The only difference being, you will evaluate for the product already rated by the user insead of predicting it for the product not rated by the user. 

In [595]:
test.columns

Index(['prod_name', 'user_name', 'rating', 'review_count'], dtype='object')

In [597]:
common =  test[test.user_name.isin(train.user_name)]
common.shape

(907, 4)

In [598]:
common.head(4)

Unnamed: 0,prod_name,user_name,rating,review_count
6875,Clorox Disinfecting Wipes Value Pack Scented 1...,natashavs,5,7786
7073,Clorox Disinfecting Wipes Value Pack Scented 1...,moomoo,5,7786
15233,My Big Fat Greek Wedding 2 (blu-Ray + Dvd + Di...,tommty41,5,650
3745,Clorox Disinfecting Bathroom Cleaner,mommy2three,5,1893


In [600]:
common_item_based_matrix = common.pivot_table(index='user_name', columns='prod_name', values='rating').T

In [601]:
common_item_based_matrix.shape

(44, 788)

In [602]:
item_correlation_df = pd.DataFrame(item_correlation)

In [603]:
item_correlation_df.head(1)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,39,40,41,42,43,44,45,46,47,48
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.004121,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [604]:
item_correlation_df['prod_name'] = df_subtracted.index
item_correlation_df.set_index('prod_name',inplace=True)
item_correlation_df.head()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,39,40,41,42,43,44,45,46,47,48
prod_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
100:Complete First Season (blu-Ray),1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.004121,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Alex Cross (dvdvideo),0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.002104,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Aussie Aussome Volume Shampoo, 13.5 Oz",0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.003022,0.0,0.0,0.0,0.0,0.0,0.0
"Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz",0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.003239,0.0,0.0,...,0.0,0.0,0.0,0.00069,0.0,0.0,0.0,0.0,0.0,0.0
"Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.000344,...,0.0,0.0,0.0,0.000704,0.0,0.0,0.0,0.0,0.0,0.0


In [606]:
list_name = common.prod_name.tolist()

In [607]:
item_correlation_df.columns = df_subtracted.index.tolist()

item_correlation_df_1 =  item_correlation_df[item_correlation_df.index.isin(list_name)]

In [608]:
#To make sure that we are filtering the movies in column level also  , above step was for row level ,
#why because item_correlation matrix is of x*x matrix which is movie * movie or item * item
item_correlation_df_2 = item_correlation_df_1.T[item_correlation_df_1.T.index.isin(list_name)]

item_correlation_df_3 = item_correlation_df_2.T

In [609]:
item_correlation_df_3.head()

Unnamed: 0_level_0,100:Complete First Season (blu-Ray),Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",Bisquick Original Pancake And Baking Mix - 40oz,"Bounce Dryer Sheets, Fresh Linen, 160 sheets","Burt's Bees Lip Shimmer, Raisin","Caress Moisturizing Body Bar Natural Silk, 4.75oz",Cheetos Crunchy Flamin' Hot Cheese Flavored Snacks,...,Stargate (ws) (ultimate Edition) (director's Cut) (dvdvideo),"Storkcraft Tuscany Glider and Ottoman, Beige Cushions, Espresso Finish",The Resident Evil Collection 5 Discs (blu-Ray),There's Something About Mary (dvd),Tostitos Bite Size Tortilla Chips,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
prod_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
100:Complete First Season (blu-Ray),1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00249,...,0.0,0.0,0.004121,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Alex Cross (dvdvideo),0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.002104,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Aussie Aussome Volume Shampoo, 13.5 Oz",0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00486,...,0.004161,0.0,0.0,0.0,0.003022,0.0,0.0,0.0,0.0,0.0
"Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz",0.0,0.0,0.0,1.0,0.0,0.003239,0.0,0.0,0.0,0.0,...,0.003038,0.0,0.0,0.0,0.00069,0.0,0.0,0.0,0.0,0.0
"Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter",0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.000344,0.001125,0.0,...,0.0,0.0,0.0,0.0,0.000704,0.0,0.0,0.0,0.0,0.0


In [610]:
item_correlation_df_3[item_correlation_df_3<0]=0

common_item_predicted_ratings = np.dot(item_correlation_df_3, common_item_based_matrix.fillna(0))
common_item_predicted_ratings


array([[0.        , 0.01595983, 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.01453388, 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.03334162,
        0.        ],
       ...,
       [0.        , 0.        , 0.00407168, ..., 0.        , 0.        ,
        0.        ],
       [0.01400847, 0.        , 0.08061798, ..., 0.        , 0.        ,
        0.00262829],
       [0.        , 0.00452629, 0.00462532, ..., 0.        , 0.        ,
        0.        ]])

In [611]:
common_item_predicted_ratings.shape

(44, 788)

Dummy test will be used for evaluation. To evaluate, we will only make prediction on the movies rated by the user. So, this is marked as 1. This is just opposite of dummy_train



In [612]:
dummy_test = common.copy()

dummy_test['rating'] = dummy_test['rating'].apply(lambda x: 1 if x>=1 else 0)

dummy_test = dummy_test.pivot_table(index='user_name', columns='prod_name', values='rating').T.fillna(0)

common_item_predicted_ratings = np.multiply(common_item_predicted_ratings,dummy_test)

The products not rated is marked as 0 for evaluation. And make the item- item matrix representaion.


In [613]:
common_ = common.pivot_table(index='user_name', columns='prod_name', values='rating').T

In [614]:
from sklearn.preprocessing import MinMaxScaler
from numpy import *

X  = common_item_predicted_ratings.copy() 
X = X[X>0]

scaler = MinMaxScaler(feature_range=(1, 5))
print(scaler.fit(X))
y = (scaler.transform(X))

print(y)

MinMaxScaler(copy=True, feature_range=(1, 5))
[[nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 ...
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]]


In [615]:
# Finding total non-NaN value
total_non_nan = np.count_nonzero(~np.isnan(y))

In [616]:
rmse = (sum(sum((common_ - y )**2))/total_non_nan)**0.5
print(rmse)

3.3926154245062774


# Final Recommendation Summary

#### <font color='red'><i> As we know that Root Mean Square Error (RMSE) is the standard deviation of the residuals (prediction errors). RMSE is a measure of how spread out these residuals are. In other words, it tells how concentrated the data is around the line of best fit.
#### <font color='red'><i>Here we see user User based prediction having RMSE value as - 2.2594021644398103 and Item Item Based prediction RMSE value as 3.3926154245062774. So for this scenario to recommend any product for user, user-user based similarity Matrix will be best


# Saving the User User  in csv file 

In [622]:
user_final_rating.to_csv('recommednation_user_final_rating.csv')
print("Saved Successfully!")

Saved Successfully!
