#### Background information and Problem statement
Shoppers rely on Amazon’s product authority to find and buy the latest products and to get timely solutions to their needs. 

In this project, I will try to build a recommendation system model from Amazon's data to help them improve their customers' experience by recommending the optimal product for them .

#### Product Recommendation System for e-commerce businesses
A well developed recommendation system will help businesses improve their shopper's experience on website and result in better customer acquisition and retention.

The recommendation system, I have designed below is based on the journey of a new customer from the time he/she lands on the business’s website for the first time to when he/she makes repeat purchases.

The recommendation system is designed in 3 parts based on the business context:

#### Recommendation system part I:
Product pupularity based system targetted at new customers

#### Recommendation system part II:
Model-based collaborative filtering system based on customer's purchase history and ratings provided by other users who bought similar items

#### Recommendation system part III:
When a business is setting up its e-commerce website for the first time without any product rating.

When a new customer without any previous purchase history visits the e-commerce website for the first time, he/she is recommended the most popular products sold on the company's website. Once, he/she makes a purchase, the recommendation system updates and recommends other products based on the purchase history and ratings provided by other users on the website. The latter part is done using collaborative filtering techniques.

In [2]:
import numpy as np
import pandas as pd
import os
os.chdir("C:\\Users\RM\\Desktop\\recommendation_system")

In [48]:
products = pd.read_csv('ratings_beauty.csv',encoding="ISO-8859-1")
train = pd.read_csv("train.csv",encoding="ISO-8859-1")
attributes = pd.read_csv('attributes.csv',encoding="ISO-8859-1")

In [50]:
train.head(5)

Unnamed: 0,id,product_uid,product_title,search_term,relevance
0,2,100001,Simpson Strong-Tie 12-Gauge Angle,angle bracket,3.0
1,3,100001,Simpson Strong-Tie 12-Gauge Angle,l bracket,2.5
2,9,100002,BEHR Premium Textured DeckOver 1-gal. #SC-141 ...,deck over,3.0
3,16,100005,Delta Vero 1-Handle Shower Only Faucet Trim Ki...,rain shower head,2.33
4,17,100005,Delta Vero 1-Handle Shower Only Faucet Trim Ki...,shower only faucet,2.67


In [51]:
attributes.head(5)

Unnamed: 0,product_uid,name,value
0,100001.0,Bullet01,Versatile connector for various 90Â° connectio...
1,100001.0,Bullet02,Stronger than angled nailing or screw fastenin...
2,100001.0,Bullet03,Help ensure joints are consistently straight a...
3,100001.0,Bullet04,Dimensions: 3 in. x 3 in. x 1-1/2 in.
4,100001.0,Bullet05,Made from 12-Gauge steel


In [55]:
products.head().sort_values('ProductId', ascending=False)

Unnamed: 0,UserId,ProductId,Rating,Timestamp
4,A3IAAVS479H7M7,737104473,1.0,1274227200
3,A1WMRR494NWEWV,733001998,4.0,1382572800
1,A3JM6GV9MNOF9X,558925278,3.0,1355443200
2,A1Z513UWSAAO0F,558925278,5.0,1404691200
0,A39HTATAQ9V7YF,205616461,5.0,1369699200


In [56]:

print(products.shape)


(2023070, 4)


In [57]:
products.isnull().sum()

UserId       0
ProductId    0
Rating       0
Timestamp    0
dtype: int64

In [58]:
products.duplicated().sum()

0

### Popularity Based Recommender System

In [59]:
user_ratings_product = products

In [60]:
user_ratings_product.shape

(2023070, 4)

In [168]:
df=user_ratings_product

In [169]:
df=df[df['ProductId'].str.startswith('B')]

In [175]:
#df[df['ProductId'] not in user_ratings_product['ProductId'].isin(df)]

In [61]:
#num_rating_df = ratings_with_name.groupby('Book-Title').count()['Book-Rating'].reset_index()
#num_rating_df.rename(columns={'Book-Rating':'num_ratings'},inplace=True)
#num_rating_df

In [62]:
num_rating_df=user_ratings_product.groupby('ProductId').count()['Rating'].reset_index()
num_rating_df.rename(columns={'Rating':'num_ratings'},inplace=True)
num_rating_df

Unnamed: 0,ProductId,num_ratings
0,0205616461,1
1,0558925278,2
2,0733001998,1
3,0737104473,1
4,0762451459,1
...,...,...
249269,B00LORWRJA,1
249270,B00LOS7MEE,1
249271,B00LP2YB8E,1
249272,B00LPVG6V0,1


In [71]:
avg_rating_df=user_ratings_product.groupby('ProductId').mean()['Rating'].reset_index()
avg_rating_df.rename(columns={'Rating':'avg_ratings'},inplace=True)
avg_rating_df

Unnamed: 0,ProductId,avg_ratings
0,0205616461,5.0
1,0558925278,4.0
2,0733001998,4.0
3,0737104473,1.0
4,0762451459,5.0
...,...,...
249269,B00LORWRJA,5.0
249270,B00LOS7MEE,5.0
249271,B00LP2YB8E,5.0
249272,B00LPVG6V0,5.0


In [72]:
popular_df = num_rating_df.merge(avg_rating_df,on='ProductId')
popular_df

Unnamed: 0,ProductId,num_ratings,avg_ratings
0,0205616461,1,5.0
1,0558925278,2,4.0
2,0733001998,1,4.0
3,0737104473,1,1.0
4,0762451459,1,5.0
...,...,...,...
249269,B00LORWRJA,1,5.0
249270,B00LOS7MEE,1,5.0
249271,B00LP2YB8E,1,5.0
249272,B00LPVG6V0,1,5.0


In [73]:
#popular_df = popular_df[popular_df['num_ratings']>=250].sort_values('avg_rating',ascending=False).head(50)


In [77]:
popular_products=popular_df[popular_df['num_ratings']>=150].sort_values('avg_ratings', ascending= False).head(50)

In [87]:
print(popular_products['y = filtered_rating.groupby('Book-Title').count()['Book-Rating']>=50
famous_books = y[y].index'].to_string(index=False))

B00I46E8DC
B00GJX58PE
B00IBS9QC6
B00F008GFQ
B001FB5NTG
B00HJD8NLY
B00CJ0TZ1E
B0027Z2720
B00A1Y177A
B000127UUA
B008Q0E714
B00KHH2VOY
B00I32AN4K
B00IP42FBA
B000NNDNYY
B00D0ANSAG
B003AY949G
B000PHUKEE
B004N7DQHA
B007O7AZBG
B00KAL5JAU
B00KHGIK54
B00KWFDBKE
B004TSF8R4
B000WZ6P34
B0011DMHZG
B00D9NV20C
B00IT69F62
B000HGIQRG
B00G5WO2VK
B004GC2LLE
B00IT1HKV4
B00GQ9I9IO
B003EMIX8W
B000NSH2L4
B003UNP20W
B0000Y3FRG
B00GMAWI66
B00GQ0GD4A
B006GTKSHY
B001ET77NY
B0001EKWPI
B0006O2IQ4
B003YM5RA4
B003ZTVDBS
B00IDWP4IA
B001ET76E4
B0045W22SW
B000052YMR
B00GS83884


#### Collaborative Filtering Based Recommender System

In [149]:
x = user_ratings_product.groupby('UserId').count()['Rating'] > 10
reliable_users = x[x].index

In [150]:
filtered_rating = user_ratings_product[user_ratings_product['UserId'].isin(reliable_users)]

In [151]:
y = filtered_rating.groupby('ProductId').count()['Rating']>=20
famous_products = y[y].index

In [152]:
final_ratings = filtered_rating[filtered_rating['ProductId'].isin(famous_products)]

In [153]:
pt = final_ratings.pivot_table(index='ProductId',columns='UserId',values='Rating')

In [154]:
pt

UserId,A00473363TJ8YSZ3YAGG9,A00700212KB3K0MVESPIY,A0078719IR14X3NNUG0F,A02155413BVL8D0G7X6DN,A029527620Q3SK5XW16RR,A03364251DGXSGA9PSR99,A042274212BJJVOBS4Q85,A0908131Z7BWYSMRQ16T,A09386383518NVR7RSA4F,A099766128UI0NCS98N1E,...,AZPHYNPEZDMIO,AZTZ7SIIRXLXE,AZUH2MX87LX7J,AZUI6YY673GW5,AZV2AG96CRJ26,AZV7CNMMJERKW,AZWJAXOQMB8EK,AZX1JTTIUYZX4,AZY3Z9QI0G8L,AZZT1ERHBSNQ8
ProductId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
B000050B6U,,,,,,,,,,,...,,,,,,,,,,
B000052WYD,,,,,,,,,,,...,,,,,,,,,,
B000052YJM,,,,,,,,,,,...,,,,,,,,,,
B000052YM4,,,,,,,,,,,...,,,,,,,,,,
B000052YM7,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
B00KNIL36E,,,,,,,,,,,...,,,,,,,,,,
B00KQ4PEBU,,,,,,,,,,,...,,,,,,,,,,
B00KQBR9FM,,,,,,,,,,,...,,,,,,,,,,
B00KQVTX06,,,,,,,,,,,...,,,,,,,,,,


In [155]:
pt.fillna(0,inplace=True)

In [156]:
pt

UserId,A00473363TJ8YSZ3YAGG9,A00700212KB3K0MVESPIY,A0078719IR14X3NNUG0F,A02155413BVL8D0G7X6DN,A029527620Q3SK5XW16RR,A03364251DGXSGA9PSR99,A042274212BJJVOBS4Q85,A0908131Z7BWYSMRQ16T,A09386383518NVR7RSA4F,A099766128UI0NCS98N1E,...,AZPHYNPEZDMIO,AZTZ7SIIRXLXE,AZUH2MX87LX7J,AZUI6YY673GW5,AZV2AG96CRJ26,AZV7CNMMJERKW,AZWJAXOQMB8EK,AZX1JTTIUYZX4,AZY3Z9QI0G8L,AZZT1ERHBSNQ8
ProductId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
B000050B6U,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B000052WYD,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B000052YJM,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B000052YM4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B000052YM7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
B00KNIL36E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B00KQ4PEBU,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B00KQBR9FM,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B00KQVTX06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [157]:
from sklearn.metrics.pairwise import cosine_similarity

In [158]:
similarity_scores = cosine_similarity(pt)

In [159]:
similarity_scores.shape

(1299, 1299)

In [160]:
def recommend(product_id):
    # index fetch
    index = np.where(pt.index==product_id)[0][0]
    similar_items = sorted(list(enumerate(similarity_scores[index])),key=lambda x:x[1],reverse=True)[1:5]
    
    data = []
    for i in similar_items:
        item = []
        temp_df = products[products['ProductId'] == pt.index[i[0]]]
        item.extend(list(temp_df.drop_duplicates('ProductId')['ProductId'].values))
        
        
        data.append(item)
    
    return data

In [176]:
print(recommend('B000052YM7'))

[['B000052YM4'], ['B001E919LU'], ['B000YJ2SKM'], ['B00008CMOQ']]


check for similar product of product's having ids as 'B000052YM7'

so the recommender is recommendind most nearest product to the searhed one.