# Rating Product & Sorting Reviews in Amazon

### İş Problemi

E-ticaretteki en önemli problemlerden bir tanesi ürünlere satış
sonrası verilen puanların doğru şekilde hesaplanmasıdır. Bu
problemin çözümü e-ticaret sitesi için daha fazla müşteri
memnuniyeti sağlamak, satıcılar için ürünün öne çıkması ve satın
alanlar için sorunsuz bir alışveriş deneyimi demektir. Bir diğer
problem ise ürünlere verilen yorumların doğru bir şekilde
sıralanması olarak karşımıza çıkmaktadır. Yanıltıcı yorumların öne
çıkması ürünün satışını doğrudan etkileyeceğinden dolayı hem
maddi kayıp hem de müşteri kaybına neden olacaktır. Bu 2 temel
problemin çözümünde e-ticaret sitesi ve satıcılar satışlarını
arttırırken müşteriler ise satın alma yolculuğunu sorunsuz olarak
tamamlayacaktır.

### Veri Seti Hikayesi
Amazon ürün verilerini içeren bu veri seti ürün kategorileri ile çeşitli metadataları içermektedir. Elektronik kategorisindeki en fazla yorum alan ürünün kullanıcı puanları ve yorumları vardır.

#### 12 Değişken 4915 Gözlem

#### reviewerID: Kullanıcı ID’si
#### asin Ürün: ID’si
#### reviewerName: Kullanıcı Adı
#### helpful: Faydalı değerlendirme derecesi
#### reviewText: Değerlendirme
#### overall: Ürün rating’i
#### summary: Değerlendirme özeti
#### unixReviewTime: Değerlendirme zamanı
#### reviewTime: Değerlendirme zamanı Raw
#### day_diff: Değerlendirmeden itibaren geçen gün sayısı
#### helpful_yes: Değerlendirmenin faydalı bulunma sayısı
#### total_vote: Değerlendirmeye verilen oy sayısı

In [1]:
#Gerekli Kütüphanelerin importu ve ayarlamaların yapımı
import pandas as pd
import math
import scipy.stats as st
from sklearn.preprocessing import MinMaxScaler
from datetime import datetime as dt

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 10)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

In [2]:
df=pd.read_csv("../datasets/amazon_review.csv")
df.head(10)

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime,day_diff,helpful_yes,total_vote
0,A3SBTW3WS4IQSN,B007WTAJTO,,"[0, 0]",No issues.,4.0,Four Stars,1406073600,2014-07-23,138,0,0
1,A18K1ODH1I2MVB,B007WTAJTO,0mie,"[0, 0]","Purchased this for my device, it worked as adv...",5.0,MOAR SPACE!!!,1382659200,2013-10-25,409,0,0
2,A2FII3I2MBMUIA,B007WTAJTO,1K3,"[0, 0]",it works as expected. I should have sprung for...,4.0,nothing to really say....,1356220800,2012-12-23,715,0,0
3,A3H99DFEG68SR,B007WTAJTO,1m2,"[0, 0]",This think has worked out great.Had a diff. br...,5.0,Great buy at this price!!! *** UPDATE,1384992000,2013-11-21,382,0,0
4,A375ZM4U047O79,B007WTAJTO,2&amp;1/2Men,"[0, 0]","Bought it with Retail Packaging, arrived legit...",5.0,best deal around,1373673600,2013-07-13,513,0,0
5,A2IDCSC6NVONIZ,B007WTAJTO,2Cents!,"[0, 0]",It's mini storage. It doesn't do anything els...,5.0,Not a lot to really be said,1367193600,2013-04-29,588,0,0
6,A26YHXZD5UFPVQ,B007WTAJTO,2K1Toaster,"[0, 0]",I have it in my phone and it never skips a bea...,5.0,Works well,1382140800,2013-10-19,415,0,0
7,A3CW0ZLUO5X2B1,B007WTAJTO,"35-year Technology Consumer ""8-tracks to 802.11""","[0, 0]",It's hard to believe how affordable digital ha...,5.0,32 GB for less than two sawbucks...what's not ...,1404950400,2014-10-07,62,0,0
8,A2CYJO155QP33S,B007WTAJTO,4evryoung,"[1, 1]",Works in a HTC Rezound. Was running short of ...,5.0,Loads of room,1395619200,2014-03-24,259,1,1
9,A2S7XG3ZC4VGOQ,B007WTAJTO,53rdcard,"[0, 0]","in my galaxy s4, super fast card, and am total...",5.0,works great,1381449600,2013-11-10,393,0,0


In [3]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
overall,4915.0,4.58759,0.99685,1.0,5.0,5.0,5.0,5.0
unixReviewTime,4915.0,1379465001.66836,15818574.32275,1339200000.0,1365897600.0,1381276800.0,1392163200.0,1406073600.0
day_diff,4915.0,437.36704,209.43987,1.0,281.0,431.0,601.0,1064.0
helpful_yes,4915.0,1.31109,41.61916,0.0,0.0,0.0,0.0,1952.0
total_vote,4915.0,1.52146,44.12309,0.0,0.0,0.0,0.0,2020.0


### Görev 1: Average Rating’i güncel yorumlara göre hesaplayınız ve var olan average rating ile kıyaslayınız.

In [4]:
#Ürünün ortalama puanına bakalım
df["overall"].mean()


4.587589013224822

In [5]:
#reviewTime değişeninin tipini datetime yaptık
df["reviewTime"]=df["reviewTime"].apply(lambda x:dt.strptime(x,'%Y-%m-%d').date())

In [6]:
#Sıralamayı yapacağımız tarih olarak en yeni yorumun tarihini seçelim
current_date=df["reviewTime"].max()
current_date

datetime.date(2014, 12, 7)

In [7]:
#Yorumların kaç gün önce yapıldığını gösteren bir değişken oluşturalım
df["review_life"]=current_date-df["reviewTime"]

In [8]:
#Yorumların kaç gün önce yapıldığına göre veri setini 4 segmente ayırıp her birinin yorumuna ayrı ağırlık verelim
df["Segments"]=pd.qcut(df["review_life"],4,["A","B","C","D"])
df.loc[df["Segments"]=="A","overall"].mean()*0.4+df.loc[df["Segments"]=="B","overall"].mean()*0.3+df.loc[df["Segments"]=="C","overall"].mean()*0.2+df.loc[df["Segments"]=="D","overall"].mean()*0.1


4.628116998159475

In [9]:
#Segmentlerin verdiği puan ortalamsını hesapla
df.groupby("Segments")["overall"].mean()
#Göründüğü gibi puanı şuanki tarihlere yakın zamanda verenlerin verdiği puan daha yüksek
#Belki üründe iyileştirme yapılmış olması olası bir durum


Segments
A   4.69579
B   4.63614
C   4.57166
D   4.44625
Name: overall, dtype: float64

### Görev 2: Ürün için ürün detay sayfasında görüntülenecek 20 review’i belirleyiniz.

In [10]:
df.head()

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime,day_diff,helpful_yes,total_vote,review_life,Segments
0,A3SBTW3WS4IQSN,B007WTAJTO,,"[0, 0]",No issues.,4.0,Four Stars,1406073600,2014-07-23,138,0,0,137 days,A
1,A18K1ODH1I2MVB,B007WTAJTO,0mie,"[0, 0]","Purchased this for my device, it worked as adv...",5.0,MOAR SPACE!!!,1382659200,2013-10-25,409,0,0,408 days,B
2,A2FII3I2MBMUIA,B007WTAJTO,1K3,"[0, 0]",it works as expected. I should have sprung for...,4.0,nothing to really say....,1356220800,2012-12-23,715,0,0,714 days,D
3,A3H99DFEG68SR,B007WTAJTO,1m2,"[0, 0]",This think has worked out great.Had a diff. br...,5.0,Great buy at this price!!! *** UPDATE,1384992000,2013-11-21,382,0,0,381 days,B
4,A375ZM4U047O79,B007WTAJTO,2&amp;1/2Men,"[0, 0]","Bought it with Retail Packaging, arrived legit...",5.0,best deal around,1373673600,2013-07-13,513,0,0,512 days,C


In [14]:
#helpful_no değişkeni üretelim
df["helpful_no"]=df["total_vote"]-df["helpful_yes"]
df.sort_values("helpful_no",ascending=False).head()

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime,day_diff,helpful_yes,total_vote,review_life,Segments,helpful_no
2909,A10B6G6W3DW1EY,B007WTAJTO,Luopo,"[53, 236]",I know armed with this in my Android tablet an...,4.0,Win Win situation,1393200000,2014-02-24,287,53,236,286 days,B,183
4212,AVBMZZAFEKO58,B007WTAJTO,SkincareCEO,"[1568, 1694]",NOTE: please read the last update (scroll to ...,1.0,1 Star reviews - Micro SDXC card unmounts itse...,1375660800,2013-05-08,579,1568,1694,578 days,C,126
2751,A19R7GVV216QKY,B007WTAJTO,Kunchok,"[8, 118]","If price is also double of 64 GB card, then it...",5.0,Price??,1393286400,2014-02-25,286,8,118,285 days,B,110
3449,AOEAD7DPLZE53,B007WTAJTO,NLee the Engineer,"[1428, 1505]",I have tested dozens of SDHC and micro-SDHC ca...,5.0,Top of the class among all (budget-priced) mic...,1348617600,2012-09-26,803,1428,1505,802 days,D,77
317,A1ZQAQFYSXL5MQ,B007WTAJTO,"Amazon Customer ""Kelly""","[422, 495]","If your card gets hot enough to be painful, it...",1.0,"Warning, read this!",1346544000,2012-02-09,1033,422,495,1032 days,D,73


In [15]:
#Sorting fonksiyonlarını import edelim
from Sorting_Rewievs import score_average_rating,wilson_lower_bound,score_up_down_diff

In [20]:
# Score = Average rating = (up ratings) / (all ratings)
df["score_average_rating"]=df.apply(lambda x:score_average_rating(x["helpful_yes"],x["helpful_no"]),axis=1)
# Up-Down Diff Score = (up ratings) − (down ratings)
df["score_pos_neg_diff"]=df.apply(lambda x:score_up_down_diff(x["helpful_yes"],x["helpful_no"]),axis=1)
# Wilson Lower Bound Score
df["Wilson_lower_bound_score"]=df.apply(lambda x:wilson_lower_bound(x["helpful_yes"],x["helpful_no"],confidence=0.95),axis=1)
df.head(10)


Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime,day_diff,helpful_yes,total_vote,review_life,Segments,helpful_no,score_average_rating,score_pos_neg_diff,Wilson_lower_bound_score
0,A3SBTW3WS4IQSN,B007WTAJTO,,"[0, 0]",No issues.,4.0,Four Stars,1406073600,2014-07-23,138,0,0,137 days,A,0,0.0,0,0.0
1,A18K1ODH1I2MVB,B007WTAJTO,0mie,"[0, 0]","Purchased this for my device, it worked as adv...",5.0,MOAR SPACE!!!,1382659200,2013-10-25,409,0,0,408 days,B,0,0.0,0,0.0
2,A2FII3I2MBMUIA,B007WTAJTO,1K3,"[0, 0]",it works as expected. I should have sprung for...,4.0,nothing to really say....,1356220800,2012-12-23,715,0,0,714 days,D,0,0.0,0,0.0
3,A3H99DFEG68SR,B007WTAJTO,1m2,"[0, 0]",This think has worked out great.Had a diff. br...,5.0,Great buy at this price!!! *** UPDATE,1384992000,2013-11-21,382,0,0,381 days,B,0,0.0,0,0.0
4,A375ZM4U047O79,B007WTAJTO,2&amp;1/2Men,"[0, 0]","Bought it with Retail Packaging, arrived legit...",5.0,best deal around,1373673600,2013-07-13,513,0,0,512 days,C,0,0.0,0,0.0
5,A2IDCSC6NVONIZ,B007WTAJTO,2Cents!,"[0, 0]",It's mini storage. It doesn't do anything els...,5.0,Not a lot to really be said,1367193600,2013-04-29,588,0,0,587 days,C,0,0.0,0,0.0
6,A26YHXZD5UFPVQ,B007WTAJTO,2K1Toaster,"[0, 0]",I have it in my phone and it never skips a bea...,5.0,Works well,1382140800,2013-10-19,415,0,0,414 days,B,0,0.0,0,0.0
7,A3CW0ZLUO5X2B1,B007WTAJTO,"35-year Technology Consumer ""8-tracks to 802.11""","[0, 0]",It's hard to believe how affordable digital ha...,5.0,32 GB for less than two sawbucks...what's not ...,1404950400,2014-10-07,62,0,0,61 days,A,0,0.0,0,0.0
8,A2CYJO155QP33S,B007WTAJTO,4evryoung,"[1, 1]",Works in a HTC Rezound. Was running short of ...,5.0,Loads of room,1395619200,2014-03-24,259,1,1,258 days,A,0,1.0,1,0.20655
9,A2S7XG3ZC4VGOQ,B007WTAJTO,53rdcard,"[0, 0]","in my galaxy s4, super fast card, and am total...",5.0,works great,1381449600,2013-11-10,393,0,0,392 days,B,0,0.0,0,0.0


In [21]:
#Yorumları wilson lower bound sskorlarına göre sıralayıp ilk 20 yoruma bakalım
df.sort_values("Wilson_lower_bound_score",ascending=False).head(20)

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime,day_diff,helpful_yes,total_vote,review_life,Segments,helpful_no,score_average_rating,score_pos_neg_diff,Wilson_lower_bound_score
2031,A12B7ZMXFI6IXY,B007WTAJTO,"Hyoun Kim ""Faluzure""","[1952, 2020]",[[ UPDATE - 6/19/2014 ]]So my lovely wife boug...,5.00000,UPDATED - Great w/ Galaxy S4 & Galaxy Tab 4 10...,1367366400,2013-01-05,702,1952,2020,701 days,D,68,0.96634,1884,0.95754
3449,AOEAD7DPLZE53,B007WTAJTO,NLee the Engineer,"[1428, 1505]",I have tested dozens of SDHC and micro-SDHC ca...,5.00000,Top of the class among all (budget-priced) mic...,1348617600,2012-09-26,803,1428,1505,802 days,D,77,0.94884,1351,0.93652
4212,AVBMZZAFEKO58,B007WTAJTO,SkincareCEO,"[1568, 1694]",NOTE: please read the last update (scroll to ...,1.00000,1 Star reviews - Micro SDXC card unmounts itse...,1375660800,2013-05-08,579,1568,1694,578 days,C,126,0.92562,1442,0.91214
317,A1ZQAQFYSXL5MQ,B007WTAJTO,"Amazon Customer ""Kelly""","[422, 495]","If your card gets hot enough to be painful, it...",1.00000,"Warning, read this!",1346544000,2012-02-09,1033,422,495,1032 days,D,73,0.85253,349,0.81858
4672,A2DKQQIZ793AV5,B007WTAJTO,Twister,"[45, 49]",Sandisk announcement of the first 128GB micro ...,5.00000,Super high capacity!!! Excellent price (on Am...,1394150400,2014-07-03,158,45,49,157 days,A,4,0.91837,41,0.80811
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1072,A2O96COBMVY9C4,B007WTAJTO,Crysis Complex,"[5, 5]",What more can I say? The 64GB micro SD works f...,5.00000,Works wonders for the Galaxy Note 2!,1349395200,2012-05-10,942,5,5,941 days,D,0,1.00000,5,0.56552
2583,A3MEPYZVTAV90W,B007WTAJTO,J. Wong,"[5, 5]",I bought this Class 10 SD card for my GoPro 3 ...,5.00000,Works Great with a GoPro 3 Black!,1370649600,2013-08-06,489,5,5,488 days,C,0,1.00000,5,0.56552
121,A2Z4VVF1NTJWPB,B007WTAJTO,A. Lee,"[5, 5]",Update: providing an update with regard to San...,5.00000,ready for use on the Galaxy S3,1346803200,2012-05-09,943,5,5,942 days,D,0,1.00000,5,0.56552
1142,A1PLHPPAJ5MUXG,B007WTAJTO,Daniel Pham(Danpham_X @ yahoo. com),"[5, 5]",As soon as I saw that this card was announced ...,5.00000,Great large capacity card,1396396800,2014-02-04,307,5,5,306 days,B,0,1.00000,5,0.56552
