# Rating Product & Sorting Reviews in Amazon

## İş Problemi

Ürün ratinglerini daha doğru hesaplamaya 
çalışmak ve ürün yorumlarını daha doğru 
sıralamak.

## Veri Seti Hikayesi

Amazon ürün verilerini içeren bu veri seti ürün kategorileri ile çeşitli metadataları içermektedir. Elektronik kategorisindeki en fazla yorum alan ürünün kullanıcı puanları ve yorumları vardır.

## Değişkenler

- **reviewerID** – Kullanıcı ID’si
- **asin** – Ürün ID’si
- **reviewerName** – Kullanıcı Adı
- **helpful** – Faydalı yorum derecesi
- **reviewText** – Yorum
- **overall** – Ürün rating’i
- **summary** – İnceleme özeti
- **unixReviewTime** – İnceleme zamanı
- **reviewTime** – İnceleme zamanı

## Gerekli Kütüphaneler

In [99]:
import re
import pandas as pd
import numpy as np
import math as math
import scipy.stats as st
from sklearn.preprocessing import MinMaxScaler

# Proje Aşamaları

## Görev 1: Average Rating’i güncel yorumlara göre hesaplayınız ve var olan average rating ile kıyaslayınız.

In [100]:
df = pd.read_csv('amazon_review.csv')
df.head(3)

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime,day_diff,helpful_yes,total_vote
0,A3SBTW3WS4IQSN,B007WTAJTO,,"[0, 0]",No issues.,4.0,Four Stars,1406073600,2014-07-23,138,0,0
1,A18K1ODH1I2MVB,B007WTAJTO,0mie,"[0, 0]","Purchased this for my device, it worked as adv...",5.0,MOAR SPACE!!!,1382659200,2013-10-25,409,0,0
2,A2FII3I2MBMUIA,B007WTAJTO,1K3,"[0, 0]",it works as expected. I should have sprung for...,4.0,nothing to really say....,1356220800,2012-12-23,715,0,0


In [101]:
df['total_vote'].max()

2020

In [102]:
#Ortalama Puan

df['overall'].mean()

4.587589013224822

In [103]:
df['reviewTime'].dtypes #should be datetime

dtype('O')

In [104]:
df['reviewTime'] = pd.to_datetime(df['reviewTime'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4915 entries, 0 to 4914
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   reviewerID      4915 non-null   object        
 1   asin            4915 non-null   object        
 2   reviewerName    4914 non-null   object        
 3   helpful         4915 non-null   object        
 4   reviewText      4914 non-null   object        
 5   overall         4915 non-null   float64       
 6   summary         4915 non-null   object        
 7   unixReviewTime  4915 non-null   int64         
 8   reviewTime      4915 non-null   datetime64[ns]
 9   day_diff        4915 non-null   int64         
 10  helpful_yes     4915 non-null   int64         
 11  total_vote      4915 non-null   int64         
dtypes: datetime64[ns](1), float64(1), int64(4), object(6)
memory usage: 460.9+ KB


In [105]:
df['reviewTime'].max()

Timestamp('2014-12-07 00:00:00')

In [106]:
df['reviewTime'].min()

Timestamp('2012-01-09 00:00:00')

In [107]:
current_date = pd.to_datetime('2014-12-08 00:00:00')

df['days'] = (current_date - df['reviewTime']).dt.days

df.head()

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime,day_diff,helpful_yes,total_vote,days
0,A3SBTW3WS4IQSN,B007WTAJTO,,"[0, 0]",No issues.,4.0,Four Stars,1406073600,2014-07-23,138,0,0,138
1,A18K1ODH1I2MVB,B007WTAJTO,0mie,"[0, 0]","Purchased this for my device, it worked as adv...",5.0,MOAR SPACE!!!,1382659200,2013-10-25,409,0,0,409
2,A2FII3I2MBMUIA,B007WTAJTO,1K3,"[0, 0]",it works as expected. I should have sprung for...,4.0,nothing to really say....,1356220800,2012-12-23,715,0,0,715
3,A3H99DFEG68SR,B007WTAJTO,1m2,"[0, 0]",This think has worked out great.Had a diff. br...,5.0,Great buy at this price!!! *** UPDATE,1384992000,2013-11-21,382,0,0,382
4,A375ZM4U047O79,B007WTAJTO,2&amp;1/2Men,"[0, 0]","Bought it with Retail Packaging, arrived legit...",5.0,best deal around,1373673600,2013-07-13,513,0,0,513


In [108]:
df['days'].max()

1064

In [109]:
df['days'].min()

1

## Time Based Weighted Average

In [110]:
df['overall'].mean()

4.587589013224822

In [111]:
# öncesi - 3 aylık
# 3 aylık - 6 aylık
# 6 aylık - 12 aylık
# 12 aylık - sonrası

In [112]:
df[(df['days'] <= 90)]['overall'].mean()*0.28+ \
df[(df['days'] > 90) & (df['days'] <= 180)]['overall'].mean()*0.26+ \
df[(df['days'] > 180) & (df['days'] <= 360)]['overall'].mean()*0.24+ \
df[(df['days'] > 360)]['overall'].mean()*0.22

4.66657427388032

In [113]:
def time_based_weighted_average(dataframe, w1=28, w2=26, w3=24, w4=22):
    return dataframe.loc[df["days"] <= 90, "overall"].mean() * w1 / 100 + \
           dataframe.loc[(dataframe["days"] > 90) & (dataframe["days"] <= 180), "overall"].mean() * w2 / 100 + \
           dataframe.loc[(dataframe["days"] > 180) & (dataframe["days"] <= 360), "overall"].mean() * w3 / 100 + \
           dataframe.loc[(dataframe["days"] > 360), "overall"].mean() * w4 / 100

time_based_weighted_average(df)

4.666574273880321

## Görev 2: Ürün için ürün detay sayfasında görüntülenecek 20 review’i belirleyiniz.


In [114]:
df = pd.read_csv('amazon_review.csv')

df['helpful_yes'] = df.helpful[df.helpful.notna()].apply(lambda x: re.sub('[\W_]+','',x.split()[0])).astype(int)
df['helpful_no'] = df.helpful[df.helpful.notna()].apply(lambda x: int(re.sub('[\W_]+','',x.split()[1])) \
                                                        - int(re.sub('[\W_]+','',x.split()[0])))

df[['reviewText','helpful','helpful_yes','helpful_no']].sort_values('helpful_yes',ascending=False).head(10)

Unnamed: 0,reviewText,helpful,helpful_yes,helpful_no
2031,[[ UPDATE - 6/19/2014 ]]So my lovely wife boug...,"[1952, 2020]",1952,68
4212,NOTE: please read the last update (scroll to ...,"[1568, 1694]",1568,126
3449,I have tested dozens of SDHC and micro-SDHC ca...,"[1428, 1505]",1428,77
317,"If your card gets hot enough to be painful, it...","[422, 495]",422,73
3981,The last few days I have been diligently shopp...,"[112, 139]",112,27
4596,Hi:I ordered two card and they arrived the nex...,"[82, 109]",82,27
1835,Bought from BestBuy online the day it was anno...,"[60, 68]",60,8
2909,I know armed with this in my Android tablet an...,"[53, 236]",53,183
4306,"While I got this card as a ""deal of the day"" o...","[51, 65]",51,14
4672,Sandisk announcement of the first 128GB micro ...,"[45, 49]",45,4


In [115]:
#average rating = (up ratings) / (all ratings)

def score_average_rating(up,down):
    if up + down == 0:
        return 0
    return up / (up+down)

score_average_rating(1952,68)

0.9663366336633663

In [116]:
df['score_average_rating'] = [score_average_rating(df['helpful_yes'][i],df['helpful_no'][i])for i in df.index]

df[['reviewText','helpful','helpful_yes','helpful_no','score_average_rating']]. \
sort_values('score_average_rating',ascending=False).head(10)

Unnamed: 0,reviewText,helpful,helpful_yes,helpful_no,score_average_rating
4277,I have a galaxy note II and after rooting I no...,"[1, 1]",1,0,1.0
2881,The Nexus One is listed as supporting a maximu...,"[1, 1]",1,0,1.0
1073,I used it with my Samsung S4 and it works grea...,"[1, 1]",1,0,1.0
445,This is exactly what I was looking for to upgr...,"[1, 1]",1,0,1.0
3923,"It's a SanDisk, so what more is there to say? ...","[1, 1]",1,0,1.0
435,This is working great in my AT&T Galaxy Note. ...,"[1, 1]",1,0,1.0
2901,Not a good typer or speller :) here is what I ...,"[1, 1]",1,0,1.0
2204,I just called Sandisk and they say they have a...,"[1, 1]",1,0,1.0
2206,I bought this for my garmin virb action cam. ...,"[1, 1]",1,0,1.0
3408,Very good card and still working now in my car...,"[1, 1]",1,0,1.0


In [117]:
df['score_average_rating'].mean()

0.07546821822485647

In [118]:
def wilson_lower_bound(up, down, confidence=0.95):
    """
    Wilson Lower Bound Score hesapla

    - Bernoulli parametresi p için hesaplanacak güven aralığının alt sınırı WLB skoru olarak kabul edilir.
    - Hesaplanacak skor ürün sıralaması için kullanılır.
    - Not:
    Eğer skorlar 1-5 arasıdaysa 1-3 negatif, 4-5 pozitif olarak işaretlenir ve bernoulli'ye uygun hale getirilebilir.
    Bu beraberinde bazı problemleri de getirir. Bu sebeple bayesian average rating yapmak gerekir.

    Parameters
    ----------
    up: int
        up count
    down: int
        down count
    confidence: float
        confidence

    Returns
    -------
    wilson score: float

    """
    n = up + down
    if n == 0:
        return 0
    z = st.norm.ppf(1 - (1 - confidence) / 2)
    phat = 1.0 * up / n
    return (phat + z * z / (2 * n) - z * math.sqrt((phat * (1 - phat) + z * z / (4 * n)) / n)) / (1 + z * z / n)

In [119]:
df['wilson_lower_bound'] = df.apply(lambda x: wilson_lower_bound(x['helpful_yes'], x['helpful_no']), axis=1)

df[['reviewText','helpful','helpful_yes','helpful_no','score_average_rating','wilson_lower_bound']]. \
sort_values('wilson_lower_bound',ascending=False).head(20)

Unnamed: 0,reviewText,helpful,helpful_yes,helpful_no,score_average_rating,wilson_lower_bound
2031,[[ UPDATE - 6/19/2014 ]]So my lovely wife boug...,"[1952, 2020]",1952,68,0.966337,0.957544
3449,I have tested dozens of SDHC and micro-SDHC ca...,"[1428, 1505]",1428,77,0.948837,0.936519
4212,NOTE: please read the last update (scroll to ...,"[1568, 1694]",1568,126,0.92562,0.912139
317,"If your card gets hot enough to be painful, it...","[422, 495]",422,73,0.852525,0.818577
4672,Sandisk announcement of the first 128GB micro ...,"[45, 49]",45,4,0.918367,0.808109
1835,Bought from BestBuy online the day it was anno...,"[60, 68]",60,8,0.882353,0.784651
3981,The last few days I have been diligently shopp...,"[112, 139]",112,27,0.805755,0.732136
3807,I bought this card to replace a lost 16 gig in...,"[22, 25]",22,3,0.88,0.700442
4306,"While I got this card as a ""deal of the day"" o...","[51, 65]",51,14,0.784615,0.670334
4596,Hi:I ordered two card and they arrived the nex...,"[82, 109]",82,27,0.752294,0.663595


## İlk 20 Yorum

In [120]:
df[['reviewText','helpful','helpful_yes','helpful_no','score_average_rating','wilson_lower_bound']]. \
sort_values('wilson_lower_bound',ascending=False).head(20)['reviewText']

2031    [[ UPDATE - 6/19/2014 ]]So my lovely wife boug...
3449    I have tested dozens of SDHC and micro-SDHC ca...
4212    NOTE:  please read the last update (scroll to ...
317     If your card gets hot enough to be painful, it...
4672    Sandisk announcement of the first 128GB micro ...
1835    Bought from BestBuy online the day it was anno...
3981    The last few days I have been diligently shopp...
3807    I bought this card to replace a lost 16 gig in...
4306    While I got this card as a "deal of the day" o...
4596    Hi:I ordered two card and they arrived the nex...
315     Bought this card to use with my Samsung Galaxy...
1465    I for one have not bought into Google's, or an...
1609    I have always been a sandisk guy.  This cards ...
4302    So I got this SD specifically for my GoPro Bla...
4072    I used this for my Samsung Galaxy Tab 2 7.0 . ...
1072    What more can I say? The 64GB micro SD works f...
2583    I bought this Class 10 SD card for my GoPro 3 ...
121     Update