<a href="https://www.kaggle.com/code/osmanacar/amazon-rating-products-sorting-reviews?scriptVersionId=205348075" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

**Business Problem**

One of the most important problems in e-commerce calculating the given points that after selling process. The solving is more satisfaction to customer, prominence product for seller, smooth shopping experience and prevent for misleading comments. 

**Dataset Information**

In this dataset including electronic categories has most receive comment product and comments

**Variables**

* reviewerID - ID of the reviewer, e.g. A2SUAM1J3GNN3B
* asin - ID of the product, e.g. 0000013714
* reviewerName - name of the reviewer
* helpful - helpfulness rating of the review, e.g. 2/3
* reviewText - text of the review
* overall - rating of the product
* summary - summary of the review
* unixReviewTime - time of the review (unix time)
* reviewTime - time of the review
* day_diff - Number of days since evaluation
* helpful_yes - Useful evaluation count
* total_vote - Total evaluation count


In [None]:
import pandas as pd
import math
import scipy.stats as st
from sklearn.preprocessing import MinMaxScaler

pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
pd.set_option("display.width", 500)
pd.set_option("display.expand_frame_repr", False)
pd.set_option("display.float_format", lambda x: "%.5f" % x)

In [None]:
df = pd.read_csv("/kaggle/input/amazon/amazon_review.csv")
df.head()

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.describe().T

In [None]:
df["day_diff"].quantile([.20, .40, .60, .80,])

In [None]:
# We calculated mean according to quantile values. Values are decreasing through to time
print(df.loc[(df["day_diff"] <= 248), "overall"].mean())
print(df.loc[(df["day_diff"] > 248) & (df["day_diff"] <= 361), "overall"].mean())
print(df.loc[(df["day_diff"] > 361) & (df["day_diff"] <= 497), "overall"].mean())
print(df.loc[(df["day_diff"] > 497) & (df["day_diff"] > 638), "overall"].mean())
print(df.loc[(df["day_diff"] > 638), "overall"].mean())

In [None]:
# Time Based Weighted Average

def time_based_weighted_average(dataframe, w1=30, w2=25, w3=20, w4=15, w5=10):
    return dataframe.loc[(df["day_diff"] <= 248), "overall"].mean() * w1 / 100 + \
        dataframe.loc[(df["day_diff"] > 248) & (df["day_diff"] <= 361), "overall"].mean() * w2 / 100 + \
        dataframe.loc[(df["day_diff"] > 361) & (df["day_diff"] <= 497), "overall"].mean() * w3 / 100 + \
        dataframe.loc[(df["day_diff"] > 497) & (df["day_diff"] > 638), "overall"].mean() * w4 / 100 + \
        dataframe.loc[(df["day_diff"] > 638), "overall"].mean() * w5 / 100

time_based_weighted_average(df)

In [None]:
# We are creating new value as "helpful_no"
# Dataset says helpful_no is calculating as total_vote - helpful_yes
df["helpful_no"] = df["total_vote"] - df["helpful_yes"]

In [None]:
df.head()

**We have 3 way calculating for rating sorting.**

**1 - Up - Down Difference Score = (up ratings) - (down ratings)**

**2- Score = Average Rating = (up ratings) / (all ratings)**

**3- Wilson Lower Bound**

In [None]:
# Up - Down Difference Score = (up ratings) - (down ratings)

def score_up_down_diff(up, down):
    return up - down

# Score = Average Rating = (up ratings) / (all ratings)

def score_average_rating(up, down):
    if up + down == 0:
        return 0
    return up / (up + down)

# Wilson Lower Bound Score
def wilson_lower_bound(up, down, confidence=0.95):
    n = up + down
    if n == 0:
        return 0
    z = st.norm.ppf(1-(1-confidence) / 2)
    phat = 1.0 * up / n
    return (phat + z * z / (2 * n) - z *
                                math.sqrt((phat * (1 - phat) + z * z / (4 * n)) / n)) / (1 + z * z / n)

In [None]:
score = pd.DataFrame({"up": df["helpful_yes"], "down": df["helpful_no"]})

score["score_pos_neg_diff"] = score.apply(lambda x: score_up_down_diff(x["up"], x["down"]), axis=1)

score["score_average_rating"] = score.apply(lambda x: score_average_rating(x["up"], x["down"]), axis=1)

score["wilson_lower_bound"] = score.apply(lambda x: wilson_lower_bound(x["up"], x["down"]), axis=1)

In [None]:
new_df = pd.concat([df, score], axis=1)

In [None]:
new_df.sort_values("wilson_lower_bound", ascending=False).head(10)