# Information
- In this study, we will try to learn some of the methods used in ranking comments. In this way, we will have more detailed information about these methods.

- **Up-Down Diff Score** = (up ratings) − (down ratings)
- **Score** = Average rating = (up ratings) / (all ratings)
- **Wilson Lower Bound Score**

In [1]:
# import Required Libraries

import pandas as pd
import math
import scipy.stats as st

In [2]:
# Adjusting Row Column Settings

pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

# Up-Down Diff Score = (up ratings) − (down ratings)
- In this method, we will only use the difference between the comments made. So we used the difference between likes and dislikes.

In [3]:
# Review 1: 600 up 400 down total 1000
# Review 2: 5500 up 4500 down total 10000

In [4]:
def score_up_down_diff(up, down):
    return up - down

In [5]:
# Review 1 Score:
score_up_down_diff(600, 400)

200

In [6]:
# Review 2 Score
score_up_down_diff(5500, 4500)

1000

In [7]:
# Note: The method we used here captured the number of comments but not the ratio information. Therefore, it contains bias.

# Score = Average rating = (up ratings) / (all ratings)
- In this method, we will only use the rate information of comments. In other words, we used the ratio of likes and dislikes.

In [8]:
# Review 1: 600 up 400 down total 1000
# Review 2: 5500 up 4500 down total 10000

In [9]:
def score_average_rating(up, down):
    if up + down == 0:
        return 0
    return up / (up + down)

In [10]:
score_average_rating(600, 400)

0.6

In [11]:
score_average_rating(5500, 4500)

0.55

In [12]:
# Review 3: 2 up 0 down total 2
# Review 4: 100 up 1 down total 101

In [13]:
score_average_rating(2, 0)

1.0

In [14]:
score_average_rating(100, 1)

0.9900990099009901

In [15]:
# Note: The method we used here captured the number of comments, it also captured the rate information, but not the frequency information. 
# It didn't catch it. It is therefore biased.

# Wilson Lower Bound Score
- This method allows us to score any item, product or review that contains binary interactions. It helps in this type of measurement problems.
- It calculates a confidence interval for the Bernoulli parameter p. It takes the lower bound of this confidence interval as the WLB score.

In [16]:
def wilson_lower_bound(up, down, confidence=0.95):
    """
    Calculate Wilson Lower Bound Score

    - The lower limit of the confidence interval to be calculated for the Bernoulli parameter p is considered as the WLB score.
    - The score to be calculated is used for product ranking.
    - Not:
    If the scores are between 1-5, 1-3 are marked as negative, 4-5 as positive and can be made bernoulli compatible.
    This brings some problems with it. For this reason, it is necessary to make a Bayesian average rating.

    Parameters
    ----------
    up: int
        up count
    down: int
        down count
    confidence: float
        confidence

    Returns
    -------
    wilson score: float

    """
    n = up + down
    if n == 0:
        return 0
    z = st.norm.ppf(1 - (1 - confidence) / 2)
    phat = 1.0 * up / n
    return (phat + z * z / (2 * n) - z * math.sqrt((phat * (1 - phat) + z * z / (4 * n)) / n)) / (1 + z * z / n)

In [17]:
# Note: We calculated the usefulness score of the comment, thus eliminating the problem of bias.

In [18]:
wilson_lower_bound(0, 100)

3.341099972357286e-18

In [19]:
wilson_lower_bound(0, 0)

0

In [20]:
wilson_lower_bound(78, 36)

0.5940657058910772

In [21]:
wilson_lower_bound(100, 1)

0.9460328420055449

In [22]:
wilson_lower_bound(5500, 4500)

0.5402319557715324

In [23]:
wilson_lower_bound(2, 0)

0.3423802275066531

In [24]:
wilson_lower_bound(100, 1)

0.9460328420055449

# Case Study

In [25]:
up = [15, 70, 14, 4, 2, 5, 8, 37, 21, 52, 28, 147, 61, 30, 23, 40, 37, 61, 54, 18, 12, 68]
down = [0, 2, 2, 2, 15, 2, 6, 5, 23, 8, 12, 2, 1, 1, 5, 1, 2, 6, 2, 0, 2, 2]
comments = pd.DataFrame({"up": up, "down": down})

In [26]:
comments

Unnamed: 0,up,down
0,15,0
1,70,2
2,14,2
3,4,2
4,2,15
5,5,2
6,8,6
7,37,5
8,21,23
9,52,8


# score_pos_neg_diff

In [27]:
comments["score_pos_neg_diff"] = comments.apply(lambda x: score_up_down_diff(x["up"],
                                                                             x["down"]), axis=1)

In [28]:
comments

Unnamed: 0,up,down,score_pos_neg_diff
0,15,0,15
1,70,2,68
2,14,2,12
3,4,2,2
4,2,15,-13
5,5,2,3
6,8,6,2
7,37,5,32
8,21,23,-2
9,52,8,44


# score_average_rating

In [29]:
comments["score_average_rating"] = comments.apply(lambda x: score_average_rating(x["up"], x["down"]), axis=1)

In [30]:
comments

Unnamed: 0,up,down,score_pos_neg_diff,score_average_rating
0,15,0,15,1.0
1,70,2,68,0.97222
2,14,2,12,0.875
3,4,2,2,0.66667
4,2,15,-13,0.11765
5,5,2,3,0.71429
6,8,6,2,0.57143
7,37,5,32,0.88095
8,21,23,-2,0.47727
9,52,8,44,0.86667


# wilson_lower_bound

In [31]:
comments["wilson_lower_bound"] = comments.apply(lambda x: wilson_lower_bound(x["up"], x["down"]), axis=1)

In [32]:
comments

Unnamed: 0,up,down,score_pos_neg_diff,score_average_rating,wilson_lower_bound
0,15,0,15,1.0,0.79612
1,70,2,68,0.97222,0.90426
2,14,2,12,0.875,0.63977
3,4,2,2,0.66667,0.29999
4,2,15,-13,0.11765,0.03288
5,5,2,3,0.71429,0.35893
6,8,6,2,0.57143,0.32591
7,37,5,32,0.88095,0.75
8,21,23,-2,0.47727,0.33755
9,52,8,44,0.86667,0.75835


In [33]:
comments.sort_values("wilson_lower_bound", ascending=False)

Unnamed: 0,up,down,score_pos_neg_diff,score_average_rating,wilson_lower_bound
11,147,2,145,0.98658,0.95238
12,61,1,60,0.98387,0.91413
1,70,2,68,0.97222,0.90426
21,68,2,66,0.97143,0.90168
18,54,2,52,0.96429,0.87881
15,40,1,39,0.97561,0.87405
13,30,1,29,0.96774,0.83806
16,37,2,35,0.94872,0.83114
19,18,0,18,1.0,0.82412
17,61,6,55,0.91045,0.81807
