# Application: User and Time Weighted Product Score Calculation

In this project, various approaches are discussed to calculate the average rating score of a product that can be used in e-commerce sites. An udemy tutorial was used as an example.

The methods used are

- `Time Weighted Average Rating`
- `User-Based Weighted Average Rating`
- `Weighted Rating` (Both Approximation)

When calculating the rating score, we may want to give more importance to current comments. `Time Weighted Average Rating` can be used for this.

Recent comments may be important, but the quality of the voting users is also a very important factor to avoid unpleasant incidents.

Certainly both approaches seem sensible. But what if we wanted to use both?

In the final part of the study, `Time Weighted` and `User-Based Weighted` Average Rating are discussed together.

# Rating Products

- Average
- Time-Based Weighted Average
- User-Based Weighted Average
- Weighted Rating

In [1]:
import pandas as pd
import math
import scipy.stats as st
from sklearn.preprocessing import MinMaxScaler

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 500)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

# Section I - Data Understanding


### Step #1 
Read data

In [3]:
df = pd.read_csv("data/course_reviews.csv")

In [5]:
df.head()

Unnamed: 0,Rating,Timestamp,Enrolled,Progress,Questions Asked,Questions Answered
0,5.0,2021-02-05 07:45:55,2021-01-25 15:12:08,5.0,0.0,0.0
1,5.0,2021-02-04 21:05:32,2021-02-04 20:43:40,1.0,0.0,0.0
2,4.5,2021-02-04 20:34:03,2019-07-04 23:23:27,1.0,0.0,0.0
3,5.0,2021-02-04 16:56:28,2021-02-04 14:41:29,10.0,0.0,0.0
4,4.0,2021-02-04 15:00:24,2020-10-13 03:10:07,10.0,0.0,0.0


In [7]:
df.shape

(4323, 6)

### Step #2
Rating distribution


In [8]:
df["Rating"].value_counts()

5.00000    3267
4.50000     475
4.00000     383
3.50000      96
3.00000      62
1.00000      15
2.00000      12
2.50000      11
1.50000       2
Name: Rating, dtype: int64

In [9]:
df["Questions Asked"].value_counts()

0.00000     3867
1.00000      276
2.00000       80
3.00000       43
4.00000       15
5.00000       13
6.00000        9
8.00000        5
9.00000        3
14.00000       2
11.00000       2
7.00000        2
10.00000       2
15.00000       2
22.00000       1
12.00000       1
Name: Questions Asked, dtype: int64

In [10]:
df.groupby("Questions Asked").agg({"Questions Asked": "count",
                                   "Rating": "mean"})

Unnamed: 0_level_0,Questions Asked,Rating
Questions Asked,Unnamed: 1_level_1,Unnamed: 2_level_1
0.0,3867,4.76519
1.0,276,4.74094
2.0,80,4.80625
3.0,43,4.74419
4.0,15,4.83333
5.0,13,4.65385
6.0,9,5.0
7.0,2,4.75
8.0,5,4.9
9.0,3,5.0


In [11]:
df.head()

Unnamed: 0,Rating,Timestamp,Enrolled,Progress,Questions Asked,Questions Answered
0,5.0,2021-02-05 07:45:55,2021-01-25 15:12:08,5.0,0.0,0.0
1,5.0,2021-02-04 21:05:32,2021-02-04 20:43:40,1.0,0.0,0.0
2,4.5,2021-02-04 20:34:03,2019-07-04 23:23:27,1.0,0.0,0.0
3,5.0,2021-02-04 16:56:28,2021-02-04 14:41:29,10.0,0.0,0.0
4,4.0,2021-02-04 15:00:24,2020-10-13 03:10:07,10.0,0.0,0.0


### Step #3

Average Rating

In [13]:
df["Rating"].mean()

4.764284061993986

# Section II  - Time-Based Weighted Average
 **Weighted Average by Score Times**

 We may want to pay more attention to current comments when calculating the rating score

In [25]:
df["Timestamp"] = pd.to_datetime(df["Timestamp"])

In [26]:
current_date = pd.to_datetime('2021-02-10 0:0:0')

In [27]:
df["days"] = (current_date - df["Timestamp"]).dt.days


In [28]:
df.loc[df["days"] <= 30, "Rating"].mean()

4.775773195876289

In [29]:
df.loc[(df["days"] > 30) & (df["days"] <= 90), "Rating"].mean()

4.763833992094861

In [30]:
df.loc[(df["days"] > 90) & (df["days"] <= 180), "Rating"].mean()

4.752503576537912

In [31]:
df.loc[(df["days"] > 180), "Rating"].mean()

4.76641586867305

In [32]:
df.loc[df["days"] <= 30, "Rating"].mean() * 28/100 + \
    df.loc[(df["days"] > 30) & (df["days"] <= 90), "Rating"].mean() * 26/100 + \
    df.loc[(df["days"] > 90) & (df["days"] <= 180), "Rating"].mean() * 24/100 + \
    df.loc[(df["days"] > 180), "Rating"].mean() * 22/100

4.765025682267194

In [33]:
def time_based_weighted_average(dataframe, w1=28, w2=26, w3=24, w4=22):
    return dataframe.loc[df["days"] <= 30, "Rating"].mean() * w1 / 100 + \
           dataframe.loc[(dataframe["days"] > 30) & (dataframe["days"] <= 90), "Rating"].mean() * w2 / 100 + \
           dataframe.loc[(dataframe["days"] > 90) & (dataframe["days"] <= 180), "Rating"].mean() * w3 / 100 + \
           dataframe.loc[(dataframe["days"] > 180), "Rating"].mean() * w4 / 100

In [34]:
time_based_weighted_average(df)

4.765025682267194

In [35]:
time_based_weighted_average(df, 30, 26, 22, 22)

4.765491074653962

# Section III - User-Based Weighted Average

Recent comments may be important, but the quality of the voters is also a very important factor.

In [38]:
df.groupby("Progress").agg({"Rating": "mean"})[0:5]

Unnamed: 0_level_0,Rating
Progress,Unnamed: 1_level_1
0.0,4.67391
1.0,4.64269
2.0,4.65476
3.0,4.66355
4.0,4.77733
5.0,4.69821


In [39]:
df.loc[df["Progress"] <= 10, "Rating"].mean() * 22 / 100 + \
    df.loc[(df["Progress"] > 10) & (df["Progress"] <= 45), "Rating"].mean() * 24 / 100 + \
    df.loc[(df["Progress"] > 45) & (df["Progress"] <= 75), "Rating"].mean() * 26 / 100 + \
    df.loc[(df["Progress"] > 75), "Rating"].mean() * 28 / 100

4.800257704672543

In [40]:
def user_based_weighted_average(dataframe, w1=22, w2=24, w3=26, w4=28):
    return dataframe.loc[dataframe["Progress"] <= 10, "Rating"].mean() * w1 / 100 + \
           dataframe.loc[(dataframe["Progress"] > 10) & (dataframe["Progress"] <= 45), "Rating"].mean() * w2 / 100 + \
           dataframe.loc[(dataframe["Progress"] > 45) & (dataframe["Progress"] <= 75), "Rating"].mean() * w3 / 100 + \
           dataframe.loc[(dataframe["Progress"] > 75), "Rating"].mean() * w4 / 100

In [41]:
user_based_weighted_average(df, 20, 24, 26, 30)

4.803286469062915

# Section IV - Weighted Rating

In [42]:
def course_weighted_rating(dataframe, time_w=50, user_w=50):
    return time_based_weighted_average(dataframe) * time_w/100 + user_based_weighted_average(dataframe)*user_w/100

In [43]:
course_weighted_rating(df)

4.782641693469868

In [44]:
course_weighted_rating(df, time_w=40, user_w=60)

4.786164895710403