# Measuremet Problems (Ölçüm PRoblemleri)
### Bir ürünü satın aldıran nedir?
### Kullanıcının satın alma kararını etkileyen birçok faktör varıdr. Son yıllardaöne çıkan en önemlisi social proof kavramıdır. Ürün yorumları, puanları, incelemeler.
### Bize diğerlerinin olumlu görüşünü kabul ettiren, kalabalıkların bilgeliğine olan inançtır.(The Wisdom of Crowds)
- Ürün puanlarının hesaplanması
- Ürünlerin sıralanması
- Ürün detay sayfalarındaki kullanıcı yorumlarının sıralanması
- Sayfa, süre. ve etkileşim alanlarının tasarımları
- Özellik denemeleri
- Olası aksiyon ve reaksiyonların test edilmesi

- Rating Products
- Sorting Products
- Sorting Reviews
- AB Testing
- Dynamic Pricing

## 1- Ürün Puanlama
- Olası faktörleri göz önünde bulundurarak ağırlıklı ürün puanlama
    - Average
    - Time-Based Weighted Average
    - User-Based Weighted Average
    - Weighted Rating
### a) Kullanıcı ve Zaman Ağırlıklı Kurs Puanı Hesaplama

In [11]:
import pandas as pd
import math
import scipy.stats as st
from sklearn.preprocessing import MinMaxScaler
import datetime as dt

pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
pd.set_option("display.width", 500)
pd.set_option("display.expand_frame_repr", False)
pd.set_option("display.float_format", lambda x: "%.5f" % x)

In [2]:
df = pd.read_csv("C:/measurement_problems/datasets/course_reviews.csv")
df.head()

Unnamed: 0,Rating,Timestamp,Enrolled,Progress,Questions Asked,Questions Answered
0,5.0,2021-02-05 07:45:55,2021-01-25 15:12:08,5.0,0.0,0.0
1,5.0,2021-02-04 21:05:32,2021-02-04 20:43:40,1.0,0.0,0.0
2,4.5,2021-02-04 20:34:03,2019-07-04 23:23:27,1.0,0.0,0.0
3,5.0,2021-02-04 16:56:28,2021-02-04 14:41:29,10.0,0.0,0.0
4,4.0,2021-02-04 15:00:24,2020-10-13 03:10:07,10.0,0.0,0.0


In [3]:
df.shape

(4323, 6)

In [4]:
df["Rating"].value_counts()

5.00000    3267
4.50000     475
4.00000     383
3.50000      96
3.00000      62
1.00000      15
2.00000      12
2.50000      11
1.50000       2
Name: Rating, dtype: int64

In [5]:
df["Questions Asked"].value_counts()

0.00000     3867
1.00000      276
2.00000       80
3.00000       43
4.00000       15
5.00000       13
6.00000        9
8.00000        5
9.00000        3
15.00000       2
11.00000       2
10.00000       2
7.00000        2
14.00000       2
22.00000       1
12.00000       1
Name: Questions Asked, dtype: int64

In [6]:
df.groupby("Questions Asked").agg({"Rating": "mean",
                                  "Questions Asked": "count"})

Unnamed: 0_level_0,Rating,Questions Asked
Questions Asked,Unnamed: 1_level_1,Unnamed: 2_level_1
0.0,4.76519,3867
1.0,4.74094,276
2.0,4.80625,80
3.0,4.74419,43
4.0,4.83333,15
5.0,4.65385,13
6.0,5.0,9
7.0,4.75,2
8.0,4.9,5
9.0,5.0,3


#### Average

In [7]:
df["Rating"].mean()

4.764284061993986

- Bu şekilde bir puanlama yapıldığı zaman ürün ile ilgili memnuniyet trendi kaçırılabilir. Örneğin kursun bir çok puanı seneler öncesinden verilmiş olabilir ve yapılan güncellemeler ile videolardaki uygulamalar çalışmıyor olabilir bu durumda son verilen puanların daha düşük olması beklenir.

#### Time Based Weighted Average
- Puan zamanlarına göre ağırlıklı ortalama 

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4323 entries, 0 to 4322
Data columns (total 6 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Rating              4323 non-null   float64
 1   Timestamp           4323 non-null   object 
 2   Enrolled            4323 non-null   object 
 3   Progress            4323 non-null   float64
 4   Questions Asked     4323 non-null   float64
 5   Questions Answered  4323 non-null   float64
dtypes: float64(4), object(2)
memory usage: 202.8+ KB


In [9]:
df["Timestamp"] = pd.to_datetime(df["Timestamp"])
df["Enrolled"] = pd.to_datetime(df["Enrolled"])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4323 entries, 0 to 4322
Data columns (total 6 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   Rating              4323 non-null   float64       
 1   Timestamp           4323 non-null   datetime64[ns]
 2   Enrolled            4323 non-null   datetime64[ns]
 3   Progress            4323 non-null   float64       
 4   Questions Asked     4323 non-null   float64       
 5   Questions Answered  4323 non-null   float64       
dtypes: datetime64[ns](2), float64(4)
memory usage: 202.8 KB


In [13]:
today_date = df["Timestamp"].max() + dt.timedelta(days=5)
today_date

Timestamp('2021-02-10 07:45:55')

In [14]:
df["days"] = (today_date - df["Timestamp"]).dt.days
df.head()

Unnamed: 0,Rating,Timestamp,Enrolled,Progress,Questions Asked,Questions Answered,days
0,5.0,2021-02-05 07:45:55,2021-01-25 15:12:08,5.0,0.0,0.0,5
1,5.0,2021-02-04 21:05:32,2021-02-04 20:43:40,1.0,0.0,0.0,5
2,4.5,2021-02-04 20:34:03,2019-07-04 23:23:27,1.0,0.0,0.0,5
3,5.0,2021-02-04 16:56:28,2021-02-04 14:41:29,10.0,0.0,0.0,5
4,4.0,2021-02-04 15:00:24,2020-10-13 03:10:07,10.0,0.0,0.0,5


In [15]:
df.tail()

Unnamed: 0,Rating,Timestamp,Enrolled,Progress,Questions Asked,Questions Answered,days
4318,5.0,2019-05-17 09:51:44,2019-05-17 09:08:53,34.0,1.0,0.0,634
4319,5.0,2019-05-16 21:27:05,2019-05-16 20:32:15,5.0,0.0,0.0,635
4320,5.0,2019-05-16 20:22:26,2019-05-16 20:21:19,1.0,0.0,0.0,635
4321,5.0,2019-05-16 19:49:07,2019-05-16 19:47:29,1.0,0.0,0.0,635
4322,5.0,2019-05-16 13:40:35,2019-05-15 14:10:24,56.0,0.0,0.0,635


In [18]:
# Son 30 gündeki yorumlar
df.loc[df["days"]<=30].count()

Rating                194
Timestamp             194
Enrolled              194
Progress              194
Questions Asked       194
Questions Answered    194
days                  194
dtype: int64

In [27]:
df.loc[df["days"]<=30].agg({"Rating": "mean"})

Rating   4.77577
dtype: float64

In [23]:
df.loc[(df["days"] > 30) & (df["days"] <=90)].agg({"Rating": "mean"})

Rating   4.76383
dtype: float64

In [24]:
df.loc[(df["days"] > 90) & (df["days"] <=180)].agg({"Rating": "mean"})

Rating   4.75250
dtype: float64

In [25]:
df.loc[df["days"]>180].agg({"Rating": "mean"})

Rating   4.76642
dtype: float64

In [28]:
# Ağırlıklandırarak ortalandırma
df.loc[df["days"]<=30].agg({"Rating": "mean"}) * .28 + \
df.loc[(df["days"] > 30) & (df["days"] <=90)].agg({"Rating": "mean"}) * .26 + \
df.loc[(df["days"] > 90) & (df["days"] <=180)].agg({"Rating": "mean"}) * .24 + \
df.loc[df["days"]>180].agg({"Rating": "mean"}) * .22 

Rating   4.76503
dtype: float64

In [31]:
def time_based_weighted_average(dataframe, w1=.28, w2=.26, w3=.24, w4=.22):
    return dataframe.loc[dataframe["days"]<=30].agg({"Rating": "mean"}) * w1 + \
           dataframe.loc[(dataframe["days"] > 30) & (dataframe["days"] <=90)].agg({"Rating": "mean"}) * w2+ \
           dataframe.loc[(dataframe["days"] > 90) & (dataframe["days"] <=180)].agg({"Rating": "mean"}) * w3 + \
           dataframe.loc[dataframe["days"]>180].agg({"Rating": "mean"}) * w4

time_based_weighted_average(df, w1=.30, w2=.26, w3=.22, w4=.22)

Rating   4.76549
dtype: float64

### Herkesin verdiği puanın değeri aynı mı olmalı?

## User-Based Weighted Average
- Kursun tamamını izleyen kişi ile sadece 5%'ini izleyen aynı ağırlığa mı sahip olmalı?
- İlk defa yaptığı bir alışverişe puan veren, yorum yapan bir müşteri ile yüzlerce defa yorum yapan bir müşterinin ağırlığı aynı mı olmalı?

In [32]:
df.groupby("Progress").agg({"Rating": "mean"})

Unnamed: 0_level_0,Rating
Progress,Unnamed: 1_level_1
0.0,4.67391
1.0,4.64269
2.0,4.65476
3.0,4.66355
4.0,4.77733
5.0,4.69821
6.0,4.7551
7.0,4.73256
8.0,4.74194
9.0,4.83125


In [33]:
df.loc[df["Progress"]<=10].agg({"Rating": "mean"}) * .22 + \
df.loc[(df["Progress"] > 10) & (df["Progress"] <=45)].agg({"Rating": "mean"}) * .24 + \
df.loc[(df["Progress"] > 45) & (df["Progress"] <=75)].agg({"Rating": "mean"}) * .26 + \
df.loc[df["Progress"]>75].agg({"Rating": "mean"}) * .28 

Rating   4.80026
dtype: float64

In [40]:
def user_based_weighted_average(dataframe, w1=.22, w2=.24, w3=.26, w4=.28):
    return dataframe.loc[dataframe["Progress"]<=10].agg({"Rating": "mean"}) * w1 + \
           dataframe.loc[(dataframe["Progress"] > 10) & (dataframe["Progress"] <=45)].agg({"Rating": "mean"}) * w2+ \
           dataframe.loc[(dataframe["Progress"] > 45) & (dataframe["Progress"] <=75)].agg({"Rating": "mean"}) * w3 + \
           dataframe.loc[dataframe["Progress"]>75].agg({"Rating": "mean"}) * w4

user_based_weighted_average(df, w1=.20, w2=.24, w3=.26, w4=.30)

Rating   4.80329
dtype: float64

## Weighted Rating

In [41]:
def course_weighted_rating(dataframe, time_w=50, user_w=50):
    return time_based_weighted_average(dataframe) * time_w / 100 + user_based_weighted_average(dataframe) * user_w / 100

course_weighted_rating(df)

Rating   4.78264
dtype: float64

In [42]:
course_weighted_rating(df, time_w=40, user_w=60)

Rating   4.78616
dtype: float64

In [44]:
df.loc["Rating"].mean()

KeyError: 'Rating'