**Average Rank vs PP Over Time**

This analysis aims to show how the average rank and average pp values have changed over time.

This analysis will provide:
- An interactive line graph with 2 lines depicting rank and pp
- User will be able to select the range of dates from a slider

In [2]:
import pandas as pd
from pathlib import Path
import numpy as np
from datetime import date

In [5]:
#open dataset with ranges
path = Path.cwd().parent.parent.absolute()
df = pd.read_csv(path / "final_data/fixed_range_updates.csv")

#list of dicts containing each rank range to be evaluated
ranges = [
            {"from": 100000, "to": 200000, "name": "100,000 - 200,000"},
            {"from": 50000, "to": 99999, "name": "50,000 - 99,999"},
            {"from": 25000, "to": 49999, "name": "25,000 - 49,999"},
            {"from": 10000, "to": 24999, "name": "10,000 - 24,999"},
            {"from": 5000, "to": 9999, "name": "5,000 - 9,999"},
            {"from": 1000, "to": 4999, "name": "1,000 - 4,999"},
            {"from": 500, "to": 999, "name": "500 - 999"},
            {"from": 100, "to": 499, "name": "100 - 499"}]

#add date field to dataset
df["timestamp"].astype(str)
df["date"] = [x[:10] for x in df["timestamp"]]
df["date"].astype(np.datetime64)

#add month field
df["month"] = pd.to_datetime(df["date"]).dt.to_period("M")

#drop unneeded columns
df_new = df[["id", "pp_rank", "pp_raw", "month"]]

#get unique list of dates
months = df_new["month"].unique()
print(months)

<PeriodArray>
['2014-05', '2014-06', '2014-07', '2014-08', '2014-09', '2014-10', '2014-11',
 '2014-12', '2015-01', '2015-02',
 ...
 '2022-01', '2022-02', '2022-03', '2022-04', '2022-05', '2022-06', '2022-07',
 '2022-08', '2022-09', '2022-10']
Length: 102, dtype: period[M]


We will now iterate through the months to get the average values for pp and rank for that month

In [6]:
#create arrays for new data
dates = []
ranks = []
pps = []

for month in months:
    this = df_new.loc[df_new["month"] == month]
    
    #get averages
    rank = this["pp_rank"].median().astype(str)
    pp = this["pp_raw"].median().astype(str)

    #append values to arrays
    dates.append(month)
    ranks.append(rank)
    pps.append(pp)

#create new dataset from this
df_month_avgs = pd.DataFrame({"month": dates, "avg_rank": ranks, "avg_pp": pps})
print(df_month_avgs)

       month avg_rank   avg_pp
0    2014-05  24771.0  1514.59
1    2014-06  21064.0  1735.54
2    2014-07  17298.0  2005.97
3    2014-08  20833.5  1872.12
4    2014-09  19490.0  2006.18
..       ...      ...      ...
97   2022-06  37142.0  5567.52
98   2022-07  60395.0  4583.51
99   2022-08  72276.0  4586.37
100  2022-09  54086.5  5146.65
101  2022-10  39228.0  5378.93

[102 rows x 3 columns]


In [7]:
#drop this new dataset in a csv file

df_month_avgs.to_csv(path / "final_data/monthly_avgs.csv")