We know that popularity-based recommendation is strong. But, which period should we use for counting popularity? As we can't know what is popular and what is not in the test period, we have to make a best guess based on the sales in the training period. If the period is too short like the last one day, it could be noisy. But if it's too long like the last one year, it could vague the hot trend.

In this notebook, I investigate which is the optimal period for counting popularity. Again, as we don't have the popularity in the test period, I instead use validation period to confirm its optimality.

# Summary
 It is the best to use only the last day to count popularity.

# Setups

In [None]:
from datetime import datetime, date, timedelta
from typing import List, Tuple, Dict, Union

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from tqdm import tqdm

sns.set()

# Configs

In [None]:
LAST_DAY = date(2020,9,22)
FIRST_DAY = date(2018,9,20)
LAST_DAY_TRAIN = (LAST_DAY - timedelta(1*7))
df = pd.read_csv("../input/h-and-m-personalized-fashion-recommendations/customers.csv")
CUSTOMERID2INDEX = dict(zip(df["customer_id"], df.index))
INDEX2CUSTOMERID = dict(zip(df.index, df["customer_id"]))
del df

# Loading Data

In [None]:
def load_efficient_df() -> pd.DataFrame:
    df = pd.read_csv(
        "../input/h-and-m-personalized-fashion-recommendations/transactions_train.csv",
        dtype={"t_dat": "object", "customer_id": "object", "article_id": "object", "price": float, "sales_channel_id": int},
    )
    df['t_dat'] = pd.to_datetime(df['t_dat'])
    # For reducing memory usage
    df["customer_id"] = df["customer_id"].map(CUSTOMERID2INDEX).astype('int32')
    df['article_id'] = df['article_id'].astype('int32')
    return df

In [None]:
df = load_efficient_df()
N_ARTICLES = df["article_id"].nunique()
display(df)

# Counting Popularity with Different Periods

Here is the "target" popularity, which is counted in the last week of the training data. This is what we want to predict/approximate in this experiment. 

In [None]:
ranks = df.query(f"t_dat > '{LAST_DAY_TRAIN}'").groupby("article_id")["customer_id"].agg(pd.Series.nunique) \
                .rank(ascending=False)
ranks.name = "target"
display(ranks)

We are interested in the popular articles. Here is how they fluctuate in ranks by changing the length of the period from 1 day to 4 weeks.

In [None]:
deltas = list(range(1, 4*7+1))
for delta in tqdm(deltas):
    past = df.query(f"t_dat > '{LAST_DAY_TRAIN - timedelta(delta)}' and t_dat <= '{LAST_DAY_TRAIN}'") \
        .groupby("article_id")["customer_id"].agg(pd.Series.nunique).rank(ascending=False)
    past.name = f"{delta}D"
    ranks = pd.merge(ranks, past, on="article_id", how="left").sort_values("target")[:12]
display(ranks)

In [None]:
for i in range(12):
    plt.plot(range(28, -1, -1), ranks.iloc[i].values[::-1].clip(0,24))
plt.gca().invert_yaxis()
plt.xlabel("Length of counting period [days]")
plt.ylabel("Rank")
plt.title("Change in rankings of top 12 articles")
plt.show()

To measure how different each ranking is to the target ranking, I use the following two metrics:
- Spearman's rank correlation coefficient (equivalent to Pearson's $r$ of ranks)
- Rank-weighted reciprocal rank (I extended the idea of mean reciprocal rank)

NOTE: It is also possible to validate by MAP@12, the true evaluation metric, but I omit it here for brevity. Calculating MAP@12 involves arbitrary choice of which customers to give popularity-based recommendations and which to other algorithms.

In [None]:
def spearmanr(y_true, y_pred):
    """Spearman's rank correlation coefficient"""
    score, _ = stats.pearsonr(y_true, y_pred)
    return score

def rwrr(y_true, y_pred):
    """Rank-weighted reciprocal rank"""
    score = 0
    for i, rank in enumerate(y_pred):
        score += (1 / rank) / (i+1)
    return score

In [None]:
y1s, y2s = [], []
for delta in deltas:
    y1s.append(spearmanr(ranks["target"], ranks[f"{delta}D"]))
    y2s.append(rwrr(ranks["target"], ranks[f"{delta}D"]))


In [None]:
plt.plot(deltas, y1s)
plt.xlabel("Length of counting period [days]")
plt.ylabel("Rank correlation (Spearman)")
plt.show()

In [None]:
plt.plot(deltas, y2s)
plt.xlabel("Length of counting period [days]")
plt.ylabel("Rank-weighted reciprocal rank")
plt.show()

Both metrics tell us it is the best to **use only the last day** to count popularity.