# Weighted Top Pop

As other participants showcased in their notebooks, there are two important factors for every item: 
* Seasonality of products 
* Age of products (Products sold in days closer to the week we must predict are more likely to be sold )

About the seasonality: some products are sold only in a specific time of the year. 
About the age: products sold in days closer to the test_set week are more likely to be bought in the test week.

To address these two factors in a Top Popularity recommendation approach, I added a weight to each transaction.
This weight is multiplied by an exponentially decaying weight following the formula:
    e^(-(days/temperature))
Where days indicates the distance in days between the start of the test week and temperature is a parameter which is used to tune how fast is the decay.
With a lower temperature the weight for older interactions becomes really low, so they get less likely to be ranked as top popular.

To address seasonality I multiplied the weight with an element of a vector which has an element for each month.
Considering that the predictions must be done for the last days of September 2020, I gave a weight of 1 to interactions for September of any year and lower weights to interactions for other months.

Changing these parameters the Map@12 on local validation set consisting of the transaction in the week before the week we must make predictions on can change from 0.0026  of a basic TopPop to values like 0.0088 using a really low temperature value (which has results similar to a Top Popular Recommender considering only most recent transactions). 

# Imports

In [None]:
import numpy as np 
import pandas as pd 
import os
from datetime import datetime
is_test=True

# Prepare Data

In [None]:
df = pd.read_csv("../input/h-and-m-personalized-fashion-recommendations/transactions_train.csv",dtype={"article_id":str})

In [None]:
df

In [None]:
df["date_time"]=pd.to_datetime(df["t_dat"])

In [None]:
df.drop([c for c in df.columns if c not in ["date_time","article_id","customer_id"]],axis=1,inplace=True)

In [None]:
if is_test:
    last_week_start = datetime.strptime("24/09/20 00:00:00", '%d/%m/%y %H:%M:%S')
else:
    last_week_start = datetime.strptime("16/09/20 00:00:00", '%d/%m/%y %H:%M:%S')

In [None]:
df_valid = df.loc[df["date_time"] >= last_week_start ].drop("date_time",axis=1)
user_to_evaluate=df_valid["customer_id"].unique()
df_train = df.loc[df["date_time"] <  last_week_start ].drop("customer_id",axis=1)

In [None]:
df_valid["list"] = df_valid.groupby("customer_id")["article_id"].transform(lambda x: " ".join([str(i) for i in x]))
df_valid.drop_duplicates("customer_id",inplace=True)  

In [None]:
valid=df_valid["list"].to_numpy()

In [None]:
valid_list=[x.split(" ") for x in valid]

In [None]:
df_train["month"] = df_train["date_time"].dt.month
df_train["days_distance"] = (last_week_start - df_train["date_time"]).dt.days

# Calculate Weight for each product

In [None]:
temperature = 3 # parameter of exponential decay 
df_train["weight"] = 1
df_train["weight"] *= np.exp(-(df_train["days_distance"]/temperature))

In [None]:
month_weights = [0,0,0,0,0,0,0.2,0.6,1,0.8,0.4,0.1] #weight of products bought in every month

df_train["weight"]*=df_train["month"].apply(lambda x: month_weights[x-1])


In [None]:
df_train_g = df_train.groupby("article_id").sum().reset_index() #sum weight for every product bought


In [None]:
df_train_sorted=df_train_g.sort_values(by="weight",ascending=False)

products = df_train_sorted["article_id"].to_numpy()[:12]

In [None]:
df_train_sorted[["article_id","weight"]].head(20)

In [None]:
products

# Visualize Weighted Top Popular Items

code from https://www.kaggle.com/negoto/h-m-sales-period-of-fashion-items-with-k-means#kln-69

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
from PIL import Image
from pathlib import Path
path = Path("/kaggle/input/h-and-m-personalized-fashion-recommendations/")

def show_images(article_ids, cols=1, rows=-1):
    if isinstance(article_ids, int) or isinstance(article_ids, str):
        article_ids = [article_ids]
    article_count = len(article_ids)
    if rows < 0: rows = (article_count // cols) + 1
    plt.figure(figsize=(3 + 3.5 * cols, 3 + 5 * rows))
    for i in range(article_count):
        article_id = ("0" + str(article_ids[i]))[-10:]
        plt.subplot(rows, cols, i + 1)
        plt.axis('off')
        plt.title(article_id)
        try:
            image = Image.open(f"/kaggle/input/h-and-m-personalized-fashion-recommendations/images/{article_id[:3]}/{article_id}.jpg")
            plt.imshow(image)
        except:
            pass

In [None]:
 show_images(products)

# Compute Metric MaP@12

Code from https://github.com/benhamner/Metrics

In [None]:
def apk(actual, predicted, k=12):
    """
    Computes the average precision at k.
    This function computes the average prescision at k between two lists of
    items.
    Parameters
    ----------
    actual : list
             A list of elements that are to be predicted (order doesn't matter)
    predicted : list
                A list of predicted elements (order does matter)
    k : int, optional
        The maximum number of predicted elements
    Returns
    -------
    score : double
            The average precision at k over the input lists
    """
    if len(predicted)>k:
        predicted = predicted[:k]

    score = 0.0
    num_hits = 0.0

    for i,p in enumerate(predicted):
        if p in actual and p not in predicted[:i]:
            num_hits += 1.0
            score += num_hits / (i+1.0)

    if not actual:
        return 0.0

    return score / min(len(actual), k)

def mapk(actual, predicted, k=12):
    """
    Computes the mean average precision at k.
    This function computes the mean average prescision at k between two lists
    of lists of items.
    Parameters
    ----------
    actual : list
             A list of lists of elements that are to be predicted 
             (order doesn't matter in the lists)
    predicted : list
                A list of lists of predicted elements
                (order matters in the lists)
    k : int, optional
        The maximum number of predicted elements
    Returns
    -------
    score : double
            The mean average precision at k over the input lists
    """
    return np.mean([apk(a,predicted,k) for a in actual])
        

In [None]:
if not is_test:
    mapk(valid_list,[str(x) for x in products])

# Submission

In [None]:
df_test = pd.read_csv("../input/h-and-m-personalized-fashion-recommendations/sample_submission.csv")

In [None]:
df_test["prediction"]=" ".join([str(x) for x in products])

In [None]:
df_test.to_csv("/kaggle/working/submission.csv",index=False)