## The reason why I wrote this notebook

In this discussions, [Addressing common questions and what the competition is really about](https://www.kaggle.com/c/h-and-m-personalized-fashion-recommendations/discussion/307288) and [Care to share?](https://www.kaggle.com/competitions/h-and-m-personalized-fashion-recommendations/discussion/314458), they mentioned that generating candidates is important to improve score. <br> However, I cannot find some good notebooks for scoring the candidate generation. So I made it!

1. I evaluate the candidate generation with Local CV. Please refer to [here](https://www.kaggle.com/competitions/h-and-m-personalized-fashion-recommendations/discussion/308919) for making Local CV.
2. I evaluate the candidate generation with 2 metrics, referenced by [@jacob34's](https://www.kaggle.com/competitions/h-and-m-personalized-fashion-recommendations/discussion/314458), `Recall` and `Multiple Factor`. 

In [None]:
import numpy as np
import pandas as pd

import cudf

## Prepare the Local CV

In [None]:
%%time

transactions = cudf.read_csv("../input/h-and-m-personalized-fashion-recommendations/transactions_train.csv")
transactions.t_dat = pd.to_datetime(transactions.t_dat.to_pandas())
transactions["week"] = 104 - (transactions.t_dat.max() - transactions.t_dat).dt.days // 7

USE_WEEKS = 5
TEST_WEEK = 104

valid = transactions[transactions['week'] == TEST_WEEK][['customer_id', 'article_id']].to_pandas()
transactions = transactions[(transactions.week > TEST_WEEK - USE_WEEKS) & (transactions.week < TEST_WEEK)]  

## Generate Candidates

I made simple candidates by two methods.

1. `previous_week` : Previous week history
2. `previous_week_top` : (Previous week Popular Top 12 Products) x Customers

In [None]:
previous_week = transactions[transactions['week'] == 103][['customer_id', 'article_id']].to_pandas()
top_products = pd.DataFrame(data=transactions[transactions['week'] == 103].to_pandas().value_counts('article_id').iloc[:200].index.tolist(),
                            columns=['article_id'])
previous_week_top = transactions[['customer_id']].drop_duplicates().to_pandas().merge(top_products, how='cross')
cand = pd.concat([previous_week, previous_week_top]).drop_duplicates()

## Score

In [None]:
def score(actual, predict):
    act_tot = len(actual)
    pre_tot = len(predict)
    correct = actual.merge(predict, on=['customer_id', 'article_id'], how='inner').shape[0]
    print(f"[+] Recall = {correct/act_tot*100:.1f}% ({correct}/{act_tot})")
    print(f"[+] Multiple Factor = {pre_tot//correct} ({pre_tot}/{correct})")

In [None]:
score(valid, cand)

**If you have some good idea for generating candidates, let's talk together!**

**If this notebook was good for you, Please Upvote!**