## Competition goal 
To predict what articles each customer will purchase in the 7-day period immediately after the training data ends.

## Evaluation Metric
Submissions are evaluated according to the Mean Average Precision @ 12 (MAP@12):
$$
MAP@12 = \frac{1}{U} \sum_{u=1}^{U}  \sum_{k=1}^{min(n,12)} P(k) \times rel(k)
$$
where U is the number of customers, P(k) is the precision at cutoff k, n  is the number predictions per customer, and rel(k) is an indicator function equaling 1 if the item at rank k is a relevant (correct) label, zero otherwise.

### Other important points
* Some Articles (products) have corresponding images but not all.
* Up to 12 articles to be predicted for each customer.
* Predictions are to be made for all customer IDs given in sample_submission.csv irrespective of whether they appear in transactions training data or not.

## Exploring Data files

In [None]:
import numpy as np
import pandas as pd 
import seaborn as sns
import plotly.express as px
import os

## Exploring articles.csv

In [None]:
# articles.csv has all the information about all the articles(products) available that customers can buy
articles = pd.read_csv('../input/h-and-m-personalized-fashion-recommendations/articles.csv',dtype={'article_id': str})
articles.head()

In [None]:
articles.info()

In [None]:
print(f'Number of unique Articles : {articles.article_id.nunique()}')
print(f'Number of unique Product_code : {articles.product_code.nunique()}')
print(f'Number of unique Product_type_no : {articles.product_type_no.nunique()}')
print(f'Number of unique Product_group_name : {articles.product_group_name.nunique()}')

In [None]:
# Let us see what the 19 groups of products are
articles.product_group_name.unique()

In [None]:
product_group_counts = articles.groupby(['product_group_name'])['article_id'].count()
product_group_counts.sort_values(ascending=False)

## Exploring customers.csv

In [None]:
customers_df = pd.read_csv('../input/h-and-m-personalized-fashion-recommendations/customers.csv')

In [None]:
customers_df.head()

In [None]:
customers_df.info()

* There are 1371980 unique customers
* Out of which active are 464404

In [None]:
customers_df['age'].describe()

In [None]:
print(f'The average age of customers is {customers_df["age"].mean():.1f} and the median age is {customers_df["age"].median()}')

In [None]:
sns.displot(customers_df['age'])

## Transactions training data

In [None]:
transactions_train_df = pd.read_csv('../input/h-and-m-personalized-fashion-recommendations/transactions_train.csv',dtype={'customer_id': str}, parse_dates=['t_dat'])

In [None]:
transactions_train_df.head()

In [None]:
transactions_train_df.info()

In [None]:
# Grouping by sales_channel_id
transactions_train_df.groupby(['sales_channel_id']).agg(article_id_count=('article_id','count'),
                                                        customer_id_count= ('customer_id','count'))

In [None]:
sns.boxplot(y=transactions_train_df['price'],color='yellow')

In [None]:
date_range = str(transactions_train_df['t_dat'].dt.date.min()) + ' to ' +str(transactions_train_df['t_dat'].dt.date.max())
print(date_range)

In [None]:
df = transactions_train_df.groupby(['t_dat',"sales_channel_id"])['price'].agg(['sum']).sort_values(by = 't_dat').reset_index()
fig = px.bar( df, x='t_dat', y='sum', title='Daily Sales',color="sales_channel_id", labels={'t_dat':'Transaction Date','sum':'Total Sales'})
fig.show()

In [None]:
# List article ids in descending order of total customer counts and total amount of sales
transactions_train_df.groupby(['article_id']).agg({'customer_id':'count','price':'sum'}).sort_values(by=['customer_id','price'],ascending=[False,False])

Transactions_train.csv is a big file with more than 3 million transaction records spanning almost 2 years from 2018-09-20 to 2020-09-22

## sample_submission file

In [None]:
sample_submission_df = pd.read_csv('../input/h-and-m-personalized-fashion-recommendations/sample_submission.csv')

In [None]:
sample_submission_df.info()

In [None]:
sample_submission_df['customer_id'].nunique()

## Work in Progress!
* Images
* transactions deep dive 