# H&M Personalized Fashion Recommendations

今回のコンペでは、各顧客が(2020/09/22)以降の7日間で買うものを予測します。
品物に関するデータ(articles.csv), 顧客に関するデータ(customer.csv), 過去2年間の顧客の購買履歴(transactions_train.csv)の3つのデータが与えられていますが、様々な情報があるためどの情報を予測に用いれば良いかわかりずらいです。このため、まずはデータを探ってみたいと思います。
順次更新予定ですので、間違っている点などがあれば、ご指摘お願いします！

In this competition, we are trying to predict what each customer will buy in the 7 days after (2020/09/22).
We are given three sets of data: articles.csv, customer.csv, and transactions_train.csv, the purchase history of customers over the past two years. It is difficult to know which information to use for forecasting. For this reason, I would like to explore the data first.　I will be updating this page as I go along, so if there are any mistakes, please let me know!

## 必要なライブラリのインポート　/ Import libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
articles_data = pd.read_csv('../input/h-and-m-personalized-fashion-recommendations/articles.csv')
articles_data

In [None]:
customers = pd.read_csv('../input/h-and-m-personalized-fashion-recommendations/customers.csv')
customers

In [None]:
sample_submission = pd.read_csv('../input/h-and-m-personalized-fashion-recommendations/sample_submission.csv')
sample_submission

In [None]:
transactions_data = pd.read_csv('../input/h-and-m-personalized-fashion-recommendations/transactions_train.csv')
transactions_data.head()

## EDA

- まず顧客の年齢層の分布を見てみます. 下のグラフから、20代後半と50歳付近にピークがあることが分かります.
- First, let's look at the distribution of customer age groups. From the graph below, we can see that there are peaks in the late 20s and around 50 years old

In [None]:
sns.displot(customers['age'])

- 次に月ごとの売り上げを見てみましょう.　下のセルで新たにmonthという列を生成しています.
- Next, let's look at the sales by month.　In the bottom cell, we have created a new column called month.

In [None]:
def get_month(date):
    return date[0:4]+date[5:7]

transactions_data['month'] = transactions_data['t_dat'].map(get_month)
transactions_data

- 4-6月の売上高が高いことが見て取れます.
- We can see that sales in the April-June period are high.

In [None]:
transactions_data_month = transactions_data.groupby('month')['price'].sum()
fig = plt.figure(figsize=(8,5))
ax = fig.add_subplot(111)
ax.plot(transactions_data_month)
ax.set_title('Monthly Sales')
plt.xticks(rotation=60)
plt.show()

- 次に各品物がどれくらいの売上を上げているかを見てみます.
- Next, let's take a look at how much sales each item generates.

In [None]:
article_price_sum = transactions_data.groupby('article_id')['price'].sum().sort_values(ascending=False)
article_price_sum

In [None]:
article_price_ctgr_list = [0,1,5,25,125,625,5000]
article_price_ctgr_name = ['(0,1]','(1,5]','(5,25]','(25,125]','(125,625]','625-']
article_price_ctgr = pd.cut(article_price_sum, bins=article_price_ctgr_list, labels=article_price_ctgr_name)
article_price_ctgr

In [None]:
plot_data = article_price_ctgr.value_counts().sort_index()
plt.title('Relationship between sales and quantity of each item')
plt.xlabel('sales')
plt.ylabel('quantity')
plt.bar(plot_data.index, plot_data)

- 売上高が上位5位以内の品物を見てみましょう！
- Let's look at the goods in the top five in terms of sales!

In [None]:
import cv2
import os
base_path = "../input/h-and-m-personalized-fashion-recommendations/images"

fig = plt.figure(figsize=(16,4))
for i in range(5):
    article_id = "0" + str(article_price_sum.index[i]) + ".jpg"
    article_path = os.path.join(base_path, article_id[0:3])
    article_path = os.path.join(article_path, article_id)
    article_img = cv2.imread(article_path)
    ax = fig.add_subplot(1,5,i+1)
    ax.imshow(article_img)
    ax.set_title('id:%s'%(article_id))
    
plt.tight_layout()

## Show sample images for each product_type_name

In [None]:
product_list = articles_data['product_type_name'].unique()
len(product_list)

In [None]:
def display_sample_image(product_type_name="Vest top"):
    """
    product_type_nameを指定すると、各dirの画像がランダムに最大5つ表示される.
    
    dir_num is "010", "011", "012",..." 095".
    If dir_num is specified, up to four images of each dir will be displayed randomly.
    """
    base_path = '../input/h-and-m-personalized-fashion-recommendations/images'
    
    articles_data_new = articles_data[articles_data["product_type_name"]==product_type_name]
    articles_data_new.reset_index(drop=True)
    
    fig = plt.figure(figsize=(16,4))
    plt.title("product_type_name: {}".format(product_type_name))
    plt.yticks([])
    plt.xticks([])

    k = min(len(articles_data), 5)
    for i in range(k):
        index = np.random.randint(len(articles_data_new))
        article_id = "0" + str(articles_data_new.iloc[index]["article_id"]) + ".jpg"
        
        img_path = os.path.join(base_path, article_id[0:3])
        img_path = os.path.join(img_path, article_id)

        sample_pic = cv2.imread(img_path)
        
        ax = fig.add_subplot(1,5,i+1)
        ax.imshow(sample_pic)
    
    plt.tight_layout()

In [None]:
display_sample_image(product_type_name="Vest top")

In [None]:
display_sample_image(product_type_name=product_list[1])

In [None]:
display_sample_image(product_type_name=product_list[2])

In [None]:
display_sample_image(product_type_name=product_list[3])

In [None]:
display_sample_image(product_type_name=product_list[4])

In [None]:
display_sample_image(product_type_name=product_list[5])

In [None]:
display_sample_image(product_type_name=product_list[6])

In [None]:
display_sample_image(product_type_name=product_list[7])

In [None]:
display_sample_image(product_type_name=product_list[8])

In [None]:
display_sample_image(product_type_name=product_list[9])

In [None]:
display_sample_image(product_type_name=product_list[10])

In [None]:
display_sample_image(product_type_name=product_list[11])

In [None]:
display_sample_image(product_type_name=product_list[12])

In [None]:
display_sample_image(product_type_name=product_list[13])

In [None]:
display_sample_image(product_type_name=product_list[14])

In [None]:
display_sample_image(product_type_name=product_list[15])

In [None]:
display_sample_image(product_type_name=product_list[16])

In [None]:
display_sample_image(product_type_name=product_list[17])

In [None]:
display_sample_image(product_type_name=product_list[18])

In [None]:
display_sample_image(product_type_name=product_list[19])

In [None]:
display_sample_image(product_type_name=product_list[20])

In [None]:
display_sample_image(product_type_name=product_list[21])

In [None]:
display_sample_image(product_type_name=product_list[22])

In [None]:
display_sample_image(product_type_name=product_list[23])

In [None]:
display_sample_image(product_type_name=product_list[24])

In [None]:
display_sample_image(product_type_name=product_list[25])

In [None]:
display_sample_image(product_type_name=product_list[26])

In [None]:
display_sample_image(product_type_name=product_list[27])

In [None]:
display_sample_image(product_type_name=product_list[28])

In [None]:
display_sample_image(product_type_name=product_list[29])

In [None]:
display_sample_image(product_type_name=product_list[30])

In [None]:
display_sample_image(product_type_name=product_list[31])

In [None]:
display_sample_image(product_type_name=product_list[32])

In [None]:
display_sample_image(product_type_name=product_list[33])

In [None]:
display_sample_image(product_type_name=product_list[34])

In [None]:
display_sample_image(product_type_name=product_list[35])

In [None]:
display_sample_image(product_type_name=product_list[36])

In [None]:
display_sample_image(product_type_name=product_list[37])

In [None]:
display_sample_image(product_type_name=product_list[38])

In [None]:
display_sample_image(product_type_name=product_list[39])

In [None]:
display_sample_image(product_type_name=product_list[40])

In [None]:
display_sample_image(product_type_name=product_list[41])

In [None]:
display_sample_image(product_type_name=product_list[42])

In [None]:
display_sample_image(product_type_name=product_list[43])

In [None]:
display_sample_image(product_type_name=product_list[44])

In [None]:
display_sample_image(product_type_name=product_list[45])

In [None]:
display_sample_image(product_type_name=product_list[46])

In [None]:
display_sample_image(product_type_name=product_list[47])

In [None]:
display_sample_image(product_type_name=product_list[48])

In [None]:
display_sample_image(product_type_name=product_list[49])

In [None]:
display_sample_image(product_type_name=product_list[50])

In [None]:
display_sample_image(product_type_name=product_list[51])

In [None]:
display_sample_image(product_type_name=product_list[52])

In [None]:
display_sample_image(product_type_name=product_list[53])

In [None]:
display_sample_image(product_type_name=product_list[54])

In [None]:
display_sample_image(product_type_name=product_list[55])

In [None]:
display_sample_image(product_type_name=product_list[56])

In [None]:
display_sample_image(product_type_name=product_list[57])

In [None]:
display_sample_image(product_type_name=product_list[58])

In [None]:
display_sample_image(product_type_name=product_list[59])

In [None]:
display_sample_image(product_type_name=product_list[60])

In [None]:
display_sample_image(product_type_name=product_list[61])

In [None]:
display_sample_image(product_type_name=product_list[62])

In [None]:
display_sample_image(product_type_name=product_list[63])

In [None]:
display_sample_image(product_type_name=product_list[64])

In [None]:
display_sample_image(product_type_name=product_list[65])

In [None]:
display_sample_image(product_type_name=product_list[66])

In [None]:
display_sample_image(product_type_name=product_list[67])

In [None]:
display_sample_image(product_type_name=product_list[68])

In [None]:
display_sample_image(product_type_name=product_list[69])

In [None]:
display_sample_image(product_type_name=product_list[70])

In [None]:
display_sample_image(product_type_name=product_list[71])

In [None]:
display_sample_image(product_type_name=product_list[72])

In [None]:
display_sample_image(product_type_name=product_list[73])

In [None]:
display_sample_image(product_type_name=product_list[74])

In [None]:
display_sample_image(product_type_name=product_list[75])

In [None]:
display_sample_image(product_type_name=product_list[76])

In [None]:
display_sample_image(product_type_name=product_list[77])

In [None]:
display_sample_image(product_type_name=product_list[78])

In [None]:
display_sample_image(product_type_name=product_list[79])

In [None]:
display_sample_image(product_type_name=product_list[80])

In [None]:
display_sample_image(product_type_name=product_list[81])

In [None]:
display_sample_image(product_type_name=product_list[82])

In [None]:
display_sample_image(product_type_name=product_list[83])

In [None]:
display_sample_image(product_type_name=product_list[84])

In [None]:
display_sample_image(product_type_name=product_list[85])

In [None]:
display_sample_image(product_type_name=product_list[86])

In [None]:
display_sample_image(product_type_name=product_list[87])

In [None]:
display_sample_image(product_type_name=product_list[88])

In [None]:
display_sample_image(product_type_name=product_list[89])

In [None]:
display_sample_image(product_type_name=product_list[90])

In [None]:
display_sample_image(product_type_name=product_list[91])

In [None]:
display_sample_image(product_type_name=product_list[92])

In [None]:
display_sample_image(product_type_name=product_list[93])

In [None]:
display_sample_image(product_type_name=product_list[94])

In [None]:
display_sample_image(product_type_name=product_list[95])

In [None]:
display_sample_image(product_type_name=product_list[96])

In [None]:
display_sample_image(product_type_name=product_list[97])

In [None]:
display_sample_image(product_type_name=product_list[98])

In [None]:
display_sample_image(product_type_name=product_list[99])

In [None]:
display_sample_image(product_type_name=product_list[100])

In [None]:
display_sample_image(product_type_name=product_list[101])

In [None]:
display_sample_image(product_type_name=product_list[102])

In [None]:
display_sample_image(product_type_name=product_list[103])

In [None]:
display_sample_image(product_type_name=product_list[104])

In [None]:
display_sample_image(product_type_name=product_list[105])

In [None]:
display_sample_image(product_type_name=product_list[106])

In [None]:
display_sample_image(product_type_name=product_list[107])

In [None]:
display_sample_image(product_type_name=product_list[108])

In [None]:
display_sample_image(product_type_name=product_list[109])

In [None]:
display_sample_image(product_type_name=product_list[110])

In [None]:
display_sample_image(product_type_name=product_list[111])

In [None]:
display_sample_image(product_type_name=product_list[112])

In [None]:
display_sample_image(product_type_name=product_list[113])

In [None]:
display_sample_image(product_type_name=product_list[114])

In [None]:
display_sample_image(product_type_name=product_list[115])

In [None]:
display_sample_image(product_type_name=product_list[116])

In [None]:
display_sample_image(product_type_name=product_list[117])

In [None]:
display_sample_image(product_type_name=product_list[118])

In [None]:
display_sample_image(product_type_name=product_list[119])

In [None]:
display_sample_image(product_type_name=product_list[120])

In [None]:
display_sample_image(product_type_name=product_list[121])

In [None]:
display_sample_image(product_type_name=product_list[122])

In [None]:
display_sample_image(product_type_name=product_list[123])

In [None]:
display_sample_image(product_type_name=product_list[124])

In [None]:
display_sample_image(product_type_name=product_list[125])

In [None]:
display_sample_image(product_type_name=product_list[126])

In [None]:
display_sample_image(product_type_name=product_list[127])

In [None]:
display_sample_image(product_type_name=product_list[128])

In [None]:
display_sample_image(product_type_name=product_list[129])

In [None]:
display_sample_image(product_type_name=product_list[130])

- もし役に立ったと思ったら、upvoteしていただけると励みになります!
- If you find it useful, please upvote it!
