<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:110%;
           font-family:Verdana;
           letter-spacing:0.5px">

<h1 style="padding: 10px;
              color:white;">

              H&M Data Visualization
</h1>
</div>

***

<h4 style="color:purple;">In this competitions H&M wants you to build a personalize fashion recommendation system because they have huge number of products on their online platform But with too many choices, customers might not quickly find what interests them or what they are looking for, and ultimately, they might not make a purchase. To enhance the shopping experience.</h4>

***

# Dataset

<ol style="color:purple;"><li><h4>images/ - a folder of images corresponding to each article_id; images are placed in subfolders starting with the first three digits of the article_id; note, not all article_id values have a corresponding image.</h4></li>
    <li><h4>articles.csv - detailed metadata for each article_id available for purchase</h4></li>
    <li><h4>customers.csv - metadata for each customer_id in dataset</h4></li>
    <li><h4>sample_submission.csv - a sample submission file in the correct format</h4></li>
<li><h4>transactions_train.csv - the training data, consisting of the purchases each customer for each date, as well as additional information. Duplicate rows correspond to multiple purchases of the same item. Your task is to predict the article_ids each customer will purchase during the 7-day period immediately after the training data period.</h4></li><ol>

***

In [None]:
# import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 
import os
import numpy as np
import cv2
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('float_format', '{:f}'.format)

In [None]:
IMG_DIR="../input/h-and-m-personalized-fashion-recommendations/images"

In [None]:
# Reading all the csv files
articles=pd.read_csv("../input/h-and-m-personalized-fashion-recommendations/articles.csv")
customers=pd.read_csv("../input/h-and-m-personalized-fashion-recommendations/customers.csv")
transactions=pd.read_csv("../input/h-and-m-personalized-fashion-recommendations/transactions_train.csv")
sample_submission=pd.read_csv("../input/h-and-m-personalized-fashion-recommendations/sample_submission.csv")

<h4 style="color:blue;">Let's display first few rows of all the dataframes</h4>

In [None]:
articles.head(2)

In [None]:
customers.head(2)

In [None]:
transactions.head(2)

In [None]:
# Let's find out the shapes of all three dataframes
shape=pd.DataFrame({"Row":[articles.shape[0],customers.shape[0],transactions.shape[0]],
             "Column":[articles.shape[1],customers.shape[1],transactions.shape[1]]},index=['articles',
                                                                                          'customers','transactions'])
green = [{'selector': 'th', 'props': 'background-color: green'}]
red = [{'selector': 'th', 'props': 'background-color: red'}]
shape.style.set_table_styles({"articles": green, "customers": red, "transactions": green}, axis=1)

In [None]:
articles.product_group_name.value_counts().to_frame()

In [None]:
plt.figure(figsize=(16,10))
sns.countplot(y='product_group_name',data=articles,order=articles['product_group_name'].value_counts().index[:10])
plt.title("Product Group Name",font='serif',size=20,color="purple")
plt.xlabel("Count",size=20,color="purple")
plt.ylabel("Product_Group_Name",size=20,color="purple")
plt.xticks(size=16)
plt.yticks(size=16)
plt.show()

These are the 10 most frequent 

In [None]:
articles.dtypes

In [None]:
articles['dir'] = articles.article_id.astype(str).str[:2].astype(int)

In [None]:
articles.head()

<h3 style="color:purple">article_id from the articles dataset is the image id from the image folder
Here i am trying to access the article_id corresponding to the particular product_group_name for instance article_id corresponding to the shoes product_group_name and trying to visualiza them.</h3>

In [None]:
def get_article_id(df,group_name):
    article_id=df[df['product_group_name']==group_name]
    article_id['article_id']="0"+article_id['article_id'].astype(str)
    article_id['dir']="0"+article_id['dir'].astype(str)
    return article_id[['article_id','dir']].reset_index(drop=True)



def read_img(data):
    li=[]
    for i in range(10):
        arti=data['article_id'][i]
        di=data['dir'][i]
        im=cv2.imread("../input/h-and-m-personalized-fashion-recommendations/images/"+di+"/"+arti+".jpg")
        im=cv2.resize(im,(224,224),fx=0,fy=0, interpolation = cv2.INTER_CUBIC)
        li.append(im)
    return li


def show_img(data):
    f, axarr = plt.subplots(1,5,figsize=(15,10)) 
    axarr[0].imshow(data[0])
    axarr[1].imshow(data[1])
    axarr[2].imshow(data[2])
    axarr[3].imshow(data[3])
    axarr[4].imshow(data[4])
    f.tight_layout()
    
def call(df,group_name):
    _id=get_article_id(df,group_name)
    img=read_img(_id)
    display_img=show_img(img)
    return display_img



    

In [None]:
call(articles,"Garment Lower body")

In [None]:
call(articles,"Garment Upper body")

In [None]:
call(articles,"Accessories")

In [None]:
call(articles,"Underwear")

In [None]:
call(articles,"Swimwear")

In [None]:
call(articles,"Socks & Tights")