# Customer Compact Visualization!

Hi guys, this notebook can be used to understand customers on "individual-level" - 
in complimentary to most excellent notebooks we have, which more focus on macro EDA / macro properties of the dataset.

## Usage
I think this notebook can be used for explanable **"error-analysis"**. I.e. after making predictions, you can have a small eye-ball validation-set where you can investigate the recommendation performance manually. The error for some cases may be sensible e.g. a customer who buy totally random stuffs, **or non-sensible** e.g. a customer who has exact pattern of what to buy, but your model still guess them wrong.

In the latter case, this micro EDA/error-analysis can be help to adjust the model to be more sensible toward the easy-but-failed cases.

In [None]:
import numpy as np
import pandas as pd
import os
import glob

from tqdm import tqdm
import datetime

import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

## Helper functions
Below is the helper function to show a list of arbitrary number of images.

In [None]:
from IPython.display import display

def show_image(image,figsize=None,title=None):
    
    if figsize is not None:
        fig = plt.figure(figsize=figsize)
    
    if isinstance(image, str):
        try:
            image = mpimg.imread(f'../input/h-and-m-personalized-fashion-recommendations/images/{str(image)[:3]}/0{int(image)}.jpg')
        except:
            image = np.zeros([16,16,3])
    
    if image.ndim == 2:
        plt.imshow(image,cmap='gray')
    else:
        plt.imshow(image)
            
def show_Nimages_all(imgs,scale=1,titles=None):

    N=len(imgs)
    fig = plt.figure(figsize=(25/scale, 16/scale))
    for i, img in enumerate(imgs):
        ax = fig.add_subplot(1, N, i + 1, xticks=[], yticks=[])
        show_image(img)
        if titles is not None:
            title = titles[i]
            ax.title.set_text(title)
    plt.show()

def show_Nimages(imgs, num_per_row=10, scale=1, titles=None):

    N=len(imgs)
    current=0
    remaining=N
    while remaining > num_per_row:
        images = imgs[current:current+10]
        
        current_titles = None
        if titles is not None:
            current_titles = titles[current:current+10]
            
        show_Nimages_all(images, scale, current_titles)
        
        remaining -= num_per_row
        try: 
            imgs = imgs[current+10:]
            titles = titles[current+10:]
        except: pass
    
    if len(imgs) > 0:
        show_Nimages_all(imgs, scale, titles)

Next is the helper and main function used for print information of each customer. To use this function, just call `print_customer()` as shown below.

In [None]:
def make_titles(titles, prices):
    new_titles = []
    assert len(titles) == len(prices)
    
    for i in range(len(titles)):
        new_titles.append(f'{titles[i]}:{prices[i]:.2f}')
    return new_titles

def print_customer(data, articles_df, cus_id, repeat_threshold=3):
    cus = cus_id

    cus_df = data.query('customer_id == @cus')
    print(f'\n** Total items bought = {cus_df.shape[0]}**\n')

    titles = cus_df.t_dat.values
    prices = cus_df.price.values
    plt.plot(prices)
    plt.title(f'average price = {prices.mean():.3f}, std = {prices.std():.3f}')
    
#     print('sales_channel_id')
    print(f'channel_id:', cus_df['sales_channel_id'].value_counts(),'\n')
    
    # favorite colors, repeated sales
    article_id_str = cus_df['article_id'].values
    cus_df.loc[:,'article_id'] = cus_df['article_id'].apply(lambda x: int(x))
    cus_df = pd.merge(cus_df, articles_df, how="left", on=["article_id"])
    cus_df.loc[:,'article_id'] = article_id_str
        
#     print(cus_df['perceived_colour_master_name'].value_counts(normalize=True)[:3])
    
    f, ax = plt.subplots(figsize=(10, 6))
    ax = sns.histplot(data=cus_df, y='perceived_colour_master_name', hue='prod_name', multiple="stack")
    ax.set_xlabel('Count by color')
    ax.set_ylabel('Favorite Colors')
    plt.show()
    
    product_repeated_num = cus_df['prod_name'].value_counts(normalize=False)
    if product_repeated_num[0] >= repeat_threshold:
        print('** LOVE buy repeating items **')
        print(product_repeated_num[:3])
    else:
        print('** DONT love buy repeating items **')
    
    titles  = make_titles(titles, prices)
    
    images = cus_df.article_id.values
    show_Nimages(images,titles=titles)


In [None]:
%%time
data_df = pd.read_csv("../input/h-and-m-personalized-fashion-recommendations/transactions_train.csv", dtype={'article_id':str})
print(data_df.shape)
data_df.head()

articles_df = pd.read_csv("../input/h-and-m-personalized-fashion-recommendations/articles.csv")
customers_df = pd.read_csv("../input/h-and-m-personalized-fashion-recommendations/customers.csv")

all_cus = data_df['customer_id'].unique()
print(len(all_cus))

## Customer Visualization
Visualize of customer #0 who bought 18 items, and has pattern to buy black-color stuffs, and love long-sleeve shirt.

In [None]:
cus = all_cus[0]
print_customer(data_df, articles_df, cus)

Visualize of customer #10 who bought 30 items during the past 2 years, with some gray and pink stuffs. She seem loves to repeatly buy the same products.

In [None]:
cus = all_cus[10]
print_customer(data_df, articles_df, cus)

Customer #1000 is a fan of H&M and bought 72 items during the last 2 years. She used both channel 1 and channel 2 (online and offline ??).
Her favorite color is obviously black. She bought lots of diversed items, so her decision next week is not easy to guess (except that the items should be black). So if the model failed on this case, you should not be worried too much.

In [None]:
cus = all_cus[1000]
print_customer(data_df, articles_df, cus)