# Exploratory Data Analysis

In [1]:
# import the libraries
%matplotlib inline

import pandas as pd
import numpy as np
import scipy
import matplotlib.pyplot as plt
import seaborn as sns

# apply style to all the charts
sns.set_style('whitegrid')

## Load the clean dataset

In [2]:
# Load the data
df = pd.read_csv('black_friday_processed.csv')

## Data Story

### Set-up & Hook; Rising Insight #1; Rising Insight #2; Aha Moment; Solution & Next Steps

In [3]:
# Catplot for Age x Purchase considering the Gender and the City Category
sns.catplot(x = 'Age',
           y = 'Purchase',
           hue = 'Gender',
           col = 'City_Category',
           order = ['0-17', '18-25','26-35', '36-45','46-50', '51-55','55+'],
           hue_order = ['M', 'F'],   
           data = df)
plt.savefig('figures/Data_Story_General.png')
plt.clf()
plt.close()

The above figure shows us the distribution of purchase amounts for respective age group across the cities A, B and C.  Note that in general men in the 26-35 age group are our largest in number and in amount as our highest spending group across all three cities. Women, on the other hand, occurs to peak in number and purchase amount within the 46-50 age group except in City A where the majority of purchase were clearly made by the 0-17 age group.   

Based on the above summary, we can advise the store manager to gear their sales effort on the said gender specific age groups on next black friday. The type of products to be chosen will depend on the store past experiences dealing with the respective age group with products that are known to be popular for specific gender.

In [4]:
# Catplot for Age x Purchase considering the Gender and the Marital Status

sns.catplot(x = 'Age',
           y = 'Purchase',
           hue = 'Gender',
           col = 'Marital_Status',
           order = ['0-17', '18-25','26-35', '36-45','46-50', '51-55','55+'],
           hue_order = ['M', 'F'],   
           data = df)
plt.savefig('figures/Data_Story_Specific.png')
plt.clf()
plt.close()

To look again at the initial question we asked ourselves about the dataset: whether young single male as the driving force on Black Friday. On one hand, we know from the dataset that the majority of black friday customers are male, in fact, about 3 times more. On the other hand, we will include other factors (Marital status and high purchase amount) to further test our hypothesis. As we deem profitability in terms of sales/purchase amount, we will focus our analysis around the purchase mean of 20000 as higher the amount would typically imply bigger ticket items. As for the age group, we will look at the combined age group from 18-35. We can confirm that further in our later analyses, but the blue color appears to dominate this area. 

Note: The legal marrying age at 18 explains the missing column for the 0-17 age group on the second graph 