# **Kaggle Merchandise Conundrum **

## Store Chains: KaggleMart and KaggleRama
## Countries: Norway, Sweden and Finland
## Products: Mug, Hat, and Sticker


Let us learn about the data using the power of python.
In this note-book, we'll learn about Basic Exploratory Data Analysis using Pandas, and vizualization using matplotlib and seaborn modules. 
We'll learn how to work with date-time data-type.

## Importing Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Importing dataset

In [None]:
df=pd.read_csv('../input/tabular-playground-series-jan-2022/train.csv',index_col=["row_id"])

## Basic EDA

Now that we have accessed our dataset, we'll have a high-level view of the data.

In [None]:
print(df.head())
print('--'*50)
print(df.tail())

Let's look at the unique values of the categorical data in our dataset

In [None]:
df['country'].unique()

In [None]:
df['store'].unique()

In [None]:
df['product'].unique()

Let's have a look at each of the data-types

In [None]:
df.info()

# Store-wise Split of Sales

How many items are sold in each store-chain

In [None]:
plt.style.use('ggplot')
plt.figure(figsize=(12,5))
plt.subplot(1,3,1)
sns.countplot(x='store',data=df)

plt.subplot(1,3,2)
sns.barplot(x='store',y='num_sold',data=df,estimator=sum)
plt.show

plt.subplot(1,3,3)
sns.barplot(x='store',y='num_sold',data=df)
plt.show

# Country-wise Split of sales

How many items are sold in each country

In [None]:
plt.style.use('ggplot')
plt.figure(figsize=(12,5))
plt.subplot(1,3,1)
sns.countplot(x='country',data=df)

plt.subplot(1,3,2)
sns.barplot(x='country',y='num_sold',data=df,estimator=sum)
plt.show

plt.subplot(1,3,3)
sns.barplot(x='country',y='num_sold',data=df)
plt.show

# Product-wise split of sales

How many items are sold of each product category

In [None]:
plt.style.use('ggplot')
plt.figure(figsize=(12,5))
plt.subplot(1,3,1)
sns.countplot(x='product',data=df)

plt.subplot(1,3,2)
sns.barplot(x='product',y='num_sold',data=df,estimator=sum)
plt.show

plt.subplot(1,3,3)
sns.barplot(x='product',y='num_sold',data=df)
plt.show

# Convert date to pandas datetime format 
## Additional columns for year-month-day and dayname

In order to plot and work with the daily sales data, we need to convert them to appropriate data type, more precisely Python datetime format.

In [None]:
df['date'] = pd.to_datetime(df['date'], format='%Y/%m/%d')
df.info()

In [None]:
df['year']= df['date'].dt.year
df['month'] = df['date'].dt.month_name()
df['day'] = df['date'].dt.day
df['dayname']=df['date'].dt.day_name()

Lets have a peek at the transformed Dataframe

In [None]:
df.head()

# Time-Series EDA

Let's have a look at the sales progression with time

In [None]:
sld_date = df.groupby(['date']).sum().reset_index()
plt.figure(figsize=(20,8))
sns.lineplot(x=sld_date.date, y=sld_date.num_sold,)
plt.title('number sold over time ', fontsize=14)
plt.show()

# Country-wise sales progression
This will give us idea of the seasonality/holiday effect in each individual country

In [None]:
df_nrw = df[df['country']=='Norway'].groupby(['date']).sum().reset_index()
plt.figure(figsize=(16,8))
plt.subplot(3,1,1)
sns.lineplot(x=df_nrw.date, y=df_nrw.num_sold)
plt.title('number sold in Norway ', fontsize=14)
plt.show()

df_fnl = df[df['country']=='Finland'].groupby(['date']).sum().reset_index()
plt.figure(figsize=(16,8))
plt.subplot(3,1,2)
sns.lineplot(x=df_fnl.date, y=df_fnl.num_sold)
plt.title('number sold in Finland ', fontsize=14)
plt.show()

df_swe= df[df['country']=='Sweden'].groupby(['date']).sum().reset_index()
plt.figure(figsize=(16,8))
plt.subplot(3,1,3)
sns.lineplot(x=df_swe.date, y=df_swe.num_sold)
plt.title('number sold in Sweden ', fontsize=14)
plt.show()