# EDA on E-Commerce Shipping Data

### Data Description
- ID: ID Number of Customers.
- Warehouse block:- The Company have big Warehouse which is divided in to block such as A,B,C,D,E.
- Mode of shipment:-The Company Ships the products in multiple way such as Ship, Flight and Road.
- Customer care calls:- The number of calls made from enquiry for enquiry of the shipment.
- Customer rating:- The company has rated from every customer. 1 is the lowest (Worst), 5 is the highest (Best).
- Cost of the product-: Cost of the Product in US Dollars.
- Prior purchases:- The Number of Prior Purchase.
- Product importance:- The company has categorized the product in the various parameter such as low, medium, high.
- Gender:- Male and Female.
- Discount offered:- Discount offered on that specific product.
- Weight in gms:- It is the weight in grams.
- Reached on time:- It is the target variable, where 1 Indicates that the product has NOT reached on time and 0 indicates it has reached on time.

In [None]:
# necessary imports

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

import warnings
warnings.filterwarnings('ignore')

plt.style.use('fivethirtyeight')
%matplotlib inline

In [None]:
# loading data

df = pd.read_csv('../input/customer-analytics/Train.csv')
df.head()

In [None]:
df.shape # looking at the shape of data

In [None]:
df.describe() # getting description of data

In [None]:
df.info() # taking a look at info of the data.

In [None]:
# checking for null values using missingno module

import missingno as msno
msno.bar(df, color = 'lightblue')
plt.title('Checking for Null Values\n', fontsize = 40)
plt.show()

In [None]:
# dropping unwanted column using drop method

df.drop('ID', axis = 1, inplace = True)
df.head()

In [None]:
# heatmap of the data for checking the correlation between the features and target column.

plt.figure(figsize = (18, 7))
sns.heatmap(df.corr(), annot = True, fmt = '0.2f', annot_kws = {'size' : 15}, linewidth = 5, linecolor = 'orange')
plt.show()

Conclusions from Correlation matrix :-
- Discount Offered have high positive correlation with Reached on Time or Not of 40%.
- Weights in gram have negative correlation with Reached on Time or Not -27%.
- Discount Offered and weights in grams have negative correlation -38%.
- Customer care calls and weights in grams havenegative correlation -28%.
- Customer care calls and cost of the product have positive correlation of 32%.
- Prior Purchases and Customer care calls have slightly positive correlation.

In [None]:
df.head() # looking at first five rows of the data

## Exploratory Data Analysis (EDA)

**Checking value counts of categorical columns**

In [None]:
# here by these plots we are lookin at the counts of each categories in the categorical columns
# creating a list of categorical coumns
cols = ['Warehouse_block', 'Mode_of_Shipment', 'Customer_care_calls', 'Customer_rating',
        'Prior_purchases', 'Product_importance', 'Gender', 'Reached.on.Time_Y.N']

plt.figure(figsize = (16, 20))
plotnumber = 1

# plotting the countplot of each categorical column.

for i in range(len(cols)):
    if plotnumber <= 8:
        ax = plt.subplot(4, 2, plotnumber)
        sns.countplot(x = cols[i], data = df, ax = ax, palette='rocket')
        plt.title(f"\n{cols[i]} Value Counts\n", fontsize = 20)
        
    plotnumber += 1

plt.tight_layout()
plt.show()

From the above plots, we can conclude following:-
- Warehouse block F have has more values than all other Warehouse blocks.
- In mode of shipment columns we can clearly see that ship delivers the most of products to the customers.
- Most of the customers calls 3 or 4 times to the customer care centers.
- Customer Ratings does not have much variation.
- Most of the customers have 3 prior purchases.
- We can say that mopst of the products are of low Importance.
- Gender Column doesn't have much variance.
- More products doesn't reach on time than products reached on time.


### Exploring relation of categorical columns with reached on time or not

In [None]:
# creating a list of categorical coumns

object_columns = df.select_dtypes(include = ['object'])
object_columns.head()

### Ware_house block

In [None]:
# looking at the warehouse column and what are the categories present in it

warehouse = object_columns['Warehouse_block'].value_counts().reset_index()
warehouse.columns = ['warehouse', 'value_counts']
fig = px.pie(warehouse, names = 'warehouse', values = 'value_counts', 
             color_discrete_sequence = px.colors.sequential.matter_r, width = 650, height = 400,
             hole = 0.5)
fig.update_traces(textinfo = 'percent+label')

In [None]:
# making a countplot of warehouse column and see the effect of Reached on time or not on the warehouse column.

plt.figure(figsize = (17, 6))
sns.countplot('Warehouse_block', hue = 'Reached.on.Time_Y.N', data = df, palette='rocket')
plt.show()

### gender

In [None]:
# looking at the gender column and what are the categories present in it

gender = object_columns['Gender'].value_counts().reset_index()
gender.columns = ['Gender', 'value_counts']
fig = px.pie(gender, names = 'Gender', values = 'value_counts', color_discrete_sequence = 
            px.colors.sequential.Darkmint_r, width = 650, height = 400, hole = 0.5)
fig.update_traces(textinfo = 'percent+label')

In [None]:
# making a countplot of gender column and see the effect of Reached on time or not on the warehouse column.

plt.figure(figsize = (17, 6))
sns.countplot('Gender', hue = 'Reached.on.Time_Y.N', data = df, palette='rocket')
plt.show()