In [None]:
from IPython import display
display.Image("https://co-well.vn/wp-content/uploads/2019/12/why-ecommerce-is-important-with-business.png")

# **Table of Content**

* [Importing Libraries](#section-one)
* [Data Cleaning](#section-two)
    - [Missing Values](#subsection-one)
    - [Duplicates](#subsection-two)
* [Analyzing Ship Modes](#section-three) 
* [Analyzing Segments](#section-four)  
* [Analyzing Categories](#section-five)   
* [Analyzing Sub Categories](#section-six)
* [Analyzing Discounts](#section-seven)
* [Analyzing Products](#section-eight)
* [Analyzing Coustomers](#section-nine)
* [Conclusion](#section-ten)   





- This data set contains records of online operated E-commerce services and their customers in USA.

<a id="section-one"></a>
# Importing Libraries

In [None]:
# importing libraries
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
file_path = '../input/online-store-dataset/SuperstoreDataset.csv'

data = pd.read_csv(file_path)
data = data.drop('Unnamed: 17', axis = 1) # droping unnecessay column
data

In [None]:
data.columns

<a id="section-two"></a>
# Data Cleaning

<a id="subsection-one"></a>
## Missing Values

In [None]:
data.info()

- No Missing values present.

## No of unique values in each column

In [None]:
data.nunique()

<a id="subsection-two"></a>
## Duplicates

In [None]:
data[data.duplicated()]

In [None]:
data.loc[3405:3407]

In [None]:
data.drop_duplicates(inplace = True) # droping ducplicates

In [None]:
data.loc[3405:3407]

- duplicate row with index number 3406 is removed.

<a id="section-three"></a>
# Analyzing Ship Modes

- There are four different ship modes available.
1. Standard Class
2. Second Class
3. First Class
4. Same Day

In [None]:
fig, ax = plt.subplots(figsize=(6,5))
data.groupby('Ship Mode').Sales.sum().sort_values(ascending = False).plot.barh(width=.5)

# removing structural elements (axes, ticks..) to improve readability, reduce distractions and to focus on analysing data

ax.xaxis.tick_top() # placing x ticks to the top
ax.tick_params(left=False) # removing ticks in the left side
plt.ylabel('')# removing y label
for location in ['left', 'right', 'bottom', 'top']: 
     ax.spines[location].set_visible(False) # removing all spines
        
plt.title('Ship Modes and Sale in million $',size = 17,weight = 'bold')

In [None]:
fig, ax = plt.subplots(figsize=(6,5))
data.groupby('Ship Mode').Profit.sum().sort_values(ascending = False).plot.barh(width=.5)
ax.xaxis.tick_top()
ax.tick_params(left=False)
plt.ylabel('')# removing y label
plt.title('Ship Modes and Profit',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)

In [None]:
fig, ax = plt.subplots(figsize=(4,5))
data.groupby('Ship Mode').Quantity.sum().sort_values(ascending = False).plot.barh(width=.5)
ax.xaxis.tick_top()
ax.tick_params(left=False)
plt.ylabel('')# removing y label
plt.title('Ship Modes and Quanitity',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)

In [None]:
fig, ax = plt.subplots(figsize=(4,5))
data.groupby('Ship Mode').Discount.mean().sort_values(ascending = False).plot.barh(width=.5)
ax.xaxis.tick_top()
ax.tick_params(left=False)
plt.ylabel('')# removing y label
plt.title('Ship Modes and Average Discount',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)

- Standard Class is more profitable eventhough on average first class ship mode gets more discont.

<a id="section-four"></a>
# Analyzing Segments

- Thee are three different Segments
1. Consumer
2. Corporate
3. Home Office

In [None]:
fig, ax = plt.subplots(figsize=(4,5))
data.groupby('Segment').Sales.sum().plot.barh(width=.4)
ax.xaxis.tick_top()
ax.tick_params(left=False)
plt.ylabel('')# removing y label
plt.title('Segments and Total Sales in million $',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)

In [None]:
fig, ax = plt.subplots(figsize=(4,5))
data.groupby('Segment').Profit.sum().sort_values(ascending = False).plot.barh(width=.4)
ax.xaxis.tick_top()
plt.ylabel('')# removing y label
ax.tick_params(left=False)
plt.title('Segments and Total Profit in $',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)

In [None]:
fig, ax = plt.subplots()
data.groupby('Segment').Quantity.sum().sort_values(ascending = False).plot.barh(width=.5)
ax.xaxis.tick_top()
plt.ylabel('')# removing y label
ax.tick_params(left=False)
plt.title('Segments and Total Quantity',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)

In [None]:
fig, ax = plt.subplots()
data.groupby('Segment').Discount.mean().sort_values(ascending = False).plot.barh(width=.5)
ax.xaxis.tick_top()
plt.ylabel('')# removing y label
ax.tick_params(left=False)
plt.title('Segments and Average Discount',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)

- Consumer Segment is more profitable but on averge Corporate Segment gets same discount as Consumer

<a id="section-five"></a>
# Analyzing Categories

- There are three different Categories
1. Technology 
2. Furniture
3. Office Supplies

In [None]:
fig, ax = plt.subplots()
data.groupby('Category').Sales.sum().sort_values(ascending = False).plot.barh(width=.5)
ax.xaxis.tick_top()
ax.tick_params(left=False)
plt.ylabel('')# removing y label
plt.title('Categories and Total Sale in $',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)

In [None]:
fig, ax = plt.subplots()
data.groupby('Category').Profit.sum().sort_values(ascending = False).plot.barh(width=.5)
ax.xaxis.tick_top()
ax.tick_params(left=False)
plt.ylabel('')# removing y label
plt.title('Categories and Total Profit',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)

In [None]:
fig, ax = plt.subplots()
Category = ['Furniture','Office Supplies','Technology']
ypos = np.arange(len(Category))
plt.yticks(ypos, Category)
plt.barh(ypos-.2, data.groupby('Category').Profit.sum(), height=.4, label = 'Profit')
plt.barh(ypos+.2, data.groupby('Category').Sales.sum(), height=.4, label = 'Sale')
plt.legend()
ax.xaxis.tick_top()
ax.tick_params(left=False)
plt.ylabel('')# removing y label
plt.title('Profit and Sale of Categories ',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)

In [None]:
fig, ax = plt.subplots()
data.groupby('Category').Quantity.sum().sort_values(ascending = False).plot.barh(width=.5)
ax.xaxis.tick_top()
ax.tick_params(left=False)
plt.title('Categories and Total Quantity',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)
plt.ylabel('')

In [None]:
fig, ax = plt.subplots()
data.groupby('Category').Discount.mean().sort_values(ascending = False).plot.barh(width=.5)
ax.xaxis.tick_top()
ax.tick_params(left=False)
plt.title('Categories and Average Discount',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)
plt.ylabel('')

- Technology Category has highest sale and profit.
- Eventhough Furniture Category has more sales than Office Supplies, Office Supplies produces significantly large profit than Furniture.
- Total Quantity of Office Supplies is the highest and Technology is the lowest, but Technology is more profitable.
- Furniture gets more discount on average followed by Office Supplies and Technology but, distribution of profit is in the reverse order

<a id="section-six"></a>
# Analyzing Sub Categories

- There are 17 different Sub-Categories available.
- They are

In [None]:
data['Sub-Category'].unique()

In [None]:
fig, ax = plt.subplots(figsize=(6,8))
data.groupby('Sub-Category').Sales.sum().sort_values(ascending = False).plot.barh()
ax.xaxis.tick_top()
ax.tick_params(left=False)
ax.axvline(x=150000, ymin=0.045, c='grey', alpha=0.5) # vertical line at 150000
plt.ylabel('')# removing y label
plt.title('Sub Categories and Total sales in $',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)

In [None]:
fig, ax = plt.subplots(figsize=(6,8))
data.groupby('Sub-Category').Profit.sum().sort_values(ascending = False).plot.barh()
ax.xaxis.tick_top()
ax.tick_params(left=False)
#ax.axvline(x=150000, ymin=0.045, c='grey', alpha=0.5)
plt.ylabel('')# removing y label
plt.title('Sub Categories and net Profit in $',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)
ax.axvline(x=0, ymin=0.045, c='grey', alpha=0.5)

## Total loss made by each sub categories

In [None]:
data[data['Profit'] < 0].groupby('Sub-Category').sum().sort_values(by='Profit')['Profit']

## Sub Category items with Maximum Average Discount

In [None]:
data.groupby('Sub-Category').Discount.mean().sort_values(ascending = False).head(10)

- On average, Binders gets highest discont followed by Machines, Tables and Bookcases.
- Tables, Bookcases, Supplies are not profitable.
- Eventhough Binders Sub-Category made 38510.4964$ loss, it is net profitable. 
- Eventhough Total sale of Copiers are around average, they are the most profitable Sub-Category.

<a id="section-seven"></a>
# Analyzing Discounts

In [None]:
fig, ax = plt.subplots(figsize=(4,7))
data.groupby('Discount').Quantity.sum().sort_values(ascending = False).plot.barh(width=.4)
ax.xaxis.tick_top()
ax.tick_params(left=False)
plt.title('Discounts and Total Quantity',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)
plt.ylabel('')

In [None]:
fig, ax = plt.subplots(figsize=(4,7))
data.groupby('Discount').Sales.sum().sort_values(ascending = False).plot.barh(width=.4)
ax.xaxis.tick_top()
ax.tick_params(left=False)
plt.title('Discounts and Total Sales',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)
plt.ylabel('')

In [None]:
fig, ax = plt.subplots(figsize=(4,7))
data.groupby('Discount').Profit.sum().sort_values(ascending = False).plot.barh(width=.4)
ax.xaxis.tick_top()
ax.tick_params(left=False)
plt.title('Discounts and Total Profit',size = 17,weight = 'bold')
for location in ['left', 'right', 'bottom', 'top']:
     ax.spines[location].set_visible(False)
ax.axvline(x=0, ymin=0.045, c='grey', alpha=0.5)
plt.ylabel('')

## Products with Maximum Average Discount

In [None]:
data.groupby('Product Name').Discount.mean().sort_values(ascending = False).head(10)

## Products with highest Discount

In [None]:
data.groupby('Product Name').Discount.max().sort_values(ascending = False).head(10)

- Most products received 0 or 0.2 discount.
- Products with Discounts more than 0.2 are not profitable.
- Many products received upto 0.8 dicount.

<a id="section-eight"></a>
# Analyzing Products

## Top Ten Profitable Products

In [None]:
data.groupby('Product Name').Profit.sum().sort_values(ascending = False).head(10)

## Top Ten less Profitable Products

In [None]:
data.groupby('Product Name').Profit.sum().sort_values().head(10)

## Total Loss made by each Products

In [None]:
data[data['Profit'] < 0].groupby('Product Name').sum().sort_values(by='Profit')['Profit'].head(10) 

<a id="section-nine"></a>
# Analyzing Customers

## Top Ten Customers by Sales

In [None]:
data.groupby('Customer Name').Sales.sum().sort_values(ascending=False).head(10)

## Top Ten Customers by Profit

In [None]:
data.groupby('Customer Name').Profit.sum().sort_values(ascending=False).head(10)

<a id="section-ten"></a>
# Conclusion

- Standard Class Ship Mode is more profitable eventhough on average first class ship mode gets more discont.

- Consumer Segment is more profitable but on averge Corporate Segment gets same discount as Consumer

- Technology Category has highest sale and profit.
- Eventhough Furniture Category has more sales than Office Supplies, Office Supplies produces significantly large profit than Furniture.
- Total Quantity of Office Supplies is the highest and Technology is the lowest, but Technology is more profitable.
- Furniture gets more discount on average followed by Office Supplies and Technology but, distribution of profit is in the reverse order

- On average, Binders gets highest discont followed by Machines, Tables and Bookcases.
- Tables, Bookcases, Supplies are not profitable.
- Eventhough Binders Sub-Category made 38510.4964$ loss, it is net profitable. 
- Eventhough Total sale of Copiers are around average, they are the most profitable Sub-Category.

- Most products received 0 or 0.2 discount.
- Products with Discounts more than 0.2 are not profitable.
- Many products received upto 0.8 dicount.

