### As a business manager, we need to find out the weak areas where we can work to make more profit.
### What all business problems we can derive by exploring the data?

## Importing Libraries

In [None]:
# importing the required libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
import warnings
warnings.filterwarnings('ignore')

## LOADING THE DATA

In [None]:
# importing csv file of data.

df = pd.read_csv('../input/tsf-datasets/SampleSuperstore.csv')

In [None]:
# To  read Data where we will get top 5 rows.

df.head()

## Shape of the dataset

In [None]:
# To check the structure of Dataset i.e the number of columns and rows. 

df.shape


## info

In [None]:
# To check the data type and if there is any NULL values or so.

df.info()

In [None]:
# To check the columns of the Dataset

df.columns

## Checking the unique values in columns

In [None]:
# To check the unique values in the Dataset

df.nunique()

**There isnt any Null values in the Dataset,We can now proceed for Statistical Analysis**.

## describe

In [None]:
# To check the aggregates of the Data Set 

df.describe()

In [None]:
# To find out  Sales and Profit generated by the Superstore

print('Sales:' ,df['Sales'].sum())
print('Profit:' ,df['Profit'].sum())

In [None]:
# To check whether we have any duplicacy in  Dataset or not 

df.duplicated().sum()

In [None]:
# Since we have 17 duplicate rows we need to drop them for further Analysis

df = df.drop_duplicates()

In [None]:
# To check the total number of rows and columns after removing Duplicates

df.shape

# REGIONAL ANALYSIS

In [None]:
# To check maximum transactions made regionwise

df.Region.value_counts().plot.pie(autopct="%.1f%%")
plt.show()

**This Pie-Chart depicts that maximum transanctions were made in WEST region followed by EAST and then we have Central at last SOUTH is at bottom**

Next we will check the what amount sales and profits are being made by each region

In [None]:
df.groupby('Region')['Sales','Profit'].sum().plot.bar()
plt.title('SALES AND PROFITS IN EACH REGION')
plt.legend()
plt.show

**Here Barplot shows that West Region has done maximum Sales and Profit is also highest in West Region followed by East Region,South Region is least performing Region**

# CUSTOMER ANALYSIS

In [None]:
# To check maximum Sales and Profit in each segment

df.groupby('Segment')['Sales','Profit'].sum().plot.bar(color=['lightgreen','yellow'])
plt.title('SALES AND PROFIT IN EACH SEGMENT')
plt.legend()
plt.show()


**So the graph presents that Consumer segment is the one which has maximum buying capacity**
**Also they give maximum profit to Superstore whereas Home Office purchases less and add less profit to business**

Now we will check Ship Mode Segment wise

In [None]:
# To check this we will use countplot 

sns.countplot(x='Segment' , hue='Ship Mode' , data=df)
plt.show()

**In each segment most of the transaction has been shipped under Standard Class**

In [None]:
#histogram plot.
df.hist(bins=50 ,figsize=(20,15))
plt.show()

In [None]:
# pairplot plot a pairwise relationships in a dataset
# creates a grid of Axes such that each variable in data will by shared in the y-axis across a single row
# and in the x-axis across a single column.
figsize=(30,30)
sns.pairplot(df,hue='Sub-Category')

# PRODUCT ANALYSIS

In [None]:
# To check profit and sales Product wise

In [None]:
df.groupby('Category')['Sales','Profit'].sum().plot.bar()
plt.title('PROFIT AND SALES CATEGORY WISE')
plt.legend(loc = 1)
plt.show()

**This Bar Plot shows that Technology has given maximum sales subsequently Profit was also maximum.
But not following this trend Furniture also had sales at great amount following with least amount of profit.**

In [None]:
df[df['Category'] == 'Furniture'].groupby('Sub-Category')['Sales','Profit'].sum().plot.bar(color = ['black','red'])
plt.title('SALES AND PROFIT FURNITURE CATEGORY WISE ')
plt.legend(loc = 1)
plt.show()

**So we have sub categories of Furniture which are Bookcases,Chairs,Furnishings and Tables.
With this Bar Plot we can conclude that irrespective of high sales in Tables and Bookcases the store is incurring loss.
This loss is affecting the whole of Furniture Category**

Now we need to check irrespective of high sales why are we incurring loss?

In [None]:
# To check the probable reason of loss 

df[df['Category'] == 'Furniture'].groupby('Sub-Category')['Discount'].mean().plot.bar(color =['green'])
plt.title('DISCOUNT GIVEN IN FURNITURE CATEGORY')
plt.legend(loc = 0)
plt.show()

**We concluded that despite of maximum Discount in Tables and Bookcases the store in incurring losses.**

Since we are having sales at max and Discount is also given,now we will check Correlation in between  the two. 

## Correlation

In [None]:
df.corr()

In [None]:
# To check the correlation in between Sales and Discount

sns.heatmap(df.corr(),annot=True , cmap= 'Blues')
plt.show()

**From above Heatmap we concluded there is a Negative correlation in between Profit and Discount whereas a Positive correlation between Profit and  Sales**

In [None]:
plt.style.use('seaborn')
df.plot(kind = 'scatter', figsize = (10,5) , x = 'Sales', y='Profit', c = 'Discount' , s = 20 , fontsize = 16 , colormap = 'plasma')
plt.ylabel('TOTAL PROFITS', fontsize = 16)
plt.title('DEPENDENCY OF SALES AND PROFIT ON DISCOUNT' , fontsize = 16)
plt.show()

**The above Scatterplot depicts that less the discount more is the Profits**
**Discount is effecting profit to a certain extent and after that point Profits has no relation with Discount**

# TOP PRODUCTS

In [None]:
# Now we will check the Top Products Sold

df.groupby('Sub-Category')['Sales'].sum().sort_values(ascending=False).plot.bar(color = 'pink')
plt.show()

**With this we concluded  that Phones,Chairs ,Storage,Tables and Binders are being sold at max consecutively.
Whereas Fasteners,Labels and Envelopes were sold  the least**

In [None]:
#To check the profit earned in all the  Sub-Categories

df.groupby('Sub-Category')['Profit'].sum().sort_values(ascending = False).plot.bar(color = 'brown')
plt.show()

**Here we saw Copiers ,Phones,Accessories are top profit giving products to the store. 
whereas Store is incurring losses due to  Tables ,Bookcases and suppliers.**

## OBSERVATIONS:

* MAXIMUM TRANSACTIONS were made in WEST REGION
* MAXIMUM SALES in WEST REGION
* MAXIMUM PROFITS in WEST REGION
* MAXIMUM SALES AND PROFIT in CONSUMER SEGMENT
* MAXIMUM TRANSACTIONS  were shipped in STANDARD CLASS irrespective of  SEGMENT
* LEAST PROFIT is incurred in FURNITURE CATEGORY irrespective of good amount of Sales
* Under FURNITURE, TABLES and BOOKCASES are INCURRING LOSSES which is effecting the TOTAL PROFIT of Furniture Category
* HIGH DISCOUNT is being offered in TABLES and BOOKCASES which is somewhere the probable reason of losses.
* POSITIVE CORRELATION:Profit and Sales
* NEGATIVE CORRELATION:Profit and Discount
* LESS the DISCOUNT ,MORE the PROFIT and vice-versa

    

## CONCLUSION:

**From Above Observation we conclude that FURNITURE CATEGORY is the WEAK AREA where we need to work upon.**
**As in Furniture we have TABLES and BOOKCASES where due to HIGH DISCOUNT offered we are incurring  LOSSES.** 
**So we need to REDUCE the DISCOUNT in order to INCREASE the PROFIT.**