# **THE SPARKS FOUNDATION**
# Task 3 - Exploratory Data Analysis - Retail

# Objective:
* **Perform ‘Exploratory Data Analysis’ on dataset ‘SampleSuperstore’**
* **As a business manager, try to find out the weak areas where you can work to make more profit.**
* **What all business problems you can derive by exploring the data?**

# Data Source : -

[https://www.kaggle.com/ravichandra498/samplesuperstore](http://)

# Author - Ravi Chandra

In [None]:
#importing all the libraries 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
mydataset=pd.read_csv("../input/samplesuperstore/SampleSuperstore.csv")
mydataset.head()

In [None]:
mydataset.info() #getting information of dataset

# **2. Data Cleaning**
**After completing the Data Sourcing, the next step in the process of EDA is Data Cleaning. It is very important to get rid of the irregularities and clean the data after sourcing it into our system. Irregularities are of different types of data.**
* **Missing Values**
* **Incorrect Format**
* **Incorrect Headers**
* **Anomalies/Outliers**

In [None]:
mydataset.isnull().sum() #to check null/missing values in dataset

In [None]:
mydataset.columns # coloumns name

In [None]:
mydataset.shape #shape of dataset

In [None]:
mydataset.nunique() #return distinct entries in particular coloumns

In [None]:
#drop unwanted rows from dataset
dt=mydataset.drop(['Postal Code'],axis=1)
dt.head()

# **3. Exploratory Data Analysis**

In [None]:
dt.corr()

# Heatmap for correlation

In [None]:
sns.heatmap(dt.corr(),cmap='rocket_r',annot=True)

* From above Heatmap: 
  * Sales and Profit are Moderately Correlated. 
  * Discount and Profit are Negatively Correlated.
  * Quantity and Profit are less Moderately less Correlated.

# Countplot for each coloumn

In [None]:
fig,axs=plt.subplots(nrows=2,ncols=2,figsize=(10,7));

sns.countplot(dt['Category'],ax=axs[0][0])
sns.countplot(dt['Segment'],ax=axs[0][1])
sns.countplot(dt['Ship Mode'],ax=axs[1][0])
sns.countplot(dt['Region'],ax=axs[1][1])
axs[0][0].set_title('Category',fontsize=20)
axs[0][1].set_title('Segment',fontsize=20)
axs[1][0].set_title('Ship Mode',fontsize=20)
axs[1][1].set_title('Region',fontsize=20)


plt.tight_layout()

In [None]:
plt.figure(figsize=(20,8))
sns.countplot(dt['Sub-Category'])
plt.title('Sub-Category',fontsize=20)

In [None]:
plt.figure(figsize=(18,5))
sns.countplot(dt['State'])
plt.xticks(rotation=90)
plt.title('State',fontsize=20)

In [None]:
plt.figure(figsize=(18,5))
sns.countplot(dt['Quantity'])
plt.title('Quantity',fontsize=20)

In [None]:
plt.figure(figsize=(18,5))
sns.countplot(dt['Discount'])
plt.xticks(rotation=90)
plt.title('Discount',fontsize=20)

# **Statewise Analysis**

In [None]:
dt1=dt['State'].value_counts()
dt1

In [None]:
dt1.plot(kind='bar',figsize=(15,5))
plt.ylabel('Frequency od deals/ Number of deals')
plt.xlabel('States')

plt.title('State Wise Dealings Bar Representation', fontsize = 20)
plt.show()

* Top 3 states where deals are Highest.
  * Califonia
  * New York
  * Texas
* **Wyoming: Lowest Number of deal**

In [None]:
dt1.mean()

# **Citywise Analysis**

In [None]:
dt2 = dt['City'].value_counts()
dt2=dt2.head(50)
dt2

In [None]:
dt2.plot(kind='bar',figsize=(15,5))
plt.ylabel('Frequency of deals / Number of deals')
plt.xlabel('City')

plt.title('City Wise Dealings bar representation', fontsize = 20)
plt.show()

* Top 3 city where deals are Highest. 
  * 1. New York City 
  * 2. Los Angeles 
  * 3. Philadelphia*

In [None]:
dt2.mean()

# **Statewise analysis of Profit Discount and Sales**

In [None]:
dt['State'].value_counts().head(10)

In [None]:
dt_state= dt.groupby(['State'])[['Sales', 'Discount', 'Profit']].mean()
dt_state.head(10)

# Statewise Profit Analysis

In [None]:
dt_state1=dt_state.sort_values('Profit')

dt_state1[['Profit']].plot(kind = 'bar', figsize = (15,4))
plt.title('State wise Profit Analysis', fontsize = 20)
plt.ylabel('Profit per Sate')
plt.xlabel('States')
plt.show()

* **Vermont**: Highest Profit 
* **Ohio**: Lowest Profit*

In [None]:
dt_state['Sales'].plot(kind='pie',
                        figsize = (20,20),
                        autopct='%1.1f%%',
                        startangle=90,     # start angle 90° (Africa)
                        shadow=True)
plt.title('State wise analysis of Sale',fontsize=20)

* Highest amount of sales= **Wyoming(11.8%)** 
* Lowest amount of sales= **South Dakota(0.8%)**

In [None]:
dt_state1['Discount'].plot(kind='bar',figsize=(18,5))
plt.title('State wise analysis of Discount', fontsize=20)

*Illinois state is at the top in terms of discount offering.*

# Citywise Analysis of the Profit

In [None]:
dt_city= dt.groupby(['City'])[['Sales', 'Discount', 'Profit']].mean()
dt_city = dt_city.sort_values('Profit')
dt_city.head()

In [None]:
dt_city['Profit'].head(30).plot(kind='bar',figsize=(15,5),color = 'Green')
plt.title('City wise analysis of Sale, Discount, profit')

In [None]:
#2. High Profit
dt_city['Profit'].tail(30).plot(kind='bar',figsize=(15,5),color = 'Green')
plt.title('City wise analysis of Sale, Discount, profit')

30 CITIES WHICH HAS PROFIT IN POSITIVE 30 CITIES WHICH HAS PROFIT IN NEGATIVE

# **Sub-Category wise Sales, Profit and Discount**

In [None]:
dt_sub_category = dt.groupby(['Sub-Category'])[['Sales', 'Quantity', 'Discount', 'Profit']].mean()
dt_sub_category.head(10)

In [None]:
plt.figure(figsize = (15,15))
plt.pie(dt_sub_category['Sales'], labels = dt_sub_category.index, autopct = '%1.1f%%')
plt.title('Sub-Category Wise Sales Analysis', fontsize = 20)
plt.legend()
plt.xticks(rotation = 90)
plt.show()

In [None]:
plt.figure(figsize = (15,15))
plt.pie(dt_sub_category['Discount'], labels = dt_sub_category.index, autopct = '%1.1f%%')
plt.title('Sub-Category Wise Discount Analysis', fontsize = 20)
plt.legend()
plt.xticks(rotation = 90)
plt.show()

# SHIP MODE WISE ANALYSIS

In [None]:
dt['Ship Mode'].value_counts()

In [None]:
dt_shipmode = dt.groupby(['Ship Mode'])[['Sales', 'Discount', 'Profit']].mean()

In [None]:
dt_shipmode.plot.pie(subplots=True,
                     figsize=(18, 20), 
                     autopct='%1.1f%%', 
                     labels = dt_shipmode.index)

*Profit and Discount is high in First Class Sales is high for Same day ship*

# RESULT AND CONCLUSION
* Profit is more than that of sale but there are some areas where profit could be increased.
* Profit and Discount is high in First Class
* State: **Vermont**: Highest Profit
* State: **Ohio**: Lowest Profit
* Here is top 3 city where deals are Highest.
  * New York City
  * Los Angeles
  * Philadelphia
* Sales and Profit are Moderately Correlated.
* Quantity and Profit are less Moderately Correlated.
* Discount and Profit are Negatively Correlated
* Here is top 3 state where deals are Highest.
  * Califonia
  * New York
  * Texas
* **Wyoming** : Lowest Number of deal,Highest amount of sales= Wyoming(11.8%)
* Lowest amount of sales= South Dakota(0.8%)**# 