# <font color=blue><center><b>DATA ANALYSIS AND VISUALIZATION OF </center><br><center>BIGMART SALES DATA</b></center></font>

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcRzdq1Fhc2TqNxCQSorwpLlNnDFq2SW-KNWqg&usqp=CAU" width="800" height="400">

# Problem Statement:
The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. Also, certain attributes of each product and store have been defined. The aim is to  find out the sales of each product at a particular store. Using this BigMart will try to understand the properties of products and stores which play a key role in increasing sales.

# <font color="red"><center>DATA ANALYSIS</center></font>

# Importing Libraries

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Reading the data

In [None]:
data = pd.read_csv("../input/bigmartsales/bigmartsales.csv")
data.head()  #first few rows 

In [None]:
data.tail()#Last few rows

# Dimension of the data

In [None]:
data.shape

# The data is of the type float,object and int

In [None]:
data.info()

# Columns in the data set

<b>Columns Description:</b><br><br>
<b>Item_Identifier :</b> Unique product ID

<b>Item_Weight:</b>  Weight of product

<b>Item_Fat_Content :</b> Whether the product is low fat or not

<b>Item_Visibility :</b> The % of total display area of all products in a store allocated to the particular product

<b>Item_Type :</b> The category to which the product belongs

<b>Item_MRP :</b> Maximum Retail Price (list price) of the product

<b>Outlet_Identifier :</b> Unique store ID

<b>Outlet_Establishment_Year :</b> The year in which store was established

<b>Outlet_Size :</b> The size of the store in terms of ground area covered

<b>Outlet_Location_Type :</b> The type of city in which the store is located

<b>Outlet_Type :</b> Whether the outlet is just a grocery store or some sort of supermarket

<b>Item_Outlet_Sales :</b> Sales of the product in the particular store.

In [None]:
print(data.columns)

# The describe method 

In [None]:
data.describe() #shows basic statistical characteristics of each numerical feature (int64 and float64 types)

# The average price of an item is 142.7 and max price is 266.9 and min price is 31.3

In [None]:
data.describe(include=['object', 'float']) 

# Unique method

In [None]:
data['Item_Fat_Content'].unique()

In [None]:
data['Item_Type'].unique()

# Missing Values

In [None]:
data.isna().sum()


<b>Observations:</b><br>
There are 749 missing values in Item_Weight column<br>
There are 2410 missing values in Outlet_Size column<br>
There are 2050 missing values in Outlet_Location_Type column

# Filling missing values

In [None]:
data['Item_Weight'].fillna(data['Item_Weight'].mean(),inplace = True)#filling null values with mean value

data['Outlet_Location_Type'].fillna(method="ffill",inplace = True)

map1 = {"Small":1,"Medium":2,"High":3}
data["Outlet_Size"] = data["Outlet_Size"].map(map1)
data["Outlet_Size"] = data["Outlet_Size"].fillna(data["Outlet_Size"].median())
data

In [None]:
data.isna().sum()

# Indexing by Name

In [None]:
data.loc[0:3, 'Item_Identifier':'Item_Type']

# Indexing by Number

In [None]:
data.iloc[0:3, 0:13]

In [None]:
data[-1:]

# Apply method

In [None]:
import numpy as np
data.apply(np.max) 

# <font color="RED"><center> Visualization</center></font>

# <font color='green'>A.Univariate Analysis:</font>

# <font color="purple">1) Item fat content</font>

In [None]:
replace={'LF':'Low Fat','reg':'Regular','low fat':'Low Fat'}
data.Item_Fat_Content.replace(replace, inplace=True)
print(data.Item_Fat_Content.value_counts())

In [None]:
plt.rcParams['figure.figsize']=(5,5)
plt.bar(['Low Fat','Regular'],data.Item_Fat_Content.value_counts(),width=0.5,color=['blue', 'cyan'],edgecolor='yellow')

<b>**Observations:**</b><br>
-The Items are mostly of Low Fat.
    

# <font color="purple">2) Item Type</font>

In [None]:
plt.figure(figsize=(25,7))
sns.countplot('Item_Type',data=data,palette='spring')

<b>Observations:</b><br>
-Fruits and Vegetables are largely sold as people tend to use them on daily purpose.<br>
-Snack Foods too have good sales.

# <font color="purple">3) Outlet Size</font>

In [None]:
plt.figure(figsize=(8,5))
sns.countplot('Outlet_Size',data=data,palette='Purples')

 <b>Observations:</b><br>
-The Outlets are more of Medium Size

# <font color="purple">4) Outlet Type</font>

In [None]:
plt.figure(figsize=(8,5))
sns.countplot('Outlet_Type',data=data,palette='autumn')

<b>Observations:</b><br>
-The Outlets are mostly of Supermarket Type1.

# <font color='green'>B.Bivariate Analysis</font>

# <font color="purple">5) Impact of Item Fat Content on Item outlet sales</font>

In [None]:
plt.figure(figsize=(8,5))
sns.barplot('Item_Fat_Content','Item_Outlet_Sales',data=data,palette='winter')

<b>Observations:</b><br>
The Item Outles sales are high for both Low Fat and Regular Item types. 

# <font color="purple"> 6) Impact of Item type on Outlet Sales</font>

In [None]:
df3=data.groupby(by='Item_Type').sum()
df2=df3['Item_Outlet_Sales'].sort_values(ascending=False)
plt.rcParams['font.size'] = 10
plt.pie(df2, autopct = '%0.1f%%', radius = 2.0, labels = ['Fruits and Vegetables', 'Snack Foods','Household ','Frozen Foods','Dairy ', 'Canned','Baking Goods','Health and Hygiene','Meat', 'Soft Drinks','Breads','Hard Drinks','Starchy Foods', 'Others','Breakfast','Seafood'],
      explode = [0.2,0.2,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0],colors=['#ff6666', '#ffcc99', '#99ff99', '#66b3ff'])
plt.show()


<b>Observations:</b><br>
Fruits and Vegetables generated most Sales where as Seafood generated the least sales accross all stores

# <font color="purple">7) Impact of Outlets on Sales</font>

In [None]:
type1=data.groupby(['Outlet_Identifier']).sum()
#type1.sort_values(by='Item_Outlet_Sales',ascending=False)
type1

In [None]:
plt.rcParams['figure.figsize']=(10,6)
a=['OUT010','OUT013','OUT017','OUT018','OUT019','OUT027','OUT035','OUT045','OUT046','OUT049']
plt.bar(a,type1.Item_Outlet_Sales,color='gold',width=0.6)
plt.xlabel('Outlet_Store_ID')
plt.ylabel('Sales')
plt.title('Outlet vs Sales')

<b>Observations:</b><br>
Most Sales was generated at Outlet 'OUT027'

# <font color="purple">8) Impact of Outlet Type on Outlet Sales</font>

In [None]:

plt.figure(figsize=(8,5))
sns.barplot(x='Outlet_Type',y='Item_Outlet_Sales',data=data,palette='Oranges_r')


<b>Observations:</b><br>
Supermarket Type 3 has the heighest Sales

# <font color="purple">9) Impact of Outlet Size on Outlet Sales</font>

In [None]:
plt.figure(figsize=(8,5))
sns.barplot(x='Outlet_Size',y='Item_Outlet_Sales',data=data,palette='winter')

<b>Observations:</b><br>
The Outlet Sales is maximum for Medium and High sized Outlets.<br>
High size Outlets can improve the Outlet Sales.


# <font color="purple">10) Item type vs Item MRP</font>

In [None]:
plt.rcParams['figure.figsize'] = 25,5
chart=sns.boxplot(x="Item_Type",y="Item_MRP",data=data,palette='husl')
chart.set_xticklabels(chart.get_xticklabels(), rotation=45,horizontalalignment='right', fontweight='light',fontsize='x-large')
plt.show()

<b>Observations:</b><br>
Dairy and Starchy Foods have the Highest Mrps(200-250 dollars).

# <font color="purple">11) Impact of Outlet Establishment Year on Sales</font>

In [None]:
plt.figure(figsize=(10,8))
sns.boxplot('Outlet_Establishment_Year','Item_Outlet_Sales',data=data,palette="Paired")
plt.show()

<b>Observations:</b><br>
The sales reported by the older stores is higher than the relatively newer stores (except for the 1998 established store)

# <font color="purple">12) Impact of Item MRP on Outlet Sales</font>

In [None]:

sns.scatterplot(x = data['Item_MRP'],y=data['Item_Outlet_Sales'],edgecolor ="purple", )

<b>Observations:</b><br>
Item MRP vs Item Sales ,Higher MRP products have higher sales

# Key Observations:
1.The Items are mostly of Low Fat.<br>
2.Fruits and Vegetables are largely sold as people tend to use them on daily purpose.<br>
3.Snack Foods too have good sales.<br>
4.Most Sales was generated at Outlet 'OUT027'.<br>
5.Supermarket Type 3 has the heighest Sales.
6.The Outlet Sales is maximum for Medium and High sized Outlets.<br>
7.High size Outlets can improve the Outlet Sales.<br>
8.Dairy and Starchy Foods have the Highest Mrps(200-250 dollars).<br>
9.The sales reported by the older stores is higher than the relatively newer stores (except for the 1998 established store).<br>
10.Higher MRP products have higher sales.