## Blinkit Analysis 
 - Requirement Gathering/ Business Requirements
 - Data Walkthrough
 - Data Connection
 - Data cleaning/ Quality check
 - Data Modeling
 - Data Processing
 - Dax Calculations
 - Dashboard Lay outing
 - Charts Development and Formatting
 - Dashboard / Report Development
 - Insights Generation

## Business Requirement:
 - To conduct a comprehensive analysis of Blinkit's sales performance, customer satisfaction, and inventory distribution to identify key insights and opportunities for optimization using various KPIs and visualization in Power BI.

## KPI's Requirements:
 1. Total Sales: The overall revenue generated from all items sold.
 2. Average Sales: The average revenue per sale.
 3. Number of items: The total count of different items sold.
 4. Average Rating: The average customer rating for items sold.

## Chart's Requirement:
 1. **Total Sales by Fat Content:**
      - Objective: Analyze the impact of fat content on total sales.
      - Additional KPI Metrics: Asses how other KPIs (Average Sales, Number of Items, Average Rating) vary with fat content.
      - Chart Type: Donut Chart
 2. **Total Sales by Item Type:**
       - Objective: Idenify the performance of different item types in terms of total sales.
       - Additional KPI Metrics: Assess how other KPIs (Average Sales, Number Of Items, Average Rating) vary with fat content.
       - Chart Type: Bar Chart.  
 3. **Fat Content by Outlet for Total Sales:**
       - Objective: Compare Total sales across different outlets segmented by fat content.
       - Additional KPI Metrics: Assess how other KPIs (Average Sales, Number of Items, Average Rating) vary with fat content.
       - Chart Type: Stacked Column Chart.   
 4. **Total Sales by Outlet Establishment:**
       - Objective: Evaluate how the age ot type of outlet establishment influences total sales.
       - Chart Type: Line chart
 5. **Sales by Outlet Size:**
       - Objective: Analyze the correlation between outlet size and total sales.
       - Chart type: Donut/ Pie Chart
 6. **Sales by Outlet Location:**
       - Objective: Assess the geographic distribution of sales across different locations.
       - Chart Type: Funnel Map.
 7. **All Metrics by Outlet Type:**
       - Objective: Provide a comprehensive view of all key metrics (Total Sales, Average sales, Number of Items, Average Rating) broken down by different outlet types.
       - Chart Type: Matrix Card.

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.simplefilter('ignore')

In [5]:
df = pd.read_excel('BlinkIT Grocery Data.xlsx')
df

Unnamed: 0,Item Fat Content,Item Identifier,Item Type,Outlet Establishment Year,Outlet Identifier,Outlet Location Type,Outlet Size,Outlet Type,Item Visibility,Item Weight,Sales,Rating
0,Regular,FDX32,Fruits and Vegetables,2012,OUT049,Tier 1,Medium,Supermarket Type1,0.100014,15.10,145.4786,5.0
1,Low Fat,NCB42,Health and Hygiene,2022,OUT018,Tier 3,Medium,Supermarket Type2,0.008596,11.80,115.3492,5.0
2,Regular,FDR28,Frozen Foods,2016,OUT046,Tier 1,Small,Supermarket Type1,0.025896,13.85,165.0210,5.0
3,Regular,FDL50,Canned,2014,OUT013,Tier 3,High,Supermarket Type1,0.042278,12.15,126.5046,5.0
4,Low Fat,DRI25,Soft Drinks,2015,OUT045,Tier 2,Small,Supermarket Type1,0.033970,19.60,55.1614,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...
8518,low fat,NCT53,Health and Hygiene,2018,OUT027,Tier 3,Medium,Supermarket Type3,0.000000,,164.5526,4.0
8519,low fat,FDN09,Snack Foods,2018,OUT027,Tier 3,Medium,Supermarket Type3,0.034706,,241.6828,4.0
8520,low fat,DRE13,Soft Drinks,2018,OUT027,Tier 3,Medium,Supermarket Type3,0.027571,,86.6198,4.0
8521,reg,FDT50,Dairy,2018,OUT027,Tier 3,Medium,Supermarket Type3,0.107715,,97.8752,4.0


In [17]:
df.columns

Index(['Item Fat Content', 'Item Identifier', 'Item Type',
       'Outlet Establishment Year', 'Outlet Identifier',
       'Outlet Location Type', 'Outlet Size', 'Outlet Type', 'Item Visibility',
       'Item Weight', 'Sales', 'Rating'],
      dtype='object')

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8523 entries, 0 to 8522
Data columns (total 12 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Item Fat Content           8523 non-null   object 
 1   Item Identifier            8523 non-null   object 
 2   Item Type                  8523 non-null   object 
 3   Outlet Establishment Year  8523 non-null   int64  
 4   Outlet Identifier          8523 non-null   object 
 5   Outlet Location Type       8523 non-null   object 
 6   Outlet Size                8523 non-null   object 
 7   Outlet Type                8523 non-null   object 
 8   Item Visibility            8523 non-null   float64
 9   Item Weight                7060 non-null   float64
 10  Sales                      8523 non-null   float64
 11  Rating                     8523 non-null   float64
dtypes: float64(4), int64(1), object(7)
memory usage: 799.2+ KB


In [21]:
for col in df.describe(include='object').columns:
    print(col)
    print(df[col].unique())
    print('-'*50)

Item Fat Content
['Regular' 'Low Fat']
--------------------------------------------------
Item Identifier
['FDX32' 'NCB42' 'FDR28' ... 'FDU60' 'NCX53' 'FDE52']
--------------------------------------------------
Item Type
['Fruits and Vegetables' 'Health and Hygiene' 'Frozen Foods' 'Canned'
 'Soft Drinks' 'Household' 'Snack Foods' 'Meat' 'Breads' 'Hard Drinks'
 'Others' 'Dairy' 'Breakfast' 'Baking Goods' 'Seafood' 'Starchy Foods']
--------------------------------------------------
Outlet Identifier
['OUT049' 'OUT018' 'OUT046' 'OUT013' 'OUT045' 'OUT017' 'OUT010' 'OUT027'
 'OUT035' 'OUT019']
--------------------------------------------------
Outlet Location Type
['Tier 1' 'Tier 3' 'Tier 2']
--------------------------------------------------
Outlet Size
['Medium' 'Small' 'High']
--------------------------------------------------
Outlet Type
['Supermarket Type1' 'Supermarket Type2' 'Grocery Store'
 'Supermarket Type3']
--------------------------------------------------


In [19]:
df['Item Fat Content'].replace({'LF': 'Low Fat', 'low fat': 'Low Fat', 'reg': 'Regular'},inplace=True)

In [23]:
df.isnull().sum()

Item Fat Content                0
Item Identifier                 0
Item Type                       0
Outlet Establishment Year       0
Outlet Identifier               0
Outlet Location Type            0
Outlet Size                     0
Outlet Type                     0
Item Visibility                 0
Item Weight                  1463
Sales                           0
Rating                          0
dtype: int64

In [25]:
df.duplicated().sum()

0

In [27]:
df.to_excel('cleaned_data.xlsx',index=False)