<a href="https://colab.research.google.com/github/quantumhome/DataAnalysisCaseStudy/blob/master/31stMay_BigBasket_Dev.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Big Basket 🧺**
  **Forget the days of grocery shopping being a chore! Imagine this: you're lounging on the couch, phone in hand, and with a few taps you've got a truckload (well, maybe a basketful) of fresh produce, pantry staples, and even household essentials on their way to your doorstep. That's the magic of bigbasket, India's one stop grocery shopping destination.**

  **They've got over 20,000 products from all your favorite brands, so you can stock up on everything you need without ever leaving home. Fruits and veggies? Got it. Dairy and meat for that epic dinner party? No problem. Bigbasket even has beauty supplies and cleaning products, so you can basically tackle your entire shopping list in one place. Plus, they have crazy convenient delivery options, so you can ditch the supermarket lines and spend that time doing way cooler things (like prepping for that dinner party!). Bigbasket basically makes grocery shopping a breeze, so you can get back to the fun stuff.**

<hr>

# **About the dataset 📊**

**This dataset is basically a big ol' bunch of info about products, all broken down into 10 easy-peasy pieces:**

  * **Index: This is just a fancy way of saying it's a unique ID for each item, like a fingerprint in the data world.**
  * **Product: The name of the product, just like you'd see it on the website.**
  * **Category: The broad group the product falls into, like groceries or home stuff.**
  * **Sub-Category: This is like zooming in on the category. So, maybe "groceries" becomes "fruits" or "home stuff" becomes "cleaning supplies."**
  * **Brand: Who makes the product? You know, like Nike or that yummy jam brand you love.**
  * **Sale Price: How much you gotta pay for it right now.**
  * **Market Price: This is kind of like a reference point, showing the usual price for the product.**
  * **Type: Another way to classify the product, just for extra organization.**
  * **Rating: What other customers think! This is a number showing how much people liked it.**
  * **Description: This is where they tell you all the juicy details about the dataset itself, what it includes and how it's put together**

**Dataset Link: https://drive.google.com/file/d/1aEuXxadTlHS4d_BBqrhVVurOFI154ATS/view?usp=drive_link**

# **Step 1 - Loading the libraries**

##### **Configuration Libraries**

In [None]:
import warnings
warnings.filterwarnings("ignore")

##### **Classical Libraries**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt

##### **External Libraries**

In [None]:
!pip install colorama
import colorama
from colorama import Fore, Back, Style

Collecting colorama
  Downloading colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Installing collected packages: colorama
Successfully installed colorama-0.4.6


# **Step 2 - Data Ingestion**

### **Data Loading**

In [None]:
df = pd.read_csv("/content/drive/MyDrive/Datasets/BBData.csv")

### **Data Inspection**

In [None]:
df.head().style.set_properties(
    **{
        "background-color": "#FF9B49",
        "color": "black",
        "border-color": "black",
        "border-style": "solid"
    }
)

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220.0,220.0,Hair Oil & Serum,4.1,"This Product contains Garlic Oil that is known to help proper digestion, maintain proper cholesterol levels, support cardiovascular and also build immunity. For Beauty tips, tricks & more visit https://bigbasket.blog/"
1,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180.0,180.0,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), refrigerator safe, dishwasher safe and can also be used for re-heating food and not for cooking. All containers come with airtight lids and a wide variety of attractive colours. Stack these stylish and colourful containers in your kitchen with ease and for a look-good factor."
2,3,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119.0,250.0,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your mother, sister, in-laws, boss or your friends, this beautiful designer piece wherever placed, is sure to beautify the surroundings Traditional design This type diya has been used for Diwali and All other Festivals for centuries. Sturdy and easy to carry The feet keep it balanced to ensure safety. Wonderful Oil Lamp made in Brass also called as Jyoti. This is a handcrafted piece of Indian brass Deepak."
3,4,Cereal Flip Lid Container/Storage Jar - Assorted Colour,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149.0,176.0,"Laundry, Storage Baskets",3.7,"Multipurpose container with an attractive design and made from food-grade plastic for your hygiene and safety ideal for storing pulses. Grains, spices, and more with easy opening and closing flip-open lid. Strong, durable and transparent body for longevity and easy identification of contents. Multipurpose storage solution for your daily needs stores your everyday food essentials in style with the Nakoda container set. With transparent bodies, you can easily identify your stored items without having to open the lids. These containers are ideal for storing a large variety of items such as food grains, snacks and pulses to sugar, spices, condiments and more. Featuring unique flip-open lids, you can easily open and close this container without any hassles. The Nakoda container is made from high-quality food-grade and BPA-free plastic that is 100% safe for storing food items. You can safely store your food items in this container without worrying about contamination and harmful toxins. As they are constructed using highly durable virgin plastic, this container will last for a long time even with regular use. This container can enhance the overall look of your kitchen decor. Being dishwasher safe, cleaning and maintaining this container is an easy task. You can also use a simple soap solution to manually wash and retain their looks for a long time."
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.0,162.0,Bathing Bars & Soaps,4.4,"Nivea Creme Soft Soap gives your skin the best care that it must get. The soft bar consists of Vitamins F and Almonds which are really skin gracious and help you get great skin. It provides the skin with moisture and leaves behind flawless and smooth skin. It makes sure that your body is totally free of germs & dirt and at the same time well nourished.For Beauty tips, tricks & more visit https://bigbasket.blog/"


# **Step 3 - Cleaning and Preprocessing Data**

### **Null Check**

In [None]:
df.isnull().sum()

Unnamed: 0,0
index,0
product,1
category,0
sub_category,0
brand,1
sale_price,0
market_price,0
type,0
rating,8626
description,115


**Assumption**
* **The data values that are missing in the ratings column, we are assuming that either they are new to inventory or they are lowest on sales**

* **Ratings: 0, Description: NotFound**

**Working with the null values**

In [None]:
df["product"] = df["product"].fillna("NoProductNameFound")
df["brand"] = df["brand"].fillna("NoBrandNameFound")
df["rating"] = df["rating"].fillna(0)
df["description"] = df["description"].fillna("NoDescriptionFound")

**Rounding off the data**

In [None]:
# Rounding off the sales price for easy understanding
df["sale_price"] = df["sale_price"].round().astype(int)

# Rounding off the sales price for easy understanding
df["market_price"] = df["market_price"].round().astype(int)

**Data Inspection**

In [None]:
df.head()

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220,220,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...
1,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180,180,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ..."
2,3,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119,250,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your m..."
3,4,Cereal Flip Lid Container/Storage Jar - Assort...,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149,176,"Laundry, Storage Baskets",3.7,Multipurpose container with an attractive desi...
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162,162,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...


# **Step 4 - Exploratory Data Analysis (EDA)**

### **Task 1 - Find out the discounts based on the market price we are providing?**

In [None]:
df["finalized_discounts"] = (((df["market_price"] - df["sale_price"]) / df["market_price"])*100)

In [None]:
df.head(10)

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description,finalized_discounts
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220,220,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...,0.0
1,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180,180,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ...",0.0
2,3,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119,250,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your m...",52.4
3,4,Cereal Flip Lid Container/Storage Jar - Assort...,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149,176,"Laundry, Storage Baskets",3.7,Multipurpose container with an attractive desi...,15.340909
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162,162,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...,0.0
5,6,Germ - Removal Multipurpose Wipes,Cleaning & Household,All Purpose Cleaners,Nature Protect,169,199,Disinfectant Spray & Cleaners,3.3,Stay protected from contamination with Multipu...,15.075377
6,7,Multani Mati,Beauty & Hygiene,Skin Care,Satinance,58,58,Face Care,3.6,Satinance multani matti is an excellent skin t...,0.0
7,8,Hand Sanitizer - 70% Alcohol Base,Beauty & Hygiene,Bath & Hand Wash,Bionova,250,250,Hand Wash & Sanitizers,4.0,70%Alcohol based is gentle of hand leaves skin...,0.0
8,9,Biotin & Collagen Volumizing Hair Shampoo + Bi...,Beauty & Hygiene,Hair Care,StBotanica,1098,1098,Shampoo & Conditioner,3.5,"An exclusive blend with Vitamin B7 Biotin, Hyd...",0.0
9,10,"Scrub Pad - Anti- Bacterial, Regular",Cleaning & Household,"Mops, Brushes & Scrubs",Scotch brite,20,20,"Utensil Scrub-Pad, Glove",4.3,Scotch Brite Anti- Bacterial Scrub Pad thoroug...,0.0


**As seen, we can get a idea that the products that are not quite bought are majorly having discouts. On the other hand, the products that are quite sold or are regularly are not having any discounts**

#### **Get an overview of the entire data**

In [None]:
print(Back.BLACK + Style.BRIGHT + "Summary of Products" + Style.RESET_ALL)
print(Fore.RED + "Total Number of unique products:" + Style.RESET_ALL + Fore.YELLOW + str(df["product"].nunique()) + Style.RESET_ALL)
print(Fore.RED + "Total Number of unique products categories:" + Style.RESET_ALL + Fore.YELLOW + str(df["category"].nunique())+ Style.RESET_ALL)
print(Fore.RED + "Total Number of unique products sub categories:"  + Style.RESET_ALL + Fore.YELLOW +str(df["sub_category"].nunique())+ Style.RESET_ALL)
print(Fore.RED + "Total Number of unique products type:" + Style.RESET_ALL + Fore.YELLOW + str(df["type"].nunique())+ Style.RESET_ALL)
print(Fore.RED + "Total Number of unique products brands:" + Style.RESET_ALL + Fore.YELLOW + str(df["brand"].nunique())+ Style.RESET_ALL)

[40m[1mSummary of Products[0m
[31mTotal Number of unique products:[0m[33m23541[0m
[31mTotal Number of unique products categories:[0m[33m11[0m
[31mTotal Number of unique products sub categories:[0m[33m90[0m
[31mTotal Number of unique products type:[0m[33m426[0m
[31mTotal Number of unique products brands:[0m[33m2314[0m


### **Task 2 - Analyze the data based on the Products and Categories to anticipate the demand**

In [None]:
# Grabbing the data from category and data
df_product_category = df[["category", "product"]]

In [None]:
# Drop all the duplicates as we have workn with the distinct data points
df_product_category = df_product_category.drop_duplicates()

In [None]:
# Now, grouping the data by category based on count of the products
df_product_category = df_product_category.groupby("category").agg(product_count = ("product", "count")).reset_index().sort_values("product_count", ascending = False)

In [None]:
# Results
df_product_category.head()

Unnamed: 0,category,product_count
2,Beauty & Hygiene,6839
8,Gourmet & World Food,4109
9,"Kitchen, Garden & Pets",3186
10,Snacks & Branded Foods,2454
4,Cleaning & Household,2411


In [None]:
# Visualize
fig = px.bar(df_product_category, x = "category", y = "product_count", color = "category", title = "Analysis based on category via product count")
fig.show()

**Insights**
  * **We have to invest more on that things which are in demand for better utilise of place**
  * **Out of all the given categories, `Beauty & Hygiene` is having the most products**
  * **Followed by the same, the `Gourmet & World Food` and `Kitchen, Garden & Pets` are the ones that are in the top 3**
  * **Given this data, we can easily analyse that demands are mostly from these three categories, since these categories combined have more products as compared to the rest of the data**