# ðŸ“Š Flipkart Makeup Product Analysis â€“ Brand & Performance Insights

## ðŸ“Œ Project Overview
This project focuses on analyzing makeup product data from Flipkart to understand brand performance, category trends, pricing strategies, discount impact, and customer behavior. The analysis aims to extract meaningful insights that can support business decision-making in marketing, merchandising, and inventory planning.

## ðŸŽ¯ Objective
The main objective of this project is to evaluate the performance of makeup products across different brands and categories by analyzing prices, discounts, ratings, and customer reviews.


## Data Cleaning + Preparation + Basic EDA

### Import Required Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use("default")


### Load the Dataset

In [8]:
df = pd.read_csv("data/Flipkart_Makeup_Product_Analysis_3L.csv")

df.head()


Unnamed: 0,Product_ID,Product_Name,Brand,Category,Original_Price,Discounted_Price,Discount_Percentage,Rating,Review_Count,Seller_Name
0,1,MAC Silk Primer Smudge-Proof Honey 29 (15 ml),MAC,Primer,1627,1138.9,30,4.5,54,OmniTechRetail
1,2,Kiko Milano Prime Concealer Velvet Shade Pink ...,Kiko Milano,Concealer,392,392.0,0,4.5,29,ShopNStyle
2,3,The Body Shop Stay Primer Waterproof Hue Caram...,The Body Shop,Primer,1923,961.5,50,4.0,51,PrimeBeauty
3,4,Mamaearth Ultra Lip Gloss Dewy Shade Caramel 2...,Mamaearth,Lip Gloss,411,369.9,10,4.5,5,BeautyHub
4,5,Lakme Max Brow Pencil Waterproof Brown 46 (0.2...,Lakme,Brow Pencil,999,599.4,40,3.9,29,TrendCart


### Understand the Dataset Structure

In [9]:
df.shape

(300000, 10)

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 10 columns):
 #   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   Product_ID           300000 non-null  int64  
 1   Product_Name         300000 non-null  object 
 2   Brand                300000 non-null  object 
 3   Category             300000 non-null  object 
 4   Original_Price       300000 non-null  int64  
 5   Discounted_Price     300000 non-null  float64
 6   Discount_Percentage  300000 non-null  int64  
 7   Rating               300000 non-null  float64
 8   Review_Count         300000 non-null  int64  
 9   Seller_Name          300000 non-null  object 
dtypes: float64(2), int64(4), object(4)
memory usage: 22.9+ MB


In [11]:
df.describe()


Unnamed: 0,Product_ID,Original_Price,Discounted_Price,Discount_Percentage,Rating,Review_Count
count,300000.0,300000.0,300000.0,300000.0,300000.0,300000.0
mean,150000.5,881.806967,687.71581,22.04085,4.145123,43.283577
std,86602.684716,613.083025,503.811919,14.711613,0.439298,35.222829
min,1.0,59.0,23.6,0.0,2.5,0.0
25%,75000.75,426.0,319.2,10.0,3.8,19.0
50%,150000.5,744.0,567.15,20.0,4.2,34.0
75%,225000.25,1193.0,922.4,30.0,4.5,57.0
max,300000.0,3999.0,3999.0,60.0,5.0,541.0


#### observation 
Number of rows â‰ˆ 3,00,000

Number of columns = 10

Prices are numeric

Ratings between 2.5 â€“ 5.0

### Check Missing Values

In [12]:
df.isnull().sum()

Product_ID             0
Product_Name           0
Brand                  0
Category               0
Original_Price         0
Discounted_Price       0
Discount_Percentage    0
Rating                 0
Review_Count           0
Seller_Name            0
dtype: int64

#### insight:

No missing values

### Check Duplicates

In [13]:
df.duplicated().sum()


np.int64(0)

### Data Type Validation

In [14]:
df.dtypes

Product_ID               int64
Product_Name            object
Brand                   object
Category                object
Original_Price           int64
Discounted_Price       float64
Discount_Percentage      int64
Rating                 float64
Review_Count             int64
Seller_Name             object
dtype: object

# Feature Engineering

### 1. Create Price Difference Column

In [15]:
df["Price_Difference"] = df["Original_Price"] - df["Discounted_Price"]


### 2. Create Rating Bucket

In [16]:
df["Rating_Category"] = pd.cut(
    df["Rating"],
    bins=[0,3,4,4.5,5],
    labels=["Low","Average","Good","Excellent"]
)


### 3. Create Discount Flag

In [18]:
df["High_Discount"] = np.where(df["Discount_Percentage"] >= 30, "Yes", "No")


# Basic EDA

### 1. Top 10 Brands by Product Count

In [19]:
df["Brand"].value_counts().head(10)


Brand
Lakme            50974
Maybelline       27399
L'Oreal Paris    18888
NYX              14693
MAC              11815
Colorbar         10026
Revlon            9065
Sugar             7793
Faces Canada      7106
Swiss Beauty      6379
Name: count, dtype: int64

### 2. Average Rating by Category

In [20]:
df.groupby("Category")["Rating"].mean().sort_values(ascending=False)


Category
Blush                4.151054
Lipstick             4.148188
Setting Spray        4.147469
Highlighter          4.147060
Bronzer              4.146918
Primer               4.146779
Makeup Remover       4.146479
Foundation           4.146226
Concealer            4.146193
Lip Balm             4.145910
Compact              4.145675
CC Cream             4.145473
Brow Pencil          4.144950
Setting Powder       4.144282
Mascara              4.144227
Liquid Lipstick      4.144003
Eyeliner             4.143827
Lip Gloss            4.143108
Single Eyeshadow     4.142685
Lip Liner            4.142647
Eyeshadow Palette    4.142498
Nail Polish          4.142430
BB Cream             4.141859
Kajal                4.139978
Brow Gel             4.136999
Name: Rating, dtype: float64

### 3. Discount vs Rating

In [21]:
df.groupby("High_Discount")["Rating"].mean()


High_Discount
No     4.145482
Yes    4.144454
Name: Rating, dtype: float64

### 4. Top Categories by Reviews

In [22]:
df.groupby("Category")["Review_Count"].sum().sort_values(ascending=False)


Category
Lipstick             1749656
Foundation           1158297
Liquid Lipstick       977998
Concealer             805936
Mascara               661674
Eyeliner              660441
Primer                531333
Kajal                 530513
Compact               528357
Lip Gloss             524471
Eyeshadow Palette     524432
Highlighter           523966
Blush                 522010
Setting Powder        401939
Lip Balm              395215
Lip Liner             285458
Setting Spray         282428
BB Cream              281120
Brow Pencil           280852
Single Eyeshadow      279769
Bronzer               279566
Nail Polish           274685
Makeup Remover        179736
CC Cream              174280
Brow Gel              170941
Name: Review_Count, dtype: int64

## Key Insights

Lipstick and Foundation categories dominate product listings.

Products with high discounts do not always guarantee higher ratings.

Premium brands tend to maintain higher average ratings.

Certain categories generate high review volume despite moderate ratings.

### Save Cleaned Dataset

In [24]:
df.to_csv("data/flipkart_makeup_cleaned.csv", index=False)
