# Flipkart Product Recommendation System - Data Preparation
### **Steps:**
1. [Importing Libraries](#1)
2. [Load the dataset](#2)
3. [Data Cleaning](#3) 
4. [Feature Engineering](#4)
5. [Create Price categories](#5)
6. [Normalize numerical features](#6)
7. [Encode categorical features](#7)

<a name='1'></a>
## 1. Importing Libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
from sklearn.neighbors import NearestNeighbors
from scipy.sparse import csr_matrix
from sklearn.preprocessing import MinMaxScaler, LabelEncoder

<a name="2"></a>
## 2. Load the Dataset

In [3]:
df = pd.read_csv('flipkart_products_20250405.csv')
df.head()

Unnamed: 0,Product Name,Price (₹),Rating (★),Number of Buyers,Total Sold,Available Stock,Main Category,Sub Category,Discount (%),Seller,Return Policy,Product URL
0,Krishnamurthy-Devan Laboriosam Ultra Smartphon...,142247.04,3.2,7348,4812,364,Electronics,Smartphones,45,RetailNet,False,https://www.flipkart.com/Krishnamurthy-Devan-L...
1,Nanda-Mahal Dignissimos Lite Laptops 1,186922.43,4.1,2342,881,145,Electronics,Laptops,55,Flipkart Assured,False,https://www.flipkart.com/Nanda-Mahal-Dignissim...
2,Choudhury LLC Amet Plus Decor 15,11843.41,5.0,739,2580,206,Home,Decor,58,SuperComNet,True,https://www.flipkart.com/Choudhury-LLC-Amet-Pl...
3,Borah LLC Accusantium Lite Smartphones 9,10864.31,4.8,1543,4562,1585,Electronics,Smartphones,0,ElectroWorld,False,https://www.flipkart.com/Borah-LLC-Accusantium...
4,Murty Inc Placeat Pro Smartwatches 8,32950.41,4.5,7702,4925,1064,Electronics,Smartwatches,18,MobileHub,False,https://www.flipkart.com/Murty-Inc-Placeat-Pro...


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Product Name      5000 non-null   object 
 1   Price (₹)         5000 non-null   float64
 2   Rating (★)        5000 non-null   float64
 3   Number of Buyers  5000 non-null   int64  
 4   Total Sold        5000 non-null   int64  
 5   Available Stock   5000 non-null   int64  
 6   Main Category     5000 non-null   object 
 7   Sub Category      5000 non-null   object 
 8   Discount (%)      5000 non-null   int64  
 9   Seller            5000 non-null   object 
 10  Return Policy     5000 non-null   bool   
 11  Product URL       5000 non-null   object 
dtypes: bool(1), float64(2), int64(4), object(5)
memory usage: 434.7+ KB


In [7]:
df.shape

(5000, 12)

In [5]:
df.describe()

Unnamed: 0,Price (₹),Rating (★),Number of Buyers,Total Sold,Available Stock,Discount (%)
count,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0
mean,35884.09435,4.00364,5044.4256,5005.0662,1250.8702,26.208
std,39867.681428,0.584742,2886.016013,2897.042048,1109.872427,23.484332
min,100.45,3.0,11.0,50.0,0.0,0.0
25%,12275.5025,3.5,2553.75,2495.0,323.75,0.0
50%,27761.01,4.0,5099.5,4972.5,916.0,23.0
75%,42852.2475,4.5,7575.5,7571.5,1934.25,47.0
max,249158.91,5.0,10000.0,10000.0,4924.0,70.0


<a name="3"></a>
## 3. Data Cleaning

In [8]:
df.isnull().sum()

Product Name        0
Price (₹)           0
Rating (★)          0
Number of Buyers    0
Total Sold          0
Available Stock     0
Main Category       0
Sub Category        0
Discount (%)        0
Seller              0
Return Policy       0
Product URL         0
dtype: int64

**Note: we found nothing null values but we can see sum columns including icon so we can changing thus columns name**

In [9]:
df.rename(columns={'Price (₹)': 'Price', 'Rating (★)': 'Rating', 'Discount (%)': 'Discount'}, inplace=True)

In [10]:
df.head()

Unnamed: 0,Product Name,Price,Rating,Number of Buyers,Total Sold,Available Stock,Main Category,Sub Category,Discount,Seller,Return Policy,Product URL
0,Krishnamurthy-Devan Laboriosam Ultra Smartphon...,142247.04,3.2,7348,4812,364,Electronics,Smartphones,45,RetailNet,False,https://www.flipkart.com/Krishnamurthy-Devan-L...
1,Nanda-Mahal Dignissimos Lite Laptops 1,186922.43,4.1,2342,881,145,Electronics,Laptops,55,Flipkart Assured,False,https://www.flipkart.com/Nanda-Mahal-Dignissim...
2,Choudhury LLC Amet Plus Decor 15,11843.41,5.0,739,2580,206,Home,Decor,58,SuperComNet,True,https://www.flipkart.com/Choudhury-LLC-Amet-Pl...
3,Borah LLC Accusantium Lite Smartphones 9,10864.31,4.8,1543,4562,1585,Electronics,Smartphones,0,ElectroWorld,False,https://www.flipkart.com/Borah-LLC-Accusantium...
4,Murty Inc Placeat Pro Smartwatches 8,32950.41,4.5,7702,4925,1064,Electronics,Smartwatches,18,MobileHub,False,https://www.flipkart.com/Murty-Inc-Placeat-Pro...
