"""
TOPICS WE ARE GOING TO COVER TODAY:

1. Loading CSV Files
   - Reading CSV files with pd.read_csv()
   - Specifying file path and delimiter
   - Saving DataFrame to CSV with .to_csv()

2. Common Pandas Methods
   - .head() and .tail() â€“ viewing top/bottom rows
   - .shape â€“ checking rows and columns
   - .info() â€“ data summary and types
   - .describe() â€“ statistical summary
   - .columns and .index â€“ viewing column and index labels
   - .dtypes â€“ checking data types

3. Basic Data Manipulation (EDA)
   - Selecting columns (df['col']) and rows (.loc[], .iloc[])
   - Filtering rows with conditions
   - Sorting data with .sort_values()
   - Renaming columns with .rename()
   - Checking for missing values (.isnull(), .notnull())
   - Dropping missing values (.dropna()) and filling them (.fillna())
"""


In [1]:
# STEP 0: IMPORTING REQUIRED LIBRARIES
import pandas as pd
import numpy as np

**_What is a file path?_**  
**_A file path is the location of a file on your computer._**

**_Types of file paths:_**

**_1. Relative path:_** File location relative to current folder  
- Example: `'BigBasket Products.csv'` (file is in same folder)  
- Example: `'data/BigBasket Products.csv'` (file is in 'data' subfolder)  

**_2. Absolute path:_** Complete file location from root directory  
- Example (Windows): `'C:/Users/YourName/Documents/BigBasket Products.csv'`  
- Example (Mac/Linux): `'/home/username/data/BigBasket Products.csv'`  

**_Examples of reading CSV with different paths:_**

**_1. Same folder:_**  
`df = pd.read_csv('BigBasket Products.csv')`  

**_2. Subfolder:_**  
`df = pd.read_csv('data/BigBasket Products.csv')`  

**_3. Full path (Windows):_**  
`df = pd.read_csv('C:/Users/YourName/Documents/BigBasket Products.csv')`  

**_4. Full path (Mac/Linux):_**  
`df = pd.read_csv('/home/username/data/BigBasket Products.csv')`  


In [2]:
# SECTION 1: LOADING CSV FILES
df = pd.read_csv('BigBasket Products.csv')

**Specifying Delimiter**  
---

**_What is a delimiter?_**  
A delimiter is a character that separates values in a file.  
CSV files typically use comma (`,`) as delimiter.  
But some files use semicolon (`;`), tab (`\t`), or pipe (`|`).  

**_Why specify delimiter?_**  
If your file uses a different separator, you need to tell Pandas.  

**_Default delimiter is comma:_**  
`df = pd.read_csv('file.csv')  # delimiter=',' is default`  

**_Other delimiters:_**  
- `df = pd.read_csv('file.csv', delimiter=';')  # semicolon`  
- `df = pd.read_csv('file.csv', delimiter='\t')  # tab`  
- `df = pd.read_csv('file.csv', delimiter='|')  # pipe`  


**Saving DataFrame to CSV with .to_csv()**  
---

**_What is .to_csv()?_**  
`.to_csv()` saves a DataFrame to a CSV file.  
**Syntax:** `df.to_csv(filepath, index=False)`  

**_Why use .to_csv()?_**  
- Save your cleaned/modified data  
- Export data for sharing  
- Create backup of processed data  

**_Parameter: index=False_**  
By default, Pandas saves row numbers (index) as a column.  
`index=False` prevents this and saves only your actual data columns.  

**_Saving DataFrame to CSV:_**

**Basic usage:**  
`df.to_csv('output.csv', index=False)`  

**Save to specific folder:**  
`df.to_csv('data/output.csv', index=False)`  

**With index column:**  
`df.to_csv('output_with_index.csv', index=True)`  

**Practical example â€“ saving our DataFrame:**  
`df.to_csv('BigBasket_Cleaned.csv', index=False)`  

**_These methods help us understand and explore our data._**  
**_They are essential for any data analysis work._**  
---

**_2.1 .head() and .tail() â€“ Viewing Top/Bottom Rows_**

**_.head():_**  
- Shows the first few rows of the DataFrame  
- **Default:** Displays the first 5 rows  
- **Syntax:** `df.head(n)` where *n* is the number of rows  

**_Why use .head()?_**  
- Quick preview of data  
- Check if data loaded correctly  
- See column names and sample values  

**_.tail():_**  
- Shows the last few rows of the DataFrame  
- **Default:** Displays the last 5 rows  
- **Syntax:** `df.tail(n)` where *n* is the number of rows  

**_Examples:_**  
```python
df.head()        # shows first 5 rows  
df.head(10)      # shows first 10 rows  
df.tail()        # shows last 5 rows  
df.tail(8)       # shows last 8 rows  


In [4]:
df.head()

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220.0,220.0,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...
1,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180.0,180.0,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ..."
2,3,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119.0,250.0,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your m..."
3,4,Cereal Flip Lid Container/Storage Jar - Assort...,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149.0,176.0,"Laundry, Storage Baskets",3.7,Multipurpose container with an attractive desi...
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.0,162.0,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...


In [5]:
df.tail()

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
27550,27551,"Wottagirl! Perfume Spray - Heaven, Classic",Beauty & Hygiene,Fragrances & Deos,Layerr,199.2,249.0,Perfume,3.9,Layerr brings you Wottagirl Classic fragrant b...
27551,27552,Rosemary,Gourmet & World Food,Cooking & Baking Needs,Puramate,67.5,75.0,"Herbs, Seasonings & Rubs",4.0,Puramate rosemary is enough to transform a dis...
27552,27553,Peri-Peri Sweet Potato Chips,Gourmet & World Food,"Snacks, Dry Fruits, Nuts",FabBox,200.0,200.0,Nachos & Chips,3.8,We have taken the richness of Sweet Potatoes (...
27553,27554,Green Tea - Pure Original,Beverages,Tea,Tetley,396.0,495.0,Tea Bags,4.2,"Tetley Green Tea with its refreshing pure, ori..."
27554,27555,United Dreams Go Far Deodorant,Beauty & Hygiene,Men's Grooming,United Colors Of Benetton,214.53,390.0,Men's Deodorants,4.5,The new mens fragrance from the United Dreams ...


In [6]:
df.head(3)

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220.0,220.0,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...
1,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180.0,180.0,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ..."
2,3,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119.0,250.0,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your m..."


In [7]:
df.tail(3)

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
27552,27553,Peri-Peri Sweet Potato Chips,Gourmet & World Food,"Snacks, Dry Fruits, Nuts",FabBox,200.0,200.0,Nachos & Chips,3.8,We have taken the richness of Sweet Potatoes (...
27553,27554,Green Tea - Pure Original,Beverages,Tea,Tetley,396.0,495.0,Tea Bags,4.2,"Tetley Green Tea with its refreshing pure, ori..."
27554,27555,United Dreams Go Far Deodorant,Beauty & Hygiene,Men's Grooming,United Colors Of Benetton,214.53,390.0,Men's Deodorants,4.5,The new mens fragrance from the United Dreams ...


**_2.2 .shape â€“ Checking Rows and Columns_**  
---

**_What is .shape?_**  
- `.shape` returns the dimensions of a DataFrame as `(rows, columns)`  
- Itâ€™s an **attribute**, not a method, so no parentheses are needed  

**_Why use .shape?_**  
- Know dataset size  
- Verify data after filtering  
- Check if operations changed dimensions  

**_Example:_**  
```python
df.shape        # returns (27555, 10) meaning 27555 rows and 10 columns


In [8]:
df.shape

(27555, 10)

In [10]:
# Extracting individual values
num_rows = df.shape[0]  # First value is rows
num_rows

27555

In [11]:
num_columns = df.shape[1]  # Second value is columns
num_columns

10

In [12]:
print(f"Total data points: {num_rows * num_columns}")
print()

Total data points: 275550



**_2.3 .info() â€“ Data Summary and Types_**  
---

**_What is .info()?_**  
`.info()` provides a concise summary of the DataFrame including:  
- Number of rows and columns  
- Column names  
- Number of non-null values in each column  
- Data type of each column  
- Memory usage  

**_Why use .info()?_**  
- Get overall data structure  
- Identify missing values quickly  
- Check data types  
- Estimate memory usage  

**_Example:_**  
```python
df.info()

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27555 entries, 0 to 27554
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   index         27555 non-null  int64  
 1   product       27554 non-null  object 
 2   category      27555 non-null  object 
 3   sub_category  27555 non-null  object 
 4   brand         27554 non-null  object 
 5   sale_price    27555 non-null  float64
 6   market_price  27555 non-null  float64
 7   type          27555 non-null  object 
 8   rating        18929 non-null  float64
 9   description   27440 non-null  object 
dtypes: float64(3), int64(1), object(6)
memory usage: 2.1+ MB


<class 'pandas.core.frame.DataFrame'>  
RangeIndex: **27555 entries, 0 to 27554**  
Data columns (total **10 columns**):  

| # | Column        | Non-Null Count | Dtype   | Missing Values |
|---|---------------|----------------|---------|----------------|
| 0 | index         | 27555          | int64   | None |
| 1 | product       | 27554          | object  | 1 missing |
| 2 | category      | 27555          | object  | None |
| 3 | sub_category  | 27555          | object  | None |
| 4 | brand         | 27554          | object  | 1 missing |
| 5 | sale_price    | 27555          | float64 | None |
| 6 | market_price  | 27555          | float64 | None |
| 7 | type          | 27555          | object  | None |
| 8 | rating        | 18929          | float64 | 6626 missing |
| 9 | description   | 27440          | object  | 115 missing |

**dtypes:** float64(3), int64(1), object(6)  
**memory usage:** ~2.1 MB  

---

### ðŸ”Ž Explanation of Missing Values
- **product:** 1 missing â†’ likely a data entry error.  
- **brand:** 1 missing â†’ could be unbranded or incomplete metadata.  
- **rating:** 6,626 missing (~24%) â†’ many products not rated.  
- **description:** 115 missing â†’ some products lack textual descriptions.  

---

###  Summary
- Dataset has **27,555 rows Ã— 10 columns**.  
- Most columns are complete.  
- **Largest gap:** `rating` (~24% missing).  
- **Minor gaps:** `product`, `brand`, `description`.  

**Handling suggestions:**  
- Drop rows for critical fields (`product`).  
- Fill `description` with placeholders like `"No description"`.  
- Treat missing `rating` as `"Not rated"` instead of an error.  


**_2.4 .describe() â€“ Statistical Summary_**  
---

**_What is .describe()?_**  
`.describe()` generates a statistical summary for numerical columns in a DataFrame. It provides:  
- **count:** Number of non-null values  
- **mean:** Average value  
- **std:** Standard deviation (spread of values)  
- **min:** Minimum value  
- **25%, 50%, 75%:** Quartiles (percentiles)  
- **max:** Maximum value  

**_Why use .describe()?_**  
- Understand data distribution  
- Identify outliers  
- Get quick statistics  
- Compare columns  

---

###  Statistical Summary Output

| Metric | index       | sale_price   | market_price | rating    |
|--------|-------------|--------------|--------------|-----------|
| count  | 27555.00000 | 27555.000000 | 27555.000000 | 18929.000000 |
| mean   | 13778.00000 | 322.514808   | 382.056664   | 3.943410 |
| std    | 7954.58767  | 486.263116   | 581.730717   | 0.739063 |
| min    | 1.00000     | 2.450000     | 3.000000     | 1.000000 |
| 25%    | 6889.50000  | 95.000000    | 100.000000   | 3.700000 |
| 50%    | 13778.00000 | 190.000000   | 220.000000   | 4.100000 |
| 75%    | 20666.50000 | 359.000000   | 425.000000   | 4.300000 |
| max    | 27555.00000 | 12500.000000 | 12500.000000 | 5.000000 |

---

###  Explanation
- **Index:** Ranges from 1 to 27,555, confirming row count.  
- **Sale Price:** Average â‰ˆ 322, with wide spread (std â‰ˆ 486). Minimum price is very low (2.45), maximum is 12,500 â†’ indicates possible outliers.  
- **Market Price:** Average â‰ˆ 382, higher than sale price, suggesting discounts. Maximum also 12,500.  
- **Rating:** Average â‰ˆ 3.94 (close to 4 stars). Range is 1â€“5, with most values clustered around 3.7â€“4.3.  

This summary helps quickly **spot anomalies, understand central tendencies, and compare distributions

In [15]:
df.describe()

Unnamed: 0,index,sale_price,market_price,rating
count,27555.0,27555.0,27555.0,18929.0
mean,13778.0,322.514808,382.056664,3.94341
std,7954.58767,486.263116,581.730717,0.739063
min,1.0,2.45,3.0,1.0
25%,6889.5,95.0,100.0,3.7
50%,13778.0,190.0,220.0,4.1
75%,20666.5,359.0,425.0,4.3
max,27555.0,12500.0,12500.0,5.0


In [18]:
# For specefic columns
df[['sale_price', 'rating']].describe()

Unnamed: 0,sale_price,rating
count,27555.0,18929.0
mean,322.514808,3.94341
std,486.263116,0.739063
min,2.45,1.0
25%,95.0,3.7
50%,190.0,4.1
75%,359.0,4.3
max,12500.0,5.0


In [21]:
# To include non-numerical columns

print(df.describe(include='all'))
print()

              index                       product          category  \
count   27555.00000                         27554             27555   
unique          NaN                         23540                11   
top             NaN  Turmeric Powder/Arisina Pudi  Beauty & Hygiene   
freq            NaN                            26              7867   
mean    13778.00000                           NaN               NaN   
std      7954.58767                           NaN               NaN   
min         1.00000                           NaN               NaN   
25%      6889.50000                           NaN               NaN   
50%     13778.00000                           NaN               NaN   
75%     20666.50000                           NaN               NaN   
max     27555.00000                           NaN               NaN   

       sub_category   brand    sale_price  market_price       type  \
count         27555   27554  27555.000000  27555.000000      27555   
unique 

**_2.5 .columns and .index â€“ Viewing Labels_**  
---

**_.columns:_**  
- `.columns` returns the column names (headers) of the DataFrame.  
- Itâ€™s an **attribute**, not a method (no parentheses needed).  

**_Why use .columns?_**  
- See all column names  
- Check spelling of columns  
- Use in loops or operations  

**_Examples:_**  
```python
df.columns          # shows column names
df.columns.tolist() # converts column names into a Python list


In [22]:
df.columns

Index(['index', 'product', 'category', 'sub_category', 'brand', 'sale_price',
       'market_price', 'type', 'rating', 'description'],
      dtype='object')

In [23]:
df.columns.tolist()

['index',
 'product',
 'category',
 'sub_category',
 'brand',
 'sale_price',
 'market_price',
 'type',
 'rating',
 'description']

**_Understanding `.index` in Pandas_**  
---

**_What is `.index`?_**  
- `.index` is an **attribute** of a DataFrame that returns the row labels.  
- By default, Pandas assigns a **RangeIndex** starting from 0 up to the number of rows âˆ’ 1.  
- You can also set a custom index (e.g., product IDs, dates, or unique identifiers).  

---

**_Why use `.index`?_**  
- To view row identifiers.  
- To check if a custom index is set.  
- To verify dataset size after filtering or transformations.  

---

**_Examples:_**  
```python
df.index
# Output: RangeIndex(start=0, stop=27555, step=1)

len(df.index)
# Output: 27555  â†’ total number of rows


In [24]:
df.index

RangeIndex(start=0, stop=27555, step=1)

In [25]:
len(df.index)

27555

**_2.6 .dtypes â€“ Checking Data Types_**  
---

**_What is .dtypes?_**  
- `.dtypes` shows the **data type of each column** in a DataFrame.  
- Itâ€™s an **attribute**, not a method (no parentheses needed).  

**_Common Data Types in Pandas:_**  
- **object:** Text/string data  
- **int64:** Integer numbers  
- **float64:** Decimal numbers  
- **bool:** True/False values  
- **datetime64:** Dates and times  

---

**_Why use .dtypes?_**  
- Verify correct data types  
- Identify type conversion needs (e.g., converting strings to numbers or dates)  
- Understand memory usage and optimize performance  

---

**_Examples:_**  
```python
# Show data types of each column
df.dtypes

# Count how many columns belong to each data type
df.dtypes.value_counts()


index            int64
product          object
category         object
sub_category     object
brand            object
sale_price       float64
market_price     float64
type             object
rating           float64
description      object
dtype: object

# Count of columns by type
object     6
float64    3
int64      1
dtype: int64


In [27]:
df.dtypes

index             int64
product          object
category         object
sub_category     object
brand            object
sale_price      float64
market_price    float64
type             object
rating          float64
description      object
dtype: object

In [31]:
df.dtypes.value_counts()

object     6
float64    3
int64      1
Name: count, dtype: int64

**_SECTION 3: BASIC DATA MANIPULATION (EDA)_**  
---

**EDA = Exploratory Data Analysis**  
This is the process of understanding your data through various operations.  

**_3.1 Selecting Columns_**  

**Method 1: Select single column with `df['column_name']`**  
- Returns a **Series** (single column).  

```python
df['product']
type(df['product'])   # <class 'pandas.core.series.Series'>


In [32]:
df['product']

0                   Garlic Oil - Vegetarian Capsule 500 mg
1                                    Water Bottle - Orange
2                           Brass Angle Deep - Plain, No.2
3        Cereal Flip Lid Container/Storage Jar - Assort...
4                       Creme Soft Soap - For Hands & Body
                               ...                        
27550           Wottagirl! Perfume Spray - Heaven, Classic
27551                                             Rosemary
27552                         Peri-Peri Sweet Potato Chips
27553                            Green Tea - Pure Original
27554                       United Dreams Go Far Deodorant
Name: product, Length: 27555, dtype: object

In [33]:
type(df['product'])

pandas.core.series.Series

Method 2: Select multiple columns with df[['col1', 'col2']]

- Returns a DataFrame (multiple columns).
- Notice the double square brackets [[]].

In [34]:
df[['product', 'brand', 'sale_price']]

Unnamed: 0,product,brand,sale_price
0,Garlic Oil - Vegetarian Capsule 500 mg,Sri Sri Ayurveda,220.00
1,Water Bottle - Orange,Mastercook,180.00
2,"Brass Angle Deep - Plain, No.2",Trm,119.00
3,Cereal Flip Lid Container/Storage Jar - Assort...,Nakoda,149.00
4,Creme Soft Soap - For Hands & Body,Nivea,162.00
...,...,...,...
27550,"Wottagirl! Perfume Spray - Heaven, Classic",Layerr,199.20
27551,Rosemary,Puramate,67.50
27552,Peri-Peri Sweet Potato Chips,FabBox,200.00
27553,Green Tea - Pure Original,Tetley,396.00


In [36]:
# Practical Example â€“ Selecting columns for analysis:
selected_columns =df[['product', 'brand', 'sale_price']]
selected_columns.head()

Unnamed: 0,product,brand,sale_price
0,Garlic Oil - Vegetarian Capsule 500 mg,Sri Sri Ayurveda,220.0
1,Water Bottle - Orange,Mastercook,180.0
2,"Brass Angle Deep - Plain, No.2",Trm,119.0
3,Cereal Flip Lid Container/Storage Jar - Assort...,Nakoda,149.0
4,Creme Soft Soap - For Hands & Body,Nivea,162.0


**_3.2 Selecting Rows with .loc[] and .iloc[]_**  
---

**_.loc[]:_**  
- `.loc[]` selects rows by **LABEL** (index name).  
- **Syntax:** `df.loc[row_label, column_label]`  

**_.iloc[]:_**  
- `.iloc[]` selects rows by **INTEGER POSITION** (like list indexing).  
- **Syntax:** `df.iloc[row_position, column_position]`  

**_Key Difference:_**  
- `.loc[]` â†’ uses actual **index labels**  
- `.iloc[]` â†’ uses numeric **positions** (0, 1, 2, â€¦)  

**_Examples:_**  
```python
# Using .loc[] (label-based)
df.loc[5]                 # Select row with index label 5
df.loc[5, 'product']      # Select 'product' column from row with index label 5

# Using .iloc[] (position-based)
df.iloc[0]                # Select the first row (position 0)
df.iloc[0, 1]             # Select value at first row, second column

In [40]:
df.head(6)

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220.0,220.0,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...
1,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180.0,180.0,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ..."
2,3,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119.0,250.0,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your m..."
3,4,Cereal Flip Lid Container/Storage Jar - Assort...,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149.0,176.0,"Laundry, Storage Baskets",3.7,Multipurpose container with an attractive desi...
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.0,162.0,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...
5,6,Germ - Removal Multipurpose Wipes,Cleaning & Household,All Purpose Cleaners,Nature Protect,169.0,199.0,Disinfectant Spray & Cleaners,3.3,Stay protected from contamination with Multipu...


In [41]:
# Using .loc[] (label-based)
df.loc[5]

index                                                           6
product                         Germ - Removal Multipurpose Wipes
category                                     Cleaning & Household
sub_category                                 All Purpose Cleaners
brand                                              Nature Protect
sale_price                                                  169.0
market_price                                                199.0
type                                Disinfectant Spray & Cleaners
rating                                                        3.3
description     Stay protected from contamination with Multipu...
Name: 5, dtype: object

In [38]:
df.loc[5 , 'product']

'Germ - Removal Multipurpose Wipes'

In [46]:
# Using .iloc[] (position-based)
df.iloc[0]

index                                                           1
product                    Garlic Oil - Vegetarian Capsule 500 mg
category                                         Beauty & Hygiene
sub_category                                            Hair Care
brand                                           Sri Sri Ayurveda 
sale_price                                                  220.0
market_price                                                220.0
type                                             Hair Oil & Serum
rating                                                        4.1
description     This Product contains Garlic Oil that is known...
Name: 0, dtype: object

In [47]:
df.iloc[0, 1] # Select value at first row, second column

'Garlic Oil - Vegetarian Capsule 500 mg'

**_3.3 Filtering Rows with Conditions_**  
---

**_What is filtering?_**  
- Filtering selects rows that meet certain conditions.  
- **Syntax:** `df[condition]`  

**_Why use filtering?_**  
- Find specific records  
- Analyze subsets of data  
- Remove unwanted data  

**_Examples:_**  

In [48]:
# Filter rows where sale_price is greater than 500
df[df['sale_price'] > 500]

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
8,9,Biotin & Collagen Volumizing Hair Shampoo + Bi...,Beauty & Hygiene,Hair Care,StBotanica,1098.00,1098.0,Shampoo & Conditioner,3.5,"An exclusive blend with Vitamin B7 Biotin, Hyd..."
11,12,Butter Cookies Gold Collection,Gourmet & World Food,Chocolates & Biscuits,Sapphire,600.00,600.0,"Luxury Chocolates, Gifts",2.2,Enjoy a tin full of delicious butter cookies m...
20,21,Ceramic Barrel Brush - Colour May Vary,Beauty & Hygiene,Hair Care,Bronson Professional,525.00,700.0,Tools & Accessories,4.2,This Ceramic Barrel Brush by Bronson Professio...
25,26,Insulated Hot Fresh Casserole For Roti/Chapati...,"Kitchen, Garden & Pets",Flask & Casserole,Cello,659.00,895.0,Casserole,3.3,Stop your worries about keeping your food warm...
47,48,Colour Catcher Sheets,Cleaning & Household,All Purpose Cleaners,Dylon,799.00,799.0,Imported Cleaners,4.0,1. Prevents Colour Run Accidents Colours that ...
...,...,...,...,...,...,...,...,...,...,...
27505,27506,Virgin Coconut Oil,"Foodgrains, Oil & Masala",Edible Oils & Ghee,Merkera,875.00,875.0,Other Edible Oils,,"Merkera Extra Virgin Coconut Oil 100% natural,..."
27514,27515,Verge & Sheer Perfume For Pair,Beauty & Hygiene,Fragrances & Deos,Skinn by Titan,1615.50,1795.0,Perfume,,VERGE for men paints a picture of a classy out...
27515,27516,EDT Spray - Musk For Men,Beauty & Hygiene,Fragrances & Deos,Brut,595.00,595.0,Men's Deodorants,5.0,Brut Musk was launched in 1986 as an elegant m...
27538,27539,Quista Pro Advanced Whey Protein Formula forti...,Beauty & Hygiene,Health & Medicine,Himalaya,4500.00,4500.0,Supplements & Proteins,4.0,Quista Pro is a whey protein blend that helps ...


In [49]:
# Filter rows where brand is 'Nivea'
df[df['brand'] == 'Nivea']

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.00,162.0,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...
318,319,Milk Delights Face Wash With Honey For Dry Skin,Beauty & Hygiene,Skin Care,Nivea,86.40,90.0,Face Care,4.2,Presenting New Nivea Milk Delights Face Wash w...
412,413,Original Care Lip Balm For 24h Moisture With S...,Beauty & Hygiene,Skin Care,Nivea,114.80,140.0,Lip Care,4.3,Nivea Original Care Lip Blam nourishes lips in...
2538,2539,Extra Whitening Cell Repair Body Lotion - SPF ...,Beauty & Hygiene,Skin Care,Nivea,249.60,260.0,Body Care,3.6,The ultra-light formula of Nivea Express Hydra...
2582,2583,Whitening Smooth Skin Women Deodorant Roll On ...,Beauty & Hygiene,Fragrances & Deos,Nivea,139.30,199.0,Women's Deodorants,4.4,Care for your underarms with NIVEA Whitening S...
...,...,...,...,...,...,...,...,...,...,...
26911,26912,Body Wash - Frangipani & Oil Shower Gel,Beauty & Hygiene,Bath & Hand Wash,Nivea,339.15,399.0,Shower Gel & Body Wash,4.0,Give your skin refreshing care with Nivea fran...
26992,26993,Milk Delights Face Wash With Besan For Oily Skin,Beauty & Hygiene,Skin Care,Nivea,86.40,90.0,Face Care,4.1,Presenting New Nivea Milk Delights Face Wash h...
27080,27081,"Women Deodorant Roll On - Protect & Care, Non-...",Beauty & Hygiene,Men's Grooming,Nivea,181.30,185.0,Men's Deodorants,4.4,Nivea Protect & Care is the only roll on with ...
27153,27154,Cherry Shine Lip Balm - 24h Moisture With Natu...,Beauty & Hygiene,Makeup,Nivea,139.00,185.0,Lips,4.3,Nivea charming Cherry Shine lip care delights ...


In [50]:
# Filter rows where rating is above 4
df[df['rating'] > 4]

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220.00,220.0,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.00,162.0,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...
9,10,"Scrub Pad - Anti- Bacterial, Regular",Cleaning & Household,"Mops, Brushes & Scrubs",Scotch brite,20.00,20.0,"Utensil Scrub-Pad, Glove",4.3,Scotch Brite Anti- Bacterial Scrub Pad thoroug...
12,13,"Face Wash - Oil Control, Active",Beauty & Hygiene,Skin Care,Oxy,110.00,110.0,Face Care,5.0,This face wash deeply cleanses dirt and impuri...
14,15,Just Spray - Mosquito Repellent Room Spray,Cleaning & Household,Fresheners & Repellents,Herbal Strategi,200.00,200.0,Mosquito Repellent,4.2,Strategi Just Spray is a very effective 100% H...
...,...,...,...,...,...,...,...,...,...,...
27543,27544,Popcorn - French Butter & Pink Salt,Gourmet & World Food,"Snacks, Dry Fruits, Nuts",4700BC,31.50,35.0,Gourmet Popcorn,4.1,High-quality mushroom corn popped in olive oil...
27546,27547,Organic Powder - Garam Masala,"Foodgrains, Oil & Masala",Organic Staples,Organic Tattva,152.00,160.0,Organic Masalas & Spices,4.2,Organic Tattva Garam masala is a famous spice ...
27548,27549,Apple Cider Vinegar Shampoo,Beauty & Hygiene,Hair Care,Morpheme Remedies,499.00,499.0,Shampoo & Conditioner,5.0,"Say no to dull, lifeless, dry and damaged hair..."
27553,27554,Green Tea - Pure Original,Beverages,Tea,Tetley,396.00,495.0,Tea Bags,4.2,"Tetley Green Tea with its refreshing pure, ori..."


In [51]:
# Combine multiple conditions (AND)
df[(df['sale_price'] > 500) & (df['rating'] > 4)]

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
20,21,Ceramic Barrel Brush - Colour May Vary,Beauty & Hygiene,Hair Care,Bronson Professional,525.00,700.0,Tools & Accessories,4.2,This Ceramic Barrel Brush by Bronson Professio...
51,52,Peach Syrup,Gourmet & World Food,Drinks & Beverages,Pekers,850.00,850.0,Gourmet Juices & Drinks,4.2,Pekers peach syrup takes you on a historical t...
91,92,Hard Anodised Ezee-Pour Saucepan With Lid - L88,"Kitchen, Garden & Pets",Cookware & Non Stick,Hawkins Futura,864.50,910.0,Tawa & Sauce Pan,4.6,Futura Hard Anodised Saucepan comes with a spo...
219,220,Gentle Baby Wipes,Baby Care,Diapers & Wipes,Himalaya,699.00,699.0,Baby Wipes,4.1,Infused with the goodness of Aloe Vera and Ind...
253,254,Regenerist - Advanced Anti-Ageing Revitalising...,Beauty & Hygiene,Skin Care,Olay,1049.25,1399.0,Face Care,4.1,Olay Regenerist offers advanced appearance cor...
...,...,...,...,...,...,...,...,...,...,...
27414,27415,Pure Ghee,"Foodgrains, Oil & Masala",Edible Oils & Ghee,Amul,1000.00,1000.0,Ghee & Vanaspati,4.1,Clarified Butter or Pure Ghee is a staple in I...
27465,27466,Matic Top Load Detergent Combo - Washing Powder,Cleaning & Household,Detergents & Dishwash,Ariel,1100.00,1100.0,"Detergent Powder, Liquid",4.6,New & Improved Ariel Matic gives you tough sta...
27474,27475,Green Tea Bags - Detox Pack,Beverages,Tea,VAHDAM,799.00,799.0,Tea Bags,4.6,Indulge in a 100% natural detox and cleanse yo...
27515,27516,EDT Spray - Musk For Men,Beauty & Hygiene,Fragrances & Deos,Brut,595.00,595.0,Men's Deodorants,5.0,Brut Musk was launched in 1986 as an elegant m...


In [52]:
# Combine multiple conditions (OR)
df[(df['brand'] == 'Nivea') | (df['brand'] == 'Sri Sri Ayurveda')]

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.00,162.0,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...
318,319,Milk Delights Face Wash With Honey For Dry Skin,Beauty & Hygiene,Skin Care,Nivea,86.40,90.0,Face Care,4.2,Presenting New Nivea Milk Delights Face Wash w...
412,413,Original Care Lip Balm For 24h Moisture With S...,Beauty & Hygiene,Skin Care,Nivea,114.80,140.0,Lip Care,4.3,Nivea Original Care Lip Blam nourishes lips in...
2538,2539,Extra Whitening Cell Repair Body Lotion - SPF ...,Beauty & Hygiene,Skin Care,Nivea,249.60,260.0,Body Care,3.6,The ultra-light formula of Nivea Express Hydra...
2582,2583,Whitening Smooth Skin Women Deodorant Roll On ...,Beauty & Hygiene,Fragrances & Deos,Nivea,139.30,199.0,Women's Deodorants,4.4,Care for your underarms with NIVEA Whitening S...
...,...,...,...,...,...,...,...,...,...,...
26911,26912,Body Wash - Frangipani & Oil Shower Gel,Beauty & Hygiene,Bath & Hand Wash,Nivea,339.15,399.0,Shower Gel & Body Wash,4.0,Give your skin refreshing care with Nivea fran...
26992,26993,Milk Delights Face Wash With Besan For Oily Skin,Beauty & Hygiene,Skin Care,Nivea,86.40,90.0,Face Care,4.1,Presenting New Nivea Milk Delights Face Wash h...
27080,27081,"Women Deodorant Roll On - Protect & Care, Non-...",Beauty & Hygiene,Men's Grooming,Nivea,181.30,185.0,Men's Deodorants,4.4,Nivea Protect & Care is the only roll on with ...
27153,27154,Cherry Shine Lip Balm - 24h Moisture With Natu...,Beauty & Hygiene,Makeup,Nivea,139.00,185.0,Lips,4.3,Nivea charming Cherry Shine lip care delights ...


**_3.4 Sorting Data with .sort_values()_**  
---

**_What is .sort_values()?_**  
- `.sort_values()` sorts the DataFrame by one or more columns.  
- **Syntax:** `df.sort_values(by='column_name', ascending=True/False)`  

**_Why use .sort_values()?_**  
- Arrange data in order  
- Find top/bottom values  
- Better data presentation  

**_Parameters:_**  
- **by:** Column name(s) to sort by  
- **ascending:** `True` for low â†’ high, `False` for high â†’ low  

---


In [53]:
# Sort by sale_price in ascending order
df.sort_values(by='sale_price', ascending=True)

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
26976,26977,Curry Leaves,Fruits & Vegetables,Herbs & Seasonings,Fresho,2.45,15.00,Indian & Exotic Herbs,,"With dark green and glossy appearance, curry l..."
21312,21313,Serum,Beauty & Hygiene,Hair Care,Livon,3.00,3.00,Hair Oil & Serum,2.5,"Instantly Softens and Smoothens Dry, Rough, Ta..."
14184,14185,"Tomato - Local, Organically Grown",Fruits & Vegetables,Organic Fruits & Vegetables,Fresho,5.00,6.25,Organic Vegetables,,Fresho brings to you an exquisite range of loc...
2761,2762,Orbit Sugar-Free Chewing Gum - Lemon & Lime,Snacks & Branded Foods,Chocolates & Candies,Wrigleys,5.00,5.00,Mints & Chewing Gum,4.2,"With Orbit Sugarfree Chewing Gums, there's no ..."
17943,17944,Fulltoss Tangy Tomato,Gourmet & World Food,"Snacks, Dry Fruits, Nuts",Parle,5.00,5.00,Nachos & Chips,4.2,A snacking sensation which brings a little fla...
...,...,...,...,...,...,...,...,...,...,...
2781,2782,Extra Virgin Olive Oil,Gourmet & World Food,Oils & Vinegar,Abbies,7299.00,7299.00,Extra Virgin Olive Oil,,Suitable to cook Indian meals due to its neutr...
23082,23083,"Gas Stove-4 Burner Royale Plus Schott Glass, B...","Kitchen, Garden & Pets",Cookware & Non Stick,Prestige,7999.00,12245.00,Gas Stove,,Prestige Royale Plus Gas Stove. Add a Touch of...
12669,12670,Epilator SE9-9961 Legs-Body-Face,Beauty & Hygiene,Feminine Hygiene,Braun,8184.44,10769.00,Hair Removal,,This cordless epilator has a sonic exfoliation...
21761,21762,Pet Food - N&D Team Breeder Puppy Top Farmina,"Kitchen, Garden & Pets",Pet Food & Accessories,Farmina,10090.00,10090.00,Pet Meals & Treats,,Dog Food Adult Health Nutritional Dog Food.


In [54]:
# Sort by sale_price in descending order
df.sort_values(by='sale_price', ascending=False)

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
25301,25302,Bravura Clipper,"Kitchen, Garden & Pets",Pet Food & Accessories,Wahl,12500.00,12500.00,Pet Cleaning & Grooming,,The bravura clipper is a must-have clipper for...
21761,21762,Pet Food - N&D Team Breeder Puppy Top Farmina,"Kitchen, Garden & Pets",Pet Food & Accessories,Farmina,10090.00,10090.00,Pet Meals & Treats,,Dog Food Adult Health Nutritional Dog Food.
12669,12670,Epilator SE9-9961 Legs-Body-Face,Beauty & Hygiene,Feminine Hygiene,Braun,8184.44,10769.00,Hair Removal,,This cordless epilator has a sonic exfoliation...
23082,23083,"Gas Stove-4 Burner Royale Plus Schott Glass, B...","Kitchen, Garden & Pets",Cookware & Non Stick,Prestige,7999.00,12245.00,Gas Stove,,Prestige Royale Plus Gas Stove. Add a Touch of...
2781,2782,Extra Virgin Olive Oil,Gourmet & World Food,Oils & Vinegar,Abbies,7299.00,7299.00,Extra Virgin Olive Oil,,Suitable to cook Indian meals due to its neutr...
...,...,...,...,...,...,...,...,...,...,...
21228,21229,Dish ShineÂ Bar,Cleaning & Household,Detergents & Dishwash,Exo,5.00,5.00,Dishwash Bars & Powders,4.2,Exo Dish Shine Bar makes your vessels aromatic...
19538,19539,Layer Cake - Chocolate,"Bakery, Cakes & Dairy",Cakes & Pastries,Winkies,5.00,5.00,Tea Cakes & Slice Cakes,4.2,"Sugar, Weat Flour, Egg,edible, Liquid Glucose,..."
14184,14185,"Tomato - Local, Organically Grown",Fruits & Vegetables,Organic Fruits & Vegetables,Fresho,5.00,6.25,Organic Vegetables,,Fresho brings to you an exquisite range of loc...
21312,21313,Serum,Beauty & Hygiene,Hair Care,Livon,3.00,3.00,Hair Oil & Serum,2.5,"Instantly Softens and Smoothens Dry, Rough, Ta..."


In [55]:
# Sort by multiple columns (first by brand, then by sale_price)
df.sort_values(by=['brand', 'sale_price'], ascending=[True, False])

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
10169,10170,PCOS Green Tea - Lavender & Chamomile,Beverages,Tea,&Me,330.0,330.0,Green Tea,4.8,"&Me PCOS Tea for Regular Periods, Weight Manag..."
13299,13300,PCOS Green Tea - Kashmiri Kahwa,Beverages,Tea,&Me,330.0,330.0,Green Tea,4.7,"PCOS Tea for Regular Periods, Weight Managemen..."
17900,17901,Skin Women's Health Drink - Watermelon & Rose,Beverages,"Health Drink, Supplement",&Me,85.0,85.0,Men & Women,3.8,&Meâ€™s skin drink restores the skin by nourishi...
19107,19108,Red Sangria,Beverages,Energy & Soft Drinks,&Stirred,480.0,480.0,Soda & Cocktail Mix,3.9,Make amazing cocktails and mocktails at home w...
1984,1985,Kamikaze Shots,Beverages,Energy & Soft Drinks,&Stirred,250.0,250.0,Soda & Cocktail Mix,4.1,Make amazing cocktails and mocktails at home w...
...,...,...,...,...,...,...,...,...,...,...
15076,15077,Malai Paneer Cubes,"Bakery, Cakes & Dairy",Dairy,sumeru,81.0,90.0,"Paneer, Tofu & Cream",3.9,This Malai Paneer is one of the delicious Indi...
11086,11087,Grated Coconut,Snacks & Branded Foods,Frozen Veggies & Snacks,sumeru,80.0,80.0,Frozen Veg Snacks,3.9,This Grated Coconut is a quick frozen to retai...
15380,15381,Filter Coffee Decoction,Beverages,Coffee,sumeru,55.0,55.0,Instant Coffee,3.6,Sumeru brings to you the authentic South India...
23439,23440,Masala French Fries - Piri Piri,Snacks & Branded Foods,Frozen Veggies & Snacks,sumeru,49.5,55.0,Frozen Veg Snacks,4.3,Sumeru Wassup Masala French Fries Piri Piri Cr...


**_3.5 Renaming Columns with .rename()_**  
---

**_What is .rename()?_**  
- `.rename()` changes column or index names in a DataFrame.  
- **Syntax:** `df.rename(columns={'old_name': 'new_name'})`  

**_Why use .rename()?_**  
- Make column names more readable  
- Fix spelling mistakes  
- Standardize naming convention  

**_Examples:_**  
---

In [60]:
# Rename a single column
df.rename(columns={'sale_price': 'discounted_price'}).head()

Unnamed: 0,index,product,category,sub_category,brand,discounted_price,market_price,type,rating,description
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220.0,220.0,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...
1,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180.0,180.0,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ..."
2,3,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119.0,250.0,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your m..."
3,4,Cereal Flip Lid Container/Storage Jar - Assort...,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149.0,176.0,"Laundry, Storage Baskets",3.7,Multipurpose container with an attractive desi...
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.0,162.0,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...


In [61]:
# Rename multiple columns
df.rename(columns={'market_price': 'original_price', 'rating': 'customer_rating'}).head()

Unnamed: 0,index,product,category,sub_category,brand,sale_price,original_price,type,customer_rating,description
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220.0,220.0,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...
1,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180.0,180.0,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ..."
2,3,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119.0,250.0,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your m..."
3,4,Cereal Flip Lid Container/Storage Jar - Assort...,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149.0,176.0,"Laundry, Storage Baskets",3.7,Multipurpose container with an attractive desi...
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.0,162.0,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...


In [62]:
# Rename index labels
df.rename(index={0: 'first_row', 1: 'second_row'}).head()

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
first_row,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220.0,220.0,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...
second_row,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180.0,180.0,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ..."
2,3,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119.0,250.0,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your m..."
3,4,Cereal Flip Lid Container/Storage Jar - Assort...,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149.0,176.0,"Laundry, Storage Baskets",3.7,Multipurpose container with an attractive desi...
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.0,162.0,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...


**_3.6 Checking for Missing Values_**  
---

**_What are missing values?_**  
- Missing values are empty cells in your dataset.  
- In Pandas, they are represented as **NaN (Not a Number)** or **None**.  

**_Why check for missing values?_**  
- Understand overall **data quality**  
- Decide how to **handle missing entries** (drop, fill, impute)  
- Prevent **errors in analysis or modeling**  

---

**_Examples:_**  

In [None]:
# Check for missing values in the entire DataFrame
df.isnull().sum()

index              0
product            1
category           0
sub_category       0
brand              1
sale_price         0
market_price       0
type               0
rating          8626
description      115
dtype: int64


---

###  Column-wise Explanation
- **index â†’ 0**  
  No missing values. Every row has an index.  

- **product â†’ 1**  
  Only **1 product name** is missing â†’ likely a data entry error.  

- **category â†’ 0**  
  Complete. No missing values.  

- **sub_category â†’ 0**  
  Complete. No missing values.  

- **brand â†’ 1**  
  Only **1 brand** is missing â†’ possibly unbranded or incomplete metadata.  

- **sale_price â†’ 0**  
  Complete. Every product has a sale price.  

- **market_price â†’ 0**  
  Complete. Every product has a market price.  

- **type â†’ 0**  
  Complete. No missing values.  

- **rating â†’ 8626**  
  **8,626 ratings are missing** â†’ largest gap (~31% of dataset). Many products have not been rated.  

- **description â†’ 115**  
  **115 descriptions are missing** â†’ some products lack textual descriptions.  

---

###  Summary
- Most columns are **complete**.  
- **Minor gaps:** `product` (1), `brand` (1), `description` (115).  
- **Major gap:** `rating` (8,626 missing).  

---

###  Handling Suggestions
- Drop rows with missing **critical fields** (`product`, `brand`).  
- Fill `description` with placeholders like `"No description available"`.  
- Treat missing `rating` as `"Not rated"` or impute with average/median depending on analysis needs.  


In [None]:
# Check if any value is missing (True/False)
df.isnull().any()

index           False
product          True
category        False
sub_category    False
brand            True
sale_price      False
market_price    False
type            False
rating           True
description      True
dtype: bool

In [65]:
# Count total missing values
df.isnull().sum().sum()

8743


#####  Step-by-step Explanation
1. **`df.isnull()`**  
   - Returns a DataFrame of the same shape with **True/False** values.  
   - `True` = cell is missing (NaN/None).  
   - `False` = cell has a valid value.  

2. **`.sum()` (first time)**  
   - Applied column-wise.  
   - Counts how many `True` values (missing entries) exist in each column.  
   - Example:  
     ```
     product        1
     brand          1
     rating      8626
     description  115
     ```

3. **`.sum()` (second time)**  
   - Adds up all the column totals.  
   - Gives the **total number of missing values across the entire DataFrame**.  

---

In [66]:
# Check missing values in a specific column
df['rating'].isnull().sum()

8626

In [67]:
# Find rows with missing values
df[df.isnull().any(axis=1)]

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
55,56,Soothing Cucumber Facial Scrub With Apricot Seeds,Beauty & Hygiene,Skin Care,TJORI,299.4,499.0,Face Care,,The soothing feel of cucumber meets the gentle...
59,60,Corporate Planner Diary With Premium PU Leathe...,Cleaning & Household,Stationery,Prozo Plus,399.0,399.0,"Notebooks, Files, Folders",,A5 Size (210x150mm)Â \n192 Pages Premium Natura...
65,66,Ayurvedic Anti-Tan Face Pack,Beauty & Hygiene,Skin Care,TJORI,269.4,449.0,Face Care,,A nourishing face pack that removes tan and br...
68,69,Organic Carom Seeds/Ajwain/Om Kalu,"Foodgrains, Oil & Masala",Masalas & Spices,Earthon,72.0,72.0,Whole Spices,,"Earthon's Ajwain is Best quality, organically ..."
69,70,"Padded Harness - 3/4 inch, Grey Colour","Kitchen, Garden & Pets",Pet Food & Accessories,Glenand,840.0,840.0,Pet Collars & Leashes,,These are soft padded harness for your active ...
...,...,...,...,...,...,...,...,...,...,...
27509,27510,Deluxe Crackers - Veg,Gourmet & World Food,Chocolates & Biscuits,Kerk,150.0,150.0,"Cookies, Biscotti, Wafer",,Kerk Biscuits-has been a household name synony...
27511,27512,Specialist Stain Remover Pen & Marker,Cleaning & Household,All Purpose Cleaners,365,449.0,449.0,Imported Cleaners,,Mightier than the pen. The pen may be mightier...
27514,27515,Verge & Sheer Perfume For Pair,Beauty & Hygiene,Fragrances & Deos,Skinn by Titan,1615.5,1795.0,Perfume,,VERGE for men paints a picture of a classy out...
27530,27531,Tick'et to Fleadom Dry Shampoo For Dogs,"Kitchen, Garden & Pets",Pet Food & Accessories,Captain Zack,99.0,99.0,Pet Cleaning & Grooming,,1) No Rinse Defence Against Ticks and Fleas: C...


**_3.7 Handling Missing Values: .dropna() and .fillna()_**  
---

**_Two Main Approaches:_**  
1. **Drop** â†’ Remove rows or columns with missing values.  
2. **Fill** â†’ Replace missing values with a specific value.  

---

**_.dropna():_**  
- Removes rows or columns containing NaN/None.  
- **Syntax:** `df.dropna(axis=0, how='any')`  

**Examples:**  


In [68]:
# Drop rows with any missing values
df.dropna()

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220.00,220.0,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...
1,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180.00,180.0,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ..."
2,3,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119.00,250.0,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your m..."
3,4,Cereal Flip Lid Container/Storage Jar - Assort...,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149.00,176.0,"Laundry, Storage Baskets",3.7,Multipurpose container with an attractive desi...
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.00,162.0,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...
...,...,...,...,...,...,...,...,...,...,...
27550,27551,"Wottagirl! Perfume Spray - Heaven, Classic",Beauty & Hygiene,Fragrances & Deos,Layerr,199.20,249.0,Perfume,3.9,Layerr brings you Wottagirl Classic fragrant b...
27551,27552,Rosemary,Gourmet & World Food,Cooking & Baking Needs,Puramate,67.50,75.0,"Herbs, Seasonings & Rubs",4.0,Puramate rosemary is enough to transform a dis...
27552,27553,Peri-Peri Sweet Potato Chips,Gourmet & World Food,"Snacks, Dry Fruits, Nuts",FabBox,200.00,200.0,Nachos & Chips,3.8,We have taken the richness of Sweet Potatoes (...
27553,27554,Green Tea - Pure Original,Beverages,Tea,Tetley,396.00,495.0,Tea Bags,4.2,"Tetley Green Tea with its refreshing pure, ori..."



- Originally, your DataFrame had **27,555 rows**.  
- After applying `df.dropna()`, only **18,840 rows remain**.  
- That means **8,715 rows were removed** because they had at least one missing value.  

---

###  Why this happened
- From earlier checks (`df.isnull().sum()`), you had missing values in:
  - `product` â†’ 1 missing  
  - `brand` â†’ 1 missing  
  - `rating` â†’ 8,626 missing  
  - `description` â†’ 115 missing  

- Any row with a missing value in **any of these columns** was dropped.  
- Most of the dropped rows are due to missing `rating` values.  

---

###  Summary
- `df.dropna()` reduced your dataset from **27,555 â†’ 18,840 rows**.  
- It removed all rows with missing values.  
- This is useful for **clean datasets**, but you may lose a lot of data.  
- Alternative: use `.fillna()` to replace missing values instead of dropping them.  


In [69]:
# Drop rows only if ALL values are missing
df.dropna(how='all')

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220.00,220.0,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...
1,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180.00,180.0,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ..."
2,3,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119.00,250.0,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your m..."
3,4,Cereal Flip Lid Container/Storage Jar - Assort...,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149.00,176.0,"Laundry, Storage Baskets",3.7,Multipurpose container with an attractive desi...
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.00,162.0,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...
...,...,...,...,...,...,...,...,...,...,...
27550,27551,"Wottagirl! Perfume Spray - Heaven, Classic",Beauty & Hygiene,Fragrances & Deos,Layerr,199.20,249.0,Perfume,3.9,Layerr brings you Wottagirl Classic fragrant b...
27551,27552,Rosemary,Gourmet & World Food,Cooking & Baking Needs,Puramate,67.50,75.0,"Herbs, Seasonings & Rubs",4.0,Puramate rosemary is enough to transform a dis...
27552,27553,Peri-Peri Sweet Potato Chips,Gourmet & World Food,"Snacks, Dry Fruits, Nuts",FabBox,200.00,200.0,Nachos & Chips,3.8,We have taken the richness of Sweet Potatoes (...
27553,27554,Green Tea - Pure Original,Beverages,Tea,Tetley,396.00,495.0,Tea Bags,4.2,"Tetley Green Tea with its refreshing pure, ori..."


In [70]:
# Drop columns with missing values
df.dropna(axis=1)

Unnamed: 0,index,category,sub_category,sale_price,market_price,type
0,1,Beauty & Hygiene,Hair Care,220.00,220.0,Hair Oil & Serum
1,2,"Kitchen, Garden & Pets",Storage & Accessories,180.00,180.0,Water & Fridge Bottles
2,3,Cleaning & Household,Pooja Needs,119.00,250.0,Lamp & Lamp Oil
3,4,Cleaning & Household,Bins & Bathroom Ware,149.00,176.0,"Laundry, Storage Baskets"
4,5,Beauty & Hygiene,Bath & Hand Wash,162.00,162.0,Bathing Bars & Soaps
...,...,...,...,...,...,...
27550,27551,Beauty & Hygiene,Fragrances & Deos,199.20,249.0,Perfume
27551,27552,Gourmet & World Food,Cooking & Baking Needs,67.50,75.0,"Herbs, Seasonings & Rubs"
27552,27553,Gourmet & World Food,"Snacks, Dry Fruits, Nuts",200.00,200.0,Nachos & Chips
27553,27554,Beverages,Tea,396.00,495.0,Tea Bags



- Originally, your DataFrame had **10 columns**.  
- After applying `df.dropna(axis=1)`, only **6 columns remain**.  
- That means **4 columns were dropped** because they had at least one missing value.  

---

###  Why this happened
From earlier missing value check (`df.isnull().sum()`):  
- `product` â†’ 1 missing  
- `brand` â†’ 1 missing  
- `rating` â†’ 8,626 missing  
- `description` â†’ 115 missing  

 These 4 columns contained missing values, so they were **removed entirely**.  

---

###  Summary
- `df.dropna(axis=1)` dropped all columns with missing values.  
- You now have a DataFrame with **27,555 rows and 6 complete columns**.  
- This ensures no missing data remains, but you lose potentially important information.  
- Alternative: use `.fillna()` to **retain columns** and handle missing values instead of dropping them.  


In [71]:
# Fill missing values with a constant
df.fillna(0)

Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220.00,220.0,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...
1,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180.00,180.0,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ..."
2,3,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119.00,250.0,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your m..."
3,4,Cereal Flip Lid Container/Storage Jar - Assort...,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149.00,176.0,"Laundry, Storage Baskets",3.7,Multipurpose container with an attractive desi...
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.00,162.0,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...
...,...,...,...,...,...,...,...,...,...,...
27550,27551,"Wottagirl! Perfume Spray - Heaven, Classic",Beauty & Hygiene,Fragrances & Deos,Layerr,199.20,249.0,Perfume,3.9,Layerr brings you Wottagirl Classic fragrant b...
27551,27552,Rosemary,Gourmet & World Food,Cooking & Baking Needs,Puramate,67.50,75.0,"Herbs, Seasonings & Rubs",4.0,Puramate rosemary is enough to transform a dis...
27552,27553,Peri-Peri Sweet Potato Chips,Gourmet & World Food,"Snacks, Dry Fruits, Nuts",FabBox,200.00,200.0,Nachos & Chips,3.8,We have taken the richness of Sweet Potatoes (...
27553,27554,Green Tea - Pure Original,Beverages,Tea,Tetley,396.00,495.0,Tea Bags,4.2,"Tetley Green Tea with its refreshing pure, ori..."


In [72]:
# Fill missing values in a specific column
df['rating'].fillna(df['rating'].mean())

0        4.1
1        2.3
2        3.4
3        3.7
4        4.4
        ... 
27550    3.9
27551    4.0
27552    3.8
27553    4.2
27554    4.5
Name: rating, Length: 27555, dtype: float64

In [73]:
# Forward fill (propagate last valid value forward)
df.fillna(method='ffill')

  df.fillna(method='ffill')


Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220.00,220.0,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...
1,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180.00,180.0,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ..."
2,3,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119.00,250.0,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your m..."
3,4,Cereal Flip Lid Container/Storage Jar - Assort...,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149.00,176.0,"Laundry, Storage Baskets",3.7,Multipurpose container with an attractive desi...
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.00,162.0,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...
...,...,...,...,...,...,...,...,...,...,...
27550,27551,"Wottagirl! Perfume Spray - Heaven, Classic",Beauty & Hygiene,Fragrances & Deos,Layerr,199.20,249.0,Perfume,3.9,Layerr brings you Wottagirl Classic fragrant b...
27551,27552,Rosemary,Gourmet & World Food,Cooking & Baking Needs,Puramate,67.50,75.0,"Herbs, Seasonings & Rubs",4.0,Puramate rosemary is enough to transform a dis...
27552,27553,Peri-Peri Sweet Potato Chips,Gourmet & World Food,"Snacks, Dry Fruits, Nuts",FabBox,200.00,200.0,Nachos & Chips,3.8,We have taken the richness of Sweet Potatoes (...
27553,27554,Green Tea - Pure Original,Beverages,Tea,Tetley,396.00,495.0,Tea Bags,4.2,"Tetley Green Tea with its refreshing pure, ori..."


In [74]:
# Backward fill (propagate next valid value backward)
df.fillna(method='bfill')

  df.fillna(method='bfill')


Unnamed: 0,index,product,category,sub_category,brand,sale_price,market_price,type,rating,description
0,1,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220.00,220.0,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...
1,2,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180.00,180.0,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ..."
2,3,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119.00,250.0,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your m..."
3,4,Cereal Flip Lid Container/Storage Jar - Assort...,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149.00,176.0,"Laundry, Storage Baskets",3.7,Multipurpose container with an attractive desi...
4,5,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.00,162.0,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...
...,...,...,...,...,...,...,...,...,...,...
27550,27551,"Wottagirl! Perfume Spray - Heaven, Classic",Beauty & Hygiene,Fragrances & Deos,Layerr,199.20,249.0,Perfume,3.9,Layerr brings you Wottagirl Classic fragrant b...
27551,27552,Rosemary,Gourmet & World Food,Cooking & Baking Needs,Puramate,67.50,75.0,"Herbs, Seasonings & Rubs",4.0,Puramate rosemary is enough to transform a dis...
27552,27553,Peri-Peri Sweet Potato Chips,Gourmet & World Food,"Snacks, Dry Fruits, Nuts",FabBox,200.00,200.0,Nachos & Chips,3.8,We have taken the richness of Sweet Potatoes (...
27553,27554,Green Tea - Pure Original,Beverages,Tea,Tetley,396.00,495.0,Tea Bags,4.2,"Tetley Green Tea with its refreshing pure, ori..."


#  Notebook Summary: Pandas Basics (Sections 3.2 â€“ 3.7)

---

## **3.2 Selecting Rows with `.loc[]` and `.iloc[]`**
- **`.loc[]`** â†’ Selects rows by **label** (index name).  
- **`.iloc[]`** â†’ Selects rows by **integer position** (like list indexing).  
- **Key Difference:**  
  - `.loc[]` uses actual index labels.  
  - `.iloc[]` uses numeric positions (0, 1, 2, â€¦).  

---

## **3.3 Filtering Rows with Conditions**
- Filtering selects rows that meet certain conditions.  
- **Syntax:** `df[condition]`  
- **Uses:** Find specific records, analyze subsets, remove unwanted data.  
- **Examples:**  
  - `df[df['sale_price'] > 500]`  
  - `df[df['brand'] == 'Nivea']`  
  - Combine conditions with `&` (AND) or `|` (OR).  

---

## **3.4 Sorting Data with `.sort_values()`**
- Sorts the DataFrame by one or more columns.  
- **Syntax:** `df.sort_values(by='column_name', ascending=True/False)`  
- **Parameters:**  
  - `by` â†’ column(s) to sort by.  
  - `ascending` â†’ True (low â†’ high), False (high â†’ low).  
- **Examples:**  
  - `df.sort_values(by='sale_price')`  
  - `df.sort_values(by=['brand', 'sale_price'], ascending=[True, False])`  

---

## **3.5 Renaming Columns with `.rename()`**
- Changes column or index names.  
- **Syntax:** `df.rename(columns={'old_name': 'new_name'})`  
- **Uses:** Make names readable, fix typos, standardize naming.  
- **Examples:**  
  - `df.rename(columns={'sale_price': 'discounted_price'})`  
  - `df.rename(index={0: 'first_row'})`  

---

## **3.6 Checking for Missing Values**
- Missing values are represented as **NaN/None**.  
- **Why check?** Understand data quality, prevent errors, decide handling.  
- **Examples:**  
  - `df.isnull().sum()` â†’ count missing per column.  
  - `df.isnull().sum().sum()` â†’ total missing values.  
  - `df[df.isnull().any(axis=1)]` â†’ rows with missing values.  

---

## **3.7 Handling Missing Values: `.dropna()` and `.fillna()`**
- **Two approaches:**  
  1. **Drop** rows/columns with missing values â†’ `df.dropna()`  
  2. **Fill** missing values with replacements â†’ `df.fillna()`  

- **Examples:**  
  - `df.dropna()` â†’ removes rows with missing values.  
  - `df.dropna(axis=1)` â†’ removes columns with missing values.  
  - `df.fillna(0)` â†’ fills missing with 0.  
  - `df.fillna(method='ffill')` â†’ forward fill (carry last value forward).  
  - `df.fillna(method='bfill')` â†’ backward fill (carry next value backward).  

---

#  Final Takeaways
- `.loc[]` vs `.iloc[]` â†’ label vs position.  
- Filtering â†’ select subsets with conditions.  
- Sorting â†’ order data for clarity.  
- Renaming â†’ improve readability.  
- Missing values â†’ detect with `.isnull()`.  
- Handling missing â†’ drop or fill depending on context.  

This notebook covered **essential Pandas operations** for **data selection, filtering, sorting, renaming, and cleaning** â€” the foundation for effective data analysis.  
