# **Data Cleaning** 🧹

In [24]:
import pandas as pd

### Create initial DataFrame with **missing values**

In [25]:
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, None], 'City': ['New York', 'Los Angeles', None]}
df = pd.DataFrame(data)

df.head()

Unnamed: 0,Name,Age,City
0,Alice,25.0,New York
1,Bob,30.0,Los Angeles
2,Charlie,,


### **Puzzle 1**
Identify missing values in a DataFrame.

In [26]:
df.isnull()

Unnamed: 0,Name,Age,City
0,False,False,False
1,False,False,False
2,False,True,True


### **Puzzle 2**
Removing missing values.
* `axis = 0` will remove all the **rows** that contain missing values
* `axis = 1` will remove all the **columns** that contain missing values

In [27]:
df_dropped = df.dropna(axis=0)

# Typecasting the `Age` column
df_dropped = df_dropped.astype({'Age': 'int64'})

df_dropped

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles


### **Puzzle 3** - Missing Value Imputation
Fill missing values with a specified value.

In [28]:
df_filled = df.fillna(0)

# Typecasting the `Age` column
df_filled = df_filled.astype({'Age': 'int64'})

df_filled

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,0,0


### **Puzzle 4**
Fill missing values using **forward** fill i.e. with the **last** observed value.

In [29]:
df_ffill = df.ffill()

# Typecasting the `Age` column
df_ffill = df_ffill.astype({'Age': 'int64'})

df_ffill

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,30,Los Angeles


### **Puzzle 5**
Drop duplicate rows from a DataFrame.

In [30]:
df_no_duplicates = df.drop_duplicates()

df_no_duplicates

Unnamed: 0,Name,Age,City
0,Alice,25.0,New York
1,Bob,30.0,Los Angeles
2,Charlie,,


### **Puzzle 6**
Convert a column to a **categorical** type.

In [32]:
df['Category'] = df['Category'].astype('category')

df

Unnamed: 0,Name,Category
0,Alice,A
1,Bob,B
2,Charlie,A


### **Puzzle 7**
Replace values in a column based on a condition.

In [36]:
df_ffill.loc[df_ffill['Age'] == 25, 'Age'] = 26

df_ffill

Unnamed: 0,Name,Age,City
0,Alice,26,New York
1,Bob,30,Los Angeles
2,Charlie,30,Los Angeles


In [None]:
# Feel free to code ...
