# Handling Missing Data in Pandas DataFrame

This notebook demonstrates various techniques for handling missing data in a Pandas DataFrame, including reindexing, filling missing values, and dropping rows with missing data.


In [1]:
import pandas as pd
import numpy as np

## Creating a DataFrame with Missing Data

We will create a DataFrame to represent student data, including names, ages, courses, and GPAs. Some data will be missing.

In [2]:
# Create a dictionary with student data
students_data = {
    "Name": ['Eric', 'Ivy', 'Jude'],
    'Age': [40, None, 10],
    'Course': ['Machine Learning', 'Project Management', 'Programming'],
    'GPA': [4, 3.85, None]
}

# Create a Pandas DataFrame from the dictionary
students_df = pd.DataFrame(students_data)

# Display the DataFrame
students_df

Unnamed: 0,Name,Age,Course,GPA
0,Eric,40.0,Machine Learning,4.0
1,Ivy,,Project Management,3.85
2,Jude,10.0,Programming,


## Reindexing the DataFrame

Reindexing allows you to change, add, or delete the index on a specified axis. This returns a copy of the data. Here, we will fill Ivy's age.

In [3]:
# Create a copy of the DataFrame for reindexing
students_df2 = students_df.copy()

# Fill Ivy's age
students_df2.at[1, 'Age'] = 37

# Display the updated DataFrame
students_df2

Unnamed: 0,Name,Age,Course,GPA
0,Eric,40.0,Machine Learning,4.0
1,Ivy,37.0,Project Management,3.85
2,Jude,10.0,Programming,


## Dropping Rows with Missing Data

The `dropna()` method drops any rows that have missing data.

In [4]:
# Create a copy of the DataFrame for dropping missing data
students_df3 = students_df.copy()

# Drop any rows with missing data
students_df3.dropna(how='any', inplace=True)

# Display the updated DataFrame
students_df3

Unnamed: 0,Name,Age,Course,GPA
0,Eric,40.0,Machine Learning,4.0


## Filling Missing Data

The `fillna()` method fills missing data with a specified value.

In [5]:
# Create a copy of the DataFrame for filling missing data
students_df4 = students_df.copy()

# Fill missing data with a specified value (e.g., 4)
students_df4.fillna(value=4, inplace=True)

# Display the updated DataFrame
students_df4

Unnamed: 0,Name,Age,Course,GPA
0,Eric,40.0,Machine Learning,4.0
1,Ivy,4.0,Project Management,3.85
2,Jude,10.0,Programming,4.0


## Identifying Missing Data

The `isna()` method gets the boolean mask where values are `NaN`.

In [6]:
# Create a copy of the DataFrame for identifying missing data
students_df5 = students_df.copy()

# Get the boolean mask where values are NaN
missing_data_mask = pd.isna(students_df5)

# Display the boolean mask
missing_data_mask

Unnamed: 0,Name,Age,Course,GPA
0,False,False,False,False
1,False,True,False,False
2,False,False,False,True
