In [None]:
# Python Pandas for Handling Missing Values

### Data is often collected from multiple sources and can contain missing values or null values.
> Missing values can occur in any dataset and can be caused by various reasons such as
data entry errors,
incomplete data, or
intentional omissions.
It’s a common problem in data analysis as they can lead to incorrect analysis or results. Therefore, it is important to handle these missing values before further analysis.

# data manipulation library in Python, provides efficient methods to handle missing data

# functions to handle missing data such as isna(), isnull(), notnull(), dropna(),fillna(), and replace().

In [None]:
# 1. Identify the missing values
# use the isna() or isnull()
# These methods will return a Boolean array that indicates whether each value in the DataFrame is missing or not
import pandas as pd

# create a DataFrame with missing values
data = {'Name': ['John', 'Joseph', 'Mary', 'Mark', 'David', 'Mike'], 'Age': [25, 33, None, 28, 29, 35], 'Salary': [50000, None, 60000, 55000, 70000, 65000]}
#
df = pd.DataFrame(data)
#
df

In [None]:
# check for missing values
print(df.isnull())

In [None]:
print(df.notnull())    # The notnull() method returns the opposite of isnull().

# 2. Dropping Missing Values

In [None]:
# drop rows with missing values
df_dropped = df.dropna()     # Remove any row that contains missing values

print(df_dropped)

### 3. Filling of Replace Missing Values

If your dataset has a large number of missing values, it may be more appropriate to replace them with other values.
Pandas provides several methods to replace missing values, including fillna() and replace().
The fillna() method replaces missing values with a specified value or with values from a specified method.
The replace() method replaces specified values with other values.

In [None]:
# Replace missing values with a specified value
df_filled = df.fillna(0) # Replace missing values with 0

print(df_filled)

In [None]:
df_filled = df.fillna(df['Age'].mean())    # Replace missing values with the mean of the column

print(df_filled)

In [None]:
# Replace specified values with other values
df_replace = df.replace({'Mary':'Johns'}) # Replace Mary with Johns

print(df_replace)