## Handling Missing Values in Python During the Data Analysis Process
Missing values are a common occurrence in data analysis and must be handled effectively to ensure the accuracy of insights derived from the data. This document provides a comprehensive guide to handling missing values in Python, covering various techniques and their implementations.

### 1. Understanding Missing Values
### Causes of Missing Values:
- Data entry errors
- Non-responses in surveys
- Data corruption
- Merging datasets with unmatched keys

### Types of Missing Data:
- Missing Completely at Random (MCAR): No systematic pattern.
- Missing at Random (MAR): Systematic relationship with other variables.
- Missing Not at Random (MNAR): Related to the missing value itself.

#### Identifying Missing Values:

Before handling missing values, identify them in the dataset.

In [1]:
import pandas as pd

# Example dataset
data = {
    'Name': ['Sajjad', 'Noor', 'Sameer', None],
    'Age': [25, None, 30, 22],
    'Salary': [50000, 60000, None, 45000]
}
df = pd.DataFrame(data)

# Check for missing values
print(df.isnull())  # Boolean mask for missing values
print(df.isnull().sum())  # Count of missing values per column

    Name    Age  Salary
0  False  False   False
1  False   True   False
2  False  False    True
3   True  False   False
Name      1
Age       1
Salary    1
dtype: int64


### 2. Techniques to Handle Missing Values

#### 2.1 Dropping Missing Values
- When to use: If missing values are sparse and not critical.

In [2]:
# Drop rows with any missing values
df_dropped_rows = df.dropna()

# Drop columns with any missing values
df_dropped_columns = df.dropna(axis=1)

# Drop rows where specific columns have missing values
df_dropped_specific = df.dropna(subset=['Age', 'Salary'])

### 2.2 Imputation (Filling Missing Values)
- When to use: If missing values need to be estimated.

2.2.1 Fill with a Constant Value

In [3]:
# Fill with a specific value
df_filled_constant = df.fillna(0)

2.2.2 Fill with Statistical Measures
- Mean, Median, or Mode: Suitable for numerical and categorical data.

In [None]:
# Fill with mean
df['Age'] = df['Age'].fillna(df['Age'].mean())

# Fill with median
df['Age'] = df['Age'].fillna(df['Age'].median())

# Fill with mode
df['Name'] = df['Name'].fillna(df['Name'].mode()[0])

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Age'].fillna(df['Age'].mean(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Age'].fillna(df['Age'].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting 