
### 1. MCAR (Missing Completely At Random)
- **What it is**: Missingness is totally random and unrelated to any data.
- **Example**: Imagine a survey where one person's answers are lost because of a glitch in the online form. It has nothing to do with their age, gender, or any other factor. 
- **Impact**: No bias introduced; analysis remains valid.

### 2. MAR (Missing At Random)
- **What it is**: Missingness is related to observed data but not to the missing data itself.
- **Example**: In a health study, younger participants might skip questions about their income, but their age is recorded. The missing income data is related to age but not to the income itself.
- **Impact**: You can use the age information to help estimate the missing income data.

### 3. MNAR (Missing Not At Random)
- **What it is**: Missingness is related to the value of the missing data itself.
- **Example**: In a survey about happiness, people who are very unhappy may choose not to answer the happiness question. Their decision to skip the question relates directly to their feelings.
- **Impact**: This can lead to biased results since the missingness is connected to the very data you want to analyze.

### Summary
- **MCAR**: Randomly missing, no bias (like a glitch).
- **MAR**: Missing related to what you do know (like age influencing income reporting).
- **MNAR**: Missing related to the missing data itself (like unhappy people skipping the happiness question).

Understanding these helps in figuring out how to deal with the missing data in your analysis!

EXAMPLES

In [2]:
import seaborn as sns

In [3]:
df = sns.load_dataset('titanic')

In [4]:
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [8]:
round(df.isnull().sum()/len(df)*100,2)

survived        0.00
pclass          0.00
sex             0.00
age            19.87
sibsp           0.00
parch           0.00
fare            0.00
embarked        0.22
class           0.00
who             0.00
adult_male      0.00
deck           77.22
embark_town     0.22
alive           0.00
alone           0.00
dtype: float64

In [9]:
df.shape

(891, 15)

### Deleting the rows or data points to handle the values

In [11]:
df.dropna().shape

(182, 15)

Deleting the data columns wise

This deletes the column if that have any na value

In [12]:
df.dropna(axis=1).shape

(891, 11)

# Imputation technique 
* Mean 
* Median
* Mode (Categorical Column)

Mean Value imputation 
* This imputation works well wehn we have normally distributed data

so, its better to check the distribution before imputation

In [21]:


df['age_mean']= round(df["age"].fillna(df['age'].mean()),2)

In [22]:
df[['age','age_mean']]

Unnamed: 0,age,age_mean
0,22.0,22.0
1,38.0,38.0
2,26.0,26.0
3,35.0,35.0
4,35.0,35.0
...,...,...
886,27.0,27.0
887,19.0,19.0
888,,29.7
889,26.0,26.0


In [23]:
# Median

In [25]:
df["Age_median"] = df["age"].fillna(df['age'].median())

In [27]:
df[["age",'Age_median','age_mean']]

Unnamed: 0,age,Age_median,age_mean
0,22.0,22.0,22.0
1,38.0,38.0,38.0
2,26.0,26.0,26.0
3,35.0,35.0,35.0
4,35.0,35.0,35.0
...,...,...,...
886,27.0,27.0,27.0
887,19.0,19.0,19.0
888,,28.0,29.7
889,26.0,26.0,26.0


MODE (no relationship - Missing Data completely at random)

THIS WORKS ON THE CATEGORIAL VALUES

In [30]:
df[df['embarked'].isnull()]

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone,age_mean,Age_median
61,1,1,female,38.0,0,0,80.0,,First,woman,False,B,,yes,True,38.0,38.0
829,1,1,female,62.0,0,0,80.0,,First,woman,False,B,,yes,True,62.0,62.0


In [44]:
df['embarked_mode'] = df['embarked'].fillna(df['embarked'].mode()[0])

In [46]:
df[['age','age_mean','Age_median','embarked','embarked_mode']].isnull().sum()

age              177
age_mean           0
Age_median         0
embarked           2
embarked_mode      0
dtype: int64