# Missing Data Mechanisms: MCAR, MAR, MNAR
Author: Sachin Laxman Masti

This notebook demonstrates practical examples of:
- MCAR (Missing Completely At Random)
- MAR (Missing At Random)
- MNAR (Missing Not At Random)


In [None]:
import pandas as pd
import numpy as np

np.random.seed(42)

## Step 1: Create Base Dataset

In [None]:
df = pd.DataFrame({
    "age": np.random.randint(20, 60, 100),
    "gender": np.random.choice(["Male", "Female"], 100),
    "salary": np.random.randint(20000, 100000, 100)
})

df.head()

## Case 1: MCAR (Missing Completely At Random)
Here we randomly remove salary values.

In [None]:
df_mcar = df.copy()

random_index = np.random.choice(df_mcar.index, 15, replace=False)
df_mcar.loc[random_index, "salary"] = np.nan

#iss code se hum ek naye column ko crate karte hai ki data MCAR hia kya dekhne ke liye.
df_mcar["salary_missing"] = df_mcar["salary"].isnull().astype(int)

#iss code se pata cahlta hia ki gender wise missing values hai kya. aur DATA complitaly random missing hai kya nahi.
df_mcar.groupby("salary_missing")["age"].mean()

## Case 2: MAR (Missing At Random)
Here salary is missing depending on gender.

<span style='color:#91c9c0'> **most of the time in ML MAR ko hi accuption main leke imputaion kiya jata hai.** </span>

In [None]:
df_mar = df.copy()

df_mar.loc[df_mar["gender"] == "Female", "salary"] = np.nan

#iss code se hum ek naye column ko crate karte hai ki data MAR hia kya dekhne ke liye.
df_mar["salary_missing"] = df_mar["salary"].isnull().astype(int)

#iss code se pata cahlta hia ki gender wise missing values hai kya.
df_mar.groupby("gender")["salary_missing"].mean()

## Case 3: MNAR (Missing Not At Random)
Here high salary values are made missing.
This is difficult to detect statistically.

In [None]:
df_mnar = df.copy()

df_mnar.loc[df_mnar["salary"] > 80000, "salary"] = np.nan

#iss code se pata cahlta hia ki missing values uss value pe depend hai ya nahi.
df_mnar["salary_missing"] = df_mnar["salary"].isnull().astype(int)

df_mnar.head()

## Conclusion
- MCAR: Missing is random
- MAR: Missing depends on other observed variables
- MNAR: Missing depends on the value itself (hard to detect)

In real ML workflows, MAR is most common assumption.