Translate anomaly and trend analysis findings into actionable risk tiers, monitoring signals, and policy-relevant decision frameworks for Aadhaar system oversight.

In [1]:
import pandas as pd
import numpy as np
from pathlib import Path


In [2]:
PROJECT_ROOT = Path.cwd().parent if Path.cwd().name == "notebooks" else Path.cwd()
INTERMEDIATE_PATH = PROJECT_ROOT / "data" / "intermediate"

biometric_df = pd.read_parquet(INTERMEDIATE_PATH / "biometric_base.parquet")
demographic_df = pd.read_parquet(INTERMEDIATE_PATH / "demographic_base.parquet")
enrollment_df = pd.read_parquet(INTERMEDIATE_PATH / "enrollment_base.parquet")


In [3]:
def state_level_summary(df, value_cols):
    return (
        df.groupby("state")[value_cols]
        .sum()
        .reset_index()
    )


In [6]:
bio_state = state_level_summary(biometric_df, ["bio_age_5_17", "bio_age_17_"])
demo_state = state_level_summary(demographic_df, ["demo_age_5_17", "demo_age_17_"])
enr_state = state_level_summary(
    enrollment_df, ["age_0_5", "age_5_17", "age_18_greater"]
)


In [7]:
from sklearn.preprocessing import MinMaxScaler

def normalize_df(df, exclude_cols=["state"]):
    scaler = MinMaxScaler()
    num_cols = [c for c in df.columns if c not in exclude_cols]
    df[num_cols] = scaler.fit_transform(df[num_cols])
    return df


In [8]:
bio_state_n = normalize_df(bio_state.copy())
demo_state_n = normalize_df(demo_state.copy())
enr_state_n = normalize_df(enr_state.copy())


In [9]:
bio_state_n["risk_score"] = bio_state_n[
    ["bio_age_5_17", "bio_age_17_"]
].mean(axis=1)

demo_state_n["risk_score"] = demo_state_n[
    ["demo_age_5_17", "demo_age_17_"]
].mean(axis=1)

enr_state_n["risk_score"] = enr_state_n[
    ["age_0_5", "age_5_17", "age_18_greater"]
].mean(axis=1)


In [10]:
def assign_risk_bucket(score):
    if score >= 0.75:
        return "High"
    elif score >= 0.40:
        return "Medium"
    else:
        return "Low"


In [11]:
for df in [bio_state_n, demo_state_n, enr_state_n]:
    df["risk_bucket"] = df["risk_score"].apply(assign_risk_bucket)


In [12]:
final_risk = (
    bio_state_n[["state", "risk_bucket"]]
    .merge(
        enr_state_n[["state", "risk_bucket"]],
        on="state",
        suffixes=("_biometric", "_enrollment"),
        how="outer"
    )
    .merge(
        demo_state_n[["state", "risk_bucket"]],
        on="state",
        how="outer"
    )
    .rename(columns={"risk_bucket": "risk_bucket_demographic"})
)


In [13]:
final_risk.head()


Unnamed: 0,state,risk_bucket_biometric,risk_bucket_enrollment,risk_bucket_demographic
0,100000,,Low,Low
1,Andaman & Nicobar Islands,Low,Low,Low
2,Andaman and Nicobar Islands,Low,Low,Low
3,Andhra Pradesh,Low,Low,Low
4,Arunachal Pradesh,Low,Low,Low


### Risk Intrepretation framework

| Biometric | Enrollment | Demographic | Action                       |
| --------- | ---------- | ----------- | ---------------------------- |
| High      | High       | Any         | Immediate operational review |
| High      | Medium     | Any         | Capacity augmentation        |
| Medium    | High       | Any         | Targeted monitoring          |
| Medium    | Medium     | Any         | Routine monitoring           |
| Low       | Low        | Low         | No action                    |


In [14]:
priority_states = final_risk[
    (final_risk["risk_bucket_biometric"] == "High") |
    (final_risk["risk_bucket_enrollment"] == "High")
]

priority_states.sort_values(
    ["risk_bucket_enrollment", "risk_bucket_biometric"],
    ascending=False
).head(10)


Unnamed: 0,state,risk_bucket_biometric,risk_bucket_enrollment,risk_bucket_demographic
32,Maharashtra,High,Low,Medium
52,Uttar Pradesh,High,High,High


### Decision Framework Conclusion

By integrating anomaly severity, temporal trends, and geographic concentration, this framework translates analytical insights into actionable risk tiers for Aadhaar system oversight. States are categorized into low, medium, and high-risk groups, enabling targeted monitoring, capacity planning, and policy intervention. The approach supports proactive governance by identifying early warning signals of sustained system stress rather than reacting to isolated events.