Recommendation Engine
---------------------------------
Policy‑Based Prescriptive Recommendation System
---------------------------------

This system generates HR actions based on diagnosed employee states.

Recommendations are derived from transparent decision rules that combine
behavioral strain with contextual factors such as performance, compensation,
development, and workload.

The design prioritizes explainability and business applicability 

In [25]:
import pandas as pd
import numpy as np

df = pd.read_csv("../data/employee_ml_dataset_v3.csv")

In [26]:

df["DaysSinceLastTraining"] = df["DaysSinceLastTraining"].replace(9999, np.nan)
df["YearsSinceLastRaise"]   = df["YearsSinceLastRaise"].replace(9999, np.nan)


In [27]:

for c in ["EngagementScore", "BurnoutRiskScore"]:
    df.loc[df[c] < 0, c] = np.nan

# Imputation  of numeric columns with median
num_for_impute = [
    "EngagementScore","BurnoutRiskScore","AbsenceDays_Last6M",
    "TrainingCount","DaysSinceLastTraining","YearsSinceLastRaise"
]
for c in num_for_impute:
    if c in df.columns:
        df[c] = pd.to_numeric(df[c], errors="coerce")
        df[c].fillna(df[c].median(), inplace=True)


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[c].fillna(df[c].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[c].fillna(df[c].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always 

In [28]:
##Behavioral Strain Index
df["Engagement_rank"] = 1 - df["EngagementScore"].rank(pct=True, method="average")
df["Burnout_rank"]    =     df["BurnoutRiskScore"].rank(pct=True, method="average")
df["Absence_rank"]    =     df["AbsenceDays_Last6M"].rank(pct=True, method="average")

 
df["Stagnation_bin"]  = df["CareerStagnationFlag"].astype(int)

#rule-based index
df["BehavioralStrainIndex"] = (
    0.35*df["Engagement_rank"] +
    0.35*df["Burnout_rank"] +
    0.20*df["Absence_rank"] +
    0.10*df["Stagnation_bin"]
)


q50, q80 = df["BehavioralStrainIndex"].quantile([0.5, 0.8]).values


In [29]:

def strain_bucket(x):
    if x < q50: return "Low Strain"
    if x < q80: return "Medium Strain"
    return "High Strain"

df["StrainLevel"] = df["BehavioralStrainIndex"].apply(strain_bucket)


In [30]:

is_high_perf = df["HighPerformerFlag"].astype(int) == 1
pay_stag     = df["PayStagnationFlag"].astype(int) == 1
needs_dev    = (df["TrainingCount"] == 0) | (df["DaysSinceLastTraining"] > 365)
workload     = df["AbsenceDays_Last6M"] >= df["AbsenceDays_Last6M"].quantile(0.8)

conditions = [
    (df["StrainLevel"].eq("High Strain") & is_high_perf),
    (pay_stag & is_high_perf),
    (needs_dev),
    (workload)
]
states = [
    "High Risk – Retention Critical",
    "Career Risk – Compensation Issue",
    "Capability Risk – Development Needed",
    "Operational Risk – Workload Issue"
]

df["DecisionState"] = np.select(conditions, states, default="Stable")


In [31]:

recommendation_map = {
    "High Risk – Retention Critical":   "Retention & Career Discussion",
    "Career Risk – Compensation Issue": "Compensation / Promotion Review",
    "Capability Risk – Development Needed": "Training & Development Plan",
    "Operational Risk – Workload Issue":    "Workload or Manager Review",
    "Stable": "No Immediate Action / Monitor"
}
df["RecommendedAction"] = df["DecisionState"].map(recommendation_map)



df["PriorityLevel"] = np.select(
    [
        df["DecisionState"].eq("High Risk – Retention Critical"),
        df["DecisionState"].ne("Stable") & df["StrainLevel"].eq("High Strain"),
        df["DecisionState"].ne("Stable")
    ],
    ["Immediate Action", "Planned Action", "Monitor"],
    default="Monitor"
)



In [32]:

def recommendation_reason(row):
    base = {
        "High Risk – Retention Critical":
            "High behavioral strain with strong performance indicators",
        "Career Risk – Compensation Issue":
            "Compensation stagnation observed for a high-performing employee",
        "Capability Risk – Development Needed":
            "Limited recent training or skill development activity",
        "Operational Risk – Workload Issue":
            "Elevated absence pattern suggesting workload pressure",
        "Stable":
            "No significant risk signals detected"
    }[row["DecisionState"]]

    extras = []
    if pay_stag.loc[row.name]:  extras.append("pay stagnation")
    if needs_dev.loc[row.name]: extras.append("development gap")
    if workload.loc[row.name]:  extras.append("high absence")

    return base if not extras else f"{base} | Signals: {', '.join(extras)}"

df["RecommendationReason"] = df.apply(recommendation_reason, axis=1)


In [33]:
output_cols = [
    "EmployeeID","Department","JobTitle",
    "StrainLevel","DecisionState",
    "RecommendedAction","PriorityLevel","RecommendationReason"
]
recommendations = df[output_cols]

In [34]:
action_counts = recommendations["RecommendedAction"].value_counts()


In [35]:
print("Actions distribution:\n", action_counts.head())


Actions distribution:
 RecommendedAction
Training & Development Plan        5010
No Immediate Action / Monitor      4932
Compensation / Promotion Review    2417
Retention & Career Discussion      1654
Workload or Manager Review          987
Name: count, dtype: int64


In [36]:
dept_summary = (recommendations
    .groupby(["Department","RecommendedAction","PriorityLevel"])
    .size().unstack(fill_value=0)
)

In [37]:
output_cols = [
    "EmployeeID",
    "Department",
    "JobTitle",
    "StrainLevel",
    "RecommendedAction",
    "PriorityLevel",
    "RecommendationReason"
]

recommendations = df[output_cols]
recommendations.head(10)

Unnamed: 0,EmployeeID,Department,JobTitle,StrainLevel,RecommendedAction,PriorityLevel,RecommendationReason
0,PNR-10012,Production,Production Worker,High Strain,Workload or Manager Review,Planned Action,Elevated absence pattern suggesting workload p...
1,PNR-10017,Logistics,Warehouse Associate,Medium Strain,Training & Development Plan,Monitor,Limited recent training or skill development a...
2,PNR-10019,Logistics,Logistics Coordinator,Medium Strain,Compensation / Promotion Review,Monitor,Compensation stagnation observed for a high-pe...
3,PNR-10036,Finance,Financial Analyst,Medium Strain,No Immediate Action / Monitor,Monitor,No significant risk signals detected | Signals...
4,PNR-10050,Quality Control,QC Inspector,Medium Strain,No Immediate Action / Monitor,Monitor,No significant risk signals detected
5,PNR-10094,Research & Development,Scientist,High Strain,Training & Development Plan,Planned Action,Limited recent training or skill development a...
6,PNR-10095,Sales,Account Manager,Medium Strain,Compensation / Promotion Review,Monitor,Compensation stagnation observed for a high-pe...
7,PNR-10097,Logistics,Logistics Coordinator,Low Strain,Training & Development Plan,Monitor,Limited recent training or skill development a...
8,PNR-10103,Marketing,Marketing Manager,High Strain,Retention & Career Discussion,Immediate Action,High behavioral strain with strong performance...
9,PNR-10111,IT Support,IT Manager,Medium Strain,Training & Development Plan,Monitor,Limited recent training or skill development a...


In [38]:
recommendations["RecommendedAction"].value_counts()

RecommendedAction
Training & Development Plan        5010
No Immediate Action / Monitor      4932
Compensation / Promotion Review    2417
Retention & Career Discussion      1654
Workload or Manager Review          987
Name: count, dtype: int64

In [39]:
recommendations = df[output_cols]

recommendations.groupby(
    ["Department", "RecommendedAction", "PriorityLevel"]
).size().unstack(fill_value=0)

Unnamed: 0_level_0,PriorityLevel,Immediate Action,Monitor,Planned Action
Department,RecommendedAction,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Finance,Compensation / Promotion Review,0,250,0
Finance,No Immediate Action / Monitor,0,564,0
Finance,Retention & Career Discussion,193,0,0
Finance,Training & Development Plan,0,437,93
Finance,Workload or Manager Review,0,29,86
Human Resources,Compensation / Promotion Review,0,275,0
Human Resources,No Immediate Action / Monitor,0,533,0
Human Resources,Retention & Career Discussion,186,0,0
Human Resources,Training & Development Plan,0,413,87
Human Resources,Workload or Manager Review,0,50,79
