# WellOps  
## Feature Engineering


The goal of feature engineering in WellOps is to transform raw workload and behavioral data into meaningful signals that correlate with burnout risk.

This includes:
- Capturing workload intensity
- Modeling temporal patterns
- Representing behavioral stress indicators
- Preparing features suitable for both classical ML and deep learning models


In [4]:
import numpy as np 
import pandas as pd 

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

In [6]:
# Synthetic Data Generation

np.random.seed(42)

n_employees = 200
n_weeks = 24

data = []

for emp_id in range(n_employees):
    role = np.random.choice(['Engineer','Data Analyst', 'Manager'])
    base_hours = np.random.normal(40, 5)


    for week in range(n_weeks):
        weekly_hours = max(30, base_hours + np.random.normal(0, 6))
        tasks_assigned = int(np.random.normal(10, 3))
        overtime_hours = max(0, weekly_hours - 40)
        task_switches = max(1, int(np.random.normal(6, 2)))
        stress_indicator = np.clip(np.random.normal(0.5, 0.15), 0, 1)

        data.append([
            emp_id, week, role, weekly_hours,
            tasks_assigned, overtime_hours,
            task_switches, stress_indicator])

df = pd.DataFrame(data, columns=[
    "employee_id", "week_id", "role",
    "weekly_hours", "tasks_assigned",
    "overtime_hours", "task_switches",
    "stress_indicator"
])

df.head()

Unnamed: 0,employee_id,week_id,role,weekly_hours,tasks_assigned,overtime_hours,task_switches,stress_indicator
0,0,0,Manager,40.341426,11,0.341426,8,0.362476
1,0,1,Manager,36.503944,3,0.0,5,0.558887
2,0,2,Manager,31.67372,10,0.0,5,0.503333
3,0,3,Manager,34.68207,8,0.0,5,0.533312
4,0,4,Manager,32.640969,10,0.0,5,0.670151


In [11]:
# Avoid division instability
epsilon = 1e-6

df["workload_intensity"] = df["weekly_hours"] / (df["tasks_assigned"] + epsilon)
df["overtime_ratio"] = df["overtime_hours"] / (df["weekly_hours"] + epsilon)
df["switch_pressure"] = df["task_switches"] / (df["tasks_assigned"] + epsilon)


In [8]:
df["rolling_hours_mean"] = (
    df.groupby("employee_id")["weekly_hours"]
    .transform(lambda x: x.rolling(4, min_periods=1).mean())
)

df["rolling_overtime_sum"] = (
    df.groupby("employee_id")["overtime_hours"]
    .transform(lambda x: x.rolling(4, min_periods=1).sum())
)


In [9]:
df["burnout_score"] = (
    0.35 * df["overtime_ratio"] +
    0.25 * df["stress_indicator"] +
    0.20 * df["workload_intensity"] +
    0.20 * df["switch_pressure"]
)

df["burnout_score"] = np.clip(df["burnout_score"], 0, 1)


In [12]:
# Replace inf with NaN, then drop
df.replace([np.inf, -np.inf], np.nan, inplace=True)
df.dropna(inplace=True)

df.isnull().sum()


employee_id             0
week_id                 0
role                    0
weekly_hours            0
tasks_assigned          0
overtime_hours          0
task_switches           0
stress_indicator        0
workload_intensity      0
overtime_ratio          0
switch_pressure         0
rolling_hours_mean      0
rolling_overtime_sum    0
burnout_score           0
dtype: int64

In [13]:
df_encoded = pd.get_dummies(df, columns=["role"], drop_first=True)

features = df_encoded.drop(
    columns=["employee_id", "week_id", "burnout_score"]
)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(features)

y = df_encoded["burnout_score"]


In [14]:
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42
)

X_train.shape, X_test.shape

((3840, 12), (960, 12))

### Feature Engineering Summary

- Raw workload data was transformed into intensity and pressure-based features.
- Temporal rolling features were added to capture burnout accumulation.
- A continuous burnout risk score was constructed as the target variable.
- Features were encoded and scaled for downstream ML models.

This dataset is now ready for classical machine learning and deep learning modeling.
