# Exploratory Data Analysis – Triadic Human–AI Delegation Dataset

This notebook explores the synthetic dataset generated by the **Triadic Delegation Simulator**, which models dynamic delegation behavior between managers, AI systems, and human employees.

**Key questions:**
- How are governance orientations distributed across managers?
- How does delegation behavior (acceptance, override, escalation) vary by governance mode?
- How do delegation dynamics evolve over time (26 periods)?
- What is the effect of the transparency intervention (period 13)?
- How do latent willingness-to-delegate states transition over time?

---

### Table of Contents

1. [Import Required Libraries](#1.-Import-Required-Libraries)
2. [Load and Inspect Data](#2.-Load-and-Inspect-Data)
3. [Data Cleaning and Preprocessing](#3.-Data-Cleaning-and-Preprocessing)
4. [Exploratory Data Analysis with Descriptive Statistics](#4.-Exploratory-Data-Analysis-with-Descriptive-Statistics)
   - [4.1 Manager-Level Summary](#4.1-Manager-Level-Summary)
   - [4.2 Panel-Level Descriptive Statistics](#4.2-Panel-Level-Descriptive-Statistics)
5. [Data Visualization](#5.-Data-Visualization)
   - [5.1 Manager Trait Distributions](#5.1-Manager-Trait-Distributions)
   - [5.2 Delegation Behavior by Governance Mode](#5.2-Delegation-Behavior-by-Governance-Mode)
6. [Filtering and Grouping — Time Series Dynamics](#6.-Filtering-and-Grouping-—-Time-Series-Dynamics)
   - [6.1 Delegation Trends Over Periods](#6.1-Delegation-Trends-Over-Periods-(by-Governance-Mode))
   - [6.2 Pre vs. Post Transparency Intervention](#6.2-Pre-vs.-Post-Transparency-Intervention)
   - [6.3 Latent State Transitions Over Time](#6.3-Latent-State-Transitions-Over-Time)
7. [Correlation Analysis](#7.-Correlation-Analysis)

---

**Dataset tables:**
| Table | Description |
|---|---|
| `manager_master` | Manager traits and governance modes |
| `employee_team_master` | Team characteristics |
| `ai_system_master` | AI system configuration |
| `panel_manager_period` | Period-level panel data (main analysis table) |
| `decision_episode` | Episode-level delegation decisions |
| `execution_episode` | Episode-level task execution |

## 1. Import Required Libraries

In [None]:
%pip install matplotlib seaborn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Plot style
sns.set_theme(style="whitegrid", palette="muted", font_scale=1.1)
plt.rcParams["figure.figsize"] = (12, 5)
plt.rcParams["figure.dpi"] = 120

DATA_PATH = Path("../triadic_simulation/data/Triadic_Delegation_Dataset_SYNTH.xlsx")

ModuleNotFoundError: No module named 'matplotlib'

## 2. Load and Inspect Data

Load all six sheets from the synthetic dataset workbook.

In [None]:
# Load all sheets
sheets = [
    "manager_master",
    "employee_team_master",
    "ai_system_master",
    "panel_manager_period",
    "decision_episode",
    "execution_episode",
]

dfs = {name: pd.read_excel(DATA_PATH, sheet_name=name) for name in sheets}

# Quick overview
for name, df in dfs.items():
    print(f"\n{'='*60}")
    print(f"  {name}  |  shape: {df.shape}")
    print(f"{'='*60}")
    print(df.dtypes)
    print(f"\nFirst 3 rows:")
    display(df.head(3))

## 3. Data Cleaning and Preprocessing

Check for missing values and data quality across all tables.

In [None]:
# Check missing values across all tables
for name, df in dfs.items():
    missing = df.isnull().sum()
    total_missing = missing.sum()
    print(f"{name}: {total_missing} missing values")
    if total_missing > 0:
        print(missing[missing > 0])
    print()

# Check for duplicates in key ID columns
print("Duplicate manager IDs:", dfs["manager_master"]["manager_id"].duplicated().sum())
print("Duplicate decision episodes:", dfs["decision_episode"]["decision_episode_id"].duplicated().sum())
print("Duplicate execution episodes:", dfs["execution_episode"]["execution_episode_id"].duplicated().sum())

# Assign convenience references
mgr = dfs["manager_master"]
panel = dfs["panel_manager_period"]
dec = dfs["decision_episode"]
exe = dfs["execution_episode"]

## 4. Exploratory Data Analysis with Descriptive Statistics

### 4.1 Manager-Level Summary

In [None]:
# Governance mode distribution
print("Governance Mode Distribution:")
print(mgr["governance_mode"].value_counts())
print(f"\n% breakdown:\n{mgr['governance_mode'].value_counts(normalize=True).round(3) * 100}")

print("\n" + "="*60)
print("Manager Traits by Governance Mode:")
print("="*60)
mgr.groupby("governance_mode")[["baseline_ai_attitude", "risk_aversion_index"]].describe().round(3)

### 4.2 Panel-Level Descriptive Statistics

Key delegation and performance variables across all manager-period observations.

In [None]:
key_panel_cols = [
    "ai_decision_authority_share",
    "override_rate",
    "escalation_rate",
    "avg_decision_latency",
    "performance_pressure_index",
    "task_complexity_index",
    "demand_volatility",
    "service_level_delta",
    "inventory_cost_delta",
    "expedite_cost_delta",
    "error_incident_count",
]

panel[key_panel_cols].describe().round(4).T

## 5. Data Visualization

### 5.1 Manager Trait Distributions

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Governance mode bar chart
mgr["governance_mode"].value_counts().plot.bar(ax=axes[0], color=sns.color_palette("muted", 3))
axes[0].set_title("Governance Mode Distribution")
axes[0].set_ylabel("Count")
axes[0].tick_params(axis="x", rotation=25)

# Baseline AI attitude histogram by governance mode
for mode in mgr["governance_mode"].unique():
    subset = mgr[mgr["governance_mode"] == mode]
    axes[1].hist(subset["baseline_ai_attitude"], bins=20, alpha=0.5, label=mode)
axes[1].set_title("Baseline AI Attitude by Governance Mode")
axes[1].set_xlabel("Baseline AI Attitude")
axes[1].legend(fontsize=8)

# Risk aversion by governance mode
sns.boxplot(data=mgr, x="governance_mode", y="risk_aversion_index", ax=axes[2])
axes[2].set_title("Risk Aversion by Governance Mode")
axes[2].tick_params(axis="x", rotation=25)

plt.tight_layout()
plt.show()

### 5.2 Delegation Behavior by Governance Mode

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

sns.boxplot(data=panel, x="governance_mode", y="ai_decision_authority_share", ax=axes[0])
axes[0].set_title("AI Decision Authority Share")
axes[0].tick_params(axis="x", rotation=25)

sns.boxplot(data=panel, x="governance_mode", y="override_rate", ax=axes[1])
axes[1].set_title("Override Rate")
axes[1].tick_params(axis="x", rotation=25)

sns.boxplot(data=panel, x="governance_mode", y="escalation_rate", ax=axes[2])
axes[2].set_title("Escalation Rate")
axes[2].tick_params(axis="x", rotation=25)

plt.suptitle("Delegation Behavior by Governance Mode", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

## 6. Filtering and Grouping — Time Series Dynamics

### 6.1 Delegation Trends Over Periods (by Governance Mode)

The vertical dashed line marks the **transparency intervention at period 13**.

In [None]:
# Aggregate by period and governance mode
ts = panel.groupby(["period_id", "governance_mode"]).agg(
    mean_authority=("ai_decision_authority_share", "mean"),
    mean_override=("override_rate", "mean"),
    mean_escalation=("escalation_rate", "mean"),
    mean_latency=("avg_decision_latency", "mean"),
).reset_index()

fig, axes = plt.subplots(2, 2, figsize=(16, 10))

for mode in ts["governance_mode"].unique():
    subset = ts[ts["governance_mode"] == mode]
    axes[0, 0].plot(subset["period_id"], subset["mean_authority"], label=mode, marker="o", markersize=3)
    axes[0, 1].plot(subset["period_id"], subset["mean_override"], label=mode, marker="o", markersize=3)
    axes[1, 0].plot(subset["period_id"], subset["mean_escalation"], label=mode, marker="o", markersize=3)
    axes[1, 1].plot(subset["period_id"], subset["mean_latency"], label=mode, marker="o", markersize=3)

titles = ["AI Decision Authority Share", "Override Rate", "Escalation Rate", "Avg Decision Latency"]
for ax, title in zip(axes.flat, titles):
    ax.axvline(x=13, color="red", linestyle="--", alpha=0.7, label="Transparency shift")
    ax.set_title(title)
    ax.set_xlabel("Period")
    ax.legend(fontsize=8)

plt.suptitle("Delegation Dynamics Over Time by Governance Mode", fontsize=14, y=1.01)
plt.tight_layout()
plt.show()

### 6.2 Pre vs. Post Transparency Intervention

Compare delegation behavior before and after the transparency shift at period 13.

In [None]:
# Create pre/post indicator
panel["phase"] = np.where(panel["period_id"] < 13, "Pre-Transparency", "Post-Transparency")

compare_cols = ["ai_decision_authority_share", "override_rate", "escalation_rate", "avg_decision_latency"]

print("Mean values by phase and governance mode:\n")
comparison = panel.groupby(["phase", "governance_mode"])[compare_cols].mean().round(4)
display(comparison)

# Visualize pre vs. post
fig, axes = plt.subplots(1, 4, figsize=(20, 5))
for i, col in enumerate(compare_cols):
    sns.barplot(data=panel, x="governance_mode", y=col, hue="phase", ax=axes[i], ci=95)
    axes[i].set_title(col.replace("_", " ").title())
    axes[i].tick_params(axis="x", rotation=25)
    if i > 0:
        axes[i].get_legend().remove()

plt.suptitle("Pre vs. Post Transparency Intervention", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

### 6.3 Latent State Transitions Over Time

Distribution of latent willingness-to-delegate states (0=low, 1=medium, 2=high) across periods.

In [None]:
# Latent state distribution by period
state_dist = panel.groupby(["period_id", "latent_state"]).size().unstack(fill_value=0)
state_pct = state_dist.div(state_dist.sum(axis=1), axis=0)

fig, axes = plt.subplots(1, 2, figsize=(16, 5))

# Stacked area chart
state_pct.plot.area(ax=axes[0], alpha=0.7, color=["#e74c3c", "#f39c12", "#2ecc71"])
axes[0].axvline(x=13, color="black", linestyle="--", alpha=0.7, label="Transparency shift")
axes[0].set_title("Latent State Distribution Over Time")
axes[0].set_xlabel("Period")
axes[0].set_ylabel("Share of Managers")
axes[0].legend(title="State", labels=["Low (0)", "Medium (1)", "High (2)", "Shift"])

# Transition matrix: state → state_next
transitions = panel.groupby(["latent_state", "latent_state_next"]).size().unstack(fill_value=0)
trans_pct = transitions.div(transitions.sum(axis=1), axis=0).round(3)

sns.heatmap(trans_pct, annot=True, fmt=".3f", cmap="YlOrRd", ax=axes[1],
            xticklabels=["Low", "Med", "High"], yticklabels=["Low", "Med", "High"])
axes[1].set_title("Empirical State Transition Matrix")
axes[1].set_xlabel("Next State")
axes[1].set_ylabel("Current State")

plt.tight_layout()
plt.show()

## 7. Correlation Analysis

Correlation heatmap of key panel-level variables to identify relationships between delegation behavior, performance, and contextual factors.

In [None]:
corr_cols = [
    "ai_decision_authority_share",
    "override_rate",
    "escalation_rate",
    "avg_decision_latency",
    "performance_pressure_index",
    "target_difficulty",
    "demand_volatility",
    "task_complexity_index",
    "service_level_delta",
    "inventory_cost_delta",
    "expedite_cost_delta",
    "error_incident_count",
    "latent_state",
]

corr_matrix = panel[corr_cols].corr().round(3)

fig, ax = plt.subplots(figsize=(14, 10))
mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
sns.heatmap(
    corr_matrix,
    mask=mask,
    annot=True,
    fmt=".2f",
    cmap="RdBu_r",
    center=0,
    vmin=-1,
    vmax=1,
    ax=ax,
    linewidths=0.5,
    annot_kws={"size": 8},
)
ax.set_title("Correlation Matrix – Key Panel Variables", fontsize=14)
plt.tight_layout()
plt.show()