<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/341_WDO_DataLoading_Utils.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Workforce Development Orchestrator — Data Loading & Lookup Utilities

This module is responsible for **bringing structured, trusted data into the agent** and preparing it for fast, deterministic analysis. It follows a strict design principle:

> **Utilities implement logic. Nodes orchestrate behavior.**

As a result, this code is:

* Independently testable
* Easy to reason about
* Reusable across agents and workflows

---

## 1. Controlled Data Ingestion

The `load_json_file` function acts as the **single gatekeeper** for all data entering the system.

### What This Function Guarantees

* Data is valid JSON
* Data is consistently returned as a list
* File and parsing errors are surfaced immediately

This prevents subtle downstream failures where:

* A malformed file silently loads partial data
* A single-object JSON breaks list-based logic
* Errors appear deep in the analysis instead of at ingestion

From a business perspective, this ensures the agent never reasons over **unknown or corrupted inputs**.

---

## 2. Explicit Loaders for Each Dataset

Each dataset (employees, roles, tasks, skills, automation signals, etc.) has its own dedicated loader.

This design choice is intentional:

* Filenames are explicit
* Dataset responsibility is clear
* Each loader can be mocked or tested independently

If leadership ever asks:

> “What data does this system depend on?”

The answer is immediate and inspectable.

---

## 3. Lookups as a Performance and Clarity Strategy

Once raw data is loaded, the utilities build **lookup dictionaries** that map IDs to full records.

Examples:

* `role_id → role`
* `task_id → task`
* `employee_id → employee`

This avoids:

* Repeated list scanning
* Implicit joins
* Hidden computational cost

Instead, the agent operates on **O(1) access patterns**, making execution predictable and efficient.

---

## 4. Relationship Maps Make Organizational Structure Explicit

The `build_tasks_by_role` and `build_employees_by_role` utilities encode **organizational structure** directly into state.

This enables:

* Role-level automation analysis
* Role-based reskilling strategies
* Team-level reporting

Importantly, these mappings are **derived**, not hard-coded, which keeps the system adaptable as the organization changes.

---

## 5. Defensive, Transparent Design

Several subtle design choices improve trust and maintainability:

* Missing role IDs are safely ignored rather than crashing execution
* No mutation of input data
* No hidden defaults or inference

The system always operates on **what is explicitly present**.

---

## Why This Design Matters for Trust and ROI

This data layer ensures that:

* All downstream analytics are grounded in verified inputs
* Errors are caught early and explained clearly
* Performance remains stable as data grows
* Changes to data do not require changes to orchestration logic

For leaders, this means:

* Fewer surprises
* Easier audits
* Higher confidence in outputs

For developers, it means:

* Clean boundaries
* Low coupling
* Easy extensibility

---

## Architectural Takeaway

This module quietly does something very important:

It **turns raw files into structured organizational knowledge** that the agent can reason over safely.

That’s the difference between a demo agent and a system designed to support real workforce decisions.



# Data loading utilities for Workforce Development Orchestrator

In [None]:
"""Data loading utilities for Workforce Development Orchestrator

Following the pattern: Utilities implement, nodes orchestrate.
These utilities are independently testable.
"""

import json
from pathlib import Path
from typing import Dict, List, Any, Optional


def load_json_file(file_path: Path) -> List[Dict[str, Any]]:
    """Load JSON data from file"""
    try:
        with open(file_path, 'r') as f:
            data = json.load(f)
        return data if isinstance(data, list) else [data]
    except FileNotFoundError:
        raise FileNotFoundError(f"Data file not found: {file_path}")
    except json.JSONDecodeError as e:
        raise ValueError(f"Invalid JSON in {file_path}: {e}")


def load_employees(data_dir: Path, filename: str = "employees.json") -> List[Dict[str, Any]]:
    """Load employees data"""
    file_path = data_dir / filename
    return load_json_file(file_path)


def load_roles(data_dir: Path, filename: str = "roles.json") -> List[Dict[str, Any]]:
    """Load roles data"""
    file_path = data_dir / filename
    return load_json_file(file_path)


def load_tasks(data_dir: Path, filename: str = "tasks.json") -> List[Dict[str, Any]]:
    """Load tasks data"""
    file_path = data_dir / filename
    return load_json_file(file_path)


def load_skills(data_dir: Path, filename: str = "skills.json") -> List[Dict[str, Any]]:
    """Load skills data"""
    file_path = data_dir / filename
    return load_json_file(file_path)


def load_automation_signals(data_dir: Path, filename: str = "automation_signals.json") -> List[Dict[str, Any]]:
    """Load automation signals data"""
    file_path = data_dir / filename
    return load_json_file(file_path)


def load_skill_gaps(data_dir: Path, filename: str = "skills_gaps.json") -> List[Dict[str, Any]]:
    """Load skill gaps data"""
    file_path = data_dir / filename
    return load_json_file(file_path)


def load_learning_paths(data_dir: Path, filename: str = "learning_paths.json") -> List[Dict[str, Any]]:
    """Load learning paths data"""
    file_path = data_dir / filename
    return load_json_file(file_path)


def load_role_evolution(data_dir: Path, filename: str = "role_evolution.json") -> List[Dict[str, Any]]:
    """Load role evolution data"""
    file_path = data_dir / filename
    return load_json_file(file_path)


def build_roles_lookup(roles: List[Dict[str, Any]]) -> Dict[str, Dict[str, Any]]:
    """Build lookup dictionary: role_id -> role dict"""
    return {role["role_id"]: role for role in roles}


def build_tasks_lookup(tasks: List[Dict[str, Any]]) -> Dict[str, Dict[str, Any]]:
    """Build lookup dictionary: task_id -> task dict"""
    return {task["task_id"]: task for task in tasks}


def build_skills_lookup(skills: List[Dict[str, Any]]) -> Dict[str, Dict[str, Any]]:
    """Build lookup dictionary: skill_id -> skill dict"""
    return {skill["skill_id"]: skill for skill in skills}


def build_employees_lookup(employees: List[Dict[str, Any]]) -> Dict[str, Dict[str, Any]]:
    """Build lookup dictionary: employee_id -> employee dict"""
    return {emp["employee_id"]: emp for emp in employees}


def build_tasks_by_role(tasks: List[Dict[str, Any]]) -> Dict[str, List[Dict[str, Any]]]:
    """Build lookup dictionary: role_id -> list of tasks"""
    tasks_by_role: Dict[str, List[Dict[str, Any]]] = {}
    for task in tasks:
        role_id = task.get("role_id")
        if role_id:
            if role_id not in tasks_by_role:
                tasks_by_role[role_id] = []
            tasks_by_role[role_id].append(task)
    return tasks_by_role


def build_employees_by_role(employees: List[Dict[str, Any]]) -> Dict[str, List[Dict[str, Any]]]:
    """Build lookup dictionary: role_id -> list of employees"""
    employees_by_role: Dict[str, List[Dict[str, Any]]] = {}
    for emp in employees:
        role_id = emp.get("role_id")
        if role_id:
            if role_id not in employees_by_role:
                employees_by_role[role_id] = []
            employees_by_role[role_id].append(emp)
    return employees_by_role



# Test data loading utilities for Workforce Development Orchestrator

In [None]:
"""Test data loading utilities for Workforce Development Orchestrator

Testing Phase 2: Data Loading Utilities
Following the pattern: Test utilities before building nodes
"""

from pathlib import Path
from agents.workforce_development_orchestrator.utilities.data_loading import (
    load_employees,
    load_roles,
    load_tasks,
    load_skills,
    load_automation_signals,
    load_skill_gaps,
    load_learning_paths,
    load_role_evolution,
    build_roles_lookup,
    build_tasks_lookup,
    build_skills_lookup,
    build_employees_lookup,
    build_tasks_by_role,
    build_employees_by_role
)


def test_load_employees():
    """Test loading employees data"""
    data_dir = Path("agents/data")
    employees = load_employees(data_dir)

    assert len(employees) > 0
    assert "employee_id" in employees[0]
    assert "name" in employees[0]
    assert "role_id" in employees[0]

    print("✅ test_load_employees: PASSED")


def test_load_roles():
    """Test loading roles data"""
    data_dir = Path("agents/data")
    roles = load_roles(data_dir)

    assert len(roles) > 0
    assert "role_id" in roles[0]
    assert "role_name" in roles[0]
    assert "required_skills" in roles[0]

    print("✅ test_load_roles: PASSED")


def test_load_tasks():
    """Test loading tasks data"""
    data_dir = Path("agents/data")
    tasks = load_tasks(data_dir)

    assert len(tasks) > 0
    assert "task_id" in tasks[0]
    assert "task_name" in tasks[0]
    assert "automation_risk_score" in tasks[0]

    print("✅ test_load_tasks: PASSED")


def test_load_skills():
    """Test loading skills data"""
    data_dir = Path("agents/data")
    skills = load_skills(data_dir)

    assert len(skills) > 0
    assert "skill_id" in skills[0]
    assert "skill_name" in skills[0]
    assert "skill_type" in skills[0]

    print("✅ test_load_skills: PASSED")


def test_load_all_data_files():
    """Test loading all data files"""
    data_dir = Path("agents/data")

    employees = load_employees(data_dir)
    roles = load_roles(data_dir)
    tasks = load_tasks(data_dir)
    skills = load_skills(data_dir)
    automation_signals = load_automation_signals(data_dir)
    skill_gaps = load_skill_gaps(data_dir)
    learning_paths = load_learning_paths(data_dir)
    role_evolution = load_role_evolution(data_dir)

    assert len(employees) == 10
    assert len(roles) == 5
    assert len(tasks) == 15
    assert len(skills) == 12

    print("✅ test_load_all_data_files: PASSED")


def test_build_roles_lookup():
    """Test building roles lookup"""
    data_dir = Path("agents/data")
    roles = load_roles(data_dir)
    lookup = build_roles_lookup(roles)

    assert "R001" in lookup
    assert lookup["R001"]["role_name"] == "HR Coordinator"

    print("✅ test_build_roles_lookup: PASSED")


def test_build_tasks_lookup():
    """Test building tasks lookup"""
    data_dir = Path("agents/data")
    tasks = load_tasks(data_dir)
    lookup = build_tasks_lookup(tasks)

    assert "T001" in lookup
    assert lookup["T001"]["task_name"] == "Maintain employee records"

    print("✅ test_build_tasks_lookup: PASSED")


def test_build_skills_lookup():
    """Test building skills lookup"""
    data_dir = Path("agents/data")
    skills = load_skills(data_dir)
    lookup = build_skills_lookup(skills)

    assert "data_entry" in lookup
    assert lookup["data_entry"]["skill_name"] == "Data Entry"

    print("✅ test_build_skills_lookup: PASSED")


def test_build_employees_lookup():
    """Test building employees lookup"""
    data_dir = Path("agents/data")
    employees = load_employees(data_dir)
    lookup = build_employees_lookup(employees)

    assert "E001" in lookup
    assert lookup["E001"]["name"] == "Sarah Chen"

    print("✅ test_build_employees_lookup: PASSED")


def test_build_tasks_by_role():
    """Test building tasks by role"""
    data_dir = Path("agents/data")
    tasks = load_tasks(data_dir)
    tasks_by_role = build_tasks_by_role(tasks)

    assert "R001" in tasks_by_role
    assert len(tasks_by_role["R001"]) == 3  # R001 has 3 tasks

    print("✅ test_build_tasks_by_role: PASSED")


def test_build_employees_by_role():
    """Test building employees by role"""
    data_dir = Path("agents/data")
    employees = load_employees(data_dir)
    employees_by_role = build_employees_by_role(employees)

    assert "R001" in employees_by_role
    assert len(employees_by_role["R001"]) == 2  # R001 has 2 employees

    print("✅ test_build_employees_by_role: PASSED")


if __name__ == "__main__":
    print("=" * 60)
    print("Testing Data Loading Utilities (Phase 2)")
    print("=" * 60)
    print()

    test_load_employees()
    test_load_roles()
    test_load_tasks()
    test_load_skills()
    test_load_all_data_files()
    test_build_roles_lookup()
    test_build_tasks_lookup()
    test_build_skills_lookup()
    test_build_employees_lookup()
    test_build_tasks_by_role()
    test_build_employees_by_role()

    print()
    print("=" * 60)
    print("✅ All data loading utility tests passed!")
    print("=" * 60)



In [None]:
(.venv) micahshull@Micahs-iMac AI_AGENTS_008_Workforce_Development_Orchestrator % python3 test_data_loading_utilities.py
============================================================
Testing Data Loading Utilities (Phase 2)
============================================================

✅ test_load_employees: PASSED
✅ test_load_roles: PASSED
✅ test_load_tasks: PASSED
✅ test_load_skills: PASSED
✅ test_load_all_data_files: PASSED
✅ test_build_roles_lookup: PASSED
✅ test_build_tasks_lookup: PASSED
✅ test_build_skills_lookup: PASSED
✅ test_build_employees_lookup: PASSED
✅ test_build_tasks_by_role: PASSED
✅ test_build_employees_by_role: PASSED

============================================================
✅ All data loading utility tests passed!
============================================================


# Workforce Development Orchestrator — Data Loading Node Code

In [None]:
def data_loading_node(
    state: WorkforceDevelopmentOrchestratorState,
    config: WorkforceDevelopmentOrchestratorConfig
) -> Dict[str, Any]:
    """
    Data Loading Node: Orchestrate loading all workforce data.

    Loads employees, roles, tasks, skills, and related data from JSON files,
    then builds lookup dictionaries for fast access.
    """
    errors = state.get("errors", [])
    data_dir = Path(config.data_dir)

    if not data_dir.exists():
        return {
            "errors": errors + [f"data_loading_node: Data directory not found: {data_dir}"]
        }

    try:
        # Load all data files
        employees = load_employees(data_dir, config.employees_file)
        roles = load_roles(data_dir, config.roles_file)
        tasks = load_tasks(data_dir, config.tasks_file)
        skills = load_skills(data_dir, config.skills_file)
        automation_signals = load_automation_signals(data_dir, config.automation_signals_file)
        skill_gaps = load_skill_gaps(data_dir, config.skill_gaps_file)
        learning_paths = load_learning_paths(data_dir, config.learning_paths_file)
        role_evolution = load_role_evolution(data_dir, config.role_evolution_file)

        # Build lookup dictionaries for fast access
        roles_lookup = build_roles_lookup(roles)
        tasks_lookup = build_tasks_lookup(tasks)
        skills_lookup = build_skills_lookup(skills)
        employees_lookup = build_employees_lookup(employees)
        tasks_by_role = build_tasks_by_role(tasks)
        employees_by_role = build_employees_by_role(employees)

        return {
            "employees": employees,
            "roles": roles,
            "tasks": tasks,
            "skills": skills,
            "automation_signals": automation_signals,
            "skill_gaps": skill_gaps,
            "learning_paths": learning_paths,
            "role_evolution": role_evolution,
            "roles_lookup": roles_lookup,
            "tasks_lookup": tasks_lookup,
            "skills_lookup": skills_lookup,
            "employees_lookup": employees_lookup,
            "tasks_by_role": tasks_by_role,
            "employees_by_role": employees_by_role,
            "errors": errors
        }
    except FileNotFoundError as e:
        return {
            "errors": errors + [f"data_loading_node: {str(e)}"]
        }
    except ValueError as e:
        return {
            "errors": errors + [f"data_loading_node: {str(e)}"]
        }
    except Exception as e:
        return {
            "errors": errors + [f"data_loading_node: Unexpected error: {str(e)}"]
        }



# Workforce Development Orchestrator — Data Loading Node

The `data_loading_node` is the **foundation node** of the Workforce Development Orchestrator. Its responsibility is simple but critical:

> **Safely transform raw workforce data into a structured, reliable state the agent can trust.**

Every downstream analysis — automation risk, skill gaps, learning paths, role evolution — depends on this step being correct, complete, and auditable.

---

## 1. Why This Node Exists

Rather than allowing each analytical component to load data independently, this node centralizes **all data ingestion** into a single, controlled step.

This design:

* Eliminates duplication
* Prevents inconsistent views of data
* Creates a single source of truth for the entire run

If something goes wrong here, the system fails early — before misleading insights are generated.

---

## 2. Configuration-Driven, Not Hard-Coded

The node relies entirely on the `WorkforceDevelopmentOrchestratorConfig` for file locations and names.

This ensures:

* Environments can change without code changes
* Data sources can be swapped safely
* Executives can understand *where the data comes from*

This is a small but powerful step toward operational trust.

---

## 3. Explicit Data Coverage

The node loads **every dataset required by the agent** in one place:

* Employees
* Roles
* Tasks
* Skills
* Automation signals
* Skill gaps
* Learning paths
* Role evolution scenarios

Nothing is implicit. Nothing is assumed.

This makes it immediately clear:

* What data the agent depends on
* What would break if a dataset were missing
* What needs to be updated as the organization evolves

---

## 4. Performance and Determinism Through Lookups

After loading raw data, the node builds **lookup and relationship maps**:

* Direct ID lookups (roles, tasks, skills, employees)
* Role-to-task mappings
* Role-to-employee mappings

These structures are deliberately created *once* and reused everywhere else.

From a business standpoint, this ensures:

* Predictable execution time
* No hidden computation costs
* No inconsistent joins during analysis

---

## 5. Defensive Error Handling

The node treats errors as first-class outputs.

### Key behaviors:

* Missing data directories are detected immediately
* File and parsing errors are captured and contextualized
* Unexpected failures are surfaced without crashing the agent

Importantly, errors are **accumulated**, not overwritten. This allows:

* Partial execution where appropriate
* Clear diagnostics at the end of a run
* Easier debugging and post-mortems

This is critical for long-running or enterprise-facing agents.

---

## 6. Thin Node, Strong Utilities

Notice what this node does *not* do:

* It does not parse JSON directly
* It does not build data structures inline
* It does not embed business logic

Instead, it delegates to utilities that are:

* Independently tested
* Reusable
* Easier to reason about

The fact that all utilities passed tests before this node was built is exactly the right workflow. This keeps orchestration clean and logic trustworthy.

---

## Why This Node Builds Confidence

From an executive or stakeholder perspective, this node guarantees:

* The agent reasons only over verified inputs
* Failures are visible and explainable
* Outputs are grounded in consistent data
* Changes to data are controlled and auditable

From a system design perspective, this node establishes a **stable base layer** that allows every other component to focus on *analysis*, not *data hygiene*.

---

## Architectural Takeaway

This node does something deceptively important:

It draws a clear line between **data reality** and **agent intelligence**.

Everything beyond this point is interpretation and recommendation — but only because this node ensures the underlying facts are solid.

