<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/342_WDO_DataLoading_Node.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# Workforce Development Orchestrator — Data Loading Node

The `data_loading_node` is the **foundation node** of the Workforce Development Orchestrator. Its responsibility is simple but critical:

> **Safely transform raw workforce data into a structured, reliable state the agent can trust.**

Every downstream analysis — automation risk, skill gaps, learning paths, role evolution — depends on this step being correct, complete, and auditable.

---

## 1. Why This Node Exists

Rather than allowing each analytical component to load data independently, this node centralizes **all data ingestion** into a single, controlled step.

This design:

* Eliminates duplication
* Prevents inconsistent views of data
* Creates a single source of truth for the entire run

If something goes wrong here, the system fails early — before misleading insights are generated.

---

## 2. Configuration-Driven, Not Hard-Coded

The node relies entirely on the `WorkforceDevelopmentOrchestratorConfig` for file locations and names.

This ensures:

* Environments can change without code changes
* Data sources can be swapped safely
* Executives can understand *where the data comes from*

This is a small but powerful step toward operational trust.

---

## 3. Explicit Data Coverage

The node loads **every dataset required by the agent** in one place:

* Employees
* Roles
* Tasks
* Skills
* Automation signals
* Skill gaps
* Learning paths
* Role evolution scenarios

Nothing is implicit. Nothing is assumed.

This makes it immediately clear:

* What data the agent depends on
* What would break if a dataset were missing
* What needs to be updated as the organization evolves

---

## 4. Performance and Determinism Through Lookups

After loading raw data, the node builds **lookup and relationship maps**:

* Direct ID lookups (roles, tasks, skills, employees)
* Role-to-task mappings
* Role-to-employee mappings

These structures are deliberately created *once* and reused everywhere else.

From a business standpoint, this ensures:

* Predictable execution time
* No hidden computation costs
* No inconsistent joins during analysis

---

## 5. Defensive Error Handling

The node treats errors as first-class outputs.

### Key behaviors:

* Missing data directories are detected immediately
* File and parsing errors are captured and contextualized
* Unexpected failures are surfaced without crashing the agent

Importantly, errors are **accumulated**, not overwritten. This allows:

* Partial execution where appropriate
* Clear diagnostics at the end of a run
* Easier debugging and post-mortems

This is critical for long-running or enterprise-facing agents.

---

## 6. Thin Node, Strong Utilities

Notice what this node does *not* do:

* It does not parse JSON directly
* It does not build data structures inline
* It does not embed business logic

Instead, it delegates to utilities that are:

* Independently tested
* Reusable
* Easier to reason about

The fact that all utilities passed tests before this node was built is exactly the right workflow. This keeps orchestration clean and logic trustworthy.

---

## Why This Node Builds Confidence

From an executive or stakeholder perspective, this node guarantees:

* The agent reasons only over verified inputs
* Failures are visible and explainable
* Outputs are grounded in consistent data
* Changes to data are controlled and auditable

From a system design perspective, this node establishes a **stable base layer** that allows every other component to focus on *analysis*, not *data hygiene*.

---

## Architectural Takeaway

This node does something deceptively important:

It draws a clear line between **data reality** and **agent intelligence**.

Everything beyond this point is interpretation and recommendation — but only because this node ensures the underlying facts are solid.



# Workforce Development Orchestrator — Data Loading Node Code

In [None]:
def data_loading_node(
    state: WorkforceDevelopmentOrchestratorState,
    config: WorkforceDevelopmentOrchestratorConfig
) -> Dict[str, Any]:
    """
    Data Loading Node: Orchestrate loading all workforce data.

    Loads employees, roles, tasks, skills, and related data from JSON files,
    then builds lookup dictionaries for fast access.
    """
    errors = state.get("errors", [])
    data_dir = Path(config.data_dir)

    if not data_dir.exists():
        return {
            "errors": errors + [f"data_loading_node: Data directory not found: {data_dir}"]
        }

    try:
        # Load all data files
        employees = load_employees(data_dir, config.employees_file)
        roles = load_roles(data_dir, config.roles_file)
        tasks = load_tasks(data_dir, config.tasks_file)
        skills = load_skills(data_dir, config.skills_file)
        automation_signals = load_automation_signals(data_dir, config.automation_signals_file)
        skill_gaps = load_skill_gaps(data_dir, config.skill_gaps_file)
        learning_paths = load_learning_paths(data_dir, config.learning_paths_file)
        role_evolution = load_role_evolution(data_dir, config.role_evolution_file)

        # Build lookup dictionaries for fast access
        roles_lookup = build_roles_lookup(roles)
        tasks_lookup = build_tasks_lookup(tasks)
        skills_lookup = build_skills_lookup(skills)
        employees_lookup = build_employees_lookup(employees)
        tasks_by_role = build_tasks_by_role(tasks)
        employees_by_role = build_employees_by_role(employees)

        return {
            "employees": employees,
            "roles": roles,
            "tasks": tasks,
            "skills": skills,
            "automation_signals": automation_signals,
            "skill_gaps": skill_gaps,
            "learning_paths": learning_paths,
            "role_evolution": role_evolution,
            "roles_lookup": roles_lookup,
            "tasks_lookup": tasks_lookup,
            "skills_lookup": skills_lookup,
            "employees_lookup": employees_lookup,
            "tasks_by_role": tasks_by_role,
            "employees_by_role": employees_by_role,
            "errors": errors
        }
    except FileNotFoundError as e:
        return {
            "errors": errors + [f"data_loading_node: {str(e)}"]
        }
    except ValueError as e:
        return {
            "errors": errors + [f"data_loading_node: {str(e)}"]
        }
    except Exception as e:
        return {
            "errors": errors + [f"data_loading_node: Unexpected error: {str(e)}"]
        }

# Test data loading node for Workforce Development Orchestrator

In [None]:
"""Test data loading node for Workforce Development Orchestrator

Testing Phase 2: Data Loading Node
Following the pattern: Test node after utilities pass
"""

from pathlib import Path
from agents.workforce_development_orchestrator.nodes import data_loading_node
from config import WorkforceDevelopmentOrchestratorState, WorkforceDevelopmentOrchestratorConfig


def test_data_loading_node():
    """Test data loading node loads all data and builds lookups"""
    state: WorkforceDevelopmentOrchestratorState = {
        "errors": []
    }
    config = WorkforceDevelopmentOrchestratorConfig()

    result = data_loading_node(state, config)

    # Check that all data is loaded
    assert "employees" in result
    assert "roles" in result
    assert "tasks" in result
    assert "skills" in result
    assert "automation_signals" in result
    assert "skill_gaps" in result
    assert "learning_paths" in result
    assert "role_evolution" in result

    # Check that lookups are built
    assert "roles_lookup" in result
    assert "tasks_lookup" in result
    assert "skills_lookup" in result
    assert "employees_lookup" in result
    assert "tasks_by_role" in result
    assert "employees_by_role" in result

    # Verify data counts
    assert len(result["employees"]) == 10
    assert len(result["roles"]) == 5
    assert len(result["tasks"]) == 15
    assert len(result["skills"]) == 12

    # Verify lookups work
    assert "R001" in result["roles_lookup"]
    assert "T001" in result["tasks_lookup"]
    assert "data_entry" in result["skills_lookup"]
    assert "E001" in result["employees_lookup"]
    assert "R001" in result["tasks_by_role"]
    assert "R001" in result["employees_by_role"]

    # Verify no errors
    assert len(result.get("errors", [])) == 0

    print("✅ test_data_loading_node: PASSED")


def test_data_loading_node_with_errors():
    """Test data loading node handles missing directory gracefully"""
    state: WorkforceDevelopmentOrchestratorState = {
        "errors": []
    }
    config = WorkforceDevelopmentOrchestratorConfig()
    config.data_dir = "nonexistent_directory"

    result = data_loading_node(state, config)

    # Should have errors
    assert "errors" in result
    assert len(result["errors"]) > 0
    assert "not found" in result["errors"][0].lower()

    print("✅ test_data_loading_node_with_errors: PASSED")


def test_data_loading_node_integration():
    """Test data loading node with goal and planning nodes"""
    state: WorkforceDevelopmentOrchestratorState = {
        "employee_id": None,
        "errors": []
    }
    config = WorkforceDevelopmentOrchestratorConfig()

    # First goal node
    from agents.workforce_development_orchestrator.nodes import goal_node, planning_node
    state = goal_node(state)
    assert "goal" in state

    # Then planning node
    state = planning_node(state)
    assert "plan" in state

    # Then data loading node
    state = data_loading_node(state, config)

    # Verify all data loaded
    assert "employees" in state
    assert "roles_lookup" in state
    assert len(state.get("errors", [])) == 0

    print("✅ test_data_loading_node_integration: PASSED")


if __name__ == "__main__":
    print("=" * 60)
    print("Testing Data Loading Node (Phase 2)")
    print("=" * 60)
    print()

    test_data_loading_node()
    test_data_loading_node_with_errors()
    test_data_loading_node_integration()

    print()
    print("=" * 60)
    print("✅ All data loading node tests passed!")
    print("=" * 60)



# Test Results

In [None]:
(.venv) micahshull@Micahs-iMac AI_AGENTS_008_Workforce_Development_Orchestrator %  python3 test_data_loading_node.py
============================================================
Testing Data Loading Node (Phase 2)
============================================================

✅ test_data_loading_node: PASSED
✅ test_data_loading_node_with_errors: PASSED
✅ test_data_loading_node_integration: PASSED

============================================================
✅ All data loading node tests passed!
============================================================
