<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/540_EaaS_v2_gaolPlanning_nodes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# Goal & Planning Nodes — Why This Matters

These two nodes define the **governance layer** of the EaaS Orchestrator.

Before a single scenario is executed or a single score is computed, the system explicitly answers two questions:

1. **What are we trying to evaluate, and why?**
2. **How will we go about doing it, step by step?**

That ordering is not accidental — and it’s one of the biggest differences between this system and most AI agents in production today.

---

## The Goal Node: Making Evaluation Intent Explicit

### What this node does in real-world terms

The `goal_node` turns a loosely defined idea like *“let’s test the agent”* into a **formal evaluation contract**.

It explicitly defines:

* the scope of the evaluation
* the metrics that matter
* what success looks like
* and what thresholds determine pass or fail

This means the system never evaluates “just because.”
It evaluates **against a declared objective**.

---

### Why this matters to executives and managers

Executives are often uncomfortable with AI systems because they can’t tell:

* what the system is optimizing for
* whether success criteria changed silently
* or why a result was considered acceptable

This node eliminates that ambiguity.

By encoding:

* evaluation type
* metric definitions
* success thresholds

leaders can see — in plain data — **what standards are being applied**.

That’s deeply reassuring, especially in regulated or customer-facing environments.

---

### Why this is different from most agents today

Most agents:

* implicitly define goals inside prompts
* mix execution logic with evaluation logic
* change behavior without changing declared objectives

Here, the goal is:

* structured
* inspectable
* reproducible
* versionable

That’s governance, not guesswork.

---

## The Planning Node: Turning Intent Into a Controlled Workflow

### What this node does in practical terms

The `planning_node` takes the declared goal and converts it into a **deterministic execution plan**.

This plan answers:

* what steps will run
* in what order
* with what dependencies
* and what artifacts each step produces

This is the difference between:

> “The agent ran and something happened”

and:

> “The system executed a known workflow and produced known outputs.”

---

### Why leaders are relieved to see this

From a business perspective, this plan:

* reduces operational risk
* prevents hidden behavior
* supports auditability

If something goes wrong, teams can trace:

* which step ran
* what inputs it used
* what outputs it produced

This is how **enterprise systems** are designed — not experimental demos.

---

### Why this is rare in agentic systems

Most AI agents today:

* act in an event-driven or reactive way
* do not expose a formal execution plan
* rely on implicit sequencing inside code or prompts

This makes failures:

* hard to debug
* hard to explain
* hard to trust

Your approach makes the workflow **explicit and reviewable**, which is exactly what stakeholders want when AI starts influencing real decisions.

---

## Rule-Based Planning: A Strategic Choice

One of the most important design decisions here is what you *didn’t* do.

You deliberately avoided:

* LLM-based planning
* dynamic goal reinterpretation
* opaque reasoning chains

Instead, you chose:

* fixed steps
* clear dependencies
* predictable outputs

This shows maturity.

It signals:

> “We optimize for reliability first, intelligence second.”

That’s how trust is earned.

---

## How These Nodes Support ROI and Accountability

Together, these nodes ensure that:

* Every evaluation has a **declared purpose**
* Every workflow follows a **known structure**
* Every output can be tied back to a **specific step**
* Every success or failure can be explained

This directly supports:

* release gating
* regression detection
* performance reporting
* executive oversight

In short, they make AI behavior **manageable at scale**.

---

## The Bigger Architectural Pattern

These nodes reinforce a broader philosophy that runs through your entire system:

> **AI should execute within boundaries that humans define — not define its own boundaries.**

The goal defines *what matters*.
The plan defines *how it will be measured*.
Everything else follows.

---

## Executive Takeaway

What leaders see here is not “AI automation.”

They see:

* intent before action
* structure before execution
* policy before intelligence

That’s why this design feels safe, professional, and deployable — and why it stands apart from most agent systems in production today.



In [None]:
"""
EaaS Orchestrator Nodes

Nodes orchestrate the evaluation workflow. Utilities handle the actual work.
"""

from typing import Dict, Any, List
from config import EvalAsServiceOrchestratorState


def goal_node(state: EvalAsServiceOrchestratorState) -> Dict[str, Any]:
    """
    Goal Node: Define the evaluation goal.

    Sets up the framework for evaluating AI agents by defining:
    - What we're evaluating (scenarios, agents)
    - What metrics we're tracking
    - What success looks like
    """
    scenario_id = state.get("scenario_id")
    target_agent_id = state.get("target_agent_id")
    errors = state.get("errors", [])

    # Build goal definition
    goal = {
        "objective": "Evaluate AI agent performance using test scenarios",
        "evaluation_type": "comprehensive",  # or "targeted" if scenario_id/agent_id specified
        "scope": {
            "scenario_id": scenario_id,  # None = evaluate all scenarios
            "target_agent_id": target_agent_id,  # None = evaluate all agents
        },
        "metrics": [
            "correctness_score",  # Does output match expected?
            "response_time_score",  # Is response time acceptable?
            "output_quality_score",  # Is structure/format correct?
            "overall_score"  # Weighted combination
        ],
        "success_criteria": {
            "pass_threshold": 0.80,  # Minimum score to pass (from config)
            "target_pass_rate": 0.90  # Target overall pass rate
        }
    }

    return {
        "goal": goal,
        "errors": errors
    }


def planning_node(state: EvalAsServiceOrchestratorState) -> Dict[str, Any]:
    """
    Planning Node: Create execution plan based on goal.

    Creates a step-by-step plan for the evaluation workflow.
    Rule-based, no LLM needed for MVP.
    """
    goal = state.get("goal")
    errors = state.get("errors", [])

    if not goal:
        return {
            "errors": errors + ["planning_node: goal is required"]
        }

    # Create execution plan
    plan = [
        {
            "step": 1,
            "name": "data_loading",
            "description": "Load test scenarios, specialist agents, and supporting data",
            "dependencies": [],
            "outputs": [
                "journey_scenarios",
                "specialist_agents",
                "supporting_data",
                "decision_rules"
            ]
        },
        {
            "step": 2,
            "name": "evaluation_execution",
            "description": "Execute test scenarios through target agents",
            "dependencies": ["data_loading"],
            "outputs": [
                "executed_evaluations"
            ]
        },
        {
            "step": 3,
            "name": "scoring_analysis",
            "description": "Score evaluations and analyze performance",
            "dependencies": ["evaluation_execution"],
            "outputs": [
                "evaluation_scores",
                "agent_performance_summary",
                "evaluation_summary"
            ]
        },
        {
            "step": 4,
            "name": "report_generation",
            "description": "Generate comprehensive evaluation report",
            "dependencies": ["scoring_analysis"],
            "outputs": [
                "evaluation_report",
                "report_file_path"
            ]
        }
    ]

    return {
        "plan": plan,
        "errors": errors
    }
