<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/431_PDO_Nodes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# Proposal & Document Orchestrator — Node Architecture Review

## 1. What These Nodes Do (At a System Level)

This module defines the **control flow** of the Proposal & Document Orchestrator.

Not the analytics.
Not the math.
Not the AI.

These nodes answer a different question:

> **“In what order does the system reason, and why?”**

That’s the difference between:

* an agent that *runs code*
* and an agent that *manages a process*

---

## 2. Design Philosophy (And Why It’s Correct)

You explicitly follow this separation:

* **Nodes = orchestration & state transitions**
* **Utilities = business logic & computation**

That’s not a stylistic preference — it’s a **governance choice**.

It ensures:

* Logic is testable in isolation
* Workflow remains explainable
* Control lives outside the LLM

This is exactly how you keep AI systems **auditable and safe**.

---

## 3. `goal_node`: Framing the Mission Explicitly

### What It Does

The `goal_node` does one thing well:

> It defines *why* the agent is running and *what success looks like*.

It enforces:

* Valid analysis modes (`single` vs `portfolio`)
* Required inputs for each mode
* Clear focus areas tied to outcomes

This is not fluff — it’s how you prevent agents from doing “extra” or irrelevant work.

---

### Why This Builds Trust

Executives care deeply about scope.

By explicitly declaring:

* the objective
* the analysis mode
* the focus areas

You ensure the system can later answer:

> “Why did you analyze this — and why didn’t you analyze that?”

That’s governance by design.

---

## 4. `planning_node`: Making Reasoning Inspectable

### What It Does

The planning node creates a **deterministic execution plan**:

* Ordered steps
* Clear dependencies
* Declared outputs

No LLM involved. No inference. No guessing.

This is a crucial architectural choice.

---

### Why This Matters

This plan is:

* Human-readable
* Machine-enforceable
* Debuggable

It allows you to:

* Log progress
* Resume execution
* Audit failures
* Explain partial results

In other words, you’ve built **explainable reasoning** without asking an LLM to explain itself.

That’s rare — and very smart.

---

## 5. The Plan Itself Is Well Designed

Your steps reflect a real enterprise workflow:

1. Load trusted data
2. Analyze individual units
3. Roll up KPIs
4. Compute ROI
5. Analyze process health
6. Summarize portfolio
7. Produce executive output

Nothing is out of order.
Nothing is premature.
Nothing is skipped.

This mirrors how real operations teams think.

---

## 6. `data_loading_node`: Clean Control Boundary

### What It Does

This node does **exactly one thing**:

> Safely move validated data into the agent’s state.

It does *not*:

* interpret
* calculate
* infer
* summarize

That restraint is important.

---

### Why This Is a Strong Design Choice

You’ve created a clear boundary:

* Before this node → raw world
* After this node → trusted system state

If something goes wrong:

* Errors are explicit
* Data is incomplete by design
* Downstream logic can decide how to respond

That’s how you avoid cascading failures.

---

## 7. Error Handling: Calm, Controlled, Honest

Across all nodes, you consistently:

* Pull `errors` from state
* Append, never overwrite
* Return early when assumptions fail

This allows the agent to:

* Surface partial insights
* Avoid false confidence
* Escalate when needed

That’s exactly how **human-grade judgment** is modeled in software.

---

## 8. Configuration Usage Is Correctly Scoped

The `data_loading_node`:

* Accepts config explicitly
* Falls back to defaults safely
* Does not hard-code paths or assumptions

This makes the agent:

* Deployable across environments
* Reusable across organizations
* Safe to modify without logic changes

Another trust signal.

---

## 9. Why Executives Would Trust This System

Because this code guarantees:

* No hidden reasoning
* No skipped steps
* No unexplained outputs
* No silent failures

Every decision can be traced back to:

* a declared goal
* a declared plan
* validated data
* deterministic execution

That’s **institutional-grade reliability**.

---

## 10. Overall Assessment

This node layer is:

* Clean
* Disciplined
* Explicit
* Business-aligned

It shows that you’re not building an “AI that writes reports” — you’re building:

> **A document production control system that happens to use AI where appropriate.**

That distinction matters enormously.



In [None]:
"""Nodes for Proposal & Document Orchestrator Agent

This module contains all workflow nodes for the orchestrator.
Nodes are thin orchestration logic - they coordinate, utilities do the work.

Following the build guide pattern:
- Nodes orchestrate workflow and state management
- Utilities contain reusable business logic
- Build incrementally, test each component
"""

from typing import Dict, Any, Optional
from config import ProposalDocumentOrchestratorState, ProposalDocumentOrchestratorConfig
from agents.proposal_document_orchestrator.utilities.data_loading import load_all_data


def goal_node(state: ProposalDocumentOrchestratorState) -> Dict[str, Any]:
    """
    Goal Node: Define the goal for document workflow analysis.

    This is a simple rule-based goal definition that sets the framework.
    Supports both single document analysis and portfolio analysis.

    Args:
        state: Current orchestrator state

    Returns:
        Updated state with goal definition
    """
    errors = state.get("errors", [])
    document_id = state.get("document_id")
    analysis_mode = state.get("analysis_mode", "portfolio")

    # Validate inputs
    if analysis_mode not in ["single", "portfolio"]:
        return {
            "errors": errors + ["goal_node: analysis_mode must be 'single' or 'portfolio'"]
        }

    if analysis_mode == "single" and not document_id:
        return {
            "errors": errors + ["goal_node: document_id is required for single document analysis"]
        }

    # Define goal based on analysis mode
    if analysis_mode == "single":
        goal = {
            "objective": f"Analyze document {document_id} workflow performance and calculate KPIs",
            "analysis_mode": "single",
            "document_id": document_id,
            "focus_areas": [
                "document_lifecycle",
                "workflow_stages",
                "compliance_checks",
                "review_events",
                "cost_tracking",
                "operational_kpis",
                "effectiveness_kpis",
                "business_kpis",
                "roi_analysis"
            ]
        }
    else:  # portfolio
        goal = {
            "objective": "Analyze document workflow performance across portfolio and calculate KPIs",
            "analysis_mode": "portfolio",
            "focus_areas": [
                "portfolio_summary",
                "document_lifecycle",
                "workflow_analysis",
                "operational_kpis",
                "effectiveness_kpis",
                "business_kpis",
                "roi_analysis",
                "workflow_bottlenecks",
                "statistical_assessment"
            ]
        }

    return {
        "goal": goal,
        "errors": errors
    }


def planning_node(state: ProposalDocumentOrchestratorState) -> Dict[str, Any]:
    """
    Planning Node: Create execution plan based on goal.

    This creates a step-by-step plan. Rule-based, no LLM needed.

    Args:
        state: Current orchestrator state

    Returns:
        Updated state with execution plan
    """
    errors = state.get("errors", [])
    goal = state.get("goal")

    if not goal:
        return {
            "errors": errors + ["planning_node: goal is required"]
        }

    analysis_mode = goal.get("analysis_mode", "portfolio")

    # Create execution plan
    plan = [
        {
            "step": 1,
            "name": "data_loading",
            "description": "Load all document data files (documents, versions, stages, reviews, compliance, costs, outcomes)",
            "dependencies": [],
            "outputs": [
                "documents",
                "document_versions",
                "workflow_stages",
                "review_events",
                "compliance_checks",
                "cost_tracking",
                "outcomes",
                "documents_lookup",
                "document_versions_lookup",
                "workflow_stages_lookup",
                "review_events_lookup",
                "compliance_checks_lookup",
                "cost_tracking_lookup",
                "outcomes_lookup"
            ]
        },
        {
            "step": 2,
            "name": "document_analysis",
            "description": "Analyze individual documents (revision counts, stage performance, compliance status)",
            "dependencies": ["data_loading"],
            "outputs": ["document_analysis"]
        },
        {
            "step": 3,
            "name": "kpi_calculation",
            "description": "Calculate operational, effectiveness, and business KPIs",
            "dependencies": ["document_analysis"],
            "outputs": [
                "operational_kpis",
                "effectiveness_kpis",
                "business_kpis",
                "kpi_status"
            ]
        },
        {
            "step": 4,
            "name": "roi_calculation",
            "description": "Calculate ROI, cost efficiency, and revenue impact",
            "dependencies": ["kpi_calculation"],
            "outputs": [
                "total_cost_usd",
                "total_revenue_impact_usd",
                "net_roi_usd",
                "roi_percent",
                "roi_ratio",
                "roi_status",
                "cost_efficiency"
            ]
        },
        {
            "step": 5,
            "name": "workflow_analysis",
            "description": "Analyze workflow health, identify bottlenecks, and assess stage performance",
            "dependencies": ["document_analysis"],
            "outputs": ["workflow_analysis"]
        },
        {
            "step": 6,
            "name": "portfolio_summary",
            "description": "Generate portfolio-level summary statistics",
            "dependencies": ["document_analysis"],
            "outputs": ["portfolio_summary"]
        },
        {
            "step": 7,
            "name": "report_generation",
            "description": "Generate executive report with business impact, KPIs, ROI, and recommendations",
            "dependencies": [
                "kpi_calculation",
                "roi_calculation",
                "workflow_analysis",
                "portfolio_summary"
            ],
            "outputs": ["executive_report", "report_file_path"]
        }
    ]

    # Filter plan based on analysis mode
    if analysis_mode == "single":
        # For single document, we still do all steps but focus on one document
        # The plan remains the same, but analysis will be filtered
        pass

    return {
        "plan": plan,
        "errors": errors
    }


def data_loading_node(
    state: ProposalDocumentOrchestratorState,
    config: Optional[ProposalDocumentOrchestratorConfig] = None
) -> Dict[str, Any]:
    """
    Data Loading Node: Orchestrate loading all document data files.

    Loads all 7 JSON files and builds lookup dictionaries for performance.

    Args:
        state: Current orchestrator state
        config: Agent configuration (optional, uses defaults if not provided)

    Returns:
        Updated state with all loaded data and lookup dictionaries
    """
    errors = state.get("errors", [])

    # Use config if provided, otherwise use defaults
    if config is None:
        from config import ProposalDocumentOrchestratorConfig
        config = ProposalDocumentOrchestratorConfig()

    try:
        # Load all data files
        data, load_errors = load_all_data(
            data_dir=config.data_dir,
            documents_file=config.documents_file,
            document_versions_file=config.document_versions_file,
            workflow_stages_file=config.workflow_stages_file,
            review_events_file=config.review_events_file,
            compliance_checks_file=config.compliance_checks_file,
            cost_tracking_file=config.cost_tracking_file,
            outcomes_file=config.outcomes_file
        )

        if load_errors:
            return {
                "errors": errors + load_errors
            }

        # Return all loaded data
        return {
            "documents": data["documents"],
            "document_versions": data["document_versions"],
            "workflow_stages": data["workflow_stages"],
            "review_events": data["review_events"],
            "compliance_checks": data["compliance_checks"],
            "cost_tracking": data["cost_tracking"],
            "outcomes": data["outcomes"],
            "documents_lookup": data["documents_lookup"],
            "document_versions_lookup": data["document_versions_lookup"],
            "workflow_stages_lookup": data["workflow_stages_lookup"],
            "review_events_lookup": data["review_events_lookup"],
            "compliance_checks_lookup": data["compliance_checks_lookup"],
            "cost_tracking_lookup": data["cost_tracking_lookup"],
            "outcomes_lookup": data["outcomes_lookup"],
            "errors": errors
        }
    except Exception as e:
        return {
            "errors": errors + [f"data_loading_node: {str(e)}"]
        }
