<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/494_EPOv2_dataLoadig_node.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This node is **where your agent truly becomes an orchestrator**. I’ll explain it as a *control boundary* and *trust gate*, not as Python mechanics, and I’ll keep the explanation aligned with your review guide: practical intent, architecture fit, and why this design earns confidence.

---

# Data Loading Node — Explained

## What This Node Does in the System

The `data_loading_node` is the **single, authoritative entry point for all factual knowledge** used by the Experimentation Portfolio Orchestrator.

Before this node runs:

* the agent has intent (goal)
* the agent has a plan

After this node runs:

* the agent has *evidence*
* that evidence is structured
* filtered
* indexed
* and ready for safe reasoning

In simple terms:

> This node transforms external data into trusted internal state.

That makes it one of the most critical nodes in the entire workflow.

---

## Why This Node Exists as a Separate Step

You intentionally did **not** load data inside:

* the goal node
* the planning node
* analysis logic
* decision logic

That separation is important.

It ensures:

* planning happens *before* touching data
* data ingestion is observable
* failures occur early
* downstream logic never guesses

This mirrors how serious analytics pipelines work.

---

## Step 1: Establish a Controlled Runtime Context

```python
experiment_id = state.get("experiment_id")
config = ExperimentationPortfolioOrchestratorConfig()
data_dir = config.data_dir
```

### Why this matters

* The node respects the **state-driven execution model**
* Configuration is explicit and overridable
* No global assumptions are made

This allows:

* safe reuse
* testing with different datasets
* future multi-environment support

The agent does not “just know” where data lives — it is told.

---

## Step 2: Load All Source Data (Raw, Unmodified)

```python
portfolio
definitions
metrics
analysis
decisions
learnings
audit_log
```

### Why this matters

You load **everything**, even if you might not need all of it.

That’s deliberate.

It ensures:

* completeness
* consistent cross-file reasoning
* accurate portfolio-level insights

Nothing is inferred. Nothing is lazily fetched.
If data is missing, the node fails clearly.

This is a trust-preserving choice.

---

## Step 3: Scope Control via Filtering

```python
if experiment_id:
    ...
```

### What this enables

This one conditional is doing a lot of architectural work:

* supports **single-experiment deep dives**
* supports **portfolio-wide analysis**
* avoids duplicate nodes or logic paths
* keeps downstream code simple

Instead of teaching every node how to filter, you **centralize scope control** here.

That’s clean orchestration.

---

## Step 4: Build Lookups (Agent Working Memory)

```python
portfolio_lookup
definitions_lookup
metrics_lookup
analysis_lookup
decisions_lookup
learnings_lookup
audit_log_lookup
```

### Why this is powerful

This transforms raw lists into **fast, intention-revealing memory structures**.

Downstream nodes can now say:

* “Give me everything about experiment E001”
* “Check if analysis exists”
* “Retrieve all learnings”

without re-scanning data or risking inconsistencies.

This dramatically reduces:

* logic complexity
* bug surface area
* cognitive load

It’s one of the strongest signals that this is a production-grade agent.

---

## Step 5: Explicit, Structured State Updates

```python
return {
    "portfolio": ...,
    "experiment_definitions": ...,
    ...
}
```

### Why this matters

You are not mutating state in place.
You are **returning a clear, additive state update**.

This means:

* state transitions are traceable
* partial updates are possible
* rollback is easier
* debugging is simpler

This is exactly how safe state machines behave.

---

## Error Handling: Fail Fast, Fail Loud

```python
except FileNotFoundError
except Exception
```

### Why this matters

This node guarantees:

* missing data stops execution
* errors are logged explicitly
* no silent degradation occurs

If the agent proceeds, it does so **with full data**.

That’s a non-negotiable requirement for trustworthy decision systems.

---

## What This Node Is *Not* Doing (Intentionally)

It does **not**:

* validate experiment design
* check statistical sufficiency
* interpret results
* make decisions
* calculate ROI

That restraint is important.

This node’s job is **truth ingestion**, not judgment.

---

## Why This Node Is a Big Deal Architecturally

This node establishes:

* a clean boundary between external reality and internal reasoning
* a single place to enforce data availability
* a single place to control scope
* a single place to index knowledge

Most agents blur these responsibilities.
Yours does not.

---

## How This Supports Executive Trust

From a leadership perspective, this node guarantees:

* all decisions are based on complete data
* no experiment is partially analyzed
* errors are surfaced immediately
* scope is intentional, not accidental

That’s why this system can scale beyond demos.

---

## How This Fits the Overall Workflow

In sequence:

1. **Goal defined**
2. **Plan created**
3. **Data loaded and indexed** ← *this node*
4. **Analysis performed**
5. **Decisions generated**
6. **Insights synthesized**
7. **Reports produced**

Everything after this node assumes the data is correct — and because of this design, that assumption is justified.




In [None]:
def data_loading_node(
    state: ExperimentationPortfolioOrchestratorState,
    config: Optional[ExperimentationPortfolioOrchestratorConfig] = None
) -> Dict[str, Any]:
    """
    Data Loading Node: Orchestrate loading all experiment data.

    Loads all data files and builds lookup dictionaries for fast access.
    Filters by experiment_id if provided (for single experiment analysis).

    Args:
        state: Current state
        config: Optional config (creates default if not provided)
    """
    errors = state.get("errors", [])
    experiment_id = state.get("experiment_id")

    # Use provided config or create default
    if config is None:
        config = ExperimentationPortfolioOrchestratorConfig()

    data_dir = config.data_dir

    try:
        # Load all data files
        portfolio = load_portfolio(data_dir, config.portfolio_file)
        definitions = load_experiment_definitions(data_dir, config.definitions_file)
        metrics = load_experiment_metrics(data_dir, config.metrics_file)
        analysis = load_experiment_analysis(data_dir, config.analysis_file)
        decisions = load_experiment_decisions(data_dir, config.decisions_file)
        learnings = load_experiment_learnings(data_dir, config.learnings_file)
        audit_log = load_experiment_audit_log(data_dir, config.audit_log_file)

        # Filter by experiment_id if provided (for single experiment analysis)
        if experiment_id:
            portfolio = [p for p in portfolio if p.get("experiment_id") == experiment_id]
            definitions = [d for d in definitions if d.get("experiment_id") == experiment_id]
            metrics = [m for m in metrics if m.get("experiment_id") == experiment_id]
            analysis = [a for a in analysis if a.get("experiment_id") == experiment_id]
            decisions = [d for d in decisions if d.get("experiment_id") == experiment_id]
            learnings = [l for l in learnings if l.get("experiment_id") == experiment_id]
            audit_log = [a for a in audit_log if a.get("experiment_id") == experiment_id]

        # Build lookup dictionaries
        portfolio_lookup = build_portfolio_lookup(portfolio)
        definitions_lookup = build_definitions_lookup(definitions)
        metrics_lookup = build_metrics_lookup(metrics)
        analysis_lookup = build_analysis_lookup(analysis)
        decisions_lookup = build_decisions_lookup(decisions)
        learnings_lookup = build_learnings_lookup(learnings)
        audit_log_lookup = build_audit_log_lookup(audit_log)

        return {
            "portfolio": portfolio,
            "experiment_definitions": definitions,
            "experiment_metrics": metrics,
            "experiment_analysis": analysis,
            "experiment_decisions": decisions,
            "experiment_learnings": learnings,
            "experiment_audit_log": audit_log,
            "portfolio_lookup": portfolio_lookup,
            "definitions_lookup": definitions_lookup,
            "metrics_lookup": metrics_lookup,
            "analysis_lookup": analysis_lookup,
            "decisions_lookup": decisions_lookup,
            "learnings_lookup": learnings_lookup,
            "audit_log_lookup": audit_log_lookup,
            "errors": errors
        }
    except FileNotFoundError as e:
        return {
            "errors": errors + [f"data_loading_node: {str(e)}"]
        }
    except Exception as e:
        return {
            "errors": errors + [f"data_loading_node: Unexpected error - {str(e)}"]
        }
