# Building an Autonomous PE Due Diligence Agent with NucleusIQ

This notebook demonstrates how to build a **production-grade autonomous agent** using the NucleusIQ framework. We'll create an agent that performs complex Private Equity due diligence analyses — the kind of work that requires multi-step financial calculations, cross-referencing multiple data sources, and getting exact numbers right.

## What You'll Learn

1. **How NucleusIQ's 3 execution modes work** — DIRECT, STANDARD, AUTONOMOUS
2. **When to use Autonomous mode** — parallel execution, external validation, structured retry
3. **How to build domain-specific validation plugins** — catch errors the LLM can't catch itself
4. **Real results** — Standard vs Autonomous performance on 8 PE scenarios

## The Problem

Apex Capital PE Fund is evaluating an acquisition of DataForge. The analyst must complete 8 due-diligence analyses, each requiring 5–12 chained tool calls with real interdependencies:

| Challenge | Why it's hard |
|-----------|--------------|
| Multi-step WACC/DCF | 8+ chained calculations — any error cascades |
| LBO modeling | Debt paydown projection + IRR computation |
| Merger math | Share issuance, synergies, EPS accretion |
| Cross-company comparison | Must query and combine data from 3 companies |
| Formula selection | CAPM vs simple, Gordon Growth, amortization |
| Unit handling | Monthly vs annual, percentages vs decimals |

**Can an LLM-powered agent handle this reliably?** Let's find out.

In [1]:
import os, sys, asyncio, time, json, re, math, logging
from typing import Any, Dict, List, Optional, Tuple
from dataclasses import dataclass, asdict

# ── Path setup ──────────────────────────────────────────────────────
for p in ["../../src/nucleusiq/core", "../../src/providers/llms/openai"]:
    abs_p = os.path.abspath(p)
    if os.path.isdir(abs_p) and abs_p not in sys.path:
        sys.path.insert(0, abs_p)

for env_path in ["../../.env", ".env"]:
    if os.path.isfile(env_path):
        with open(env_path) as f:
            for line in f:
                line = line.strip()
                if line and not line.startswith("#") and "=" in line:
                    k, v = line.split("=", 1)
                    os.environ.setdefault(k.strip(), v.strip())
        break

from nucleusiq.agents import Agent
from nucleusiq.agents.task import Task
from nucleusiq.agents.config import AgentConfig, ExecutionMode, AgentState
from nucleusiq.llms.base_llm import BaseLLM
from nucleusiq.llms.mock_llm import MockLLM
from nucleusiq.plugins.builtin.tool_call_limit import ToolCallLimitPlugin
from tools.base_tool import BaseTool

HAS_OPENAI = False
try:
    from nucleusiq_openai import BaseOpenAI
    HAS_OPENAI = True
except ImportError:
    pass

logging.basicConfig(level=logging.WARNING, format="%(name)-20s %(levelname)-7s %(message)s")
USE_OPENAI = HAS_OPENAI and bool(os.environ.get("OPENAI_API_KEY"))

print("=" * 70)
print("  Enterprise Decision Support Agent — Complex Analysis")
print("=" * 70)
print(f"  LLM:  {'OpenAI gpt-5.2-2025-12-11' if USE_OPENAI else 'MockLLM'}")
print(f"  Mode: Standard vs Autonomous (Generate → Verify → Revise)")
print("=" * 70)

  Enterprise Decision Support Agent — Complex Analysis
  LLM:  OpenAI gpt-5.2-2025-12-11
  Mode: Standard vs Autonomous (Generate → Verify → Revise)


## Step 1: Setup

Import the NucleusIQ framework and configure the environment. The cell above loads:
- **`Agent`** — the main agent class
- **`AgentConfig`** / **`ExecutionMode`** — configuration with `DIRECT`, `STANDARD`, `AUTONOMOUS` modes
- **`BaseTool`** — base class for custom tools
- **`BaseOpenAI`** — OpenAI LLM provider (optional; falls back to `MockLLM` if not installed)

## Step 2: Define Domain-Specific Tools

An agent is only as good as its tools. We define 4 tools that cover the PE analyst's toolkit:

| Tool | Purpose | Example |
|------|---------|---------|
| `CompanyDataTool` | Query company financials | Revenue, EBITDA, debt, shares outstanding |
| `MarketBenchmarkTool` | Industry benchmarks | Risk-free rate, market premium, sector multiples |
| `FinancialCalcTool` | Financial calculations | WACC, DCF, IRR, NPV, amortization |
| `PortfolioMathTool` | Portfolio analytics | Sharpe ratio, optimization, correlation |

Each tool is a subclass of `BaseTool` — the framework's generic tool interface. The agent discovers tools automatically and decides which to call based on the task.

In [2]:
# ═══════════════════════════════════════════════════════════════════
# EMBEDDED DATASETS
# ═══════════════════════════════════════════════════════════════════

COMPANIES = {
    "CloudVault": {
        "sector": "SaaS / Cloud Storage",
        "metrics": {
            "monthly_revenue": 2_400_000,
            "annual_revenue": 28_800_000,
            "revenue_growth_rate": 0.35,
            "gross_margin": 0.78,
            "operating_margin": 0.12,
            "ebitda_margin": 0.18,
            "ebitda": 5_184_000,
            "net_income": 2_016_000,
            "total_debt": 8_000_000,
            "cash": 12_000_000,
            "total_equity": 45_000_000,
            "shares_outstanding": 10_000_000,
            "monthly_churn_rate": 0.025,
            "arpu_monthly": 120,
            "cac": 850,
            "expansion_revenue_rate": 0.03,
            "customers": 20_000,
            "employees": 180,
            "capex_annual": 3_200_000,
            "depreciation": 1_728_000,
            "tax_rate": 0.21,
            "interest_expense": 480_000,
        },
        "projections": {
            "year1_growth": 0.35,
            "year2_growth": 0.28,
            "year3_growth": 0.22,
            "year4_growth": 0.18,
            "year5_growth": 0.15,
            "terminal_growth": 0.03,
            "target_operating_margin": 0.25,
        },
    },
    "DataForge": {
        "sector": "Data Analytics Platform",
        "metrics": {
            "monthly_revenue": 1_800_000,
            "annual_revenue": 21_600_000,
            "revenue_growth_rate": 0.42,
            "gross_margin": 0.82,
            "operating_margin": -0.05,
            "ebitda_margin": 0.02,
            "ebitda": 432_000,
            "net_income": -1_080_000,
            "total_debt": 15_000_000,
            "cash": 25_000_000,
            "total_equity": 60_000_000,
            "shares_outstanding": 12_000_000,
            "monthly_churn_rate": 0.018,
            "arpu_monthly": 200,
            "cac": 1_400,
            "expansion_revenue_rate": 0.05,
            "customers": 9_000,
            "employees": 220,
            "capex_annual": 4_500_000,
            "depreciation": 1_512_000,
            "tax_rate": 0.21,
            "interest_expense": 900_000,
        },
        "projections": {
            "year1_growth": 0.42,
            "year2_growth": 0.35,
            "year3_growth": 0.28,
            "year4_growth": 0.22,
            "year5_growth": 0.18,
            "terminal_growth": 0.03,
            "target_operating_margin": 0.30,
        },
    },
    "SecureNet": {
        "sector": "Cybersecurity",
        "metrics": {
            "monthly_revenue": 3_500_000,
            "annual_revenue": 42_000_000,
            "revenue_growth_rate": 0.25,
            "gross_margin": 0.72,
            "operating_margin": 0.20,
            "ebitda_margin": 0.28,
            "ebitda": 11_760_000,
            "net_income": 5_544_000,
            "total_debt": 20_000_000,
            "cash": 18_000_000,
            "total_equity": 80_000_000,
            "shares_outstanding": 15_000_000,
            "monthly_churn_rate": 0.012,
            "arpu_monthly": 350,
            "cac": 2_100,
            "expansion_revenue_rate": 0.02,
            "customers": 10_000,
            "employees": 350,
            "capex_annual": 5_000_000,
            "depreciation": 3_360_000,
            "tax_rate": 0.21,
            "interest_expense": 1_200_000,
        },
        "projections": {
            "year1_growth": 0.25,
            "year2_growth": 0.22,
            "year3_growth": 0.20,
            "year4_growth": 0.18,
            "year5_growth": 0.15,
            "terminal_growth": 0.03,
            "target_operating_margin": 0.28,
        },
    },
}

MARKET_DATA = {
    "saas": {
        "median_revenue_multiple": 8.5,
        "median_ebitda_multiple": 25.0,
        "median_growth_rate": 0.25,
        "risk_free_rate": 0.043,
        "equity_risk_premium": 0.055,
        "small_cap_premium": 0.02,
        "median_beta": 1.2,
        "median_ltv_cac_ratio": 3.0,
        "median_churn_monthly": 0.02,
        "median_gross_margin": 0.75,
    },
    "cybersecurity": {
        "median_revenue_multiple": 10.0,
        "median_ebitda_multiple": 30.0,
        "median_growth_rate": 0.22,
        "risk_free_rate": 0.043,
        "equity_risk_premium": 0.055,
        "small_cap_premium": 0.015,
        "median_beta": 1.1,
        "median_ltv_cac_ratio": 4.0,
        "median_churn_monthly": 0.015,
        "median_gross_margin": 0.70,
    },
    "data_analytics": {
        "median_revenue_multiple": 9.0,
        "median_ebitda_multiple": 28.0,
        "median_growth_rate": 0.30,
        "risk_free_rate": 0.043,
        "equity_risk_premium": 0.055,
        "small_cap_premium": 0.025,
        "median_beta": 1.3,
        "median_ltv_cac_ratio": 3.5,
        "median_churn_monthly": 0.02,
        "median_gross_margin": 0.78,
    },
}

PORTFOLIO_ASSETS = {
    "US_EQUITY":  {"expected_return": 0.095, "std_dev": 0.185},
    "INTL_EQUITY": {"expected_return": 0.075, "std_dev": 0.210},
    "BONDS":       {"expected_return": 0.042, "std_dev": 0.055},
    "REAL_ESTATE": {"expected_return": 0.082, "std_dev": 0.145},
}

CORRELATION_MATRIX = {
    ("US_EQUITY", "US_EQUITY"): 1.0,
    ("US_EQUITY", "INTL_EQUITY"): 0.72,
    ("US_EQUITY", "BONDS"): -0.15,
    ("US_EQUITY", "REAL_ESTATE"): 0.45,
    ("INTL_EQUITY", "INTL_EQUITY"): 1.0,
    ("INTL_EQUITY", "BONDS"): -0.10,
    ("INTL_EQUITY", "REAL_ESTATE"): 0.38,
    ("BONDS", "BONDS"): 1.0,
    ("BONDS", "REAL_ESTATE"): 0.20,
    ("REAL_ESTATE", "REAL_ESTATE"): 1.0,
}
for (a, b), v in list(CORRELATION_MATRIX.items()):
    CORRELATION_MATRIX[(b, a)] = v

print(f"  Companies loaded: {list(COMPANIES.keys())}")
print(f"  Market sectors:   {list(MARKET_DATA.keys())}")
print(f"  Portfolio assets: {list(PORTFOLIO_ASSETS.keys())}")

  Companies loaded: ['CloudVault', 'DataForge', 'SecureNet']
  Market sectors:   ['saas', 'cybersecurity', 'data_analytics']
  Portfolio assets: ['US_EQUITY', 'INTL_EQUITY', 'BONDS', 'REAL_ESTATE']


In [3]:
# ═══════════════════════════════════════════════════════════════════
# TOOL 1: financial_calc  — core financial formulas
# ═══════════════════════════════════════════════════════════════════

class FinancialCalcTool(BaseTool):
    """Computes standard financial metrics from provided inputs."""

    def __init__(self):
        super().__init__(
            name="financial_calc",
            description=(
                "Financial calculator. Operations:\n"
                "  npv        — Net Present Value. Args: rate (discount rate, e.g. 0.10), cash_flows (list of floats, year 0 first)\n"
                "  irr        — Internal Rate of Return. Args: cash_flows (list of floats, year 0 first)\n"
                "  wacc       — Weighted Avg Cost of Capital. Args: equity_value, debt_value, cost_of_equity, cost_of_debt, tax_rate\n"
                "  ltv        — Customer Lifetime Value. Args: arpu_monthly, monthly_churn_rate, gross_margin, expansion_rate (optional, default 0)\n"
                "  dcf_terminal — DCF terminal value (Gordon Growth). Args: final_year_fcf, terminal_growth, discount_rate\n"
                "  loan_payment — Monthly loan payment. Args: principal, annual_rate, years\n"
                "  capm       — Cost of equity via CAPM. Args: risk_free_rate, beta, equity_risk_premium, size_premium (optional, default 0)\n"
                "  ev         — Enterprise Value. Args: equity_value, total_debt, cash\n"
                "  expression — Evaluate a math expression. Args: expr (string)\n"
            ),
        )

    async def initialize(self):
        pass

    async def execute(self, operation: str = "expression", **kwargs) -> str:
        try:
            if isinstance(operation, list):
                operation = str(operation[0]) if operation else "expression"
            op = operation.lower().strip()

            if op == "npv":
                rate = float(kwargs["rate"])
                cfs = [float(x) for x in kwargs["cash_flows"]]
                result = sum(cf / (1 + rate) ** t for t, cf in enumerate(cfs))
                return json.dumps({"npv": round(result, 2)})

            elif op == "irr":
                cfs = [float(x) for x in kwargs["cash_flows"]]
                lo, hi = -0.5, 5.0
                for _ in range(200):
                    mid = (lo + hi) / 2
                    npv = sum(cf / (1 + mid) ** t for t, cf in enumerate(cfs))
                    if npv > 0:
                        lo = mid
                    else:
                        hi = mid
                return json.dumps({"irr": round((lo + hi) / 2, 6)})

            elif op == "wacc":
                E = float(kwargs["equity_value"])
                D = float(kwargs["debt_value"])
                ke = float(kwargs["cost_of_equity"])
                kd = float(kwargs["cost_of_debt"])
                t = float(kwargs["tax_rate"])
                V = E + D
                wacc = (E / V) * ke + (D / V) * kd * (1 - t)
                return json.dumps({"wacc": round(wacc, 6)})

            elif op == "ltv":
                arpu = float(kwargs["arpu_monthly"])
                churn = float(kwargs["monthly_churn_rate"])
                gm = float(kwargs["gross_margin"])
                exp = float(kwargs.get("expansion_rate", 0))
                effective_churn = max(churn - exp, 0.001)
                ltv = (arpu * gm) / effective_churn
                return json.dumps({"ltv": round(ltv, 2), "effective_monthly_churn": round(effective_churn, 6)})

            elif op == "dcf_terminal":
                fcf = float(kwargs["final_year_fcf"])
                g = float(kwargs["terminal_growth"])
                r = float(kwargs["discount_rate"])
                tv = fcf * (1 + g) / (r - g)
                return json.dumps({"terminal_value": round(tv, 2)})

            elif op == "loan_payment":
                P = float(kwargs["principal"])
                r_annual = float(kwargs["annual_rate"])
                years = float(kwargs["years"])
                r_m = r_annual / 12
                n = int(years * 12)
                if r_m == 0:
                    pmt = P / n
                else:
                    pmt = P * (r_m * (1 + r_m) ** n) / ((1 + r_m) ** n - 1)
                total = pmt * n
                return json.dumps({"monthly_payment": round(pmt, 2), "total_paid": round(total, 2), "total_interest": round(total - P, 2)})

            elif op == "capm":
                rf = float(kwargs["risk_free_rate"])
                beta = float(kwargs["beta"])
                erp = float(kwargs["equity_risk_premium"])
                sp = float(kwargs.get("size_premium", 0))
                ke = rf + beta * erp + sp
                return json.dumps({"cost_of_equity": round(ke, 6)})

            elif op == "ev":
                eq = float(kwargs["equity_value"])
                debt = float(kwargs["total_debt"])
                cash = float(kwargs["cash"])
                return json.dumps({"enterprise_value": round(eq + debt - cash, 2)})

            elif op == "expression":
                safe_ns = {"pi": math.pi, "sqrt": math.sqrt, "abs": abs,
                           "round": round, "min": min, "max": max, "sum": sum}
                result = eval(str(kwargs["expr"]), {"__builtins__": {}}, safe_ns)
                return json.dumps({"result": round(float(result), 4)})

            else:
                return json.dumps({"error": f"Unknown operation: {operation}"})
        except Exception as e:
            return json.dumps({"error": str(e)})

    def get_spec(self) -> Dict[str, Any]:
        return {
            "name": self.name,
            "description": self.description,
            "parameters": {
                "type": "object",
                "properties": {
                    "operation": {"type": "string", "description": "One of: npv, irr, wacc, ltv, dcf_terminal, loan_payment, capm, ev, expression"},
                    "rate": {"type": "number", "description": "Discount rate for NPV"},
                    "cash_flows": {"type": "array", "items": {"type": "number"}, "description": "Cash flows starting from year 0"},
                    "equity_value": {"type": "number"}, "debt_value": {"type": "number"},
                    "cost_of_equity": {"type": "number"}, "cost_of_debt": {"type": "number"},
                    "tax_rate": {"type": "number"},
                    "arpu_monthly": {"type": "number"}, "monthly_churn_rate": {"type": "number"},
                    "gross_margin": {"type": "number"}, "expansion_rate": {"type": "number"},
                    "final_year_fcf": {"type": "number"}, "terminal_growth": {"type": "number"},
                    "discount_rate": {"type": "number"},
                    "principal": {"type": "number"}, "annual_rate": {"type": "number"},
                    "years": {"type": "number"},
                    "risk_free_rate": {"type": "number"}, "beta": {"type": "number"},
                    "equity_risk_premium": {"type": "number"}, "size_premium": {"type": "number"},
                    "total_debt": {"type": "number"}, "cash": {"type": "number"},
                    "expr": {"type": "string", "description": "Math expression to evaluate"},
                },
                "required": ["operation"],
            },
        }


# ═══════════════════════════════════════════════════════════════════
# TOOL 2: company_data  — query embedded company financials
# ═══════════════════════════════════════════════════════════════════

class CompanyDataTool(BaseTool):
    def __init__(self):
        super().__init__(
            name="company_data",
            description=(
                "Query company financial data. Args:\n"
                "  company — name: CloudVault, DataForge, or SecureNet\n"
                "  fields  — comma-separated metric names (e.g. 'annual_revenue,ebitda,monthly_churn_rate')\n"
                "           Use 'all' for everything, 'projections' for growth projections.\n"
                "Available metric fields: monthly_revenue, annual_revenue, revenue_growth_rate, "
                "gross_margin, operating_margin, ebitda_margin, ebitda, net_income, total_debt, "
                "cash, total_equity, shares_outstanding, monthly_churn_rate, arpu_monthly, cac, "
                "expansion_revenue_rate, customers, employees, capex_annual, depreciation, "
                "tax_rate, interest_expense"
            ),
        )

    async def initialize(self):
        pass

    async def execute(self, company: str, fields: str = "all") -> str:
        if isinstance(fields, list):
            fields = ",".join(str(f) for f in fields)
        if isinstance(company, list):
            company = str(company[0]) if company else ""
        company = company.strip()
        match = None
        for name in COMPANIES:
            if name.lower() == company.lower():
                match = name
                break
        if not match:
            return json.dumps({"error": f"Unknown company: {company}. Available: {list(COMPANIES.keys())}"})

        data = COMPANIES[match]
        if fields.strip().lower() == "all":
            return json.dumps({"company": match, **data["metrics"]})
        elif fields.strip().lower() == "projections":
            return json.dumps({"company": match, **data["projections"]})
        else:
            result = {"company": match}
            for f in fields.split(","):
                f = f.strip()
                if f in data["metrics"]:
                    result[f] = data["metrics"][f]
                elif f in data["projections"]:
                    result[f] = data["projections"][f]
                else:
                    result[f] = f"unknown field"
            return json.dumps(result)

    def get_spec(self) -> Dict[str, Any]:
        return {
            "name": self.name,
            "description": self.description,
            "parameters": {
                "type": "object",
                "properties": {
                    "company": {"type": "string", "description": "Company name: CloudVault, DataForge, or SecureNet"},
                    "fields": {"type": "string", "description": "Comma-separated fields or 'all' or 'projections'"},
                },
                "required": ["company"],
            },
        }


# ═══════════════════════════════════════════════════════════════════
# TOOL 3: market_benchmark  — industry comparables
# ═══════════════════════════════════════════════════════════════════

class MarketBenchmarkTool(BaseTool):
    def __init__(self):
        super().__init__(
            name="market_benchmark",
            description=(
                "Get industry benchmark data. Args:\n"
                "  sector — one of: saas, cybersecurity, data_analytics\n"
                "  fields — comma-separated or 'all'. Available: median_revenue_multiple, "
                "median_ebitda_multiple, median_growth_rate, risk_free_rate, equity_risk_premium, "
                "small_cap_premium, median_beta, median_ltv_cac_ratio, median_churn_monthly, "
                "median_gross_margin"
            ),
        )

    async def initialize(self):
        pass

    async def execute(self, sector: str, fields: str = "all") -> str:
        if isinstance(fields, list):
            fields = ",".join(str(f) for f in fields)
        if isinstance(sector, list):
            sector = str(sector[0]) if sector else ""
        sector = sector.strip().lower().replace(" ", "_")
        if sector not in MARKET_DATA:
            return json.dumps({"error": f"Unknown sector: {sector}. Available: {list(MARKET_DATA.keys())}"})
        data = MARKET_DATA[sector]
        if fields.strip().lower() == "all":
            return json.dumps({"sector": sector, **data})
        result = {"sector": sector}
        for f in fields.split(","):
            f = f.strip()
            if f in data:
                result[f] = data[f]
            else:
                result[f] = "unknown field"
        return json.dumps(result)

    def get_spec(self) -> Dict[str, Any]:
        return {
            "name": self.name,
            "description": self.description,
            "parameters": {
                "type": "object",
                "properties": {
                    "sector": {"type": "string", "description": "Sector: saas, cybersecurity, or data_analytics"},
                    "fields": {"type": "string", "description": "Comma-separated fields or 'all'"},
                },
                "required": ["sector"],
            },
        }


# ═══════════════════════════════════════════════════════════════════
# TOOL 4: portfolio_math  — portfolio-level calculations
# ═══════════════════════════════════════════════════════════════════

class PortfolioMathTool(BaseTool):
    def __init__(self):
        super().__init__(
            name="portfolio_math",
            description=(
                "Portfolio calculations. Args:\n"
                "  operation — one of:\n"
                "    portfolio_return: weighted return. Args: weights (dict asset->weight)\n"
                "    portfolio_risk: portfolio std dev using correlations. Args: weights (dict asset->weight)\n"
                "    sharpe_ratio: (portfolio_return - risk_free) / portfolio_risk. Args: weights (dict), risk_free_rate\n"
                "    asset_info: get return/risk for an asset. Args: asset (name)\n"
                "    correlation: get correlation between two assets. Args: asset1, asset2\n"
                "  Available assets: US_EQUITY, INTL_EQUITY, BONDS, REAL_ESTATE"
            ),
        )

    async def initialize(self):
        pass

    async def execute(self, operation: str = "asset_info", **kwargs) -> str:
        try:
            if isinstance(operation, list):
                operation = str(operation[0]) if operation else "asset_info"
            op = operation.lower().strip()

            if op == "asset_info":
                a = kwargs["asset"].upper().strip()
                if a not in PORTFOLIO_ASSETS:
                    return json.dumps({"error": f"Unknown asset: {a}"})
                return json.dumps({"asset": a, **PORTFOLIO_ASSETS[a]})

            elif op == "correlation":
                a1 = kwargs["asset1"].upper().strip()
                a2 = kwargs["asset2"].upper().strip()
                key = (a1, a2)
                if key not in CORRELATION_MATRIX:
                    return json.dumps({"error": f"Unknown pair: {a1}, {a2}"})
                return json.dumps({"asset1": a1, "asset2": a2, "correlation": CORRELATION_MATRIX[key]})

            elif op == "portfolio_return":
                weights = kwargs["weights"]
                if isinstance(weights, str):
                    weights = json.loads(weights)
                ret = sum(
                    float(w) * PORTFOLIO_ASSETS[a.upper()]["expected_return"]
                    for a, w in weights.items()
                )
                return json.dumps({"portfolio_return": round(ret, 6)})

            elif op == "portfolio_risk":
                weights = kwargs["weights"]
                if isinstance(weights, str):
                    weights = json.loads(weights)
                assets = list(weights.keys())
                variance = 0.0
                for i, a1 in enumerate(assets):
                    for j, a2 in enumerate(assets):
                        w1 = float(weights[a1])
                        w2 = float(weights[a2])
                        s1 = PORTFOLIO_ASSETS[a1.upper()]["std_dev"]
                        s2 = PORTFOLIO_ASSETS[a2.upper()]["std_dev"]
                        corr = CORRELATION_MATRIX.get((a1.upper(), a2.upper()), 0)
                        variance += w1 * w2 * s1 * s2 * corr
                return json.dumps({"portfolio_variance": round(variance, 8), "portfolio_std_dev": round(math.sqrt(variance), 6)})

            elif op == "sharpe_ratio":
                weights = kwargs["weights"]
                if isinstance(weights, str):
                    weights = json.loads(weights)
                rf = float(kwargs.get("risk_free_rate", 0.043))
                ret = sum(float(w) * PORTFOLIO_ASSETS[a.upper()]["expected_return"] for a, w in weights.items())
                assets = list(weights.keys())
                variance = 0.0
                for a1 in assets:
                    for a2 in assets:
                        w1, w2 = float(weights[a1]), float(weights[a2])
                        s1 = PORTFOLIO_ASSETS[a1.upper()]["std_dev"]
                        s2 = PORTFOLIO_ASSETS[a2.upper()]["std_dev"]
                        corr = CORRELATION_MATRIX.get((a1.upper(), a2.upper()), 0)
                        variance += w1 * w2 * s1 * s2 * corr
                std = math.sqrt(variance)
                sharpe = (ret - rf) / std if std > 0 else 0
                return json.dumps({"sharpe_ratio": round(sharpe, 4), "return": round(ret, 6), "risk": round(std, 6)})
            else:
                return json.dumps({"error": f"Unknown operation: {operation}"})
        except Exception as e:
            return json.dumps({"error": str(e)})

    def get_spec(self) -> Dict[str, Any]:
        return {
            "name": self.name,
            "description": self.description,
            "parameters": {
                "type": "object",
                "properties": {
                    "operation": {"type": "string", "description": "portfolio_return, portfolio_risk, sharpe_ratio, asset_info, or correlation"},
                    "weights": {"type": "object", "description": "Asset weights dict, e.g. {\"US_EQUITY\": 0.4, \"BONDS\": 0.6}"},
                    "asset": {"type": "string"}, "asset1": {"type": "string"}, "asset2": {"type": "string"},
                    "risk_free_rate": {"type": "number"},
                },
                "required": ["operation"],
            },
        }


print("Tools defined: financial_calc, company_data, market_benchmark, portfolio_math")

Tools defined: financial_calc, company_data, market_benchmark, portfolio_math


In [4]:
# ═══════════════════════════════════════════════════════════════════
# TOOL 1: financial_calc  — core financial formulas
# ═══════════════════════════════════════════════════════════════════

class FinancialCalcTool(BaseTool):
    """Computes standard financial metrics from provided inputs."""

    def __init__(self):
        super().__init__(
            name="financial_calc",
            description=(
                "Financial calculator. Operations:\n"
                "  npv        — Net Present Value. Args: rate (discount rate, e.g. 0.10), cash_flows (list of floats, year 0 first)\n"
                "  irr        — Internal Rate of Return. Args: cash_flows (list of floats, year 0 first)\n"
                "  wacc       — Weighted Avg Cost of Capital. Args: equity_value, debt_value, cost_of_equity, cost_of_debt, tax_rate\n"
                "  ltv        — Customer Lifetime Value. Args: arpu_monthly, monthly_churn_rate, gross_margin, expansion_rate (optional, default 0)\n"
                "  dcf_terminal — DCF terminal value (Gordon Growth). Args: final_year_fcf, terminal_growth, discount_rate\n"
                "  loan_payment — Monthly loan payment. Args: principal, annual_rate, years\n"
                "  capm       — Cost of equity via CAPM. Args: risk_free_rate, beta, equity_risk_premium, size_premium (optional, default 0)\n"
                "  ev         — Enterprise Value. Args: equity_value, total_debt, cash\n"
                "  expression — Evaluate a math expression. Args: expr (string)\n"
            ),
        )

    async def initialize(self):
        pass

    async def execute(self, operation: str = "expression", **kwargs) -> str:
        try:
            if isinstance(operation, list):
                operation = str(operation[0]) if operation else "expression"
            op = operation.lower().strip()

            if op == "npv":
                rate = float(kwargs["rate"])
                cfs = [float(x) for x in kwargs["cash_flows"]]
                result = sum(cf / (1 + rate) ** t for t, cf in enumerate(cfs))
                return json.dumps({"npv": round(result, 2)})

            elif op == "irr":
                cfs = [float(x) for x in kwargs["cash_flows"]]
                lo, hi = -0.5, 5.0
                for _ in range(200):
                    mid = (lo + hi) / 2
                    npv = sum(cf / (1 + mid) ** t for t, cf in enumerate(cfs))
                    if npv > 0:
                        lo = mid
                    else:
                        hi = mid
                return json.dumps({"irr": round((lo + hi) / 2, 6)})

            elif op == "wacc":
                E = float(kwargs["equity_value"])
                D = float(kwargs["debt_value"])
                ke = float(kwargs["cost_of_equity"])
                kd = float(kwargs["cost_of_debt"])
                t = float(kwargs["tax_rate"])
                V = E + D
                wacc = (E / V) * ke + (D / V) * kd * (1 - t)
                return json.dumps({"wacc": round(wacc, 6)})

            elif op == "ltv":
                arpu = float(kwargs["arpu_monthly"])
                churn = float(kwargs["monthly_churn_rate"])
                gm = float(kwargs["gross_margin"])
                exp = float(kwargs.get("expansion_rate", 0))
                effective_churn = max(churn - exp, 0.001)
                ltv = (arpu * gm) / effective_churn
                return json.dumps({"ltv": round(ltv, 2), "effective_monthly_churn": round(effective_churn, 6)})

            elif op == "dcf_terminal":
                fcf = float(kwargs["final_year_fcf"])
                g = float(kwargs["terminal_growth"])
                r = float(kwargs["discount_rate"])
                tv = fcf * (1 + g) / (r - g)
                return json.dumps({"terminal_value": round(tv, 2)})

            elif op == "loan_payment":
                P = float(kwargs["principal"])
                r_annual = float(kwargs["annual_rate"])
                years = float(kwargs["years"])
                r_m = r_annual / 12
                n = int(years * 12)
                if r_m == 0:
                    pmt = P / n
                else:
                    pmt = P * (r_m * (1 + r_m) ** n) / ((1 + r_m) ** n - 1)
                total = pmt * n
                return json.dumps({"monthly_payment": round(pmt, 2), "total_paid": round(total, 2), "total_interest": round(total - P, 2)})

            elif op == "capm":
                rf = float(kwargs["risk_free_rate"])
                beta = float(kwargs["beta"])
                erp = float(kwargs["equity_risk_premium"])
                sp = float(kwargs.get("size_premium", 0))
                ke = rf + beta * erp + sp
                return json.dumps({"cost_of_equity": round(ke, 6)})

            elif op == "ev":
                eq = float(kwargs["equity_value"])
                debt = float(kwargs["total_debt"])
                cash = float(kwargs["cash"])
                return json.dumps({"enterprise_value": round(eq + debt - cash, 2)})

            elif op == "expression":
                safe_ns = {"pi": math.pi, "sqrt": math.sqrt, "abs": abs,
                           "round": round, "min": min, "max": max, "sum": sum}
                result = eval(str(kwargs["expr"]), {"__builtins__": {}}, safe_ns)
                return json.dumps({"result": round(float(result), 4)})

            else:
                return json.dumps({"error": f"Unknown operation: {operation}"})
        except Exception as e:
            return json.dumps({"error": str(e)})

    def get_spec(self) -> Dict[str, Any]:
        return {
            "name": self.name,
            "description": self.description,
            "parameters": {
                "type": "object",
                "properties": {
                    "operation": {"type": "string", "description": "One of: npv, irr, wacc, ltv, dcf_terminal, loan_payment, capm, ev, expression"},
                    "rate": {"type": "number", "description": "Discount rate for NPV"},
                    "cash_flows": {"type": "array", "items": {"type": "number"}, "description": "Cash flows starting from year 0"},
                    "equity_value": {"type": "number"}, "debt_value": {"type": "number"},
                    "cost_of_equity": {"type": "number"}, "cost_of_debt": {"type": "number"},
                    "tax_rate": {"type": "number"},
                    "arpu_monthly": {"type": "number"}, "monthly_churn_rate": {"type": "number"},
                    "gross_margin": {"type": "number"}, "expansion_rate": {"type": "number"},
                    "final_year_fcf": {"type": "number"}, "terminal_growth": {"type": "number"},
                    "discount_rate": {"type": "number"},
                    "principal": {"type": "number"}, "annual_rate": {"type": "number"},
                    "years": {"type": "number"},
                    "risk_free_rate": {"type": "number"}, "beta": {"type": "number"},
                    "equity_risk_premium": {"type": "number"}, "size_premium": {"type": "number"},
                    "total_debt": {"type": "number"}, "cash": {"type": "number"},
                    "expr": {"type": "string", "description": "Math expression to evaluate"},
                },
                "required": ["operation"],
            },
        }


# ═══════════════════════════════════════════════════════════════════
# TOOL 2: company_data  — query embedded company financials
# ═══════════════════════════════════════════════════════════════════

class CompanyDataTool(BaseTool):
    def __init__(self):
        super().__init__(
            name="company_data",
            description=(
                "Query company financial data. Args:\n"
                "  company — name: CloudVault, DataForge, or SecureNet\n"
                "  fields  — comma-separated metric names (e.g. 'annual_revenue,ebitda,monthly_churn_rate')\n"
                "           Use 'all' for everything, 'projections' for growth projections.\n"
                "Available metric fields: monthly_revenue, annual_revenue, revenue_growth_rate, "
                "gross_margin, operating_margin, ebitda_margin, ebitda, net_income, total_debt, "
                "cash, total_equity, shares_outstanding, monthly_churn_rate, arpu_monthly, cac, "
                "expansion_revenue_rate, customers, employees, capex_annual, depreciation, "
                "tax_rate, interest_expense"
            ),
        )

    async def initialize(self):
        pass

    async def execute(self, company: str, fields: str = "all") -> str:
        if isinstance(fields, list):
            fields = ",".join(str(f) for f in fields)
        if isinstance(company, list):
            company = str(company[0]) if company else ""
        company = company.strip()
        match = None
        for name in COMPANIES:
            if name.lower() == company.lower():
                match = name
                break
        if not match:
            return json.dumps({"error": f"Unknown company: {company}. Available: {list(COMPANIES.keys())}"})

        data = COMPANIES[match]
        if fields.strip().lower() == "all":
            return json.dumps({"company": match, **data["metrics"]})
        elif fields.strip().lower() == "projections":
            return json.dumps({"company": match, **data["projections"]})
        else:
            result = {"company": match}
            for f in fields.split(","):
                f = f.strip()
                if f in data["metrics"]:
                    result[f] = data["metrics"][f]
                elif f in data["projections"]:
                    result[f] = data["projections"][f]
                else:
                    result[f] = f"unknown field"
            return json.dumps(result)

    def get_spec(self) -> Dict[str, Any]:
        return {
            "name": self.name,
            "description": self.description,
            "parameters": {
                "type": "object",
                "properties": {
                    "company": {"type": "string", "description": "Company name: CloudVault, DataForge, or SecureNet"},
                    "fields": {"type": "string", "description": "Comma-separated fields or 'all' or 'projections'"},
                },
                "required": ["company"],
            },
        }


# ═══════════════════════════════════════════════════════════════════
# TOOL 3: market_benchmark  — industry comparables
# ═══════════════════════════════════════════════════════════════════

class MarketBenchmarkTool(BaseTool):
    def __init__(self):
        super().__init__(
            name="market_benchmark",
            description=(
                "Get industry benchmark data. Args:\n"
                "  sector — one of: saas, cybersecurity, data_analytics\n"
                "  fields — comma-separated or 'all'. Available: median_revenue_multiple, "
                "median_ebitda_multiple, median_growth_rate, risk_free_rate, equity_risk_premium, "
                "small_cap_premium, median_beta, median_ltv_cac_ratio, median_churn_monthly, "
                "median_gross_margin"
            ),
        )

    async def initialize(self):
        pass

    async def execute(self, sector: str, fields: str = "all") -> str:
        if isinstance(fields, list):
            fields = ",".join(str(f) for f in fields)
        if isinstance(sector, list):
            sector = str(sector[0]) if sector else ""
        sector = sector.strip().lower().replace(" ", "_")
        if sector not in MARKET_DATA:
            return json.dumps({"error": f"Unknown sector: {sector}. Available: {list(MARKET_DATA.keys())}"})
        data = MARKET_DATA[sector]
        if fields.strip().lower() == "all":
            return json.dumps({"sector": sector, **data})
        result = {"sector": sector}
        for f in fields.split(","):
            f = f.strip()
            if f in data:
                result[f] = data[f]
            else:
                result[f] = "unknown field"
        return json.dumps(result)

    def get_spec(self) -> Dict[str, Any]:
        return {
            "name": self.name,
            "description": self.description,
            "parameters": {
                "type": "object",
                "properties": {
                    "sector": {"type": "string", "description": "Sector: saas, cybersecurity, or data_analytics"},
                    "fields": {"type": "string", "description": "Comma-separated fields or 'all'"},
                },
                "required": ["sector"],
            },
        }


# ═══════════════════════════════════════════════════════════════════
# TOOL 4: portfolio_math  — portfolio-level calculations
# ═══════════════════════════════════════════════════════════════════

class PortfolioMathTool(BaseTool):
    def __init__(self):
        super().__init__(
            name="portfolio_math",
            description=(
                "Portfolio calculations. Args:\n"
                "  operation — one of:\n"
                "    portfolio_return: weighted return. Args: weights (dict asset->weight)\n"
                "    portfolio_risk: portfolio std dev using correlations. Args: weights (dict asset->weight)\n"
                "    sharpe_ratio: (portfolio_return - risk_free) / portfolio_risk. Args: weights (dict), risk_free_rate\n"
                "    asset_info: get return/risk for an asset. Args: asset (name)\n"
                "    correlation: get correlation between two assets. Args: asset1, asset2\n"
                "  Available assets: US_EQUITY, INTL_EQUITY, BONDS, REAL_ESTATE"
            ),
        )

    async def initialize(self):
        pass

    async def execute(self, operation: str = "asset_info", **kwargs) -> str:
        try:
            if isinstance(operation, list):
                operation = str(operation[0]) if operation else "asset_info"
            op = operation.lower().strip()

            if op == "asset_info":
                a = kwargs["asset"].upper().strip()
                if a not in PORTFOLIO_ASSETS:
                    return json.dumps({"error": f"Unknown asset: {a}"})
                return json.dumps({"asset": a, **PORTFOLIO_ASSETS[a]})

            elif op == "correlation":
                a1 = kwargs["asset1"].upper().strip()
                a2 = kwargs["asset2"].upper().strip()
                key = (a1, a2)
                if key not in CORRELATION_MATRIX:
                    return json.dumps({"error": f"Unknown pair: {a1}, {a2}"})
                return json.dumps({"asset1": a1, "asset2": a2, "correlation": CORRELATION_MATRIX[key]})

            elif op == "portfolio_return":
                weights = kwargs["weights"]
                if isinstance(weights, str):
                    weights = json.loads(weights)
                ret = sum(
                    float(w) * PORTFOLIO_ASSETS[a.upper()]["expected_return"]
                    for a, w in weights.items()
                )
                return json.dumps({"portfolio_return": round(ret, 6)})

            elif op == "portfolio_risk":
                weights = kwargs["weights"]
                if isinstance(weights, str):
                    weights = json.loads(weights)
                assets = list(weights.keys())
                variance = 0.0
                for i, a1 in enumerate(assets):
                    for j, a2 in enumerate(assets):
                        w1 = float(weights[a1])
                        w2 = float(weights[a2])
                        s1 = PORTFOLIO_ASSETS[a1.upper()]["std_dev"]
                        s2 = PORTFOLIO_ASSETS[a2.upper()]["std_dev"]
                        corr = CORRELATION_MATRIX.get((a1.upper(), a2.upper()), 0)
                        variance += w1 * w2 * s1 * s2 * corr
                return json.dumps({"portfolio_variance": round(variance, 8), "portfolio_std_dev": round(math.sqrt(variance), 6)})

            elif op == "sharpe_ratio":
                weights = kwargs["weights"]
                if isinstance(weights, str):
                    weights = json.loads(weights)
                rf = float(kwargs.get("risk_free_rate", 0.043))
                ret = sum(float(w) * PORTFOLIO_ASSETS[a.upper()]["expected_return"] for a, w in weights.items())
                assets = list(weights.keys())
                variance = 0.0
                for a1 in assets:
                    for a2 in assets:
                        w1, w2 = float(weights[a1]), float(weights[a2])
                        s1 = PORTFOLIO_ASSETS[a1.upper()]["std_dev"]
                        s2 = PORTFOLIO_ASSETS[a2.upper()]["std_dev"]
                        corr = CORRELATION_MATRIX.get((a1.upper(), a2.upper()), 0)
                        variance += w1 * w2 * s1 * s2 * corr
                std = math.sqrt(variance)
                sharpe = (ret - rf) / std if std > 0 else 0
                return json.dumps({"sharpe_ratio": round(sharpe, 4), "return": round(ret, 6), "risk": round(std, 6)})
            else:
                return json.dumps({"error": f"Unknown operation: {operation}"})
        except Exception as e:
            return json.dumps({"error": str(e)})

    def get_spec(self) -> Dict[str, Any]:
        return {
            "name": self.name,
            "description": self.description,
            "parameters": {
                "type": "object",
                "properties": {
                    "operation": {"type": "string", "description": "portfolio_return, portfolio_risk, sharpe_ratio, asset_info, or correlation"},
                    "weights": {"type": "object", "description": "Asset weights dict, e.g. {\"US_EQUITY\": 0.4, \"BONDS\": 0.6}"},
                    "asset": {"type": "string"}, "asset1": {"type": "string"}, "asset2": {"type": "string"},
                    "risk_free_rate": {"type": "number"},
                },
                "required": ["operation"],
            },
        }


print("Tools defined: financial_calc, company_data, market_benchmark, portfolio_math")

Tools defined: financial_calc, company_data, market_benchmark, portfolio_math


In [5]:
# ═══════════════════════════════════════════════════════════════════
# TOOL 1: financial_calc  — core financial formulas
# ═══════════════════════════════════════════════════════════════════

class FinancialCalcTool(BaseTool):
    """Computes standard financial metrics from provided inputs."""

    def __init__(self):
        super().__init__(
            name="financial_calc",
            description=(
                "Financial calculator. Operations:\n"
                "  npv        — Net Present Value. Args: rate (discount rate, e.g. 0.10), cash_flows (list of floats, year 0 first)\n"
                "  irr        — Internal Rate of Return. Args: cash_flows (list of floats, year 0 first)\n"
                "  wacc       — Weighted Avg Cost of Capital. Args: equity_value, debt_value, cost_of_equity, cost_of_debt, tax_rate\n"
                "  ltv        — Customer Lifetime Value. Args: arpu_monthly, monthly_churn_rate, gross_margin, expansion_rate (optional, default 0)\n"
                "  dcf_terminal — DCF terminal value (Gordon Growth). Args: final_year_fcf, terminal_growth, discount_rate\n"
                "  loan_payment — Monthly loan payment. Args: principal, annual_rate, years\n"
                "  capm       — Cost of equity via CAPM. Args: risk_free_rate, beta, equity_risk_premium, size_premium (optional, default 0)\n"
                "  ev         — Enterprise Value. Args: equity_value, total_debt, cash\n"
                "  expression — Evaluate a math expression. Args: expr (string)\n"
            ),
        )

    async def initialize(self):
        pass

    async def execute(self, operation: str = "expression", **kwargs) -> str:
        try:
            if isinstance(operation, list):
                operation = str(operation[0]) if operation else "expression"
            op = operation.lower().strip()

            if op == "npv":
                rate = float(kwargs["rate"])
                cfs = [float(x) for x in kwargs["cash_flows"]]
                result = sum(cf / (1 + rate) ** t for t, cf in enumerate(cfs))
                return json.dumps({"npv": round(result, 2)})

            elif op == "irr":
                cfs = [float(x) for x in kwargs["cash_flows"]]
                lo, hi = -0.5, 5.0
                for _ in range(200):
                    mid = (lo + hi) / 2
                    npv = sum(cf / (1 + mid) ** t for t, cf in enumerate(cfs))
                    if npv > 0:
                        lo = mid
                    else:
                        hi = mid
                return json.dumps({"irr": round((lo + hi) / 2, 6)})

            elif op == "wacc":
                E = float(kwargs["equity_value"])
                D = float(kwargs["debt_value"])
                ke = float(kwargs["cost_of_equity"])
                kd = float(kwargs["cost_of_debt"])
                t = float(kwargs["tax_rate"])
                V = E + D
                wacc = (E / V) * ke + (D / V) * kd * (1 - t)
                return json.dumps({"wacc": round(wacc, 6)})

            elif op == "ltv":
                arpu = float(kwargs["arpu_monthly"])
                churn = float(kwargs["monthly_churn_rate"])
                gm = float(kwargs["gross_margin"])
                exp = float(kwargs.get("expansion_rate", 0))
                effective_churn = max(churn - exp, 0.001)
                ltv = (arpu * gm) / effective_churn
                return json.dumps({"ltv": round(ltv, 2), "effective_monthly_churn": round(effective_churn, 6)})

            elif op == "dcf_terminal":
                fcf = float(kwargs["final_year_fcf"])
                g = float(kwargs["terminal_growth"])
                r = float(kwargs["discount_rate"])
                tv = fcf * (1 + g) / (r - g)
                return json.dumps({"terminal_value": round(tv, 2)})

            elif op == "loan_payment":
                P = float(kwargs["principal"])
                r_annual = float(kwargs["annual_rate"])
                years = float(kwargs["years"])
                r_m = r_annual / 12
                n = int(years * 12)
                if r_m == 0:
                    pmt = P / n
                else:
                    pmt = P * (r_m * (1 + r_m) ** n) / ((1 + r_m) ** n - 1)
                total = pmt * n
                return json.dumps({"monthly_payment": round(pmt, 2), "total_paid": round(total, 2), "total_interest": round(total - P, 2)})

            elif op == "capm":
                rf = float(kwargs["risk_free_rate"])
                beta = float(kwargs["beta"])
                erp = float(kwargs["equity_risk_premium"])
                sp = float(kwargs.get("size_premium", 0))
                ke = rf + beta * erp + sp
                return json.dumps({"cost_of_equity": round(ke, 6)})

            elif op == "ev":
                eq = float(kwargs["equity_value"])
                debt = float(kwargs["total_debt"])
                cash = float(kwargs["cash"])
                return json.dumps({"enterprise_value": round(eq + debt - cash, 2)})

            elif op == "expression":
                safe_ns = {"pi": math.pi, "sqrt": math.sqrt, "abs": abs,
                           "round": round, "min": min, "max": max, "sum": sum}
                result = eval(str(kwargs["expr"]), {"__builtins__": {}}, safe_ns)
                return json.dumps({"result": round(float(result), 4)})

            else:
                return json.dumps({"error": f"Unknown operation: {operation}"})
        except Exception as e:
            return json.dumps({"error": str(e)})

    def get_spec(self) -> Dict[str, Any]:
        return {
            "name": self.name,
            "description": self.description,
            "parameters": {
                "type": "object",
                "properties": {
                    "operation": {"type": "string", "description": "One of: npv, irr, wacc, ltv, dcf_terminal, loan_payment, capm, ev, expression"},
                    "rate": {"type": "number", "description": "Discount rate for NPV"},
                    "cash_flows": {"type": "array", "items": {"type": "number"}, "description": "Cash flows starting from year 0"},
                    "equity_value": {"type": "number"}, "debt_value": {"type": "number"},
                    "cost_of_equity": {"type": "number"}, "cost_of_debt": {"type": "number"},
                    "tax_rate": {"type": "number"},
                    "arpu_monthly": {"type": "number"}, "monthly_churn_rate": {"type": "number"},
                    "gross_margin": {"type": "number"}, "expansion_rate": {"type": "number"},
                    "final_year_fcf": {"type": "number"}, "terminal_growth": {"type": "number"},
                    "discount_rate": {"type": "number"},
                    "principal": {"type": "number"}, "annual_rate": {"type": "number"},
                    "years": {"type": "number"},
                    "risk_free_rate": {"type": "number"}, "beta": {"type": "number"},
                    "equity_risk_premium": {"type": "number"}, "size_premium": {"type": "number"},
                    "total_debt": {"type": "number"}, "cash": {"type": "number"},
                    "expr": {"type": "string", "description": "Math expression to evaluate"},
                },
                "required": ["operation"],
            },
        }


# ═══════════════════════════════════════════════════════════════════
# TOOL 2: company_data  — query embedded company financials
# ═══════════════════════════════════════════════════════════════════

class CompanyDataTool(BaseTool):
    def __init__(self):
        super().__init__(
            name="company_data",
            description=(
                "Query company financial data. Args:\n"
                "  company — name: CloudVault, DataForge, or SecureNet\n"
                "  fields  — comma-separated metric names (e.g. 'annual_revenue,ebitda,monthly_churn_rate')\n"
                "           Use 'all' for everything, 'projections' for growth projections.\n"
                "Available metric fields: monthly_revenue, annual_revenue, revenue_growth_rate, "
                "gross_margin, operating_margin, ebitda_margin, ebitda, net_income, total_debt, "
                "cash, total_equity, shares_outstanding, monthly_churn_rate, arpu_monthly, cac, "
                "expansion_revenue_rate, customers, employees, capex_annual, depreciation, "
                "tax_rate, interest_expense"
            ),
        )

    async def initialize(self):
        pass

    async def execute(self, company: str, fields: str = "all") -> str:
        if isinstance(fields, list):
            fields = ",".join(str(f) for f in fields)
        if isinstance(company, list):
            company = str(company[0]) if company else ""
        company = company.strip()
        match = None
        for name in COMPANIES:
            if name.lower() == company.lower():
                match = name
                break
        if not match:
            return json.dumps({"error": f"Unknown company: {company}. Available: {list(COMPANIES.keys())}"})

        data = COMPANIES[match]
        if fields.strip().lower() == "all":
            return json.dumps({"company": match, **data["metrics"]})
        elif fields.strip().lower() == "projections":
            return json.dumps({"company": match, **data["projections"]})
        else:
            result = {"company": match}
            for f in fields.split(","):
                f = f.strip()
                if f in data["metrics"]:
                    result[f] = data["metrics"][f]
                elif f in data["projections"]:
                    result[f] = data["projections"][f]
                else:
                    result[f] = f"unknown field"
            return json.dumps(result)

    def get_spec(self) -> Dict[str, Any]:
        return {
            "name": self.name,
            "description": self.description,
            "parameters": {
                "type": "object",
                "properties": {
                    "company": {"type": "string", "description": "Company name: CloudVault, DataForge, or SecureNet"},
                    "fields": {"type": "string", "description": "Comma-separated fields or 'all' or 'projections'"},
                },
                "required": ["company"],
            },
        }


# ═══════════════════════════════════════════════════════════════════
# TOOL 3: market_benchmark  — industry comparables
# ═══════════════════════════════════════════════════════════════════

class MarketBenchmarkTool(BaseTool):
    def __init__(self):
        super().__init__(
            name="market_benchmark",
            description=(
                "Get industry benchmark data. Args:\n"
                "  sector — one of: saas, cybersecurity, data_analytics\n"
                "  fields — comma-separated or 'all'. Available: median_revenue_multiple, "
                "median_ebitda_multiple, median_growth_rate, risk_free_rate, equity_risk_premium, "
                "small_cap_premium, median_beta, median_ltv_cac_ratio, median_churn_monthly, "
                "median_gross_margin"
            ),
        )

    async def initialize(self):
        pass

    async def execute(self, sector: str, fields: str = "all") -> str:
        if isinstance(fields, list):
            fields = ",".join(str(f) for f in fields)
        if isinstance(sector, list):
            sector = str(sector[0]) if sector else ""
        sector = sector.strip().lower().replace(" ", "_")
        if sector not in MARKET_DATA:
            return json.dumps({"error": f"Unknown sector: {sector}. Available: {list(MARKET_DATA.keys())}"})
        data = MARKET_DATA[sector]
        if fields.strip().lower() == "all":
            return json.dumps({"sector": sector, **data})
        result = {"sector": sector}
        for f in fields.split(","):
            f = f.strip()
            if f in data:
                result[f] = data[f]
            else:
                result[f] = "unknown field"
        return json.dumps(result)

    def get_spec(self) -> Dict[str, Any]:
        return {
            "name": self.name,
            "description": self.description,
            "parameters": {
                "type": "object",
                "properties": {
                    "sector": {"type": "string", "description": "Sector: saas, cybersecurity, or data_analytics"},
                    "fields": {"type": "string", "description": "Comma-separated fields or 'all'"},
                },
                "required": ["sector"],
            },
        }


# ═══════════════════════════════════════════════════════════════════
# TOOL 4: portfolio_math  — portfolio-level calculations
# ═══════════════════════════════════════════════════════════════════

class PortfolioMathTool(BaseTool):
    def __init__(self):
        super().__init__(
            name="portfolio_math",
            description=(
                "Portfolio calculations. Args:\n"
                "  operation — one of:\n"
                "    portfolio_return: weighted return. Args: weights (dict asset->weight)\n"
                "    portfolio_risk: portfolio std dev using correlations. Args: weights (dict asset->weight)\n"
                "    sharpe_ratio: (portfolio_return - risk_free) / portfolio_risk. Args: weights (dict), risk_free_rate\n"
                "    asset_info: get return/risk for an asset. Args: asset (name)\n"
                "    correlation: get correlation between two assets. Args: asset1, asset2\n"
                "  Available assets: US_EQUITY, INTL_EQUITY, BONDS, REAL_ESTATE"
            ),
        )

    async def initialize(self):
        pass

    async def execute(self, operation: str = "asset_info", **kwargs) -> str:
        try:
            if isinstance(operation, list):
                operation = str(operation[0]) if operation else "asset_info"
            op = operation.lower().strip()

            if op == "asset_info":
                a = kwargs["asset"].upper().strip()
                if a not in PORTFOLIO_ASSETS:
                    return json.dumps({"error": f"Unknown asset: {a}"})
                return json.dumps({"asset": a, **PORTFOLIO_ASSETS[a]})

            elif op == "correlation":
                a1 = kwargs["asset1"].upper().strip()
                a2 = kwargs["asset2"].upper().strip()
                key = (a1, a2)
                if key not in CORRELATION_MATRIX:
                    return json.dumps({"error": f"Unknown pair: {a1}, {a2}"})
                return json.dumps({"asset1": a1, "asset2": a2, "correlation": CORRELATION_MATRIX[key]})

            elif op == "portfolio_return":
                weights = kwargs["weights"]
                if isinstance(weights, str):
                    weights = json.loads(weights)
                ret = sum(
                    float(w) * PORTFOLIO_ASSETS[a.upper()]["expected_return"]
                    for a, w in weights.items()
                )
                return json.dumps({"portfolio_return": round(ret, 6)})

            elif op == "portfolio_risk":
                weights = kwargs["weights"]
                if isinstance(weights, str):
                    weights = json.loads(weights)
                assets = list(weights.keys())
                variance = 0.0
                for i, a1 in enumerate(assets):
                    for j, a2 in enumerate(assets):
                        w1 = float(weights[a1])
                        w2 = float(weights[a2])
                        s1 = PORTFOLIO_ASSETS[a1.upper()]["std_dev"]
                        s2 = PORTFOLIO_ASSETS[a2.upper()]["std_dev"]
                        corr = CORRELATION_MATRIX.get((a1.upper(), a2.upper()), 0)
                        variance += w1 * w2 * s1 * s2 * corr
                return json.dumps({"portfolio_variance": round(variance, 8), "portfolio_std_dev": round(math.sqrt(variance), 6)})

            elif op == "sharpe_ratio":
                weights = kwargs["weights"]
                if isinstance(weights, str):
                    weights = json.loads(weights)
                rf = float(kwargs.get("risk_free_rate", 0.043))
                ret = sum(float(w) * PORTFOLIO_ASSETS[a.upper()]["expected_return"] for a, w in weights.items())
                assets = list(weights.keys())
                variance = 0.0
                for a1 in assets:
                    for a2 in assets:
                        w1, w2 = float(weights[a1]), float(weights[a2])
                        s1 = PORTFOLIO_ASSETS[a1.upper()]["std_dev"]
                        s2 = PORTFOLIO_ASSETS[a2.upper()]["std_dev"]
                        corr = CORRELATION_MATRIX.get((a1.upper(), a2.upper()), 0)
                        variance += w1 * w2 * s1 * s2 * corr
                std = math.sqrt(variance)
                sharpe = (ret - rf) / std if std > 0 else 0
                return json.dumps({"sharpe_ratio": round(sharpe, 4), "return": round(ret, 6), "risk": round(std, 6)})
            else:
                return json.dumps({"error": f"Unknown operation: {operation}"})
        except Exception as e:
            return json.dumps({"error": str(e)})

    def get_spec(self) -> Dict[str, Any]:
        return {
            "name": self.name,
            "description": self.description,
            "parameters": {
                "type": "object",
                "properties": {
                    "operation": {"type": "string", "description": "portfolio_return, portfolio_risk, sharpe_ratio, asset_info, or correlation"},
                    "weights": {"type": "object", "description": "Asset weights dict, e.g. {\"US_EQUITY\": 0.4, \"BONDS\": 0.6}"},
                    "asset": {"type": "string"}, "asset1": {"type": "string"}, "asset2": {"type": "string"},
                    "risk_free_rate": {"type": "number"},
                },
                "required": ["operation"],
            },
        }


print("Tools defined: financial_calc, company_data, market_benchmark, portfolio_math")

Tools defined: financial_calc, company_data, market_benchmark, portfolio_math


## Step 3: Ground Truth

We compute the mathematically correct answers for each task. This lets us measure accuracy objectively — the agent doesn't see these values.

In [6]:
def compute_ground_truth():
    gt = {}

    # ── Task 1: DataForge WACC ──
    m = COMPANIES["DataForge"]["metrics"]
    mkt = MARKET_DATA["data_analytics"]
    ke = mkt["risk_free_rate"] + mkt["median_beta"] * mkt["equity_risk_premium"] + mkt["small_cap_premium"]
    kd = m["interest_expense"] / m["total_debt"]
    E, D = m["total_equity"], m["total_debt"]
    wacc = (E / (E + D)) * ke + (D / (E + D)) * kd * (1 - m["tax_rate"])
    gt["task1"] = {"wacc": round(wacc, 6), "ke": round(ke, 6), "kd": round(kd, 4)}

    # ── Task 2: CloudVault DCF Price Per Share ──
    def dcf_model(co_name, sector):
        m2 = COMPANIES[co_name]["metrics"]
        p = COMPANIES[co_name]["projections"]
        mkt2 = MARKET_DATA[sector]
        ke2 = mkt2["risk_free_rate"] + mkt2["median_beta"] * mkt2["equity_risk_premium"] + mkt2["small_cap_premium"]
        kd2 = m2["interest_expense"] / m2["total_debt"]
        E2, D2 = m2["total_equity"], m2["total_debt"]
        wacc2 = (E2 / (E2 + D2)) * ke2 + (D2 / (E2 + D2)) * kd2 * (1 - m2["tax_rate"])
        rev = m2["annual_revenue"]
        growths = [p[f"year{i}_growth"] for i in range(1, 6)]
        capex_ratio = m2["capex_annual"] / m2["annual_revenue"]
        depr_ratio = m2["depreciation"] / m2["annual_revenue"]
        fcfs = []
        for i, g in enumerate(growths):
            rev = rev * (1 + g)
            margin = m2["operating_margin"] + (p["target_operating_margin"] - m2["operating_margin"]) * (i + 1) / 5
            op_income = rev * margin
            fcf = op_income * (1 - m2["tax_rate"]) + rev * depr_ratio - rev * capex_ratio
            fcfs.append(fcf)
        tv = fcfs[-1] * (1 + p["terminal_growth"]) / (wacc2 - p["terminal_growth"])
        pv_fcfs = sum(fcf / (1 + wacc2) ** (t + 1) for t, fcf in enumerate(fcfs))
        pv_tv = tv / (1 + wacc2) ** 5
        ev = pv_fcfs + pv_tv
        equity = ev - m2["total_debt"] + m2["cash"]
        pps = equity / m2["shares_outstanding"]
        return {"wacc": wacc2, "ev": ev, "equity": equity, "pps": pps, "fcfs": fcfs, "tv": tv, "year5_fcf": fcfs[-1]}

    cv_dcf = dcf_model("CloudVault", "saas")
    gt["task2"] = {"price_per_share": round(cv_dcf["pps"], 2)}

    # ── Task 3: DataForge DCF Enterprise Value ──
    df_dcf = dcf_model("DataForge", "data_analytics")
    gt["task3"] = {"enterprise_value": round(df_dcf["ev"], 2)}

    # ── Task 4: Cross-Company EV/EBITDA Ranking ──
    df_m = COMPANIES["DataForge"]["metrics"]
    ev_ebitda_multiples = {}
    for name, co in COMPANIES.items():
        m3 = co["metrics"]
        ev = m3["total_equity"] + m3["total_debt"] - m3["cash"]
        ev_ebitda_multiples[name] = ev / m3["ebitda"]
    gt["task4"] = {"best_ev_ebitda": round(min(ev_ebitda_multiples.values()), 2)}

    # ── Task 5: LBO IRR — SecureNet at 30x EBITDA ──
    sn_m = COMPANIES["SecureNet"]["metrics"]
    sn_p = COMPANIES["SecureNet"]["projections"]
    entry_ev = sn_m["ebitda"] * 30
    equity_invested = entry_ev * 0.40  # 40% equity
    remaining_debt = entry_ev * 0.60
    ebitda = sn_m["ebitda"]
    for i in range(1, 6):
        ebitda = ebitda * (1 + sn_p[f"year{i}_growth"])
        remaining_debt = max(remaining_debt - 15_000_000, 0)
    exit_ev = ebitda * 30
    exit_equity = exit_ev - remaining_debt
    cfs = [-equity_invested, 0, 0, 0, 0, exit_equity]
    lo, hi = -0.5, 5.0
    for _ in range(200):
        mid = (lo + hi) / 2
        npv = sum(cf / (1 + mid) ** t for t, cf in enumerate(cfs))
        if npv > 0: lo = mid
        else: hi = mid
    gt["task5"] = {"irr": round((lo + hi) / 2, 4)}

    # ── Task 6: Merger EPS — CloudVault + DataForge all-stock ──
    cv_m = COMPANIES["CloudVault"]["metrics"]
    acq_price = df_m["annual_revenue"] * 1.5
    cv_share_price = cv_m["total_equity"] / cv_m["shares_outstanding"]
    new_shares = acq_price / cv_share_price
    combined_ni = cv_m["net_income"] + df_m["net_income"]
    synergy = 2_000_000 * (1 - cv_m["tax_rate"])
    adj_ni = combined_ni + synergy
    total_shares = cv_m["shares_outstanding"] + new_shares
    gt["task6"] = {"combined_eps": round(adj_ni / total_shares, 4)}

    # ── Task 7: Comparative Revenue-Multiple Valuation ──
    sector_map = {"CloudVault": "saas", "DataForge": "data_analytics", "SecureNet": "cybersecurity"}
    rev_pps = {}
    for name, co in COMPANIES.items():
        m3 = co["metrics"]
        sector = sector_map[name]
        implied_ev = m3["annual_revenue"] * MARKET_DATA[sector]["median_revenue_multiple"]
        equity = implied_ev - m3["total_debt"] + m3["cash"]
        pps = equity / m3["shares_outstanding"]
        rev_pps[name] = pps
    gt["task7"] = {"highest_pps": round(max(rev_pps.values()), 2)}

    # ── Task 8: Post-Acquisition DSCR ──
    mkt_da = MARKET_DATA["data_analytics"]
    rev_ev = df_m["annual_revenue"] * mkt_da["median_revenue_multiple"]
    acq_debt = rev_ev * 0.50
    r_m = 0.07 / 12
    n = 84
    pmt = acq_debt * (r_m * (1 + r_m)**n) / ((1 + r_m)**n - 1)
    annual_ds = pmt * 12
    y1_ebitda = df_m["ebitda"] * (1 + COMPANIES["DataForge"]["projections"]["year1_growth"])
    gt["task8"] = {"dscr": round(y1_ebitda / annual_ds, 4)}

    return gt

GROUND_TRUTH_ANSWERS = compute_ground_truth()
for tid, vals in GROUND_TRUTH_ANSWERS.items():
    print(f"  {tid}: {vals}")

  task1: {'wacc': 0.12108, 'ke': 0.1395, 'kd': 0.06}
  task2: {'price_per_share': 10.95}
  task3: {'enterprise_value': 48086188.95}
  task4: {'best_ev_ebitda': 6.97}
  task5: {'irr': 0.3927}
  task6: {'combined_eps': 0.1463}
  task7: {'highest_pps': 27.87}
  task8: {'dscr': 0.0348}


## Step 4: Define the 8 PE Due Diligence Tasks

Each task is a real PE analysis scenario. The agent must figure out which tools to call, in what order, and how to combine intermediate results into a final answer.

In [7]:
COMPLEX_TASKS = [
    {
        "id": "task1",
        "name": "DataForge WACC",
        "question": (
            "Calculate DataForge's Weighted Average Cost of Capital (WACC). Steps:\n"
            "1. Get DataForge financials: total_equity, total_debt, tax_rate, interest_expense.\n"
            "2. Get data_analytics market benchmarks: risk_free_rate, equity_risk_premium, median_beta, small_cap_premium.\n"
            "3. Use financial_calc 'capm' to compute cost of equity (Ke = risk_free + beta*ERP + size_premium).\n"
            "4. Compute cost of debt: Kd = interest_expense / total_debt (use financial_calc 'expression').\n"
            "5. Use financial_calc 'wacc' with equity_value, debt_value, cost_of_equity, cost_of_debt, tax_rate.\n"
            "Report the WACC as a decimal number (e.g., 0.12 for 12%)."
        ),
        "verify_key": "wacc",
        "tolerance": 0.02,
        "extract_hint": "wacc",
        "complexity": "5-step: 2 data lookups → CAPM → cost of debt → WACC formula",
        "trap": "Using wrong beta/premium, forgetting tax shield on debt",
    },
    {
        "id": "task2",
        "name": "CloudVault DCF Valuation",
        "question": (
            "Perform a full DCF valuation of CloudVault to find its intrinsic price per share. Steps:\n"
            "1. Get CloudVault financials: annual_revenue, operating_margin, total_debt, cash, "
            "total_equity, shares_outstanding, capex_annual, depreciation, tax_rate, interest_expense.\n"
            "2. Get CloudVault projections: year1-5 growth rates, terminal_growth, target_operating_margin.\n"
            "3. Get SaaS market benchmarks: risk_free_rate, equity_risk_premium, median_beta, small_cap_premium.\n"
            "4. Calculate WACC via CAPM (same method as Task 1 but for CloudVault/SaaS).\n"
            "5. Project revenues for 5 years using growth rates.\n"
            "6. For each year, operating margin linearly ramps from current (0.12) to target (0.25).\n"
            "   FCF = Operating_Income*(1-tax) + Revenue*depr_ratio - Revenue*capex_ratio,\n"
            "   where depr_ratio = depreciation/annual_revenue, capex_ratio = capex_annual/annual_revenue.\n"
            "7. Terminal Value = Year5_FCF * (1+terminal_growth) / (WACC - terminal_growth).\n"
            "8. EV = sum of PV(each FCF) + PV(Terminal Value), discounted at WACC.\n"
            "9. Equity = EV - total_debt + cash. Price per share = Equity / shares_outstanding.\n"
            "Report the price per share as a number."
        ),
        "verify_key": "price_per_share",
        "tolerance": 0.10,
        "extract_hint": "price per share",
        "complexity": "9-step: 3 data lookups → WACC → 5yr FCF projection → terminal value → discount → equity",
        "trap": "Wrong margin ramp, forgetting to discount TV, wrong capex/depr ratios",
    },
    {
        "id": "task3",
        "name": "DataForge DCF Enterprise Value",
        "question": (
            "Calculate DataForge's intrinsic Enterprise Value using a 5-year DCF model.\n"
            "1. Get DataForge financials: annual_revenue, operating_margin, capex_annual, depreciation, "
            "tax_rate, interest_expense, total_equity, total_debt.\n"
            "2. Get DataForge projections: year1_growth through year5_growth, terminal_growth, target_operating_margin.\n"
            "3. Get data_analytics market benchmarks: risk_free_rate, equity_risk_premium, median_beta, small_cap_premium.\n"
            "4. Calculate WACC via CAPM (same method as Task 1).\n"
            "5. Compute two ratios from current financials:\n"
            "   capex_ratio = capex_annual / annual_revenue\n"
            "   depr_ratio = depreciation / annual_revenue\n"
            "6. Project 5 years of revenue and FCF. For each year i (1 to 5):\n"
            "   Revenue_i = Revenue_{i-1} * (1 + year_i_growth)\n"
            "   Margin_i = current_operating_margin + (target_operating_margin - current_operating_margin) * i/5\n"
            "   (This linearly ramps margin from -0.05 to 0.30.)\n"
            "   Operating_Income_i = Revenue_i * Margin_i\n"
            "   FCF_i = Operating_Income_i * (1 - tax_rate) + Revenue_i * depr_ratio - Revenue_i * capex_ratio\n"
            "   IMPORTANT: Use financial_calc 'expression' for each year's calculation.\n"
            "7. Terminal Value = Year5_FCF * (1 + terminal_growth) / (WACC - terminal_growth).\n"
            "8. Discount EACH FCF: PV(FCF_i) = FCF_i / (1+WACC)^i for i=1..5.\n"
            "   PV(TV) = Terminal_Value / (1+WACC)^5.\n"
            "9. Enterprise Value = sum of all PV(FCFs) + PV(TV).\n"
            "Note: DataForge starts at -5% operating margin, so early FCFs will be NEGATIVE. This is expected.\n"
            "Report ONLY the Enterprise Value as a single number (the total EV, NOT per share)."
        ),
        "verify_key": "enterprise_value",
        "tolerance": 0.10,
        "extract_hint": "enterprise value|EV",
        "complexity": "7-step: data → WACC → negative-FCF projection → terminal value → discount",
        "trap": "Negative FCFs confuse the model, wrong margin ramp for negative starting point",
    },
    {
        "id": "task4",
        "name": "Cross-Company EV/EBITDA Ranking",
        "question": (
            "Rank all 3 companies (CloudVault, DataForge, SecureNet) by EV/EBITDA multiple.\n"
            "For EACH company:\n"
            "1. Use company_data to get total_equity, total_debt, cash, and ebitda.\n"
            "2. Use financial_calc 'ev' to compute Enterprise Value = total_equity + total_debt - cash.\n"
            "3. Use financial_calc 'expression' to compute EV/EBITDA = Enterprise_Value / ebitda.\n"
            "After computing all 3 multiples, identify the company with the LOWEST EV/EBITDA "
            "(this represents the best value, cheapest relative to earnings).\n"
            "Report the LOWEST EV/EBITDA multiple as a number."
        ),
        "verify_key": "best_ev_ebitda",
        "tolerance": 0.05,
        "extract_hint": "ev.ebitda|lowest|best value|multiple",
        "complexity": "9-step: 3 companies x (data + EV calc + ratio) + comparison",
        "trap": "Wrong EV formula (forgetting to subtract cash), comparing wrong direction",
    },
    {
        "id": "task5",
        "name": "SecureNet LBO Equity IRR",
        "question": (
            "A PE firm buys SecureNet in a leveraged buyout at 30x current EBITDA.\n"
            "Structure: 60% debt, 40% equity. Debt is paid down by $15,000,000 per year (capped at remaining balance).\n"
            "Steps:\n"
            "1. Get SecureNet's current EBITDA using company_data.\n"
            "2. Get SecureNet's projections (year1_growth through year5_growth).\n"
            "3. Entry EV = current EBITDA * 30.\n"
            "4. Equity invested = 40% of Entry EV. Initial debt = 60% of Entry EV.\n"
            "5. Project EBITDA for each of 5 years: EBITDA_year_i = EBITDA_{i-1} * (1 + year_i_growth).\n"
            "   Use financial_calc 'expression' for each year.\n"
            "6. After 5 years of $15M/year paydown, remaining debt = max(initial_debt - 5*15000000, 0).\n"
            "7. Exit EV = Year5_EBITDA * 30 (same exit multiple).\n"
            "8. Exit equity = Exit_EV - remaining_debt.\n"
            "9. Use financial_calc 'irr' with cash_flows = [-equity_invested, 0, 0, 0, 0, exit_equity].\n"
            "   These 6 values represent year 0 (investment) through year 5 (exit).\n"
            "Report the IRR as a DECIMAL (e.g., 0.25 means 25% annual return). This is the return to equity holders."
        ),
        "verify_key": "irr",
        "tolerance": 0.05,
        "extract_hint": "irr|internal rate of return",
        "complexity": "6-step: data → entry structure → 5yr EBITDA projection → debt schedule → exit → IRR",
        "trap": "Wrong EBITDA compounding, forgetting to cap debt paydown at 0, wrong cash flow vector",
    },
    {
        "id": "task6",
        "name": "Merger EPS Accretion",
        "question": (
            "CloudVault acquires DataForge in an all-stock deal at 1.5x DataForge's revenue.\n"
            "Calculate the combined post-merger EPS with $2M annual cost synergies.\n"
            "Steps:\n"
            "1. Get CloudVault: net_income, total_equity, shares_outstanding.\n"
            "2. Get DataForge: net_income, annual_revenue.\n"
            "3. Acquisition price = DataForge annual_revenue * 1.5.\n"
            "4. CloudVault share price = total_equity / shares_outstanding.\n"
            "5. New shares issued = acquisition_price / CloudVault_share_price.\n"
            "6. Combined net income = CloudVault NI + DataForge NI.\n"
            "7. After-tax synergies = $2,000,000 * (1 - CloudVault tax_rate). Get tax_rate from company_data.\n"
            "8. Adjusted combined NI = combined NI + after-tax synergies.\n"
            "9. Total shares = CloudVault existing + new shares issued.\n"
            "10. Combined EPS = adjusted NI / total shares.\n"
            "Report the combined EPS as a number."
        ),
        "verify_key": "combined_eps",
        "tolerance": 0.05,
        "extract_hint": "combined eps|eps|earnings per share",
        "complexity": "10-step: 2 company lookups → deal structure → share dilution → synergies → EPS",
        "trap": "Forgetting after-tax on synergies, wrong share price, mixing up acquirer/target",
    },
    {
        "id": "task7",
        "name": "Comparative Revenue-Multiple Valuation",
        "question": (
            "Calculate the implied equity price per share for ALL 3 companies using their sector's revenue multiple.\n"
            "For EACH company:\n"
            "1. Use company_data to get annual_revenue, total_debt, cash, shares_outstanding.\n"
            "2. Use market_benchmark to get the median_revenue_multiple for that company's sector:\n"
            "   - CloudVault uses 'saas' sector\n"
            "   - DataForge uses 'data_analytics' sector\n"
            "   - SecureNet uses 'cybersecurity' sector\n"
            "3. Implied EV = annual_revenue * median_revenue_multiple.\n"
            "4. Equity Value = Implied_EV - total_debt + cash.\n"
            "5. Price Per Share = Equity_Value / shares_outstanding.\n"
            "After computing all 3, identify the company with the HIGHEST implied price per share.\n"
            "Report that HIGHEST price per share as a number."
        ),
        "verify_key": "highest_pps",
        "tolerance": 0.05,
        "extract_hint": "price per share|highest|pps",
        "complexity": "15-step: 3 companies x (data + market + EV calc + equity + PPS) + comparison",
        "trap": "Wrong sector mapping, forgetting debt/cash adjustment, wrong multiple",
    },
    {
        "id": "task8",
        "name": "Post-Acquisition Debt Service Coverage",
        "question": (
            "If DataForge is acquired at its revenue-multiple EV with 50% debt financing,\n"
            "calculate the Year 1 Debt Service Coverage Ratio (DSCR).\n"
            "Steps:\n"
            "1. Get DataForge annual_revenue, ebitda, and year1_growth projection.\n"
            "2. Get data_analytics median_revenue_multiple.\n"
            "3. Acquisition EV = annual_revenue * median_revenue_multiple.\n"
            "4. Acquisition debt = 50% of Acquisition EV.\n"
            "5. Loan terms: 7 years, 7.0% annual rate. Use financial_calc 'loan_payment'.\n"
            "6. Annual debt service = monthly_payment * 12.\n"
            "7. Year 1 EBITDA = current EBITDA * (1 + year1_growth).\n"
            "8. DSCR = Year1_EBITDA / Annual_debt_service.\n"
            "A DSCR > 1.0 means the company can cover its debt payments.\n"
            "Report the DSCR as a number."
        ),
        "verify_key": "dscr",
        "tolerance": 0.05,
        "extract_hint": "dscr|debt service coverage",
        "complexity": "8-step: data → acquisition structure → loan amortization → projected EBITDA → ratio",
        "trap": "Using current EBITDA instead of Year 1, wrong loan calculation, forgetting annual conversion",
    },
]

print(f"PE Due Diligence tasks defined: {len(COMPLEX_TASKS)}\n")
for t in COMPLEX_TASKS:
    gt_val = GROUND_TRUTH_ANSWERS[t["id"]][t["verify_key"]]
    print(f"  {t['id']}: {t['name']}")
    print(f"    Complexity: {t['complexity']}")
    print(f"    Ground truth ({t['verify_key']}): {gt_val}")
    print()

PE Due Diligence tasks defined: 8

  task1: DataForge WACC
    Complexity: 5-step: 2 data lookups → CAPM → cost of debt → WACC formula
    Ground truth (wacc): 0.12108

  task2: CloudVault DCF Valuation
    Complexity: 9-step: 3 data lookups → WACC → 5yr FCF projection → terminal value → discount → equity
    Ground truth (price_per_share): 10.95

  task3: DataForge DCF Enterprise Value
    Complexity: 7-step: data → WACC → negative-FCF projection → terminal value → discount
    Ground truth (enterprise_value): 48086188.95

  task4: Cross-Company EV/EBITDA Ranking
    Complexity: 9-step: 3 companies x (data + EV calc + ratio) + comparison
    Ground truth (best_ev_ebitda): 6.97

  task5: SecureNet LBO Equity IRR
    Complexity: 6-step: data → entry structure → 5yr EBITDA projection → debt schedule → exit → IRR
    Ground truth (irr): 0.3927

  task6: Merger EPS Accretion
    Complexity: 10-step: 2 company lookups → deal structure → share dilution → synergies → EPS
    Ground truth (com

## Step 5: Evaluation Harness

The evaluation harness runs each task, extracts the numeric answer, and compares against ground truth with tolerance. This is our objective scoring system.

In [8]:
# ═══════════════════════════════════════════════════════════════════
# LLM CALL TRACKER & SCORING
# ═══════════════════════════════════════════════════════════════════

class LLMCallTracker(BaseLLM):
    model_name: str = "tracker"

    def __init__(self, wrapped_llm: BaseLLM):
        super().__init__()
        self._wrapped = wrapped_llm
        self.model_name = getattr(wrapped_llm, "model_name", "unknown")
        self._call_count = 0

    @property
    def call_count(self):
        return self._call_count

    async def call(self, **kwargs):
        self._call_count += 1
        return await self._wrapped.call(**kwargs)

    def convert_tool_specs(self, tools):
        return self._wrapped.convert_tool_specs(tools)

    def create_completion(self, *args, **kwargs):
        return self._wrapped.create_completion(*args, **kwargs)

    def reset(self):
        self._call_count = 0


def extract_number(text: str, hint: str = "") -> Optional[float]:
    if text is None:
        return None
    text = str(text)

    # Try parsing as JSON first — tool outputs are often JSON
    try:
        data = json.loads(text.strip())
        if isinstance(data, dict):
            if hint:
                for h in hint.split("|"):
                    key = h.strip().replace(".", "_").replace(" ", "_").replace("/", "_").lower()
                    for k, v in data.items():
                        if key in k.lower().replace(" ", "_"):
                            try:
                                return float(v)
                            except (ValueError, TypeError):
                                pass
            for v in data.values():
                try:
                    return float(v)
                except (ValueError, TypeError):
                    pass
    except (json.JSONDecodeError, ValueError):
        pass

    if hint:
        for pat_word in hint.split("|"):
            clean = pat_word.strip()
            # JSON key:value pattern
            escaped = re.escape(clean).replace(r"\ ", r"[\s_]*").replace(r"\.", r"[\s_:/-]*")
            json_pat = rf'"{escaped}"[\s]*:[\s]*([\d.]+)'
            m = re.search(json_pat, text, re.IGNORECASE)
            if m:
                try:
                    return float(m.group(1))
                except ValueError:
                    pass

            pat_word_re = clean.replace(".", r"[\s_:/-]*").replace(" ", r"[\s_:/-]*")
            pattern = rf"(?:{pat_word_re})[\s:=]*\*?\*?\$?([\d,]+\.?\d*)"
            m = re.search(pattern, text, re.IGNORECASE)
            if m:
                try:
                    return float(m.group(1).replace(",", ""))
                except ValueError:
                    pass

    bold_patterns = [
        r"\*\*(?:final\s+)?(?:answer|result|total|ratio|score|sharpe|price)[:\s]*\*?\*?\s*\$?([\d,]+\.?\d*)",
        r"\*\*([\d,]+\.?\d*)\*\*",
    ]
    for pat in bold_patterns:
        matches = re.findall(pat, text, re.IGNORECASE)
        if matches:
            try:
                return float(matches[-1].replace(",", ""))
            except ValueError:
                pass

    answer_patterns = [
        r"(?:final\s+\w+|answer|result|total)[:\s]*\$?([\d,]+\.?\d*)",
        r"\$\s*([\d,]+\.?\d*)",
    ]
    for pat in answer_patterns:
        m = re.search(pat, text, re.IGNORECASE)
        if m:
            try:
                return float(m.group(1).replace(",", ""))
            except ValueError:
                pass

    numbers = re.findall(r"[\d,]+\.?\d+", text)
    if numbers:
        try:
            return float(numbers[-1].replace(",", ""))
        except ValueError:
            return None
    return None


def score_result(raw_result: str, task: Dict, gt: Dict) -> Dict:
    hint = task.get("extract_hint", "")
    extracted = extract_number(raw_result, hint)
    expected = gt[task["verify_key"]]
    tolerance = task.get("tolerance", 0.05)

    if extracted is None:
        return {"correct": False, "extracted": None, "expected": expected, "error": None}

    if isinstance(expected, (int, float)) and expected != 0:
        rel_error = abs(extracted - expected) / abs(expected)
        correct = rel_error <= tolerance
    else:
        correct = abs(extracted - expected) <= 0.01

    return {
        "correct": correct,
        "extracted": extracted,
        "expected": expected,
        "error": round(abs(extracted - expected), 4) if extracted is not None else None,
    }


@dataclass
class TaskResult:
    task_id: str
    task_name: str
    mode: str
    raw_result: str
    extracted_answer: Optional[float]
    expected_answer: float
    correct: bool
    error: Optional[float]
    time_seconds: float
    llm_calls: int


print("Evaluation infrastructure ready.")

Evaluation infrastructure ready.


In [9]:
def score_result(raw_result: str, task: Dict, gt: Dict) -> Dict:
    hint = task.get("extract_hint", "")
    extracted = extract_number(raw_result, hint)
    expected = gt[task["verify_key"]]
    tolerance = task.get("tolerance", 0.05)

    if extracted is None:
        return {"correct": False, "extracted": None, "expected": expected, "error": None}

    if isinstance(expected, (int, float)) and expected != 0:
        rel_error = abs(extracted - expected) / abs(expected)
        correct = rel_error <= tolerance
        if not correct and 0 < abs(expected) < 1 and abs(extracted) > 1:
            pct_converted = extracted / 100.0
            rel_error_pct = abs(pct_converted - expected) / abs(expected)
            if rel_error_pct <= tolerance:
                correct = True
                extracted = pct_converted
    else:
        correct = abs(extracted - expected) <= 0.01

    return {
        "correct": correct,
        "extracted": extracted,
        "expected": expected,
        "error": round(abs(extracted - expected), 4) if extracted is not None else None,
    }


def create_llm():
    if USE_OPENAI:
        return BaseOpenAI(model_name="gpt-5.2-2025-12-11")
    return MockLLM()


async def run_complex_evaluation(
    mode_name: str,
    execution_mode: ExecutionMode,
    extra_config: Optional[Dict] = None,
    extra_plugins: Optional[List] = None,
) -> List[TaskResult]:

    base_llm = create_llm()
    tracker = LLMCallTracker(base_llm)
    tools = [FinancialCalcTool(), CompanyDataTool(), MarketBenchmarkTool(), PortfolioMathTool()]

    config_kwargs = {
        "execution_mode": execution_mode,
        "verbose": False,
    }
    if extra_config:
        config_kwargs.update(extra_config)
    config = AgentConfig(**config_kwargs)
    plugins = [ToolCallLimitPlugin(max_calls=50)]
    if extra_plugins:
        plugins.extend(extra_plugins)

    agent = Agent(
        name=f"PEAnalyst-{mode_name}",
        role="Senior PE Due Diligence Analyst",
        objective=(
            "You are a private equity analyst performing due diligence on an acquisition target. "
            "Use the provided tools for ALL calculations — never compute math in your head. "
            "Use company_data for financials, market_benchmark for industry data, "
            "financial_calc for computations. Always report the final numeric answer clearly."
        ),
        llm=tracker,
        tools=tools,
        config=config,
        plugins=plugins,
    )
    await agent.initialize()

    print(f"\n{'='*70}")
    print(f"  MODE: {mode_name.upper()}  |  Tasks: {len(COMPLEX_TASKS)}")
    print(f"{'='*70}\n")

    results = []
    for i, task_data in enumerate(COMPLEX_TASKS):
        tracker.reset()
        task = Task(id=task_data["id"], objective=task_data["question"])
        gt = GROUND_TRUTH_ANSWERS[task_data["id"]]

        start = time.time()
        try:
            raw = await agent.execute(task)
        except Exception as e:
            raw = f"Error: {e}"
        elapsed = time.time() - start

        raw_str = str(raw)
        sc = score_result(raw_str, task_data, gt)

        tr = TaskResult(
            task_id=task_data["id"], task_name=task_data["name"],
            mode=mode_name, raw_result=raw_str[:500],
            extracted_answer=sc["extracted"], expected_answer=sc["expected"],
            correct=sc["correct"], error=sc["error"],
            time_seconds=round(elapsed, 1), llm_calls=tracker.call_count,
        )
        results.append(tr)
        mark = "V CORRECT" if sc["correct"] else "X WRONG"
        print(f"  [{i+1}/{len(COMPLEX_TASKS)}] {task_data['name']:40s} {mark:10s} "
              f"got={sc['extracted']!s:20s} exp={sc['expected']!s:20s} "
              f"{elapsed:>7.1f}s  {tracker.call_count:3d} calls")

    correct = sum(1 for r in results if r.correct)
    total_time = sum(r.time_seconds for r in results)
    total_calls = sum(r.llm_calls for r in results)
    print(f"\n  Summary: {correct}/{len(results)} correct ({100*correct//len(results)}%) "
          f"| {total_time:.1f}s | {total_calls} LLM calls")
    return results

## Step 6: Run Standard Mode (Baseline)

**Standard mode** = one continuous conversation between LLM and tools. The LLM decides which tool to call, processes the result, and continues until it has an answer. No orchestration, no validation — just the model doing its thing.

```
User task → LLM thinks → calls tool → gets result → thinks → calls tool → ... → final answer
```

In [10]:
standard_results = await run_complex_evaluation(
    mode_name="standard",
    execution_mode=ExecutionMode.STANDARD,
)

[INFO] PEAnalyst-standard: Initializing agent: PEAnalyst-standard
[INFO] PEAnalyst-standard: Agent initialization completed successfully
[INFO] PEAnalyst-standard: Agent 'PEAnalyst-standard' executing in STANDARD mode



  MODE: STANDARD  |  Tasks: 8



[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: market_benchmark
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Agent 'PEAnalyst-standard' executing in STANDARD mode


  [1/8] DataForge WACC                           V CORRECT  got=0.12108              exp=0.12108                 10.8s    5 calls


[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: market_benchmark
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst

  [2/8] CloudVault DCF Valuation                 V CORRECT  got=10.0874              exp=10.95                   46.5s   32 calls


[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: market_benchmark
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst

  [3/8] DataForge DCF Enterprise Value           V CORRECT  got=48086188.9477        exp=48086188.95             50.9s   35 calls


[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Agent 'PEAnalyst-standard' executing in STANDARD mode


  [4/8] Cross-Company EV/EBITDA Ranking          X WRONG    got=82000000.0           exp=6.97                    10.4s    4 calls


[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Agent 'PEAnalyst-standard' executing in STANDARD mode


  [5/8] SecureNet LBO Equity IRR                 X WRONG    got=0.268794             exp=0.3927                  12.7s    3 calls


[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Agent 'PEAnalyst-standard' executing in STANDARD mode


  [6/8] Merger EPS Accretion                     V CORRECT  got=0.1463               exp=0.1463                  15.8s   10 calls


[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: market_benchmark
[INFO] PEAnalyst-standard: Tool requested: market_benchmark
[INFO] PEAnalyst-standard: Tool requested: market_benchmark
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Agent 'PEAnalyst-standard' executing in STANDARD mode


  [7/8] Comparative Revenue-Multiple Valuation   V CORRECT  got=27.8667              exp=27.87                   13.9s    5 calls


[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: company_data
[INFO] PEAnalyst-standard: Tool requested: market_benchmark
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc
[INFO] PEAnalyst-standard: Tool requested: financial_calc


  [8/8] Post-Acquisition Debt Service Coverage   V CORRECT  got=0.0348               exp=0.0348                  13.1s    8 calls

  Summary: 6/8 correct (75%) | 174.1s | 102 LLM calls


## Step 7: Run Autonomous Mode (with External Validation)

**Autonomous mode** adds orchestration capabilities that Standard mode can't provide:

- **Parallel execution** — independent sub-tasks run simultaneously via isolated sub-agents
- **External validation** — plugin-based validators catch errors the LLM can't catch itself
- **Structured retry** — when validation fails, the agent retries with error context
- **Progress tracking** — every step is recorded

The key insight: **validation comes from external signals, not LLM self-assessment.** Below, we define a `PEFinancialValidator` that uses domain knowledge (e.g., "IRR should be between -1 and 10") to catch obviously wrong answers. This is a cheap regex check — no LLM calls — but it catches errors that an LLM verifier would miss (because the same model makes the same mistake).

In [11]:
import re
from nucleusiq.plugins.builtin.result_validator import ResultValidatorPlugin

# Project-level validator: domain-specific range checks for PE metrics.
# Subclasses the framework's generic ResultValidatorPlugin base class.
# This is NOT part of the framework -- it's specific to this PE notebook.

PE_METRIC_RULES = {
    "irr": (-1.0, 10.0),
    "wacc": (0.0, 0.50),
    "eps": (-500.0, 5000.0),
    "price_per_share": (0.0, 100_000.0),
}

PE_METRIC_PATTERNS = {
    "irr": re.compile(r"(?:IRR|internal\s+rate\s+of\s+return)[^0-9\-]*?(-?\d+(?:\.\d+)?(?:[eE][+-]?\d+)?)", re.IGNORECASE),
    "wacc": re.compile(r"(?:WACC|weighted\s+average\s+cost\s+of\s+capital)[^0-9\-]*?(-?\d+(?:\.\d+)?(?:[eE][+-]?\d+)?)", re.IGNORECASE),
    "eps": re.compile(r"(?:EPS|earnings\s+per\s+share)[^0-9\-]*?(-?\d+(?:\.\d+)?(?:[eE][+-]?\d+)?)", re.IGNORECASE),
}

PE_METRIC_ALIASES = {
    "irr": ["irr", "internal rate of return", "equity irr"],
    "wacc": ["wacc", "weighted average cost of capital", "cost of capital"],
    "eps": ["eps", "earnings per share", "accretion"],
}

class PEFinancialValidator(ResultValidatorPlugin):
    """Validates PE financial metrics fall within sensible ranges.
    Mechanical check -- no LLM involved, just regex + range bounds.

    Two-pass strategy:
      Pass 1 -- keyword-anchored: look for "IRR = 0.39" near the metric keyword.
      Pass 2 -- final-number fallback: if the task asks for a metric but Pass 1
               found nothing, extract the last standalone number in the response
               and check it against the range. This catches cases where the LLM
               reports the exit equity value instead of the IRR.
    """

    async def validate_result(self, result, context):
        text = str(result) if result else ""
        if not text.strip():
            return True, ""
        task = context.get("task_objective", "").lower()
        for metric, pattern in PE_METRIC_PATTERNS.items():
            if metric not in PE_METRIC_RULES:
                continue
            lo, hi = PE_METRIC_RULES[metric]
            keywords = PE_METRIC_ALIASES.get(metric, [metric])
            if not any(kw in task for kw in keywords):
                continue

            # Pass 1: keyword-anchored regex
            match = pattern.search(text)
            if match:
                try:
                    value = float(match.group(1))
                except ValueError:
                    continue
                if value < lo or value > hi:
                    return False, f"{metric.upper()}={value} is outside expected range [{lo}, {hi}]"
                continue

            # Pass 2: fallback -- check last number in output
            numbers = re.findall(r"-?\d+(?:\.\d+)?(?:[eE][+-]?\d+)?", text)
            if numbers:
                try:
                    last_val = float(numbers[-1])
                except ValueError:
                    continue
                if last_val < lo or last_val > hi:
                    return False, (
                        f"No explicit \'{metric.upper()}\' label found in output, "
                        f"but final numeric value {last_val} is outside expected "
                        f"range [{lo}, {hi}] for this metric"
                    )
        return True, ""

autonomous_results = await run_complex_evaluation(
    mode_name="autonomous",
    execution_mode=ExecutionMode.AUTONOMOUS,
    extra_plugins=[PEFinancialValidator()],
)

[INFO] PEAnalyst-autonomous: Initializing agent: PEAnalyst-autonomous
[INFO] PEAnalyst-autonomous: Agent initialization completed successfully
[INFO] PEAnalyst-autonomous: Agent 'PEAnalyst-autonomous' executing in AUTONOMOUS mode



  MODE: AUTONOMOUS  |  Tasks: 8



[INFO] PEAnalyst-autonomous: Task classified as SIMPLE — standard + validate
[INFO] PEAnalyst-autonomous: Attempt 1/3 [EXECUTE]
[INFO] PEAnalyst-autonomous: Tool requested: company_data
[INFO] PEAnalyst-autonomous: Tool requested: market_benchmark
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Attempt 1/3 [VALIDATE]: valid=True layer=all
[INFO] PEAnalyst-autonomous: Agent 'PEAnalyst-autonomous' executing in AUTONOMOUS mode


  [1/8] DataForge WACC                           V CORRECT  got=0.12108              exp=0.12108                 11.8s    6 calls


[INFO] PEAnalyst-autonomous: Task classified as SIMPLE — standard + validate
[INFO] PEAnalyst-autonomous: Attempt 1/3 [EXECUTE]
[INFO] PEAnalyst-autonomous: Tool requested: company_data
[INFO] PEAnalyst-autonomous: Tool requested: company_data
[INFO] PEAnalyst-autonomous: Tool requested: market_benchmark
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool 

  [2/8] CloudVault DCF Valuation                 V CORRECT  got=9.886                exp=10.95                   46.1s   29 calls


[INFO] PEAnalyst-autonomous: Task classified as SIMPLE — standard + validate
[INFO] PEAnalyst-autonomous: Attempt 1/3 [EXECUTE]
[INFO] PEAnalyst-autonomous: Tool requested: company_data
[INFO] PEAnalyst-autonomous: Tool requested: company_data
[INFO] PEAnalyst-autonomous: Tool requested: market_benchmark
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool 

  [3/8] DataForge DCF Enterprise Value           V CORRECT  got=48086188.9475        exp=48086188.95             34.6s   15 calls


[INFO] PEAnalyst-autonomous: Task classified as COMPLEX (3 sub-tasks) — decomposing
[INFO] PEAnalyst-autonomous: Decomposing into 3 sub-tasks (cap=5)
[INFO] PEAnalyst-autonomous-sub-sub1: Initializing agent: PEAnalyst-autonomous-sub-sub1
[INFO] PEAnalyst-autonomous-sub-sub1: Agent initialization completed successfully
[INFO] PEAnalyst-autonomous-sub-sub2: Initializing agent: PEAnalyst-autonomous-sub-sub2
[INFO] PEAnalyst-autonomous-sub-sub2: Agent initialization completed successfully
[INFO] PEAnalyst-autonomous-sub-sub3: Initializing agent: PEAnalyst-autonomous-sub-sub3
[INFO] PEAnalyst-autonomous-sub-sub3: Agent initialization completed successfully
[INFO] PEAnalyst-autonomous-sub-sub1: Agent 'PEAnalyst-autonomous-sub-sub1' executing in STANDARD mode
[INFO] PEAnalyst-autonomous-sub-sub2: Agent 'PEAnalyst-autonomous-sub-sub2' executing in STANDARD mode
[INFO] PEAnalyst-autonomous-sub-sub3: Agent 'PEAnalyst-autonomous-sub-sub3' executing in STANDARD mode
[INFO] PEAnalyst-autonomous-sub

  [4/8] Cross-Company EV/EBITDA Ranking          V CORRECT  got=6.9728               exp=6.97                    12.4s   15 calls


[INFO] PEAnalyst-autonomous: Task classified as SIMPLE — standard + validate
[INFO] PEAnalyst-autonomous: Attempt 1/3 [EXECUTE]
[INFO] PEAnalyst-autonomous: Tool requested: company_data
[INFO] PEAnalyst-autonomous: Tool requested: company_data
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Validat

  [5/8] SecureNet LBO Equity IRR                 V CORRECT  got=0.392714             exp=0.3927                  20.8s    9 calls


[INFO] PEAnalyst-autonomous: Task classified as SIMPLE — standard + validate
[INFO] PEAnalyst-autonomous: Attempt 1/3 [EXECUTE]
[INFO] PEAnalyst-autonomous: Tool requested: company_data
[INFO] PEAnalyst-autonomous: Tool requested: company_data
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Attempt 1/3 [VALIDATE]: valid=True layer=all
[INFO] PEAnalyst-autonomous: Agent 'PEAnalyst-autonomous' executing in AUTONOMOUS mode


  [6/8] Merger EPS Accretion                     V CORRECT  got=0.1463               exp=0.1463                  14.5s   11 calls


[INFO] PEAnalyst-autonomous: Task classified as COMPLEX (4 sub-tasks) — decomposing
[INFO] PEAnalyst-autonomous: Decomposing into 4 sub-tasks (cap=5)
[INFO] PEAnalyst-autonomous-sub-sub1: Initializing agent: PEAnalyst-autonomous-sub-sub1
[INFO] PEAnalyst-autonomous-sub-sub1: Agent initialization completed successfully
[INFO] PEAnalyst-autonomous-sub-sub2: Initializing agent: PEAnalyst-autonomous-sub-sub2
[INFO] PEAnalyst-autonomous-sub-sub2: Agent initialization completed successfully
[INFO] PEAnalyst-autonomous-sub-sub3: Initializing agent: PEAnalyst-autonomous-sub-sub3
[INFO] PEAnalyst-autonomous-sub-sub3: Agent initialization completed successfully
[INFO] PEAnalyst-autonomous-sub-sub4: Initializing agent: PEAnalyst-autonomous-sub-sub4
[INFO] PEAnalyst-autonomous-sub-sub4: Agent initialization completed successfully
[INFO] PEAnalyst-autonomous-sub-sub1: Agent 'PEAnalyst-autonomous-sub-sub1' executing in STANDARD mode
[INFO] PEAnalyst-autonomous-sub-sub2: Agent 'PEAnalyst-autonomous-s

  [7/8] Comparative Revenue-Multiple Valuation   V CORRECT  got=27.8667              exp=27.87                   23.3s   24 calls


[INFO] PEAnalyst-autonomous: Task classified as SIMPLE — standard + validate
[INFO] PEAnalyst-autonomous: Attempt 1/3 [EXECUTE]
[INFO] PEAnalyst-autonomous: Tool requested: company_data
[INFO] PEAnalyst-autonomous: Tool requested: market_benchmark
[INFO] PEAnalyst-autonomous: Tool requested: company_data
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Tool requested: financial_calc
[INFO] PEAnalyst-autonomous: Attempt 1/3 [VALIDATE]: valid=True layer=all


  [8/8] Post-Acquisition Debt Service Coverage   V CORRECT  got=0.0348               exp=0.0348                  12.9s   10 calls

  Summary: 8/8 correct (100%) | 176.4s | 119 LLM calls


## Step 8: Results — Standard vs Autonomous

Now let's compare the two modes head-to-head across all 8 PE scenarios.

In [12]:
print("=" * 100)
print("  PER-TASK COMPARISON: Standard vs Autonomous")
print("=" * 100)
print(f"  {'Task':<40s} | {'Std':>12s}     | {'Auto':>12s}     | {'Expected':>12s} | Note")
print("  " + "-" * 95)
for s, a in zip(standard_results, autonomous_results):
    std_v = f"{s.extracted_answer}" if s.extracted_answer is not None else "None"
    auto_v = f"{a.extracted_answer}" if a.extracted_answer is not None else "None"
    exp_v = f"{s.expected_answer}"
    s_mark = "V" if s.correct else "X"
    a_mark = "V" if a.correct else "X"
    note = ""
    if not s.correct and a.correct:
        note = "<-- FIXED"
    elif s.correct and not a.correct:
        note = "<-- REGRESSED"
    print(f"  {s.task_name:<40s} | {std_v:>10s}   {s_mark} | {auto_v:>10s}   {a_mark} | {exp_v:>12s} | {note}")

print()
print("=" * 100)
print("  KPI DASHBOARD — PE Due Diligence")
print("=" * 100)
std_correct = sum(1 for r in standard_results if r.correct)
auto_correct = sum(1 for r in autonomous_results if r.correct)
std_time = sum(r.time_seconds for r in standard_results)
auto_time = sum(r.time_seconds for r in autonomous_results)
std_calls = sum(r.llm_calls for r in standard_results)
auto_calls = sum(r.llm_calls for r in autonomous_results)
n = len(standard_results)
std_pct = 100 * std_correct // n
auto_pct = 100 * auto_correct // n
delta_pct = auto_pct - std_pct
fixes = sum(1 for s, a in zip(standard_results, autonomous_results) if not s.correct and a.correct)
regressions = sum(1 for s, a in zip(standard_results, autonomous_results) if s.correct and not a.correct)

print(f"  {'KPI':<40s} | {'Standard':>12s} | {'Autonomous':>12s} | {'Delta':>12s}")
print("  " + "-" * 85)
print(f"  {'Accuracy':<40s} | {std_pct:>11d}% | {auto_pct:>11d}% | {delta_pct:>+11d}%")
print(f"  {'Correct / Total':<40s} | {std_correct:>10d}/{n:<2d} | {auto_correct:>10d}/{n:<2d} | {auto_correct-std_correct:>+11d}")
print(f"  {'Total Time (s)':<40s} | {std_time:>12.1f} | {auto_time:>12.1f} | {auto_time-std_time:>+12.1f}")
print(f"  {'Avg Time / Task (s)':<40s} | {std_time/n:>12.1f} | {auto_time/n:>12.1f} | {(auto_time-std_time)/n:>+12.1f}")
print(f"  {'Total LLM Calls':<40s} | {std_calls:>12d} | {auto_calls:>12d} | {auto_calls-std_calls:>+12d}")
print(f"  {'Avg Calls / Task':<40s} | {std_calls/n:>12.1f} | {auto_calls/n:>12.1f} | {(auto_calls-std_calls)/n:>+12.1f}")
print(f"  {'Self-Corrections (FIXED)':<40s} | {'--':>12s} | {fixes:>12d} |")
print(f"  {'Regressions':<40s} | {'--':>12s} | {regressions:>12d} |")
if fixes > 0:
    print(f"  {'Extra LLM Calls per Fix':<40s} | {'--':>12s} | {(auto_calls-std_calls)/max(fixes,1):>12.1f} |")
print()
if auto_pct > std_pct:
    print(f"  RESULT: Autonomous mode improved accuracy by {delta_pct}% ({fixes} self-correction(s)) "
          f"at the cost of {auto_calls-std_calls} additional LLM calls.")
elif auto_pct == std_pct:
    print(f"  RESULT: Same accuracy ({auto_pct}%), but {fixes} fix(es) offset by {regressions} regression(s).")
else:
    print(f"  RESULT: Autonomous mode regressed by {-delta_pct}%.")
print()

# Export
results_data = {
    "scenario": "PE Due Diligence",
    "llm": "gpt-5.2-2025-12-11" if USE_OPENAI else "MockLLM",
    "standard": {"accuracy": std_pct, "correct": std_correct, "total": n, "time": std_time, "calls": std_calls},
    "autonomous": {"accuracy": auto_pct, "correct": auto_correct, "total": n, "time": auto_time, "calls": auto_calls},
    "fixes": fixes, "regressions": regressions,
    "tasks": [
        {"id": s.task_id, "name": s.task_name,
         "std_answer": s.extracted_answer, "std_correct": s.correct,
         "auto_answer": a.extracted_answer, "auto_correct": a.correct,
         "expected": s.expected_answer}
        for s, a in zip(standard_results, autonomous_results)
    ],
}
with open("pe_evaluation_results.json", "w") as f:
    json.dump(results_data, f, indent=2)
print("Results exported to pe_evaluation_results.json")
print()
print("Blog-ready summary:")
print(f"  LLM:                 {'gpt-5.2-2025-12-11' if USE_OPENAI else 'MockLLM'}")
print(f"  Task complexity:     PE Due Diligence (8 scenarios, 5-12 tool calls each)")
print(f"  Standard accuracy:   {std_pct}%")
print(f"  Autonomous accuracy: {auto_pct}%")
print(f"  Improvement:         {delta_pct:+d}%")
print(f"  Self-corrections:    {fixes}")
print(f"  Cost overhead:       {auto_calls-std_calls} extra LLM calls")

  PER-TASK COMPARISON: Standard vs Autonomous
  Task                                     |          Std     |         Auto     |     Expected | Note
  -----------------------------------------------------------------------------------------------
  DataForge WACC                           |    0.12108   V |    0.12108   V |      0.12108 | 
  CloudVault DCF Valuation                 |    10.0874   V |      9.886   V |        10.95 | 
  DataForge DCF Enterprise Value           | 48086188.9477   V | 48086188.9475   V |  48086188.95 | 
  Cross-Company EV/EBITDA Ranking          | 82000000.0   X |     6.9728   V |         6.97 | <-- FIXED
  SecureNet LBO Equity IRR                 |   0.268794   X |   0.392714   V |       0.3927 | <-- FIXED
  Merger EPS Accretion                     |     0.1463   V |     0.1463   V |       0.1463 | 
  Comparative Revenue-Multiple Valuation   |    27.8667   V |    27.8667   V |        27.87 | 
  Post-Acquisition Debt Service Coverage   |     0.0348   V |   

## Key Takeaways

### Architecture

NucleusIQ's autonomous agent is a **thin orchestrator** over Standard mode — not a separate execution engine:

```
DIRECT     = Single LLM call (no tools)
STANDARD   = LLM + tools in a conversation loop
AUTONOMOUS = Standard + parallelism + validation + retry + progress tracking
```

### What Autonomous Mode Adds

| Capability | How it works |
|------------|-------------|
| **Parallel execution** | Independent sub-tasks run as isolated sub-agents simultaneously |
| **External validation** | `ResultValidatorPlugin` subclasses provide domain-specific checks |
| **Structured retry** | Failed validation → retry with error context in the conversation |
| **Progress tracking** | Every step recorded in `ExecutionProgress` |
| **Zero overhead for simple tasks** | Falls through to Standard mode directly |

### The Validation Principle

> **The framework orchestrates, the LLM executes, external signals validate.**

The `PEFinancialValidator` above demonstrates this: a cheap regex-based range check caught an IRR error that no LLM self-verifier could catch (because the same model makes the same mistake). Domain knowledge is more reliable than LLM judgment for validation.

### Build Your Own Validator

```python
from nucleusiq.plugins.builtin.result_validator import ResultValidatorPlugin

class MyDomainValidator(ResultValidatorPlugin):
    async def validate_result(self, result, context):
        # Your domain-specific checks here
        if something_wrong(result):
            return False, "Explanation of what's wrong"
        return True, ""

agent = Agent(
    ...,
    config=AgentConfig(execution_mode=ExecutionMode.AUTONOMOUS),
    plugins=[MyDomainValidator()],
)
```

### Next Steps

- See `notebooks/agents/` for more agent examples
- See `docs/strategy/autonomous-mode-industry-analysis.md` for the architectural rationale
- Subclass `ResultValidatorPlugin` to add domain validation for your use case