# 001: Claim Extraction with Operation Graphs

Demonstrates sequential coordination using Operation Graphs to orchestrate ReaderTool workflows for academic claim extraction.

## What You'll Learn

1. **Sequential Coordination**: Building context step-by-step through dependent operations
2. **ReaderTool Integration**: Document chunking and progressive reading strategies
3. **Structured Extraction**: Using Pydantic models for reliable claim extraction
4. **Custom Operations**: How to register and use custom workflow patterns

## Use Case: Extracting Claims from Academic Papers

We'll extract verifiable claims from an arxiv paper (Mamba: Linear-Time Sequence Modeling) by:
- Opening the document with ReaderTool
- Progressive analysis of document structure
- Extracting specific claims with structured output

In [None]:
# Setup and imports
from pathlib import Path
from typing import Literal

# ReaderTool from lionagi
from lionagi.tools.types import ReaderTool
from pydantic import BaseModel, Field

from lionpride import Builder, Session
from lionpride.operations import (
    GenerateParams,
)
from lionpride.services.types import iModel

# Target document - Mamba paper from arxiv
here = Path().cwd()
document_path = here / "data" / "mamba_paper.pdf"

print("Environment setup complete")
print(f"Target: {document_path.name}")
print("Goal: Extract verifiable claims using coordinated operations")

In [None]:
# Data models for structured responses


class Claim(BaseModel):
    """A single verifiable claim from the document."""

    claim: str = Field(..., description="The specific claim text")
    type: Literal["citation", "performance", "technical", "theoretical", "other"] = Field(
        ..., description="Category of claim"
    )
    location: str = Field(..., description="Section/page reference")
    verifiability: Literal["high", "medium", "low"] = Field(
        ..., description="How easily this claim can be verified"
    )
    search_strategy: str = Field(..., description="How to verify this claim")


class ClaimExtraction(BaseModel):
    """Collection of extracted claims."""

    claims: list[Claim] = Field(default_factory=list)


class DocumentOutline(BaseModel):
    """Document structure analysis."""

    title: str = Field(..., description="Document title")
    sections: list[str] = Field(default_factory=list, description="Main sections")
    key_topics: list[str] = Field(default_factory=list, description="Key topics covered")
    claim_density_sections: list[str] = Field(
        default_factory=list, description="Sections likely to contain verifiable claims"
    )


print("Data models defined")

In [None]:
# Initialize session with model and ReaderTool

# Create model
model = iModel(
    provider="openai",
    model="gpt-4o-mini",
    name="gpt4o-mini",
)

# Create session with default branch that has access to the model
session = Session(
    default_generate_model=model,
    default_branch="main",
)

# Initialize ReaderTool
reader = ReaderTool()

print(f"Session: {session}")
print(f"Default branch: {session.default_branch}")
print(f"Operations: {session.operations.list_names()}")

## Pattern 1: Direct Operations via Session.conduct()

The simplest way to run operations - directly through the session.

In [None]:
# Step 1: Open document with ReaderTool
doc_info = reader.handle_request({"action": "open", "path_or_url": str(document_path)})

print(f"Document opened: {doc_info.doc_info.doc_id}")
print(f"Total length: {doc_info.doc_info.length:,} characters")
print(f"Estimated tokens: {doc_info.doc_info.num_tokens:,}")

In [None]:
# Step 2: Read first chunk to understand structure
first_chunk = reader.handle_request(
    {
        "action": "read",
        "doc_id": doc_info.doc_info.doc_id,
        "start_offset": 0,
        "end_offset": 8000,  # ~2000 tokens
    }
)

print(f"Read {len(first_chunk.chunk.content):,} characters")
print("\n--- First 500 chars ---")
print(first_chunk.chunk.content[:500])

In [None]:
# Step 3: Use communicate operation to analyze structure
from lionpride.types import Operable, Spec

# Create operable for DocumentOutline
outline_operable = Operable(
    specs=(
        Spec(name="title", base_type=str),
        Spec(name="sections", base_type=str, listable=True),
        Spec(name="key_topics", base_type=str, listable=True),
        Spec(name="claim_density_sections", base_type=str, listable=True),
    ),
    name="DocumentOutline",
)

# Run communicate operation
op = await session.conduct(
    "communicate",
    generate=GenerateParams(
        imodel="gpt4o-mini",
        instruction="Analyze this document excerpt and identify its structure.",
        context={"document_content": first_chunk.chunk.content},
    ),
    operable=outline_operable,
    capabilities={"title", "sections", "key_topics", "claim_density_sections"},
)

print(f"Operation status: {op.status}")
print("\nDocument outline:")
print(op.response)

## Pattern 2: Operation Graphs with Builder

For multi-step workflows with dependencies, use Builder to construct operation graphs.

In [None]:
async def claim_extraction_workflow():
    """Sequential workflow: analyze structure -> read key sections -> extract claims."""

    builder = Builder()

    # Step 1: Analyze document structure
    builder.add(
        name="analyze_structure",
        operation="communicate",
        parameters={
            "generate": {
                "imodel": "gpt4o-mini",
                "instruction": (
                    "Analyze this academic paper excerpt. Identify the main sections "
                    "and which sections are most likely to contain verifiable claims "
                    "(performance benchmarks, citations, technical specifications)."
                ),
                "context": {"document_content": first_chunk.chunk.content},
            },
            "return_as": "text",
        },
    )

    # Step 2: Extract claims (depends on structure analysis)
    builder.add(
        name="extract_claims",
        operation="communicate",
        parameters={
            "generate": {
                "imodel": "gpt4o-mini",
                "instruction": (
                    "Based on the document structure analysis, extract 5-7 specific, "
                    "verifiable claims from this paper. Focus on:\n"
                    "- Performance benchmarks and metrics\n"
                    "- Technical specifications\n"
                    "- Citations that can be verified\n"
                    "- Theoretical claims with mathematical basis"
                ),
                "context": {"document_content": first_chunk.chunk.content},
            },
            "return_as": "text",
        },
        depends_on=["analyze_structure"],
        inherit_context=True,
    )

    # Build and execute graph
    graph = builder.build()
    print(f"Graph: {len(graph.nodes)} operations, {len(graph.edges)} dependencies")

    results = await session.flow(graph, verbose=True)
    return results


# Execute workflow
results = await claim_extraction_workflow()

print("\n" + "=" * 60)
print("WORKFLOW RESULTS")
print("=" * 60)

for name, result in results.items():
    print(f"\n--- {name} ---")
    print(result[:1000] if isinstance(result, str) else result)

## Pattern 3: Custom Operations for Nested DAGs

Register custom operations that can contain their own nested workflows.

In [None]:
# Define a custom operation that contains a nested DAG


async def deep_claim_analysis(session, branch, params):
    """Custom operation with nested DAG for deep claim analysis.

    This demonstrates arbitrary nested workflow patterns:
    1. Outer operation receives parameters
    2. Builds inner DAG based on context
    3. Executes inner flow
    4. Aggregates results
    """
    from lionpride import Builder

    document_content = params.get("document_content", "")
    # num_passes can be used to control iteration depth in more complex workflows
    _ = params.get("num_passes", 2)

    # Build inner DAG with multiple analysis passes
    inner_builder = Builder()

    # First pass: broad analysis
    inner_builder.add(
        name="broad_scan",
        operation="communicate",
        parameters={
            "generate": {
                "imodel": "gpt4o-mini",
                "instruction": "Identify all potential claims in this text. Be exhaustive.",
                "context": {"text": document_content[:4000]},
            },
            "return_as": "text",
        },
    )

    # Second pass: validation (depends on first)
    inner_builder.add(
        name="validate_claims",
        operation="communicate",
        parameters={
            "generate": {
                "imodel": "gpt4o-mini",
                "instruction": (
                    "Review these potential claims. For each, assess:\n"
                    "1. Is it actually a claim (not just description)?\n"
                    "2. Is it verifiable?\n"
                    "3. What evidence would validate/refute it?"
                ),
            },
            "return_as": "text",
        },
        depends_on=["broad_scan"],
        inherit_context=True,
    )

    # Third pass: synthesis (depends on validation)
    inner_builder.add(
        name="synthesize",
        operation="communicate",
        parameters={
            "generate": {
                "imodel": "gpt4o-mini",
                "instruction": (
                    "Synthesize the validated claims into a final list. "
                    "Include only high-confidence, verifiable claims."
                ),
            },
            "return_as": "text",
        },
        depends_on=["validate_claims"],
        inherit_context=True,
    )

    # Execute inner DAG
    inner_graph = inner_builder.build()
    inner_results = await session.flow(inner_graph, branch, verbose=True)

    # Return aggregated results
    return {
        "passes": list(inner_results.keys()),
        "final_claims": inner_results.get("synthesize", ""),
        "validation_notes": inner_results.get("validate_claims", ""),
    }


# Register custom operation
session.register_operation("deep_claim_analysis", deep_claim_analysis)
print(f"Registered operations: {session.operations.list_names()}")

In [None]:
# Use the custom operation
op = await session.conduct(
    "deep_claim_analysis",
    document_content=first_chunk.chunk.content,
    num_passes=3,
)

print(f"\nCustom operation status: {op.status}")
print(f"\nPasses executed: {op.response.get('passes', [])}")
print("\n--- Final Claims ---")
print(op.response.get("final_claims", "")[:2000])

## Pattern 4: Nested DAG in Builder Graph

Custom operations can be used inside Builder graphs, enabling arbitrary nesting.

In [None]:
async def nested_dag_workflow():
    """Outer graph that uses custom operations containing inner graphs."""

    builder = Builder()

    # Step 1: Standard operation
    builder.add(
        name="prepare_context",
        operation="communicate",
        parameters={
            "generate": {
                "imodel": "gpt4o-mini",
                "instruction": "Summarize the key themes in this document for claim extraction.",
                "context": {"document_content": first_chunk.chunk.content[:2000]},
            },
            "return_as": "text",
        },
    )

    # Step 2: Custom operation with nested DAG
    builder.add(
        name="deep_analysis",
        operation="deep_claim_analysis",  # Our custom operation!
        parameters={
            "document_content": first_chunk.chunk.content,
            "num_passes": 3,
        },
        depends_on=["prepare_context"],
        inherit_context=True,
    )

    # Step 3: Final aggregation
    builder.add(
        name="final_report",
        operation="communicate",
        parameters={
            "generate": {
                "imodel": "gpt4o-mini",
                "instruction": (
                    "Create a final report summarizing the extracted claims. "
                    "Format as a structured list with verification strategies."
                ),
            },
            "return_as": "text",
        },
        depends_on=["deep_analysis"],
        inherit_context=True,
    )

    # Execute
    graph = builder.build()
    print(f"Outer graph: {len(graph.nodes)} operations")
    print("Note: 'deep_analysis' contains its own inner DAG with 3 operations!")

    results = await session.flow(graph, verbose=True)
    return results


# Execute nested workflow
nested_results = await nested_dag_workflow()

print("\n" + "=" * 60)
print("NESTED DAG RESULTS")
print("=" * 60)
print(f"\nOperations completed: {list(nested_results.keys())}")
print("\n--- Final Report ---")
print(nested_results.get("final_report", "")[:2000])

## Summary

This cookbook demonstrated:

1. **Direct Operations**: `session.conduct()` for simple workflows
2. **Operation Graphs**: `Builder` + `session.flow()` for multi-step DAGs
3. **Custom Operations**: `session.register_operation()` for arbitrary patterns
4. **Nested DAGs**: Custom operations containing inner graphs

The key insight is that lionpride's operation system is **compositional**:
- Operations are just functions `(session, branch, params) -> result`
- Custom operations can call `session.flow()` with inner graphs
- This enables arbitrary nesting depth for complex workflows