# Accrue v0.4 Demo

This notebook demonstrates the Accrue pipeline across **Phases 2, 3, and 4**.

**Phase 2 — Resilience & API redesign:**
- `Pipeline.run(df)` returns `PipelineResult` with `.data`, `.cost`, `.errors`, `.success_rate`
- Per-step cost aggregation (`CostSummary` with token counts per step)
- Per-row error handling (`on_error="continue"` / `"raise"`, `RowError`)
- `EnrichmentConfig` presets (`for_development()`, `for_production()`, `for_server()`)
- Provider flexibility via `base_url` shortcut and `LLMClient` protocol
- tqdm progress bar per step
- Two-layer retry: API errors (429/500) with backoff + parse errors fed back to the LLM

**Phase 3 — Field spec & dynamic prompts:**
- 7-key field spec: `prompt`, `type`, `format`, `enum`, `examples`, `bad_examples`, `default`
- Dynamic prompt builder (markdown headers + XML data boundaries)
- Default enforcement — refusals replaced with field `default` in Python
- FieldSpec validation — unknown keys rejected at construction time

**Phase 4 — Caching & `list[dict]` input:**
- SQLite-backed input-hash cache (`enable_caching=True`) — skip redundant API calls
- Per-step cache stats: `cache_hits`, `cache_misses`, `cache_hit_rate`
- `pipeline.clear_cache()` for full or per-step invalidation
- `list[dict]` input: `pipeline.run([{...}])` returns `list[dict]` (output matches input type)
- `FunctionStep(..., cache=False)` to disable caching for non-deterministic steps

> **Note:** Jupyter runs its own event loop, so we use `await pipeline.run_async(df)`.
> In scripts, use `pipeline.run(df)` (sync wrapper) instead.

In [None]:
import pandas as pd
from accrue import Pipeline, LLMStep, FunctionStep, EnrichmentConfig, FieldSpec

## 1. Pipeline.run() → PipelineResult (Phase 2)

`Pipeline.run(df)` is the primary API. It returns a `PipelineResult` with:
- `.data` — enriched DataFrame
- `.cost` — `CostSummary` with per-step token usage
- `.errors` — list of `RowError` objects
- `.success_rate` — fraction of rows that succeeded
- `.has_errors` — quick boolean check

The simplest case: 3 rows, 2 fields, one LLM call per row. Default model is `gpt-4.1-mini`.

In [None]:
df = pd.DataFrame({
    "company": ["Stripe", "Notion", "Figma"],
    "description": [
        "Online payment processing for internet businesses",
        "All-in-one workspace for notes, docs, and project management",
        "Collaborative interface design tool for teams",
    ],
})
df

In [None]:
pipeline = Pipeline([
    LLMStep("analyze", fields={
        "category": "Classify into one of: Fintech, Productivity, Design, Other",
        "target_market": "Describe the primary target market in 10 words or less",
    })
])

result = await pipeline.run_async(df)
result.data

In [None]:
# PipelineResult gives you everything in one object
print(f"Success rate: {result.success_rate:.0%}")
print(f"Has errors:   {result.has_errors}")
print(f"Total tokens: {result.cost.total_tokens}")
print(f"  Prompt:     {result.cost.total_prompt_tokens}")
print(f"  Completion: {result.cost.total_completion_tokens}")

print(f"\nPer-step breakdown:")
for step_name, usage in result.cost.steps.items():
    print(f"  {step_name}: {usage.total_tokens} tokens, {usage.rows_processed} rows, model={usage.model}")

## 2. Per-row error handling (Phase 2)

With `on_error="continue"` (the default), failed rows don't crash the pipeline.
They produce `RowError` objects with sentinel `None` values, and the rest of the rows succeed normally.

With `on_error="raise"`, the pipeline fails fast on the first error.

In [None]:
def flaky_lookup(ctx):
    """Simulates an API that fails for unknown companies."""
    company = ctx.row["company"]
    if company == "Notion":
        raise ConnectionError(f"API timeout for {company}")
    return {"status": f"{company} found"}


pipeline_err = Pipeline([
    FunctionStep("lookup", fn=flaky_lookup, fields=["status"]),
])

config = EnrichmentConfig(enable_progress_bar=False, on_error="continue")
result = await pipeline_err.run_async(df, config)

print(f"Success rate: {result.success_rate:.0%}")
print(f"Errors: {len(result.errors)}\n")

for err in result.errors:
    print(f"  Row {err.row_index} ({df.iloc[err.row_index]['company']}): "
          f"{err.error_type} \u2014 {err.error}")

print(f"\nData (failed rows get None sentinels):")
result.data[["company", "status"]]

In [None]:
# on_error="raise" fails fast on the first error
from accrue.core.exceptions import RowError

config_raise = EnrichmentConfig(enable_progress_bar=False, on_error="raise")
try:
    await pipeline_err.run_async(df, config_raise)
except ConnectionError as e:
    print(f"Pipeline stopped immediately: {e}")

## 3. EnrichmentConfig presets (Phase 2)

Three built-in presets cover common scenarios. Each tunes concurrency, logging, caching, and checkpointing.

In [None]:
from dataclasses import asdict

for name, preset in [
    ("for_development()", EnrichmentConfig.for_development()),
    ("for_production()",  EnrichmentConfig.for_production()),
    ("for_server()",      EnrichmentConfig.for_server()),
]:
    d = asdict(preset)
    print(f"{name}:")
    # Show only the fields that differ from defaults
    defaults = asdict(EnrichmentConfig())
    diff = {k: v for k, v in d.items() if v != defaults[k]}
    for k, v in diff.items():
        print(f"  {k} = {v!r}")
    print()

## 4. Provider flexibility (Phase 2)

LLMStep uses the `LLMClient` protocol. OpenAI is the default (zero config).
The `base_url` shortcut works with any OpenAI-compatible provider.
Anthropic and Google ship as optional extras.

```python
# OpenAI-compatible providers (Ollama, Groq, DeepSeek, etc.)
LLMStep("analyze", fields={...}, model="llama3", base_url="http://localhost:11434/v1")

# Anthropic: pip install accrue[anthropic]
from accrue.providers import AnthropicClient
LLMStep("analyze", fields={...}, model="claude-sonnet-4-5-20250929", client=AnthropicClient())

# Google: pip install accrue[google]
from accrue.providers import GoogleClient
LLMStep("analyze", fields={...}, model="gemini-2.5-flash", client=GoogleClient())

# Any provider — implement the ~30-line LLMClient protocol
LLMStep("analyze", fields={...}, client=MyCustomClient())
```

## 5. Multi-step pipeline with cost tracking (Phase 2)

FunctionStep generates context, LLMStep uses it via `depends_on`.
Internal `__` fields are filtered from output. Cost tracks across all steps.

In [None]:
def generate_context(ctx):
    """Simulate an API lookup."""
    fake_data = {
        "Stripe": "Founded 2010. $95B valuation. 8000+ employees. Competes with Adyen, Square.",
        "Notion": "Founded 2013. $10B valuation. 500+ employees. Competes with Confluence, Coda.",
        "Figma": "Founded 2012. Acquired by Adobe for $20B (cancelled). Competes with Sketch, Canva.",
    }
    company = ctx.row["company"]
    return {"__context": fake_data.get(company, "No data available")}


pipeline = Pipeline([
    FunctionStep("lookup", fn=generate_context, fields=["__context"]),
    LLMStep("synthesize", fields={
        "competitive_position": {
            "prompt": "Rate the company's competitive position based on the context",
            "enum": ["Leader", "Challenger", "Niche"],
        },
        "investment_thesis": "One-sentence investment thesis using context and description",
    }, depends_on=["lookup"]),
])

result = await pipeline.run_async(df)

print("Columns:", list(result.data.columns))
assert "__context" not in result.data.columns  # internal fields filtered

print(f"\nCost across {len(result.cost.steps)} steps:")
for step_name, usage in result.cost.steps.items():
    print(f"  {step_name}: {usage.total_tokens} tokens, {usage.rows_processed} rows")

result.data[["company", "competitive_position", "investment_thesis"]]

---

## 6. Full 7-key field spec (Phase 3)

Each field can use up to 7 keys:
- `prompt` (required) — the extraction instruction
- `type` — String, Number, Boolean, Date, List[String], JSON
- `format` — output format pattern
- `enum` — constrained value list (LLM MUST pick one)
- `examples` — good output examples
- `bad_examples` — anti-patterns to avoid
- `default` — fallback when data is insufficient (enforced in Python, not by the LLM)

In [None]:
pipeline = Pipeline([
    LLMStep("enrich", fields={
        "sector": {
            "prompt": "Classify the company's primary sector",
            "enum": ["Fintech", "Productivity", "Design", "Infrastructure", "Other"],
            "examples": ["Fintech", "Productivity"],
            "bad_examples": ["Tech company", "Software"],
            "default": "Other",
        },
        "employee_count": {
            "prompt": "Estimate the number of employees",
            "type": "Number",
            "format": "integer",
            "default": 0,
        },
        "growth_stage": {
            "prompt": "Classify the company's growth stage based on its description and market position",
            "enum": ["Seed", "Growth", "Mature", "Decline"],
            "examples": ["Growth - rapidly expanding market share"],
            "default": "Unknown",
        },
    })
])

result = await pipeline.run_async(df)
result.data[["company", "sector", "employee_count", "growth_stage"]]

In [None]:
# Inspect the generated system prompt to see the dynamic builder in action
from accrue.steps.prompt_builder import build_system_message

step = pipeline.get_step("enrich")
sample_prompt = build_system_message(
    field_specs=step._field_specs,
    row={"company": "Stripe", "description": "Online payment processing"},
)
print(sample_prompt)

## 7. Default enforcement (Phase 3)

When the LLM can't determine a value, it returns refusal text like "Unable to determine".
Default enforcement catches this in Python and replaces it with the field's `default`.

We use a made-up company with no description to trigger refusals.

In [None]:
df_unknown = pd.DataFrame({
    "company": ["Stripe", "Xylophonica Dynamics Ltd"],
    "description": [
        "Online payment processing for internet businesses",
        "",
    ],
})

pipeline = Pipeline([
    LLMStep("analyze", fields={
        "sector": {
            "prompt": "What sector does this company operate in?",
            "enum": ["Fintech", "SaaS", "Hardware", "Other"],
            "default": "Unknown",
        },
        "founded_year": {
            "prompt": "What year was this company founded?",
            "type": "String",
            "format": "YYYY",
            "default": "N/A",
        },
    })
])

result = await pipeline.run_async(df_unknown)
result.data[["company", "sector", "founded_year"]]

## 8. FieldSpec validation (Phase 3)

Unknown keys are rejected at LLMStep construction time (not at runtime).
This catches typos and legacy keys like `instructions` immediately.

In [None]:
from pydantic import ValidationError

# This SHOULD fail — "instructions" is not a valid key (use "prompt" instead)
try:
    LLMStep("bad", fields={
        "f1": {"prompt": "test", "instructions": "extra guidance"},
    })
except ValidationError as e:
    print("Caught validation error (as expected):")
    print(e.errors()[0]["type"], "\u2014", e.errors()[0]["msg"])

---

## 9. SQLite-backed caching (Phase 4)

With `enable_caching=True`, Accrue stores step results in a local SQLite database (`.accrue/cache.db`).
The cache key is a SHA-256 hash of the step's full input: row data, prior results, field specs, model, and temperature.
Changing any of these auto-invalidates the cache — no manual flushing needed.

On the **first run**, all rows are cache misses (the step executes normally).
On the **second run** with the same inputs, all rows are cache hits (zero API calls).

Cache stats appear on `StepUsage`: `cache_hits`, `cache_misses`, and `cache_hit_rate`.

In [None]:
import tempfile, os

# Use a temp directory so the demo doesn't pollute the working directory
cache_dir = tempfile.mkdtemp()

call_count = 0

def counted_lookup(ctx):
    """A FunctionStep that counts how many times it's actually called."""
    global call_count
    call_count += 1
    company = ctx.row["company"]
    return {"info": f"{company}: looked up"}


pipeline_cached = Pipeline([
    FunctionStep("lookup", fn=counted_lookup, fields=["info"]),
])

config_cache = EnrichmentConfig(
    enable_caching=True,
    cache_dir=cache_dir,
    enable_progress_bar=False,
)

# --- First run: all cache misses ---
call_count = 0
result1 = await pipeline_cached.run_async(df, config_cache)

usage1 = result1.cost.steps["lookup"]
print("=== First run ===")
print(f"  Function called:  {call_count} times")
print(f"  cache_hits:       {usage1.cache_hits}")
print(f"  cache_misses:     {usage1.cache_misses}")
print(f"  cache_hit_rate:   {usage1.cache_hit_rate:.0%}")

# --- Second run: all cache hits (zero function calls) ---
call_count = 0
result2 = await pipeline_cached.run_async(df, config_cache)

usage2 = result2.cost.steps["lookup"]
print("\n=== Second run (cached) ===")
print(f"  Function called:  {call_count} times")
print(f"  cache_hits:       {usage2.cache_hits}")
print(f"  cache_misses:     {usage2.cache_misses}")
print(f"  cache_hit_rate:   {usage2.cache_hit_rate:.0%}")

# Data is identical
assert result1.data.equals(result2.data)
print("\nData matches between runs ✓")
result2.data[["company", "info"]]

## 10. Cache invalidation and `clear_cache()` (Phase 4)

`pipeline.clear_cache()` wipes all cached entries. `pipeline.clear_cache(step="name")` wipes only one step.

Caching also auto-invalidates when inputs change — if you modify a row's data or change a field spec,
the cache key changes and the step re-executes for that row.

In [None]:
# clear_cache() returns the number of entries deleted
deleted = pipeline_cached.clear_cache(cache_dir=cache_dir)
print(f"Cleared {deleted} cache entries")

# After clearing, the next run is all cache misses again
call_count = 0
result3 = await pipeline_cached.run_async(df, config_cache)
usage3 = result3.cost.steps["lookup"]
print(f"\nAfter clear — cache_misses: {usage3.cache_misses}, function called: {call_count} times")

## 11. `cache=False` per step (Phase 4)

Non-deterministic FunctionSteps (e.g. current time, random sampling) should skip caching.
Set `cache=False` on the step. FunctionSteps can also use `cache_version="v1"` — bumping
the version string invalidates all cached entries for that step.

In [None]:
import random

def random_score(ctx):
    """Non-deterministic — should NOT be cached."""
    return {"score": random.randint(1, 100)}


pipeline_nocache = Pipeline([
    FunctionStep("rand", fn=random_score, fields=["score"], cache=False),
])

# Even with caching enabled globally, this step always re-executes
r1 = await pipeline_nocache.run_async(df, config_cache)
r2 = await pipeline_nocache.run_async(df, config_cache)

print("Run 1 scores:", list(r1.data["score"]))
print("Run 2 scores:", list(r2.data["score"]))
print("Different?", not r1.data["score"].equals(r2.data["score"]))

## 12. `list[dict]` input (Phase 4)

`Pipeline.run()` also accepts `list[dict]` — useful for server contexts, test code,
or Polars users (`polars_df.to_dicts()`). The output type matches the input type:
DataFrame in → DataFrame out, `list[dict]` in → `list[dict]` out.

In [None]:
# Pass list[dict] instead of DataFrame
rows = [
    {"company": "Stripe", "description": "Online payment processing"},
    {"company": "Notion", "description": "All-in-one workspace"},
]

pipeline_simple = Pipeline([
    FunctionStep("tag", fn=lambda ctx: {"tag": ctx.row["company"].lower()}, fields=["tag"]),
])

config_quiet = EnrichmentConfig(enable_progress_bar=False)
result = await pipeline_simple.run_async(rows, config_quiet)

# Output is list[dict], not DataFrame
print(f"Input type:  {type(rows).__name__}")
print(f"Output type: {type(result.data).__name__}")
print(f"Success rate: {result.success_rate:.0%}\n")

for row in result.data:
    print(row)

In [None]:
# Cleanup temp cache directory
import shutil
shutil.rmtree(cache_dir, ignore_errors=True)