Skip to content

Pre Tools

lacause edited this page Mar 30, 2026 · 3 revisions

Pre-Tools

Pre-tools inject data into a step's prompt before the LLM runs. They execute in order and make their results available as {inject_as} variables.

OCC has 27 pre-tool types across 6 categories. All pre-tools support:

  • {variable} interpolation in string fields
  • on_error: "inject" | "skip" | "fail" — error handling
  • timeout_ms — per-pre-tool timeout (default: 30s)
  • retry: N — retry with exponential backoff
  • cache_ttl_minutes: N — cache result for N minutes
  • parallel: true — run in parallel with other parallel pre-tools

Pre-tools also support chaining: the output of pre-tool A is available as {inject_as} in pre-tool B.


Data Fetching

http_fetch

Full HTTP client with method, headers, auth, body, and JSON path extraction.

- type: http_fetch
  url: "https://api.example.com/search"
  method: POST                              # GET (default), POST, PUT, PATCH, DELETE
  headers:
    Authorization: "Bearer {token}"         # {variable} interpolation
    Content-Type: "application/json"
  body: '{"query": "{input.topic}"}'        # Request body (POST/PUT/PATCH)
  json_path: "data.results[0].name"         # Extract JSON path from response
  timeout_ms: 10000
  retry: 2
  on_error: fail
  inject_as: search_results

web_search

Search the web using Claude's built-in WebSearch tool.

- type: web_search
  query: "{input.topic} latest research"
  inject_as: search_results

mcp_call

Call a tool on an external MCP server (GitHub, Slack, PostgreSQL, etc.).

- type: mcp_call
  server: "github"
  tool: "search_repositories"
  args: { query: "{input.topic}" }
  inject_as: repos

Requires occ-mcp-servers.json config. See MCP Client.

db_query

SQL query via CLI (requires psql, mysql, or sqlite3 installed).

- type: db_query
  connection: "postgres://user:pass@localhost/mydb"
  sql: "SELECT * FROM users WHERE role = 'admin' LIMIT 10"
  inject_as: users

Supports: PostgreSQL (postgres://), MySQL (mysql://), SQLite (path.db).

parallel_fetch

Batch multiple URLs with rate limiting.

- type: parallel_fetch
  urls:
    - "https://api.example.com/page/1"
    - "https://api.example.com/page/2"
    - "https://api.example.com/page/3"
  rate_limit_ms: 200
  inject_as: all_pages

Files & Code

read_file

- type: read_file
  path: "/path/to/file.txt"
  encoding: "utf-8"          # Any Node.js encoding (default: utf-8)
  inject_as: content

write_file

- type: write_file
  path: "/tmp/output.txt"
  content: "{analysis}"
  append: true                # Append instead of overwrite (default: false)
  encoding: "utf-8"
  inject_as: file_path

bash

- type: bash
  command: "git log --oneline -10"
  stderr: true                # Capture stderr too (default: false)
  timeout_ms: 30000
  inject_as: git_log

diff_inject

Git diff — structured, LLM-optimized format with per-file summaries.

- type: diff_inject
  repo: "{repo_path}"
  base: "main"                # Base ref (default: main)
  head: "HEAD"                # Head ref (default: HEAD)
  max_tokens: 4000            # Max output size in estimated tokens
  inject_as: smart_diff

ast_parse

Extract code structure (functions, classes, imports, exports, types) using regex-based parsing.

- type: ast_parse
  path: "{repo_path}/src/index.ts"
  extract: ["functions", "classes", "exports", "types"]
  inject_as: code_structure

Supports: TypeScript/JavaScript, Python, Go.

ocr

Image → text via Tesseract.

- type: ocr
  image_path: "/tmp/document.png"
  language: "fra"             # Tesseract language code (default: eng)
  inject_as: extracted_text

screenshot

URL → PNG screenshot via Playwright.

- type: screenshot
  url: "https://example.com/dashboard"
  viewport: { width: 1440, height: 900 }
  wait_ms: 3000
  inject_as: screenshot_path

pdf_generate

HTML → PDF via wkhtmltopdf or Chrome headless.

- type: pdf_generate
  html: "<h1>Report</h1><p>{analysis}</p>"
  output_path: "/tmp/report.pdf"
  inject_as: pdf_path

State & Memory

state_load / state_save

Persistent key-value store across chain executions. Chains remember results between runs.

# Load state from a previous run
- type: state_load
  key: "last_scan_results"
  scope: "bounty-hunter"      # Chain name or "global" (default: current chain)
  default: "No previous data"
  inject_as: previous_results

# Save state for the next run (in a later step)
- type: state_save
  key: "last_scan_results"
  value: "{scan_results}"
  scope: "bounty-hunter"
  inject_as: save_status

vector_query / vector_index

Local semantic search via SQLite FTS5. RAG without external infrastructure.

# Index documents
- type: vector_index
  collection: "project_docs"
  source: "{document_content}"
  chunk_size: 512
  inject_as: index_status

# Query indexed documents
- type: vector_query
  collection: "project_docs"
  query: "authentication architecture"
  top_k: 5
  inject_as: relevant_docs

semantic_cache

Cache by semantic similarity (not exact hash). Re-running similar queries returns cached results.

- type: semantic_cache
  query: "{input.topic} market analysis"
  cache_ttl_minutes: 720
  similarity_threshold: 0.85
  inject_as: cached_analysis

graph_query

Knowledge graph with triples (subject → predicate → object).

# Write triples
- type: graph_query
  triples:
    - { subject: "OCC", predicate: "is_a", object: "orchestrator" }
    - { subject: "OCC", predicate: "uses", object: "Claude" }
  inject_as: write_status

# Read triples
- type: graph_query
  graph_query_subject: "OCC"
  inject_as: occ_facts

Data Processing

json_parse

Extract a field from JSON output (handles markdown-wrapped JSON).

- type: json_parse
  input: "{llm_output}"
  json_path: "opportunities[0].name"
  inject_as: best_opportunity

template_render

Handlebars-style template engine with {{#each}}, {{#if}}, nested paths.

- type: template_render
  template: |
    {{#each items}}
    - {{name}}: {{value}}
    {{/each}}
    {{#if show_total}}Total: {{total}}{{/if}}
  data:
    items: [{ name: "A", value: 1 }, { name: "B", value: 2 }]
    show_total: true
    total: 3
  inject_as: formatted

embed_compare

Compare two texts — returns similarity score and drift detection.

- type: embed_compare
  text_a: "{previous_analysis}"
  text_b: "{current_analysis}"
  inject_as: drift_report
  # Returns: {"similarity": 0.73, "verdict": "partially_changed", "new_keywords": [...]}

cost_gate

Check token budget mid-execution. Skip or warn if over budget.

- type: cost_gate
  budget_usd: 0.50
  action: "warn"              # "warn" | "skip" | "downgrade"
  inject_as: budget_status
  # Returns: {"status": "within_budget", "spent_usd": 0.12, "remaining_usd": 0.38}

Notifications

notify

Multi-channel notifications (Slack, Discord, Telegram, webhook).

- type: notify
  channel: "slack"            # slack | discord | telegram | webhook
  webhook_url: "{env:SLACK_WEBHOOK}"
  message: "Chain found {count} results"
  inject_as: notification_status

email

Send email via SMTP or SendGrid.

- type: email
  to: "user@example.com"
  subject: "Alert: {input.topic}"
  content: "Details: {analysis}"
  provider: sendgrid           # smtp (default) | sendgrid
  inject_as: email_result

approval_request

Generate a shareable approval URL for human-in-the-loop workflows.

- type: approval_request
  title: "Deploy to production?"
  description: "Changes: {diff_summary}"
  expires_hours: 4
  inject_as: approval_info
  # Returns: {"approve_url": "http://...", "approve_command": "curl ...", "token": "..."}

System

env_var

- type: env_var
  var_name: "API_KEY"
  default_value: "demo_key"   # Fallback if not set (default: "")
  inject_as: key

current_datetime

- type: current_datetime
  timezone: "Europe/Paris"     # IANA timezone (default: UTC)
  format: "locale"             # iso (default) | locale | unix
  inject_as: now

sandbox_exec

Run commands in an isolated Docker container.

- type: sandbox_exec
  image: "node:20-slim"
  command: "npm install && npm test"
  mount: "/project:/workspace"
  timeout_ms: 60000
  inject_as: test_results

Error Handling

pre_tools:
  - type: http_fetch
    url: "https://unreliable.api.com"
    on_error: skip       # "inject" (default) | "skip" | "fail"
    retry: 3             # Retry 3 times with backoff
    timeout_ms: 5000     # 5 second timeout
    inject_as: data
Mode Behavior
inject Injects [PRE-TOOL ERROR: message] into prompt (default)
skip Injects empty string — step continues cleanly
fail Aborts the step with an error

Pre-Tool Chaining

Pre-tools execute sequentially by default. Output of pre-tool A is available in pre-tool B:

pre_tools:
  # Step 1: Get auth token
  - type: env_var
    var_name: API_TOKEN
    inject_as: token

  # Step 2: Use token in API call (chaining!)
  - type: http_fetch
    url: "https://api.example.com/data"
    headers:
      Authorization: "Bearer {token}"
    inject_as: api_data

Parallel Execution

Mark pre-tools as parallel: true to run them simultaneously:

pre_tools:
  - type: web_search
    query: "topic A news"
    parallel: true           # ← runs in parallel
    inject_as: news_a
  - type: web_search
    query: "topic B news"
    parallel: true           # ← runs in parallel
    inject_as: news_b
  - type: bash
    command: "echo done"     # ← sequential (after parallel batch)
    inject_as: status

See Also

Clone this wiki locally