-
Notifications
You must be signed in to change notification settings - Fork 2
Pre Tools
Pre-tools inject data into a step's prompt before the LLM runs. They execute in order and make their results available as {inject_as} variables.
OCC has 27 pre-tool types across 6 categories. All pre-tools support:
-
{variable}interpolation in string fields -
on_error: "inject" | "skip" | "fail"— error handling -
timeout_ms— per-pre-tool timeout (default: 30s) -
retry: N— retry with exponential backoff -
cache_ttl_minutes: N— cache result for N minutes -
parallel: true— run in parallel with other parallel pre-tools
Pre-tools also support chaining: the output of pre-tool A is available as {inject_as} in pre-tool B.
Full HTTP client with method, headers, auth, body, and JSON path extraction.
- type: http_fetch
url: "https://api.example.com/search"
method: POST # GET (default), POST, PUT, PATCH, DELETE
headers:
Authorization: "Bearer {token}" # {variable} interpolation
Content-Type: "application/json"
body: '{"query": "{input.topic}"}' # Request body (POST/PUT/PATCH)
json_path: "data.results[0].name" # Extract JSON path from response
timeout_ms: 10000
retry: 2
on_error: fail
inject_as: search_resultsSearch the web using Claude's built-in WebSearch tool.
- type: web_search
query: "{input.topic} latest research"
inject_as: search_resultsCall a tool on an external MCP server (GitHub, Slack, PostgreSQL, etc.).
- type: mcp_call
server: "github"
tool: "search_repositories"
args: { query: "{input.topic}" }
inject_as: reposRequires occ-mcp-servers.json config. See MCP Client.
SQL query via CLI (requires psql, mysql, or sqlite3 installed).
- type: db_query
connection: "postgres://user:pass@localhost/mydb"
sql: "SELECT * FROM users WHERE role = 'admin' LIMIT 10"
inject_as: usersSupports: PostgreSQL (postgres://), MySQL (mysql://), SQLite (path.db).
Batch multiple URLs with rate limiting.
- type: parallel_fetch
urls:
- "https://api.example.com/page/1"
- "https://api.example.com/page/2"
- "https://api.example.com/page/3"
rate_limit_ms: 200
inject_as: all_pages- type: read_file
path: "/path/to/file.txt"
encoding: "utf-8" # Any Node.js encoding (default: utf-8)
inject_as: content- type: write_file
path: "/tmp/output.txt"
content: "{analysis}"
append: true # Append instead of overwrite (default: false)
encoding: "utf-8"
inject_as: file_path- type: bash
command: "git log --oneline -10"
stderr: true # Capture stderr too (default: false)
timeout_ms: 30000
inject_as: git_logGit diff — structured, LLM-optimized format with per-file summaries.
- type: diff_inject
repo: "{repo_path}"
base: "main" # Base ref (default: main)
head: "HEAD" # Head ref (default: HEAD)
max_tokens: 4000 # Max output size in estimated tokens
inject_as: smart_diffExtract code structure (functions, classes, imports, exports, types) using regex-based parsing.
- type: ast_parse
path: "{repo_path}/src/index.ts"
extract: ["functions", "classes", "exports", "types"]
inject_as: code_structureSupports: TypeScript/JavaScript, Python, Go.
Image → text via Tesseract.
- type: ocr
image_path: "/tmp/document.png"
language: "fra" # Tesseract language code (default: eng)
inject_as: extracted_textURL → PNG screenshot via Playwright.
- type: screenshot
url: "https://example.com/dashboard"
viewport: { width: 1440, height: 900 }
wait_ms: 3000
inject_as: screenshot_pathHTML → PDF via wkhtmltopdf or Chrome headless.
- type: pdf_generate
html: "<h1>Report</h1><p>{analysis}</p>"
output_path: "/tmp/report.pdf"
inject_as: pdf_pathPersistent key-value store across chain executions. Chains remember results between runs.
# Load state from a previous run
- type: state_load
key: "last_scan_results"
scope: "bounty-hunter" # Chain name or "global" (default: current chain)
default: "No previous data"
inject_as: previous_results
# Save state for the next run (in a later step)
- type: state_save
key: "last_scan_results"
value: "{scan_results}"
scope: "bounty-hunter"
inject_as: save_statusLocal semantic search via SQLite FTS5. RAG without external infrastructure.
# Index documents
- type: vector_index
collection: "project_docs"
source: "{document_content}"
chunk_size: 512
inject_as: index_status
# Query indexed documents
- type: vector_query
collection: "project_docs"
query: "authentication architecture"
top_k: 5
inject_as: relevant_docsCache by semantic similarity (not exact hash). Re-running similar queries returns cached results.
- type: semantic_cache
query: "{input.topic} market analysis"
cache_ttl_minutes: 720
similarity_threshold: 0.85
inject_as: cached_analysisKnowledge graph with triples (subject → predicate → object).
# Write triples
- type: graph_query
triples:
- { subject: "OCC", predicate: "is_a", object: "orchestrator" }
- { subject: "OCC", predicate: "uses", object: "Claude" }
inject_as: write_status
# Read triples
- type: graph_query
graph_query_subject: "OCC"
inject_as: occ_factsExtract a field from JSON output (handles markdown-wrapped JSON).
- type: json_parse
input: "{llm_output}"
json_path: "opportunities[0].name"
inject_as: best_opportunityHandlebars-style template engine with {{#each}}, {{#if}}, nested paths.
- type: template_render
template: |
{{#each items}}
- {{name}}: {{value}}
{{/each}}
{{#if show_total}}Total: {{total}}{{/if}}
data:
items: [{ name: "A", value: 1 }, { name: "B", value: 2 }]
show_total: true
total: 3
inject_as: formattedCompare two texts — returns similarity score and drift detection.
- type: embed_compare
text_a: "{previous_analysis}"
text_b: "{current_analysis}"
inject_as: drift_report
# Returns: {"similarity": 0.73, "verdict": "partially_changed", "new_keywords": [...]}Check token budget mid-execution. Skip or warn if over budget.
- type: cost_gate
budget_usd: 0.50
action: "warn" # "warn" | "skip" | "downgrade"
inject_as: budget_status
# Returns: {"status": "within_budget", "spent_usd": 0.12, "remaining_usd": 0.38}Multi-channel notifications (Slack, Discord, Telegram, webhook).
- type: notify
channel: "slack" # slack | discord | telegram | webhook
webhook_url: "{env:SLACK_WEBHOOK}"
message: "Chain found {count} results"
inject_as: notification_statusSend email via SMTP or SendGrid.
- type: email
to: "user@example.com"
subject: "Alert: {input.topic}"
content: "Details: {analysis}"
provider: sendgrid # smtp (default) | sendgrid
inject_as: email_resultGenerate a shareable approval URL for human-in-the-loop workflows.
- type: approval_request
title: "Deploy to production?"
description: "Changes: {diff_summary}"
expires_hours: 4
inject_as: approval_info
# Returns: {"approve_url": "http://...", "approve_command": "curl ...", "token": "..."}- type: env_var
var_name: "API_KEY"
default_value: "demo_key" # Fallback if not set (default: "")
inject_as: key- type: current_datetime
timezone: "Europe/Paris" # IANA timezone (default: UTC)
format: "locale" # iso (default) | locale | unix
inject_as: nowRun commands in an isolated Docker container.
- type: sandbox_exec
image: "node:20-slim"
command: "npm install && npm test"
mount: "/project:/workspace"
timeout_ms: 60000
inject_as: test_resultspre_tools:
- type: http_fetch
url: "https://unreliable.api.com"
on_error: skip # "inject" (default) | "skip" | "fail"
retry: 3 # Retry 3 times with backoff
timeout_ms: 5000 # 5 second timeout
inject_as: data| Mode | Behavior |
|---|---|
inject |
Injects [PRE-TOOL ERROR: message] into prompt (default) |
skip |
Injects empty string — step continues cleanly |
fail |
Aborts the step with an error |
Pre-tools execute sequentially by default. Output of pre-tool A is available in pre-tool B:
pre_tools:
# Step 1: Get auth token
- type: env_var
var_name: API_TOKEN
inject_as: token
# Step 2: Use token in API call (chaining!)
- type: http_fetch
url: "https://api.example.com/data"
headers:
Authorization: "Bearer {token}"
inject_as: api_dataMark pre-tools as parallel: true to run them simultaneously:
pre_tools:
- type: web_search
query: "topic A news"
parallel: true # ← runs in parallel
inject_as: news_a
- type: web_search
query: "topic B news"
parallel: true # ← runs in parallel
inject_as: news_b
- type: bash
command: "echo done" # ← sequential (after parallel batch)
inject_as: status- Chain Format — YAML reference
- Step Types — How steps use pre-tool data
- MCP Client — External MCP server configuration
- Token Optimization — Pre-tools reduce LLM token usage