Pre Tools

Pre-Tools

Pre-tools inject data into a step's prompt before the LLM runs. They execute in order and make their results available as {inject_as} variables.

OCC has 27 pre-tool types across 6 categories. All pre-tools support:

{variable} interpolation in string fields
on_error: "inject" | "skip" | "fail" — error handling
timeout_ms — per-pre-tool timeout (default: 30s)
retry: N — retry with exponential backoff
cache_ttl_minutes: N — cache result for N minutes
parallel: true — run in parallel with other parallel pre-tools

Pre-tools also support chaining: the output of pre-tool A is available as {inject_as} in pre-tool B.

Data Fetching

http_fetch

Full HTTP client with method, headers, auth, body, and JSON path extraction.

- type: http_fetch
  url: "https://api.example.com/search"
  method: POST                              # GET (default), POST, PUT, PATCH, DELETE
  headers:
    Authorization: "Bearer {token}"         # {variable} interpolation
    Content-Type: "application/json"
  body: '{"query": "{input.topic}"}'        # Request body (POST/PUT/PATCH)
  json_path: "data.results[0].name"         # Extract JSON path from response
  timeout_ms: 10000
  retry: 2
  on_error: fail
  inject_as: search_results

web_search

Search the web using Claude's built-in WebSearch tool.

- type: web_search
  query: "{input.topic} latest research"
  inject_as: search_results

mcp_call

Call a tool on an external MCP server (GitHub, Slack, PostgreSQL, etc.).

- type: mcp_call
  server: "github"
  tool: "search_repositories"
  args: { query: "{input.topic}" }
  inject_as: repos

Requires occ-mcp-servers.json config. See MCP Client.

db_query

SQL query via CLI (requires psql, mysql, or sqlite3 installed).

- type: db_query
  connection: "postgres://user:pass@localhost/mydb"
  sql: "SELECT * FROM users WHERE role = 'admin' LIMIT 10"
  inject_as: users

Supports: PostgreSQL (postgres://), MySQL (mysql://), SQLite (path.db).

parallel_fetch

Batch multiple URLs with rate limiting.

- type: parallel_fetch
  urls:
    - "https://api.example.com/page/1"
    - "https://api.example.com/page/2"
    - "https://api.example.com/page/3"
  rate_limit_ms: 200
  inject_as: all_pages

Files & Code

read_file

- type: read_file
  path: "/path/to/file.txt"
  encoding: "utf-8"          # Any Node.js encoding (default: utf-8)
  inject_as: content

write_file

- type: write_file
  path: "/tmp/output.txt"
  content: "{analysis}"
  append: true                # Append instead of overwrite (default: false)
  encoding: "utf-8"
  inject_as: file_path

bash

- type: bash
  command: "git log --oneline -10"
  stderr: true                # Capture stderr too (default: false)
  timeout_ms: 30000
  inject_as: git_log

diff_inject

Git diff — structured, LLM-optimized format with per-file summaries.

- type: diff_inject
  repo: "{repo_path}"
  base: "main"                # Base ref (default: main)
  head: "HEAD"                # Head ref (default: HEAD)
  max_tokens: 4000            # Max output size in estimated tokens
  inject_as: smart_diff

ast_parse

Extract code structure (functions, classes, imports, exports, types) using regex-based parsing.

- type: ast_parse
  path: "{repo_path}/src/index.ts"
  extract: ["functions", "classes", "exports", "types"]
  inject_as: code_structure

Supports: TypeScript/JavaScript, Python, Go.

ocr

Image → text via Tesseract.

- type: ocr
  image_path: "/tmp/document.png"
  language: "fra"             # Tesseract language code (default: eng)
  inject_as: extracted_text

screenshot

URL → PNG screenshot via Playwright.

- type: screenshot
  url: "https://example.com/dashboard"
  viewport: { width: 1440, height: 900 }
  wait_ms: 3000
  inject_as: screenshot_path

pdf_generate

HTML → PDF via wkhtmltopdf or Chrome headless.

- type: pdf_generate
  html: "<h1>Report</h1><p>{analysis}</p>"
  output_path: "/tmp/report.pdf"
  inject_as: pdf_path

State & Memory

state_load / state_save

Persistent key-value store across chain executions. Chains remember results between runs.

# Load state from a previous run
- type: state_load
  key: "last_scan_results"
  scope: "bounty-hunter"      # Chain name or "global" (default: current chain)
  default: "No previous data"
  inject_as: previous_results

# Save state for the next run (in a later step)
- type: state_save
  key: "last_scan_results"
  value: "{scan_results}"
  scope: "bounty-hunter"
  inject_as: save_status

vector_query / vector_index

Local semantic search via SQLite FTS5. RAG without external infrastructure.

# Index documents
- type: vector_index
  collection: "project_docs"
  source: "{document_content}"
  chunk_size: 512
  inject_as: index_status

# Query indexed documents
- type: vector_query
  collection: "project_docs"
  query: "authentication architecture"
  top_k: 5
  inject_as: relevant_docs

semantic_cache

Cache by semantic similarity (not exact hash). Re-running similar queries returns cached results.

- type: semantic_cache
  query: "{input.topic} market analysis"
  cache_ttl_minutes: 720
  similarity_threshold: 0.85
  inject_as: cached_analysis

graph_query

Knowledge graph with triples (subject → predicate → object).

# Write triples
- type: graph_query
  triples:
    - { subject: "OCC", predicate: "is_a", object: "orchestrator" }
    - { subject: "OCC", predicate: "uses", object: "Claude" }
  inject_as: write_status

# Read triples
- type: graph_query
  graph_query_subject: "OCC"
  inject_as: occ_facts

Data Processing

json_parse

Extract a field from JSON output (handles markdown-wrapped JSON).

- type: json_parse
  input: "{llm_output}"
  json_path: "opportunities[0].name"
  inject_as: best_opportunity

template_render

Handlebars-style template engine with {{#each}}, {{#if}}, nested paths.

- type: template_render
  template: |
    {{#each items}}
    - {{name}}: {{value}}
    {{/each}}
    {{#if show_total}}Total: {{total}}{{/if}}
  data:
    items: [{ name: "A", value: 1 }, { name: "B", value: 2 }]
    show_total: true
    total: 3
  inject_as: formatted

embed_compare

Compare two texts — returns similarity score and drift detection.

- type: embed_compare
  text_a: "{previous_analysis}"
  text_b: "{current_analysis}"
  inject_as: drift_report
  # Returns: {"similarity": 0.73, "verdict": "partially_changed", "new_keywords": [...]}

cost_gate

Check token budget mid-execution. Skip or warn if over budget.

- type: cost_gate
  budget_usd: 0.50
  action: "warn"              # "warn" | "skip" | "downgrade"
  inject_as: budget_status
  # Returns: {"status": "within_budget", "spent_usd": 0.12, "remaining_usd": 0.38}

Notifications

notify

Multi-channel notifications (Slack, Discord, Telegram, webhook).

- type: notify
  channel: "slack"            # slack | discord | telegram | webhook
  webhook_url: "{env:SLACK_WEBHOOK}"
  message: "Chain found {count} results"
  inject_as: notification_status

email

Send email via SMTP or SendGrid.

- type: email
  to: "user@example.com"
  subject: "Alert: {input.topic}"
  content: "Details: {analysis}"
  provider: sendgrid           # smtp (default) | sendgrid
  inject_as: email_result

approval_request

Generate a shareable approval URL for human-in-the-loop workflows.

- type: approval_request
  title: "Deploy to production?"
  description: "Changes: {diff_summary}"
  expires_hours: 4
  inject_as: approval_info
  # Returns: {"approve_url": "http://...", "approve_command": "curl ...", "token": "..."}

System

env_var

- type: env_var
  var_name: "API_KEY"
  default_value: "demo_key"   # Fallback if not set (default: "")
  inject_as: key

current_datetime

- type: current_datetime
  timezone: "Europe/Paris"     # IANA timezone (default: UTC)
  format: "locale"             # iso (default) | locale | unix
  inject_as: now

sandbox_exec

Run commands in an isolated Docker container.

- type: sandbox_exec
  image: "node:20-slim"
  command: "npm install && npm test"
  mount: "/project:/workspace"
  timeout_ms: 60000
  inject_as: test_results

Error Handling

pre_tools:
  - type: http_fetch
    url: "https://unreliable.api.com"
    on_error: skip       # "inject" (default) | "skip" | "fail"
    retry: 3             # Retry 3 times with backoff
    timeout_ms: 5000     # 5 second timeout
    inject_as: data

Mode	Behavior
`inject`	Injects `[PRE-TOOL ERROR: message]` into prompt (default)
`skip`	Injects empty string — step continues cleanly
`fail`	Aborts the step with an error

Pre-Tool Chaining

Pre-tools execute sequentially by default. Output of pre-tool A is available in pre-tool B:

pre_tools:
  # Step 1: Get auth token
  - type: env_var
    var_name: API_TOKEN
    inject_as: token

  # Step 2: Use token in API call (chaining!)
  - type: http_fetch
    url: "https://api.example.com/data"
    headers:
      Authorization: "Bearer {token}"
    inject_as: api_data

Parallel Execution

Mark pre-tools as parallel: true to run them simultaneously:

pre_tools:
  - type: web_search
    query: "topic A news"
    parallel: true           # ← runs in parallel
    inject_as: news_a
  - type: web_search
    query: "topic B news"
    parallel: true           # ← runs in parallel
    inject_as: news_b
  - type: bash
    command: "echo done"     # ← sequential (after parallel batch)
    inject_as: status

Pre Tools

Pre-Tools

Data Fetching

http_fetch

web_search

mcp_call

db_query

parallel_fetch

Files & Code

read_file

write_file

bash

diff_inject

ast_parse

ocr

screenshot

pdf_generate

State & Memory

state_load / state_save

vector_query / vector_index

semantic_cache

graph_query

Data Processing

json_parse

template_render

embed_compare

cost_gate

Notifications

notify

email

approval_request

System

env_var

current_datetime

sandbox_exec

Error Handling

Pre-Tool Chaining

Parallel Execution

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally