Skip to content

v0.13.0

Choose a tag to compare

@initializ-mk initializ-mk released this 07 Jun 16:36
· 28 commits to main since this release

Forge v0.13.0 — A2A 0.3.0 conformance, audit pipeline hardening, and configurable platform policy

Published: 2026-06-07

Forge v0.13.0 is a feature release covering every commit merged to main
since v0.12.0
on 2026-05-25. 119 files changed, +16,324 / −566 lines,
13 issues closed, 10 numbered workstreams (FWS-1 through FWS-10)
plus the new Claude knowledge skills.

If you operate Forge in production, this release lands the
verifiable version of the security and observability claims the
project has made since v0.10: an A2A 0.3.0 spec-conformant Agent Card,
a documented schema-versioned audit contract with monotonic sequence
numbers and a metadata-only invariant pinned by regression tests, a
dedicated Unix Domain Socket sink for audit export, a three-layer
platform policy (system / user / workspace) with first-match-wins
attribution, configurable per-IP rate limiting with sensible
orchestration-friendly defaults, and a one-line stream split so SIEM
pipelines can route ops logs and audit NDJSON without parsing
payloads.

The full per-PR detail lives in CHANGELOG.md.


Highlights

  • A2A 0.3.0 spec-conformant Agent Card at the canonical
    /.well-known/agent-card.json path. Every required field populated
    from forge.yaml and SKILL.md. Legacy /.well-known/agent.json
    alias preserved with a Deprecation: true header per RFC 8594.
  • Hardened audit emission: every event carries
    schema_version: "1.0" and a monotonic per-invocation seq so
    consumers detect gaps and reordering by grouping
    (correlation_id, task_id). The default audit posture is
    metadata-only (token counts, sizes, durations — never raw prompt or
    completion text or tool args/results), and a TestNoPayloadByDefault_LLMCall
    regression test pins the invariant.
  • Audit event export via Unix Domain Socket sink (preferred) or
    localhost HTTP fallback. Fire-and-forget, 50ms per-event timeout,
    exponential reconnect backoff, periodic audit_export_status event
    every 60s carrying per-sink health.
  • Three-layer platform policy. The single-layer
    FORGE_PLATFORM_POLICY env from v0.12 is now joined by a system
    layer (/etc/forge/policy.yaml, sysadmin) and a user layer
    (~/.forge/policy.yaml, developer). Deny lists union; max-bounds
    use smallest non-zero; audit attribution credits the first layer
    to deny.
  • Channel deny is first-class. A new denied_channels field in
    the platform policy schema skips named adapters at startup with a
    channel_denied_by_policy audit event. forge channel disable /
    enable edit the user layer; --system retargets to the system
    layer.
  • Per-IP rate limit is configurable end-to-end via
    server.rate_limit: in forge.yaml, matching --rate-limit-*
    CLI flags, and FORGE_RATE_LIMIT_* env vars. Resolution order:
    CLI > env > yaml > defaults. Defaults bumped for orchestrated
    workloads
    — write side now 60/min sustained with burst 20 (was
    10/min, burst 3). tasks/cancel is exempt from the write bucket
    by default
    so a cost-ceiling cancel burst never throttles the
    recovery mechanism.
  • Workflow correlationX-Workflow-ID, X-Workflow-Stage-ID,
    X-Workflow-Step-ID, and X-Invocation-Caller headers are extracted
    on every request and stamped on every audit event for multi-agent
    flow tracing.
  • Token usage in audit and response headers. Every llm_call
    event carries input_tokens / output_tokens / duration_ms /
    model / provider; every A2A response carries
    X-Forge-Tokens-In/Out, X-Forge-Duration-Ms, X-Forge-Model,
    X-Forge-Provider for cost-enforcement pipelines.
  • tasks/cancel actually cancels. The JSON-RPC tasks/cancel
    method now signals an in-flight invocation via
    context.CancelCauseFunc with a typed reason. The agent loop
    honors cancellation at iteration and tool-call boundaries. An
    invocation_cancelled audit event closes the cancelled invocation
    with the reason + partial token totals.
  • Ops logs to stdout, audit to stderr. Container log collectors
    and SIEM pipelines can now split the two at the stream level —
    no payload parsing.
  • New Claude-side reference: .claude/skills/forge.md ships a
    single self-contained knowledge file for Forge, and
    .claude/skills/forge-skill-builder.md ports the Web UI Skill
    Builder's prompt for use in any Claude chat. The sync-docs
    workflow keeps both current with the code.

What's new

1. A2A 0.3.0 Agent Card conformance (FWS-1, #85)

Forge's Agent Card now conforms to the A2A 0.3.0 spec end-to-end:
canonical path, required-field shape, security schemes derived from
the configured auth chain, and SKILL.md skills bridged into the
published card at both forge build and forge run time.

What Where
Canonical path GET /.well-known/agent-card.json
Legacy alias GET /.well-known/agent.json — same body + Deprecation: true + Link: rel=successor-version per RFC 8594. Removable one release after this ships.
Required fields name, description, url, version (from forge.yaml), protocolVersion: "0.3.0", defaultInputModes, defaultOutputModes, skills[], capabilities, securitySchemes, security
Skills mapping SKILL.md frontmatter (name, description, category, tags) → A2A AgentSkill objects with deterministic ID-sorted output so card bytes are stable across restarts
Security schemes Derived from auth.providersoidc/azure_adopenIdConnect with discovery URL; static_token/http_verifierhttp+bearer; gcp_iapapiKey in header; aws_sigv4http+bearer with bearerFormat: "forge-aws-v1"
New audit event agent_card_published at startup + hot-reload, carrying name, version, protocol_version, url, skill_count, capabilities, security_schemes, card_size_bytes, card_sha256 — consumers detect config drift across deploys

See docs/reference/a2a-agent-card.md.

2. Workflow correlation ID threading (FWS-2, #86)

Orchestrators (initializ Command, custom controllers, GitOps tooling)
that fire multi-agent workflows now get end-to-end audit traceability
without any per-agent code change. Inbound requests carrying
X-Workflow-ID / X-Workflow-Stage-ID / X-Workflow-Step-ID /
X-Invocation-Caller headers have those values extracted into the
request context, threaded through the agent loop, and stamped on
every audit event the invocation produces — invocation_complete,
llm_call, tool_exec, etc.

The emitted JSON is byte-for-byte identical to the pre-FWS-2 shape
for direct (non-orchestrated) calls, so consumers reading the v0.12
audit stream continue to work unchanged.

See docs/security/workflow-correlation.md.

3. Token usage + execution duration in audit (FWS-3, #87)

Every llm_call audit event now carries input_tokens,
output_tokens, model, provider, duration_ms, and request_id
captured directly from provider response metadata (Anthropic, OpenAI,
Ollama via the OpenAI-compatible path, OpenAI Responses). Field naming
aligns with OTel GenAI semantic conventions
(gen_ai.usage.input_tokens / gen_ai.usage.output_tokens) so audit
consumers correlate to OTel traces without a translation table.

When a provider returns no usage (some self-hosted Ollama setups),
the event flags tokens_unavailable: true rather than silent zeros.

Each tool_exec event gains duration_ms plus structured arg-shape
metadata (args_size, result_size). Raw arg values are not emitted
(that's FWS-8's payload-stripping invariant, not FWS-3's).

A new invocation_complete audit event closes every A2A invocation
with the wall-clock duration and aggregated input_tokens_total /
output_tokens_total / llm_call_count. Response headers
X-Forge-Tokens-In, X-Forge-Tokens-Out, X-Forge-Duration-Ms,
X-Forge-Model, X-Forge-Provider carry the same totals so
cost-enforcement layers act without parsing the body.

4. Cancellation signal handling (FWS-4, #88)

The A2A tasks/cancel JSON-RPC method now actually cancels in-flight
invocations instead of merely flipping the stored task state. A
per-Runner CancellationRegistry tracks every active invocation by
task ID; the cancel handler signals the registered
context.CancelCauseFunc with a typed reason
(workflow_failure / cost_limit_exceeded / timeout /
external_signal).

The agent loop honors cancellation at the iteration boundary and
between tool calls within an iteration, so cancellation latency is
bounded by the current LLM call or tool exec.

A new invocation_cancelled audit event closes every cancelled
invocation with the classified reason, duration_ms up to
cancellation, and partial token totals consumed before the signal.
The A2A response carries state canceled plus a cancelled: <reason>
message so the orchestrator can react. Cancel-after-complete is
idempotent.

5. Three-layer platform policy + channel scope (FWS-5 / FWS-6, #89 + #90)

The single-layer FORGE_PLATFORM_POLICY env from v0.12.0 is now
joined by two more layers so different operators with different
trust contexts can each express their bounds independently.

Layer Path Set by
system /etc/forge/policy.yaml (override FORGE_SYSTEM_POLICY) Sysadmin (MDM, corporate image, Ansible)
user ~/.forge/policy.yaml Developer (via forge channel disable or Web UI chip)
workspace path at FORGE_PLATFORM_POLICY Operator (Initializ Command, GitOps tooling); forge package pre-wires this env into generated Kubernetes manifests

Resolution semantics:

  • Deny lists (denied_egress_domains, denied_tools,
    forbidden_models, denied_channels) → union across layers
  • Max bounds (max_egress_allowlist_size, max_tool_count) →
    smallest non-zero value wins
  • Audit attributionfirst layer to deny in load order
    (system → user → workspace) takes credit
    , so operators grepping
    layer=system see every sysadmin-enforced violation without false
    positives from per-user overrides

Channel deny is now first-class: denied_channels in any layer skips
the named adapter at startup with a channel_denied_by_policy
audit event. The runner continues with the remaining channels —
channel skip is non-fatal, unlike egress / tool / model violations
which abort startup.

CLI:

forge channel disable slack             # edits ~/.forge/policy.yaml (user layer)
forge channel disable slack --system    # edits /etc/forge/policy.yaml (system layer; warns when not root)
forge channel enable slack              # removes from the user layer

The Web UI surfaces all three layers via new GET/PUT /api/user-policy
endpoints. The agent card renders denied channels as locked / dimmed
chips with a tooltip naming the enforcing layer.

See docs/security/platform-policy.md
and examples/platform-policy.yaml.

6. Audit event export — UDS sink + HTTP fallback (FWS-7, #95)

Audit events can now be exported to a local Unix Domain Socket
(preferred) or localhost HTTP endpoint in addition to the existing
NDJSON-to-stderr stream. An in-pod sidecar (the initializ platform
receiver or a customer SIEM) can consume audit at low latency while
stderr stays as the safety-net fallback.

Configure via:

Flag Env Purpose
--audit-socket FORGE_AUDIT_SOCKET Unix Domain Socket path (preferred)
--audit-http-endpoint FORGE_AUDIT_HTTP_ENDPOINT localhost HTTP POST endpoint (fallback)
--audit-write-timeout FORGE_AUDIT_WRITE_TIMEOUT per-event timeout (default 50ms)

Behavior:

  • Lazy connect — the socket need not exist at agent startup; the
    first emit dials. Sidecar deploys that come up later pick up future
    events without restarting the agent.
  • Per-event timeout — 50ms default. Beyond that the event drops
    and increments drops_timeout. A slow sidecar cannot back-pressure
    the agent.
  • Exponential backoff between failed dials — 100ms → 200ms → … →
    5s cap. During backoff, writes drop without dialing.
  • No buffering — the sink is fire-and-forget; buffering is the
    sidecar's job.
  • Byte parity — events leaving the export sink are byte-identical
    to events leaving stderr. No sink transforms the payload.

A periodic audit_export_status event fires every 60s carrying
per-sink writes_ok, drops_timeout, drops_dial, and connected
counters so operators tail the audit stream itself to confirm export
health.

See docs/security/audit-logging.md § Audit Event Export.

7. Hardened audit emission — schema, sequence, payload (FWS-8, #91)

The audit event schema is now a stable, versioned contract.

  • Every event carries schema_version: "1.0". The schema is
    additive-by-default; the version bumps only on removals or semantic
    changes.
  • Every event emitted on behalf of an A2A invocation carries a
    monotonic seq field starting at 1. Consumers detect gaps
    (lost events) and reordering by grouping
    (correlation_id, task_id). Startup events
    (policy_loaded, agent_card_published, audit_export_status)
    omit seq (no invocation scope).
  • The default audit posture stays metadata-only — token counts,
    sizes, durations, tool names. No prompt text, no completion text,
    no raw tool args / results.
    A TestNoPayloadByDefault_LLMCall
    regression test pins the invariant — any future caller that
    smuggles raw user content into a default event fails the test.
  • Customers who need raw payloads (debugging incidents,
    supervised-learning corpora, compliance replay) opt in field by
    field via AuditPayloadCapture on the runner config:
    LLMMessages / LLMResponse / ToolArgs / ToolResult. Captured
    strings are truncated to a per-field byte cap (default 16 KiB) with
    a …[truncated:N] marker so a runaway prompt cannot bloat one
    audit event.

Audit-event signing is deferred per the issue's architectural
recommendation — sequence numbers cover gap detection in the
meantime, and schema_version gives signing a stable place to land
when added.

See docs/security/audit-logging.md § Schema contract.

8. Ops logger output to stdout (FWS-9, #100)

forge run / forge serve use the OS streams as a stream-level
audit-vs-ops split so container log collectors and SIEM pipelines can
route the two concerns separately without parsing any payload.

Stream Carries Consumer
stdout Ops logs — startup banner, request lines, runtime errors via r.logger.Info/Warn/Error (JSONLogger) Container log collector / local debugging
stderr Audit NDJSON — every event constant. After FWS-7, also lands on the dedicated UDS / HTTP sink in parallel (stderr stays as the safety-net fallback) SIEM pipeline today / FWS-7 sink primary

Operator migration: if you previously captured ops logs via
forge run 2> ops.log, switch to forge run > ops.log (and
2> audit.log for audit). Container deployments that capture both
streams via the runtime's standard log collector are unaffected.

Interactive CLI commands (forge init, forge build,
forge channel) keep writing user-facing warnings + errors to stderr
— those are UX messages, not server ops logs, and the stream-split
policy doesn't apply.

9. Rate limit configurability + cancel exemption (FWS-10, #110)

The per-IP A2A rate limiter from #31 is now configurable end-to-end
via a new top-level server.rate_limit: block in forge.yaml,
matching --rate-limit-* CLI flags, and FORGE_RATE_LIMIT_* env vars.

Field New default Previous
read_rps 1.0 (60/min) unchanged
read_burst 10 unchanged
write_rps 1.0 (60/min) 10/60 ≈ 10/min
write_burst 20 3
cancel_exempt true (didn't exist)

Why the bump. The old defaults predated parallel workflow
execution and cron bursts — a 10-step parallel stage was serialized
after the 3rd dispatch. The new 60/min sustained + burst 20 absorbs
orchestrator dispatch without throttling and still bounds anonymous
public-internet DoS at 1 task/sec sustained.

Why cancel exemption. The cost-ceiling cancel-burst case
(orchestrator firing N parallel tasks/cancel when a workflow budget
trips) was hitting -32603: rate limit exceeded at exactly the moment
cancellation matters most. DoS via cancel-spam is naturally bounded by
the registry's O(1) unknown-task lookup, so exempting cancel from the
write bucket is safe by default. Configurable via
cancel_exempt: false for stricter threat models.

Resolution per field is CLI > env > yaml > defaults.

See docs/reference/forge-yaml-schema.md § server.rate_limit.

10. Workspace-level Skill Builder LLM config (#92)

The Web UI Skill Builder now resolves its LLM from a workspace-level
.forge/ui.yaml (with ~/.forge/ui.yaml as fallback), decoupled from
any agent's runtime model. Operator picks the model verbatim — no
hardcoded "upgrade" to gpt-4.1 or claude-opus-4-6. Credentials are
threaded as request-scoped values; the pre-#92 os.Setenv leak that
caused cross-agent credential stomping when switching agents in the UI
is removed.

See docs/ui/skill-builder-llm.md.

11. Claude knowledge skills (PR #118)

Two new Markdown files under .claude/skills/ make Forge knowledge
loadable into any Claude conversation:

  • .claude/skills/forge.md — single comprehensive knowledge skill.
    Architecture, security model, A2A protocol, CLI surface, forge.yaml
    schema, audit pipeline, how to build agents + skills, FWS-1–FWS-10
    recap, docs map, recipes. ~600 lines, table of contents, every
    claim cross-links to the canonical doc.
  • .claude/skills/forge-skill-builder.md — byte-identical port of
    the Web UI Skill Builder's system prompt
    (forge-ui/skill_builder_context.go:skillBuilderSystemPrompt).
    Author a valid SKILL.md in any Claude chat without running
    forge ui.

The sync-docs workflow keeps both current with the code; the
byte-identical invariant for the skill-builder file is documented as
a rule in .claude/commands/sync-docs.md.


New audit events

Seven new event constants land in this release. All carry the FWS-8
schema_version and seq fields when emitted in an invocation scope.

Event When
agent_card_published Agent Card finalized at startup + hot-reload. card_size_bytes, card_sha256 for drift detection. (FWS-1)
invocation_complete A2A invocation closed; duration_ms, token totals, llm_call_count. (FWS-3)
llm_call_cancelled Streaming LLM call aborted mid-flight; partial usage counts. (FWS-3 / FWS-4)
invocation_cancelled A2A invocation cancelled via tasks/cancel; classified reason + partial token totals. (FWS-4)
policy_loaded One per non-empty platform-policy layer at startup; layer, source, per-list size counters. (FWS-5 / FWS-6)
policy_violation_at_build_time forge.yaml conflicts with a policy layer at startup. violation_kind, offending_value, forge_yaml_field, layer, source. (FWS-5 / FWS-6)
channel_denied_by_policy Channel adapter skipped at startup. channel, layer, source. (FWS-6)
audit_export_status Every 60s when an export sink is configured. Per-sink writes_ok, drops_timeout, drops_dial, connected. (FWS-7)

See the full event-type table in
docs/security/audit-logging.md § Event Types.


New CLI flags and env vars

Flag (on forge run / forge serve start) Env var Purpose
--audit-socket FORGE_AUDIT_SOCKET Unix Domain Socket export path (FWS-7)
--audit-http-endpoint FORGE_AUDIT_HTTP_ENDPOINT Localhost HTTP fallback (FWS-7)
--audit-write-timeout FORGE_AUDIT_WRITE_TIMEOUT Per-event sink timeout, default 50ms (FWS-7)
--rate-limit-read-rps FORGE_RATE_LIMIT_READ_RPS Per-IP read RPS (FWS-10)
--rate-limit-read-burst FORGE_RATE_LIMIT_READ_BURST Per-IP read burst (FWS-10)
--rate-limit-write-rps FORGE_RATE_LIMIT_WRITE_RPS Per-IP write RPS (FWS-10)
--rate-limit-write-burst FORGE_RATE_LIMIT_WRITE_BURST Per-IP write burst (FWS-10)
--rate-limit-cancel-exempt FORGE_RATE_LIMIT_CANCEL_EXEMPT Exempt tasks/cancel from the write bucket (FWS-10)

forge serve start forwards each flag to the forked forge run; env
vars flow through os.Environ() regardless.


forge.yaml schema additions

New top-level server: block:

server:
  rate_limit:
    read_rps: 1.0
    read_burst: 10
    write_rps: 1.0
    write_burst: 20
    cancel_exempt: true

All five fields are optional; omitting them inherits the new defaults.
See docs/reference/forge-yaml-schema.md § server.rate_limit.


Removed

  • Per-agent disabled_channels: field on forge.yaml. FWS-6
    shipped this in its first cut, then replaced it with the three-layer
    policy's denied_channels. Migration: move any
    disabled_channels: [X] from forge.yaml into
    ~/.forge/policy.yaml's denied_channels: (developer scope),
    /etc/forge/policy.yaml (laptop-wide), or the workspace ConfigMap
    (deployed-agent). forge channel disable <name> does this
    automatically.
  • channel_disabled_by_config audit event.
    channel_denied_by_policy (with layer attribution) carries every
    skip now.

Upgrade notes

  1. No action required for existing deployments that don't use the
    new audit export sinks, platform policy, or server.rate_limit.
    Defaults are backward-compatible and metadata-only audit is
    unchanged on the wire.

  2. If you previously captured ops logs via forge run 2> ops.log,
    switch to forge run > ops.log (and 2> audit.log for audit). The
    FWS-9 stream split routes ops to stdout. Container deployments that
    capture both streams via the runtime's standard log collector are
    unaffected.

  3. If your SIEM grouped audit by correlation_id alone, you can
    now add seq for gap detection. The schema is additive — no
    existing field changed shape.

  4. If you carry a per-agent disabled_channels: block in your
    forge.yaml, migrate it to denied_channels: in
    ~/.forge/policy.yaml (per-developer) or the workspace policy
    ConfigMap (per-deployment). forge channel disable <name> does
    this for you.

  5. Rate-limit defaults are more permissive. If you previously
    relied on the WriteBurst=3 ceiling for DoS protection, set
    server.rate_limit.write_burst: 3 in your forge.yaml to restore
    the prior behavior. The new 20-burst default targets
    orchestration-friendly workloads.

  6. Audit-event signing is deferred to a future release per the
    FWS-8 architectural recommendation. Sequence numbers cover gap
    detection in the meantime.


Merged pull requests

PR Title
#118 docs(claude): ship the forge knowledge skill + skill-builder skill (+ sync-docs hook)
#117 feat(server): rate-limit configurability + orchestration-friendly defaults + cancel exemption (FWS-10)
#116 chore(runtime): ops logs to stdout — stream separation from audit (FWS-9)
#115 feat(runtime): hardened audit emission — sequence numbers, schema version, opt-in payload capture (FWS-8)
#114 feat(runtime): audit event export — UDS sink + HTTP fallback (FWS-7)
#113 feat(security): three-layer platform policy + channel scope (FWS-6)
#112 feat(security): platform policy enforcement at runtime (FWS-5)
#111 feat(runtime): cancellation signal handling (FWS-4)
#109 feat(runtime): token usage + execution duration emission (FWS-3)
#108 feat(runtime): workflow correlation ID threading (FWS-2)
#107 feat(a2a): A2A 0.3.0 Agent Card conformance (FWS-1)
#84 feat(ui): workspace-level skill-builder LLM config

(PR numbers above the v0.12 release reflect the per-feature PRs merged
since then. The CHANGELOG entry for each FWS workstream lives in
CHANGELOG.md.)


Issues closed

  • FWS-1 → #85
  • FWS-2 → #86
  • FWS-3 → #87
  • FWS-4 → #88
  • FWS-5 → #89
  • FWS-6 → #90
  • FWS-7 → #95
  • FWS-8 → #91
  • FWS-9 → #100
  • FWS-10 → #110
  • Custom-provider regression → #83
  • Skill Builder LLM workspace config → #92

Documentation


About Forge

Forge is an open-source runtime for building, packaging, and operating
LLM-backed agents that do real work. The design commitment is three
properties at once — atomicity (explicit skills, declared tools,
declared dependencies), security (restricted egress, encrypted
secrets, end-to-end audit), and portability (the same agent runs
locally, in a container, or in Kubernetes with the same forge.yaml).
The runtime speaks A2A 0.3.0 over JSON-RPC and REST, integrates with
multiple LLM providers (Anthropic, OpenAI, Ollama, OpenAI-compatible)
behind a common interface, and ships with a pluggable channel system
(Slack / Telegram / MS Teams) plus an MCP client for external tool
servers.


Full diff: v0.12.0...v0.13.0