feat: structured output mode — constrain agent responses to a declared JSON schema by Copilot · Pull Request #28964 · github/gh-aw

Copilot · 2026-04-28T16:32:13Z

Agentic workflows currently produce freeform text, forcing downstream jobs to parse unstructured output with brittle regex or prompt-engineering hacks. This adds a first-class structured-output frontmatter field that declares a JSON Schema the agent's response must conform to, with compile-time validation and runtime enforcement.

Changes

New structured-output frontmatter field

Accepts either schema (inline JSON Schema object) or schema-file (repo-relative path to a .json file)
Exactly one must be specified; both or neither is a compile-time error
Schema is validated at compile time using santhosh-tekuri/jsonschema/v6 (already a project dependency)

Runtime behavior (generated YAML)

Pre-agent step writes the resolved schema to /tmp/gh-aw/structured-output-schema.json
GH_AW_STRUCTURED_OUTPUT_SCHEMA and GH_AW_STRUCTURED_OUTPUT_FILE env vars injected into all engine execution steps (copilot, claude, codex, crush, gemini, opencode) — parallel to the existing applySafeOutputEnvToMap pattern
Post-agent actions/github-script step (runs always()) reads /tmp/gh-aw/structured-output.json, validates it is well-formed JSON, and sets the structured_output step output

Codex native enforcement

When structured-output is configured, --output-schema <schema-path> is passed to the Codex exec command, constraining the model at the token-sampling level to produce schema-conformant JSON (belt-and-suspenders on top of the file-write convention)
Not applied for detection runs

Agent job output

structured_output added to the agent job's outputs map, exposing the compact validated JSON string to downstream jobs via ${{ needs.agent.outputs.structured_output }}
Structured output file included in the agent artifact for traceability

Data model

StructuredOutputConfig added to FrontmatterConfig and WorkflowData
Extraction happens in extractAdditionalConfigurations alongside cache-memory, repo-memory, and mcp-scripts

Example

---
on:
  issues:
    types: [labeled]
engine:
  id: copilot

structured-output:
  schema:
    type: object
    properties:
      decision:
        type: string
        enum: [APPROVE, REQUEST_CHANGES, ESCALATE]
      reasoning:
        type: string
        minLength: 10
      confidence:
        type: number
        minimum: 0
        maximum: 1
    required: [decision, reasoning, confidence]
    additionalProperties: false
---

Downstream job:

jobs:
  route:
    needs: agent
    steps:
      - run: |
          DECISION=$(echo '${{ needs.agent.outputs.structured_output }}' | jq -r .decision)
          if [ "$DECISION" = "ESCALATE" ]; then
            gh issue edit ${{ github.event.issue.number }} --add-label "needs-human"
          fi

Alternatively, reference a schema file to keep large schemas out of the frontmatter:

structured-output:
  schema-file: .github/schemas/triage-output.schema.json

…lared JSON schema Resolves: #structured-output-mode - New `structured-output` frontmatter field accepting `schema` (inline) or `schema-file` (file path) containing a JSON Schema draft-07 object - Compile-time validation of the schema using santhosh-tekuri/jsonschema/v6 - Pre-agent step writes schema to /tmp/gh-aw/structured-output-schema.json - GH_AW_STRUCTURED_OUTPUT_SCHEMA and GH_AW_STRUCTURED_OUTPUT_FILE env vars injected into all engine execution steps (copilot, claude, codex, crush, gemini, opencode) - Post-agent validation step (actions/github-script, always runs) checks /tmp/gh-aw/structured-output.json is well-formed JSON and sets the `validate_structured_output` step output - Agent job exposes `structured_output` output for downstream job consumption - JSON schema updated with `structured-output` property definition - StructuredOutputConfig added to FrontmatterConfig and WorkflowData - Unit tests covering all new functions Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d1463ae7-cb32-4831-981b-39e1993c2cbc Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>

…shellEscapeSingleQuote Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d1463ae7-cb32-4831-981b-39e1993c2cbc Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>

lpcox

Review: Engine-Level Structured Output Support

This PR implements structured output as a file-based convention — the agent is given env vars pointing to a schema file and an output file path, then a post-agent step validates the JSON. This is engine-agnostic but leaves significant native capabilities unused.

How Each Engine Supports Structured Outputs Natively

Copilot CLI — ❌ No native structured output support

Copilot CLI does not expose a --response-format or JSON-mode flag
The env-var + file-write approach is the only viable path
The agent must be instructed via system prompt to write JSON to the file path

Claude CLI — ⚠️ Partial native support available but unused

Claude CLI (claude) supports --output-format json which constrains the final response to valid JSON
The Anthropic API natively supports tool_use with structured schemas and prefilled responses for JSON mode
Gap: This PR could pass --output-format json to the Claude CLI command when structured output is configured, providing a belt-and-suspenders guarantee (native JSON mode + post-validation)

Codex CLI — ⚠️ Partial native support available but unused

OpenAI's API supports response_format: { type: "json_schema", json_schema: {...} } which provides guaranteed schema-conformant output at the model level
The Codex CLI does not currently expose this as a flag, but OpenAI's structured outputs are the gold standard for this feature — the model is constrained at the token sampling level
Gap: If/when Codex CLI adds a --response-format flag, this PR's architecture should pass the schema through

Gemini — ⚠️ Native support exists

Gemini API supports response_schema in generation config for JSON mode
Gemini CLI may not expose this yet

Crush/OpenCode — Unknown/varies

Summary of Concerns

The PR relies entirely on prompt-engineering + post-hoc validation — the schema file is written to disk and env vars are set, but no engine is actually told to produce structured output. The agent must discover the env vars and voluntarily write JSON. There's no mechanism in the system prompt injection to instruct the agent about this requirement.
No runtime schema validation — The post-agent step only checks JSON.parse() succeeds. It does NOT validate against the declared schema. The schema is validated at compile time but never enforced at runtime. A conforming-JSON-but-wrong-schema response would pass silently.
Missing prompt injection — How does the agent know to write JSON to $GH_AW_STRUCTURED_OUTPUT_FILE? The env vars are set, but unless the system prompt or agent instructions reference them, most engines will produce freeform text. Is there a separate PR that adds the system prompt integration?
Native capabilities create stronger guarantees — For Claude and OpenAI/Codex, model-level JSON constraints prevent malformed output entirely (no retry needed). The current approach allows the agent to produce invalid output, fail validation, and waste the entire run.

Recommendation

The architecture is sound as a lowest-common-denominator approach, but the PR should:

Add runtime schema validation (not just JSON.parse) in the post-agent step
Document which engines need prompt-based instruction vs. which could use native structured output in future
Clarify how the agent discovers and acts on the env vars (system prompt integration)

pelikhan · 2026-04-28T22:28:57Z

@copilot configure codex to use the cli argument --output-schema

https://www.mintlify.com/openai/codex/cli/exec?utm_source=chatgpt.com#param-output-schema

pelikhan · 2026-04-28T22:34:19Z

Aside from codex none of these are native, it will not work well. We need to take into the SDK and provide the Schema into the LLM inference calls.

pelikhan

Unlikely to work properly

…t is configured When structured-output is configured and it is not a detection run, the Codex exec command now includes --output-schema <schema-path> so the model is constrained at the token-sampling level to produce schema-conformant JSON in addition to the file-write convention enforced by the post-agent validation step. Ref: https://openai.github.io/codex/cli/exec#param-output-schema Agent-Logs-Url: https://github.com/github/gh-aw/sessions/2811a83a-be95-4f17-a521-37792e0ae26d Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-04-28T22:37:16Z

@copilot configure codex to use the cli argument --output-schema

https://www.mintlify.com/openai/codex/cli/exec?utm_source=chatgpt.com#param-output-schema

Done in 5aa6e43. When structured-output is configured and it's not a detection run, the Codex exec command now includes --output-schema /tmp/gh-aw/structured-output-schema.json — the same schema file written by the pre-agent setup step. This provides native model-level JSON constraint on top of the existing env-var + post-validation layer.

pelikhan · 2026-04-28T22:58:16Z

It's not well supported so far. Looks like a job for sub-agents.

Initial plan

19cb3da

Copilot AI assigned Copilot and lpcox Apr 28, 2026

Copilot started work on behalf of lpcox April 28, 2026 16:32 View session

Copilot AI linked an issue Apr 28, 2026 that may be closed by this pull request

Structured output mode — constrain agent responses to a declared JSON schema #28963

Open

Copilot AI and others added 2 commits April 28, 2026 17:05

fix: address code review — fix comment typo, add edge case tests for …

a917305

…shellEscapeSingleQuote Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d1463ae7-cb32-4831-981b-39e1993c2cbc Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add structured output mode to constrain agent responses~~ feat: structured output mode — constrain agent responses to a declared JSON schema Apr 28, 2026

Copilot AI requested a review from lpcox April 28, 2026 17:14

Copilot finished work on behalf of lpcox April 28, 2026 17:14

This was referenced Apr 28, 2026

[Contribution Check Report] Contribution Check — 2026-04-28 #28835

Closed

[aw] No-Op Runs #28859

Closed

lpcox reviewed Apr 28, 2026

View reviewed changes

Copilot started work on behalf of pelikhan April 28, 2026 22:29 View session

pelikhan requested changes Apr 28, 2026

View reviewed changes

Copilot finished work on behalf of pelikhan April 28, 2026 22:38

Copilot AI requested a review from pelikhan April 28, 2026 22:38

pelikhan closed this Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: structured output mode — constrain agent responses to a declared JSON schema#28964

feat: structured output mode — constrain agent responses to a declared JSON schema#28964
Copilot wants to merge 4 commits intomainfrom
copilot/add-structured-output-mode

Copilot AI commented Apr 28, 2026 •

edited

Loading

Uh oh!

lpcox left a comment

Uh oh!

pelikhan commented Apr 28, 2026

Uh oh!

pelikhan commented Apr 28, 2026

Uh oh!

pelikhan left a comment

Uh oh!

Copilot AI commented Apr 28, 2026

Uh oh!

pelikhan commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Example

Uh oh!

lpcox left a comment

Choose a reason for hiding this comment

Review: Engine-Level Structured Output Support

How Each Engine Supports Structured Outputs Natively

Summary of Concerns

Recommendation

Uh oh!

pelikhan commented Apr 28, 2026

Uh oh!

pelikhan commented Apr 28, 2026

Uh oh!

pelikhan left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Apr 28, 2026

Uh oh!

pelikhan commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 28, 2026 •

edited

Loading