diff --git a/docs/adr/29721-extract-focused-helpers-from-post-agent-compiler.md b/docs/adr/29721-extract-focused-helpers-from-post-agent-compiler.md new file mode 100644 index 0000000000..2d08286bb6 --- /dev/null +++ b/docs/adr/29721-extract-focused-helpers-from-post-agent-compiler.md @@ -0,0 +1,69 @@ +# ADR-29721: Extract Focused Helpers from Post-Agent Workflow Compiler Function + +**Date**: 2026-05-02 +**Status**: Draft +**Deciders**: pelikhan (copilot-swe-agent) + +--- + +## Part 1 — Narrative (Human-Friendly) + +### Context + +The `generatePostAgentCollectionAndUpload` function in `pkg/workflow/compiler_yaml_main_job.go` had grown to 203 lines handling over 10 unrelated concerns: artifact path collection (MCP logs, DIFC proxy, OTLP spans, safe-outputs, patches, firewall audit paths), `GITHUB_STEP_SUMMARY` log-parsing step emission, engine cleanup, access log upload, safe-outputs staging, token invalidation, dev-mode restore, and step-order validation. This violated the Single Responsibility Principle and made it impossible to unit-test individual concerns (e.g., path collection logic) without exercising the entire post-agent pipeline. + +### Decision + +We will decompose `generatePostAgentCollectionAndUpload` into three clearly scoped units: a pure `collectArtifactPaths` helper that returns all paths for the unified artifact upload without touching the YAML builder, a `generateSummarySteps` helper that emits all `GITHUB_STEP_SUMMARY` log-parsing steps, and a reduced `generatePostAgentCollectionAndUpload` orchestrator (~97 lines) that delegates to both helpers and retains only the remaining YAML writes. The primary driver is testability and maintainability: `collectArtifactPaths` is now a pure function (no side effects on the YAML builder) that can be unit-tested in isolation. + +### Alternatives Considered + +#### Alternative 1: Keep the monolithic function + +Leave `generatePostAgentCollectionAndUpload` as a single function and rely on comments to delineate logical sections. This was rejected because the function had already become difficult to reason about incrementally — the interleaving of path-collection assignments and YAML step emissions made it hard to follow the data flow, and the comment-only boundary offered no enforcement. Future additions would have continued growing the function. + +#### Alternative 2: Extract into more than three units (fine-grained decomposition) + +Further split into per-concern functions (e.g., separate helpers for firewall paths, safe-outputs paths, MCP paths). This was not chosen because the natural seams in the existing code are two: data collection (no side effects) vs. step emission (YAML side effects). Adding more helpers beyond those two seams would have created unnecessary call-graph complexity without proportionate testability benefit for this refactor. + +### Consequences + +#### Positive +- `collectArtifactPaths` is a pure function with no YAML builder side effects, enabling direct unit testing of path-collection logic. +- The orchestrator function is reduced from 203 to ~97 lines, making the post-agent pipeline easier to read and modify. +- `generateSummarySteps` groups all `GITHUB_STEP_SUMMARY`-related step emissions in one place, making it easier to add or audit summary steps. + +#### Negative +- Introduces two additional exported-equivalent methods on `Compiler`, increasing the surface that future contributors must be aware of when modifying post-agent behavior. +- The refactor slightly changes the call order relative to the original function (engine cleanup now precedes access log extraction rather than being interleaved with path collection), which requires careful review to confirm no behavioral change. + +#### Neutral +- All compiled lock files recompile identically — the refactor is purely structural with no behavioral change. +- New contributors must understand that `generatePostAgentCollectionAndUpload` is now an orchestrator and that the two helpers must be called together to reproduce the original behavior. + +--- + +## Part 2 — Normative Specification (RFC 2119) + +> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119). + +### Compiler Function Structure + +1. Implementations **MUST** keep `collectArtifactPaths` free of side effects on the YAML builder; it **MUST** only read `WorkflowData`, `CodingAgentEngine`, and the initial paths slice, and **MUST** return the augmented paths slice. +2. Implementations **MUST NOT** add `GITHUB_STEP_SUMMARY` step-emission calls to `collectArtifactPaths`; all such calls **MUST** reside in `generateSummarySteps` or `generatePostAgentCollectionAndUpload`. +3. Implementations **MUST NOT** add artifact path accumulation (`paths = append(paths, ...)`) directly inside `generatePostAgentCollectionAndUpload`; new paths **MUST** be added inside `collectArtifactPaths`. +4. Implementations **SHOULD** call `collectArtifactPaths` and `generateSummarySteps` exclusively via `generatePostAgentCollectionAndUpload` to preserve the correct step ordering in the emitted workflow YAML. + +### Post-Agent Step Ordering + +1. The orchestrator **MUST** invoke engine output cleanup before calling `collectArtifactPaths` so that engine-declared output paths are still present on disk when collected. +2. The orchestrator **MUST** invoke `generateSummarySteps` after `collectArtifactPaths` so that all paths are resolved before any step-summary steps reference them. +3. The orchestrator **MUST** emit the unified artifact upload step after all path-collection and step-summary emission, following the ordering established in this refactor. + +### Conformance + +An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance. + +--- + +*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/25249422020) workflow. The PR author must review, complete, and finalize this document before the PR can merge.* diff --git a/pkg/workflow/compiler_yaml_main_job.go b/pkg/workflow/compiler_yaml_main_job.go index 6d5b595961..d8667c0863 100644 --- a/pkg/workflow/compiler_yaml_main_job.go +++ b/pkg/workflow/compiler_yaml_main_job.go @@ -529,49 +529,117 @@ func (c *Compiler) generateAgentRunSteps(yaml *strings.Builder, data *WorkflowDa return artifactPaths, logFileFull, nil } -// generatePostAgentCollectionAndUpload emits all post-agent steps: engine output collection, -// access log extraction/upload, log parsing, token usage/AWF reflect/observability summaries, -// artifact path accumulation, safe-outputs agent output placeholder, Copilot engine cleanup, -// repo-memory/cache-memory/staging artifact uploads, patch collection, post-steps, firewall -// audit paths, the unified artifact upload, token invalidation steps, dev-mode actions restore, -// and step-order validation. -func (c *Compiler) generatePostAgentCollectionAndUpload(yaml *strings.Builder, data *WorkflowData, engine CodingAgentEngine, artifactPaths []string, logFileFull string, checkoutMgr *CheckoutManager) error { - // Merge engine-declared output files into the unified artifact instead of creating a - // separate agent_outputs artifact. The cleanup step is still generated so workspace files - // are removed after collection. - if enginePaths := getEngineArtifactPaths(engine); len(enginePaths) > 0 { - artifactPaths = append(artifactPaths, enginePaths...) - c.generateEngineOutputCleanup(yaml, engine) - } +// collectArtifactPaths gathers all paths for the unified artifact upload. +// It starts from the initial paths already accumulated by generateAgentRunSteps and appends +// engine-declared output paths, log directories, observability files, safe-outputs files, +// patch/bundle paths, and firewall audit paths. +func (c *Compiler) collectArtifactPaths(data *WorkflowData, engine CodingAgentEngine, logFileFull string, initialPaths []string) []string { + paths := initialPaths - // Extract and upload squid access logs (if any proxy tools were used) - c.generateExtractAccessLogs(yaml, data.Tools) - c.generateUploadAccessLogs(yaml, data.Tools) + // Merge engine-declared output files into the unified artifact instead of creating a + // separate agent_outputs artifact. + paths = append(paths, getEngineArtifactPaths(engine)...) // Collect MCP logs path if any MCP tools were used - artifactPaths = append(artifactPaths, "/tmp/gh-aw/mcp-logs/") + paths = append(paths, "/tmp/gh-aw/mcp-logs/") // Collect DIFC proxy logs (proxy-tls certs + container stderr) when proxy was injected - artifactPaths = append(artifactPaths, difcProxyLogPaths(data)...) + paths = append(paths, difcProxyLogPaths(data)...) // Collect MCPScripts logs path if mcp-scripts is enabled if IsMCPScriptsEnabled(data.MCPScripts) { - artifactPaths = append(artifactPaths, "/tmp/gh-aw/mcp-scripts/logs/") + paths = append(paths, "/tmp/gh-aw/mcp-scripts/logs/") + } + + // Include the aggregated agent_usage.json in the agent artifact so third-party + // tools can consume structured token data without parsing the step summary. + // Requires AWF v0.25.8+ + if isFirewallEnabled(data) { + paths = append(paths, "/tmp/gh-aw/"+constants.TokenUsageFilename) + } + + // Collect agent stdio logs path for unified upload + paths = append(paths, logFileFull) + + // Collect agent-generated files path for unified upload + // This directory is used by workflows that instruct the agent to write files + // (e.g., smoke-claude status summaries) + paths = append(paths, "/tmp/gh-aw/agent/") + + // Collect GitHub API rate-limit log for observability. + // Written by github_rate_limit_logger.cjs during REST API calls. + paths = append(paths, "/tmp/gh-aw/"+constants.GithubRateLimitsFilename) + + // Collect OTLP span mirror — enables post-hoc trace debugging without a live collector. + // Written by send_otlp_span.cjs; each line is a full OTLP/HTTP JSON traces payload. + // Only included when OTLP is configured for this workflow. + if isOTLPEnabled(data) { + paths = append(paths, "/tmp/gh-aw/"+constants.OtelJsonlFilename) + } + + // Collect safe outputs and agent output paths for the unified artifact. + // These were previously uploaded as separate safe-output and agent-output artifacts. + if data.SafeOutputs != nil { + // Raw safe-output NDJSON (copied to /tmp/gh-aw/ by generateOutputCollectionStep) + paths = append(paths, "/tmp/gh-aw/"+constants.SafeOutputsFilename) + // Processed agent output JSON produced by collect_ndjson_output.cjs + paths = append(paths, "/tmp/gh-aw/"+constants.AgentOutputFilename) + if data.SafeOutputs.CommentMemory != nil { + paths = append(paths, "/tmp/gh-aw/comment-memory/") + } + } + + // Collect git patch path if safe-outputs with PR operations is configured. + // NOTE: Git patch generation has been moved to the safe-outputs MCP server. + // The patch is now generated when create_pull_request or push_to_pull_request_branch + // tools are called, providing immediate error feedback if no changes are present. + // Include patches in the artifact when: + // 1. Safe outputs needs them for checkout (non-staged create_pull_request/push_to_pull_request_branch) + // 2. Threat detection is enabled (detection job needs patches for security analysis, even when the + // safe-output handler is staged and doesn't need checkout itself) + threatDetectionNeedsPatches := IsDetectionJobEnabled(data.SafeOutputs) + if usesPatchesAndCheckouts(data.SafeOutputs) || threatDetectionNeedsPatches { + paths = append(paths, "/tmp/gh-aw/aw-*.patch") + // Bundle files are generated when patch-format: bundle is configured. + // Both formats use the same download path in the safe_outputs job, so + // include the bundle glob unconditionally alongside the patch glob. + // The artifact upload step already sets if-no-files-found: ignore, so + // this is safe even when no bundle files exist. + paths = append(paths, "/tmp/gh-aw/aw-*.bundle") } - // parse agent logs for GITHUB_STEP_SUMMARY + // Include firewall audit/observability logs in the unified agent artifact + // so all agent job outputs ship as a single artifact (AWF v0.25.0+). + if isFirewallEnabled(data) { + paths = append(paths, constants.AWFConfigFilePath) + paths = append(paths, constants.AWFProxyLogsDir+"/") + paths = append(paths, constants.AWFAuditDir+"/") + // Include the AWF /reflect payload persisted by the agent harness. + // Co-located under /tmp/gh-aw/sandbox/firewall/ so the existing + // chmod -R a+r step covers its permissions before upload. + paths = append(paths, constants.AWFReflectFilePath) + } + + return paths +} + +// generateSummarySteps emits all GITHUB_STEP_SUMMARY log-parsing steps for the agent job. +// It covers agent log parsing, MCP scripts, MCP gateway, firewall logs, token usage, +// AWF reflect summary, and observability summary. +func (c *Compiler) generateSummarySteps(yaml *strings.Builder, data *WorkflowData, engine CodingAgentEngine) { + // Parse agent logs for GITHUB_STEP_SUMMARY c.generateLogParsing(yaml, data, engine) - // parse mcp-scripts logs for GITHUB_STEP_SUMMARY (if mcp-scripts is enabled) + // Parse mcp-scripts logs for GITHUB_STEP_SUMMARY (if mcp-scripts is enabled) if IsMCPScriptsEnabled(data.MCPScripts) { c.generateMCPScriptsLogParsing(yaml, data) } - // parse MCP gateway logs for GITHUB_STEP_SUMMARY - // The MCP gateway is always enabled, even when agent sandbox is disabled + // Parse MCP gateway logs for GITHUB_STEP_SUMMARY. + // The MCP gateway is always enabled, even when agent sandbox is disabled. c.generateMCPGatewayLogParsing(yaml, data) - // Add firewall log parsing and dedicated audit upload for all firewall-enabled engines. + // Add firewall log parsing for all firewall-enabled engines. // This replaces the previous per-engine blocks (Copilot, Codex, Claude) and extends // support to all engines (including Gemini) so every agentic workflow uploads audit logs. if isFirewallEnabled(data) { @@ -584,9 +652,6 @@ func (c *Compiler) generatePostAgentCollectionAndUpload(yaml *strings.Builder, d // Parse token-usage.jsonl and append to step summary (requires AWF v0.25.8+) if isFirewallEnabled(data) { c.generateTokenUsageSummary(yaml, data) - // Include the aggregated agent_usage.json in the agent artifact so third-party - // tools can consume structured token data without parsing the step summary. - artifactPaths = append(artifactPaths, "/tmp/gh-aw/"+constants.TokenUsageFilename) } // Append AWF API proxy reflection data (available endpoints and models) to step summary. @@ -598,41 +663,35 @@ func (c *Compiler) generatePostAgentCollectionAndUpload(yaml *strings.Builder, d // Synthesize a compact observability section from runtime artifacts when OTLP is enabled. c.generateObservabilitySummary(yaml, data) +} - // Collect agent stdio logs path for unified upload - artifactPaths = append(artifactPaths, logFileFull) +// generatePostAgentCollectionAndUpload orchestrates the post-agent phase: +// engine output cleanup, access log collection, artifact path accumulation via collectArtifactPaths, +// step-summary generation via generateSummarySteps, safe-outputs/memory/staging artifact uploads, +// post-steps, the unified artifact upload, token invalidation, dev-mode actions restore, +// and step-order validation. +func (c *Compiler) generatePostAgentCollectionAndUpload(yaml *strings.Builder, data *WorkflowData, engine CodingAgentEngine, artifactPaths []string, logFileFull string, checkoutMgr *CheckoutManager) error { + // Generate engine output cleanup step so workspace files are removed after collection. + // The engine-declared output paths are gathered by collectArtifactPaths below. + if len(getEngineArtifactPaths(engine)) > 0 { + c.generateEngineOutputCleanup(yaml, engine) + } - // Collect agent-generated files path for unified upload - // This directory is used by workflows that instruct the agent to write files - // (e.g., smoke-claude status summaries) - artifactPaths = append(artifactPaths, "/tmp/gh-aw/agent/") + // Extract and upload squid access logs (if any proxy tools were used) + c.generateExtractAccessLogs(yaml, data.Tools) + c.generateUploadAccessLogs(yaml, data.Tools) - // Collect GitHub API rate-limit log for observability. - // Written by github_rate_limit_logger.cjs during REST API calls. - artifactPaths = append(artifactPaths, "/tmp/gh-aw/"+constants.GithubRateLimitsFilename) + // Collect all artifact paths for the unified upload. + artifactPaths = c.collectArtifactPaths(data, engine, logFileFull, artifactPaths) - // Collect OTLP span mirror — enables post-hoc trace debugging without a live collector. - // Written by send_otlp_span.cjs; each line is a full OTLP/HTTP JSON traces payload. - // Only included when OTLP is configured for this workflow. - if isOTLPEnabled(data) { - artifactPaths = append(artifactPaths, "/tmp/gh-aw/"+constants.OtelJsonlFilename) - } + // Emit all GITHUB_STEP_SUMMARY log-parsing steps. + c.generateSummarySteps(yaml, data, engine) - // Collect safe outputs and agent output paths for the unified artifact. - // These were previously uploaded as separate safe-output and agent-output artifacts. + // Write a minimal agent_output.json placeholder when the engine fails before + // producing any safe outputs, so downstream safe_outputs and conclusion jobs + // receive a valid (empty) JSON file instead of an ENOENT error. + // The placeholder is only written if the engine did not already write the file. if data.SafeOutputs != nil { - // Raw safe-output NDJSON (copied to /tmp/gh-aw/ by generateOutputCollectionStep) - artifactPaths = append(artifactPaths, "/tmp/gh-aw/"+constants.SafeOutputsFilename) - // Processed agent output JSON produced by collect_ndjson_output.cjs - artifactPaths = append(artifactPaths, "/tmp/gh-aw/"+constants.AgentOutputFilename) - if data.SafeOutputs.CommentMemory != nil { - artifactPaths = append(artifactPaths, "/tmp/gh-aw/comment-memory/") - } - - // Write a minimal agent_output.json placeholder when the engine fails before - // producing any safe outputs, so downstream safe_outputs and conclusion jobs - // receive a valid (empty) JSON file instead of an ENOENT error. - // The placeholder is only written if the engine did not already write the file. c.generateAgentOutputPlaceholderStep(yaml) } @@ -668,40 +727,9 @@ func (c *Compiler) generatePostAgentCollectionAndUpload(yaml *strings.Builder, d // to be downloaded and processed by the upload_artifact job generateSafeOutputsArtifactStagingUpload(yaml, data) - // Collect git patch path if safe-outputs with PR operations is configured - // NOTE: Git patch generation has been moved to the safe-outputs MCP server - // The patch is now generated when create_pull_request or push_to_pull_request_branch - // tools are called, providing immediate error feedback if no changes are present. - // Include patches in the artifact when: - // 1. Safe outputs needs them for checkout (non-staged create_pull_request/push_to_pull_request_branch) - // 2. Threat detection is enabled (detection job needs patches for security analysis, even when the - // safe-output handler is staged and doesn't need checkout itself) - threatDetectionNeedsPatches := IsDetectionJobEnabled(data.SafeOutputs) - if usesPatchesAndCheckouts(data.SafeOutputs) || threatDetectionNeedsPatches { - artifactPaths = append(artifactPaths, "/tmp/gh-aw/aw-*.patch") - // Bundle files are generated when patch-format: bundle is configured. - // Both formats use the same download path in the safe_outputs job, so - // include the bundle glob unconditionally alongside the patch glob. - // The artifact upload step already sets if-no-files-found: ignore, so - // this is safe even when no bundle files exist. - artifactPaths = append(artifactPaths, "/tmp/gh-aw/aw-*.bundle") - } - // Add post-steps (if any) after AI execution c.generatePostSteps(yaml, data) - // Include firewall audit/observability logs in the unified agent artifact - // so all agent job outputs ship as a single artifact (AWF v0.25.0+). - if isFirewallEnabled(data) { - artifactPaths = append(artifactPaths, constants.AWFConfigFilePath) - artifactPaths = append(artifactPaths, constants.AWFProxyLogsDir+"/") - artifactPaths = append(artifactPaths, constants.AWFAuditDir+"/") - // Include the AWF /reflect payload persisted by the agent harness. - // Co-located under /tmp/gh-aw/sandbox/firewall/ so the existing - // chmod -R a+r step covers its permissions before upload. - artifactPaths = append(artifactPaths, constants.AWFReflectFilePath) - } - // Generate single unified artifact upload with all collected paths. // In workflow_call context, apply the per-invocation prefix to avoid name clashes. agentArtifactPrefix := artifactPrefixExprForDownstreamJob(data)