[smoke-detector] 🔍 Smoke Test Investigation - Smoke Codex Run #71: Agent Output Artifact Missing (Recurring)

# 🔍 Smoke Test Investigation - Run #71

## Summary
The Smoke Codex workflow failed because the `create_issue` job could not find the expected `agent_output.json` file. The agent job completed successfully in 1.7 minutes, but did not create the safe-outputs artifact that downstream jobs expected. This is a **recurring pattern** that has occurred at least 4 times for Codex, with similar issues affecting GenAIScript and OpenCode engines.

## Failure Details
- **Run**: [18981163567]((redacted))
- **Run Number**: 71
- **Commit**: cd115b7b06812befb91f4b3c56f88e8b4615890a
- **Branch**: main
- **Trigger**: schedule
- **Duration**: 3.0 minutes
- **Failed Jobs**: create_issue (8s)
- **Workflow**: Smoke Codex

## Root Cause Analysis

### Primary Error
```
Error reading agent output file: ENOENT: no such file or directory, 
open '/tmp/gh-aw/safeoutputs/agent_output.json'
```

### Error Chain

**1. Agent Job Succeeds** (1.7 minutes)
- ✅ Agent job completed successfully
- ✅ No errors reported during execution
- ❌ But no safe-outputs file was created

**2. Create_Issue Job Fails** (8 seconds)
```
Error reading agent output file: ENOENT: no such file or directory, 
open '/tmp/gh-aw/safeoutputs/agent_output.json'
```

### Why Did This Happen?

This failure indicates the agent completed its task successfully but did not use the safe-outputs MCP tools to create the expected output file. Possible reasons:

1. **Agent didn't use safe-outputs tools**: The Codex agent may have completed without calling the `safe_outputs_create_issue` tool
2. **Safe-outputs file not created**: Even if the agent intended to use safe-outputs, the file `/tmp/gh-aw/safeoutputs/outputs.jsonl` was never written
3. **Staged mode behavior**: In staged mode, the agent may behave differently regarding output file creation
4. **MCP server issue**: The safe-outputs MCP server may not have been properly initialized or available

## Failed Jobs and Errors

### Job Sequence
1. ✅ **pre_activation** - succeeded (6s)
2. ✅ **activation** - succeeded (3s)
3. ✅ **agent** - succeeded (1.7m) - **No artifact created**
4. ✅ **detection** - succeeded (21s)
5. ❌ **create_issue** - failed (8s) - **ENOENT error**
6. ⏭️ **missing_tool** - skipped

### Key Observations
- Agent job **succeeded** with 1.7m runtime
- Detection job **succeeded** 
- Only create_issue job failed
- Workflow ran in **staged mode** (`GH_AW_SAFE_OUTPUTS_STAGED=true`)
- No error count was 1 (from the create_issue failure)

## Historical Context

This is a **well-documented recurring pattern**.

### Pattern Classification
- **Pattern ID**: `CODEX_AGENT_NO_ARTIFACT_STAGED_MODE`
- **Category**: Workflow Configuration - Staged Mode Issue
- **Severity**: Medium
- **First Seen**: 2025-10-27
- **Occurrence Count**: **4 occurrences** for Codex

### Previous Codex Occurrences

| Date | Run ID | Issue | Status |
|------|--------|-------|--------|
| 2025-10-31 | 18981163567 | This investigation | New |
| 2025-10-29 | 18892865991 | Cached | Pattern identified |
| 2025-10-28 | 18890591960 | Cached | Pattern identified |
| 2025-10-27 | 18840299097 | #2604 | Closed |

### Related Issues (Other Engines)
- #2604 - Codex agent output artifact missing (closed, 2025-10-27)
- #2307 - GenAIScript agent doesn't use safe-outputs (closed)
- #2143 - OpenCode agent doesn't use safe-outputs (closed)
- #2121 - Missing agent_output.json file (closed)

This pattern affects **multiple AI engines** (Codex, GenAIScript, OpenCode), suggesting a systemic issue with safe-outputs MCP tool usage in staged mode.

## Investigation Findings

### Environment Configuration
```bash
GH_AW_SAFE_OUTPUTS_STAGED=true
GH_AW_WORKFLOW_NAME=Smoke Codex
GH_AW_AGENT_OUTPUT=/tmp/gh-aw/safeoutputs/agent_output.json
```

### Staged Mode Expectations
In staged mode, the workflow should:
- ✅ Run the agent normally
- ✅ Generate output to preview
- ✅ Not actually create GitHub issues (only preview)
- ❌ **Still upload artifacts for preview** (this failed)

### Artifact Details
- **Expected source path**: `/tmp/gh-aw/safeoutputs/outputs.jsonl`
- **Artifact name**: `agent_output.json`
- **Result**: File not found, no artifact created

## Recommended Actions

### 🔴 High Priority

- [ ] **Make create_issue job conditional on artifact existence**
  ```yaml
  create_issue:
    needs: [agent, detection]
    if: hashFiles('/tmp/gh-aw/safeoutputs/agent_output.json') != ''
    runs-on: ubuntu-latest
  ```
  **Impact**: Prevents cascading failures when agent doesn't create output

- [ ] **Make artifact upload conditional**
  ```yaml
  - name: Upload Safe Outputs
    if: hashFiles('/tmp/gh-aw/safeoutputs/outputs.jsonl') != ''
    uses: actions/upload-artifact@v4
    with:
      name: agent_output.json
      path: /tmp/gh-aw/safeoutputs/outputs.jsonl
  ```
  **Impact**: Eliminates artifact upload warnings

- [ ] **Add validation step before artifact operations**
  ```yaml
  - name: Validate Safe Outputs Created
    run: |
      if [ -f "/tmp/gh-aw/safeoutputs/outputs.jsonl" ]; then
        echo "✓ outputs.jsonl created"
        echo "has_output=true" >> $GITHUB_OUTPUT
      else
        echo "⚠️ outputs.jsonl not found - agent may not have used safe-outputs tools"
        echo "has_output=false" >> $GITHUB_OUTPUT
      fi
  ```

### 🟡 Medium Priority

- [ ] **Enhance agent prompt for staged mode**
  - Make it explicit that safe-outputs tools MUST be used
  - Clarify that output file is required even in preview mode
  - Add examples of proper tool usage

- [ ] **Investigate Codex MCP integration**
  - Verify safe-outputs MCP server initialization
  - Check if tools are available to the agent
  - Review agent logs for MCP-related warnings

- [ ] **Add detection job enhancement**
  - Detection job should identify missing output files
  - Warn when agent completes without using required tools

### 🟢 Low Priority

- [ ] **Compare with successful Codex runs**
  - Identify differences in agent behavior
  - Check if certain prompts trigger the issue
  - Analyze patterns in successful vs failed runs

- [ ] **Document expected behavior**
  - Clarify staged mode requirements
  - Define success criteria for artifact creation
  - Add troubleshooting guide

## Prevention Strategies

1. **Graceful Degradation**
   - Don't fail workflow if artifacts are optional
   - Distinguish between "agent failed" vs "agent succeeded without output"
   - Add conditional logic throughout artifact handling

2. **Explicit Requirements**
   - Make prompts very clear about tool usage
   - Add validation that verifies expected outputs
   - Fail fast if required outputs are truly missing (vs optional)

3. **Better Error Handling**
   - Provide context-aware error messages
   - Include troubleshooting steps
   - Link to documentation about safe-outputs

4. **Monitoring and Tracking**
   - Track artifact creation success rates by engine
   - Alert on patterns of missing artifacts
   - Monitor staged mode workflow health metrics

## Related PR Assessment

**PR #2886**: "Fix inconsistent formatting of github-workflow.json during build"
- **Related to failure?**: No
- **Merged at**: 2025-10-31T16:53:15Z (before this run)
- **Assessment**: The PR fixed schema formatting issues with prettier. This is unrelated to safe-outputs or agent behavior. The failure is a pre-existing recurring pattern.

## Analysis: Why This Pattern Persists

After analyzing 4+ occurrences for Codex and similar issues across multiple engines:

1. **No Conditional Logic**: Downstream jobs always run regardless of artifact creation
2. **Agent Autonomy**: Agents may complete tasks without using safe-outputs tools
3. **Staged Mode Ambiguity**: Unclear whether agents MUST create outputs in preview mode
4. **Missing Validation**: No checks before artifact upload or download operations
5. **Pattern Recognition**: Each engine hits this independently, suggesting systemic issue

## Technical Details

### Workflow Run Metrics
- **Turns**: 1
- **Error Count**: 1 (from create_issue failure)
- **Warning Count**: 0
- **Total Duration**: 3.0 minutes
- **Agent Duration**: 1.7 minutes

### Environment Variables
```
GH_AW_SAFE_OUTPUTS_STAGED=true
GH_AW_WORKFLOW_NAME=Smoke Codex
GH_AW_AGENT_OUTPUT=/tmp/gh-aw/safeoutputs/agent_output.json
GH_AW_SAFE_OUTPUTS=/tmp/gh-aw/safeoutputs/outputs.jsonl
```

## Immediate Next Steps

Given this is the **4th occurrence** of this Codex-specific pattern:

1. **Implement conditional checks** for create_issue job (high priority fix)
2. **Make artifact operations conditional** on file existence
3. **Enhance agent prompts** to be more explicit about requirements
4. **Monitor next 5 runs** after implementing fixes
5. **Consider systemwide solution** since multiple engines are affected

---

**Investigation Metadata:**
- **Investigator**: Smoke Detector (automated investigator)
- **Investigation Run**: 18981242914
- **Investigation Record**: `/tmp/gh-aw/cache-memory/investigations/2025-10-31-18981163567.json`
- **Pattern Database**: `/tmp/gh-aw/cache-memory/patterns/codex_no_artifact_staged.json`
- **Similar Patterns**: `GENAISCRIPT_NO_SAFE_OUTPUTS`, `OPENCODE_NO_SAFE_OUTPUTS`

**Labels**: `smoke-test`, `investigation`, `codex`, `safe-outputs`, `staged-mode`, `artifact-missing`, `recurring`




> AI generated by [Smoke Detector - Smoke Test Failure Investigator](https://github.com/githubnext/gh-aw/actions/runs/18981242914)

Date	Run ID	Issue	Status
2025-10-31	18981163567	This investigation	New
2025-10-29	18892865991	Cached	Pattern identified
2025-10-28	18890591960	Cached	Pattern identified
2025-10-27	18840299097	#2604	Closed

[smoke-detector] 🔍 Smoke Test Investigation - Smoke Codex Run #71: Agent Output Artifact Missing (Recurring) #2887

Description

🔍 Smoke Test Investigation - Run #71

Summary

Failure Details

Root Cause Analysis

Primary Error

Error Chain

Why Did This Happen?

Failed Jobs and Errors

Job Sequence

Key Observations

Historical Context

Pattern Classification

Previous Codex Occurrences

Related Issues (Other Engines)

Investigation Findings

Environment Configuration

Staged Mode Expectations

Artifact Details

Recommended Actions

🔴 High Priority

🟡 Medium Priority

🟢 Low Priority

Prevention Strategies

Related PR Assessment

Analysis: Why This Pattern Persists

Technical Details

Workflow Run Metrics

Environment Variables

Immediate Next Steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions