Skip to content

[smoke-detector] 🚨 CRITICAL: GenAIScript Invalid Model (gpt-4.1) - 5th Consecutive Failure Post-v0.24.0 #2227

@github-actions

Description

@github-actions

🚨 CRITICAL RECURRING FAILURE - 5th Consecutive Occurrence

Summary

The Smoke GenAIScript workflow has FAILED AGAIN after the v0.24.0 release with the EXACT SAME ROOT CAUSE that has been reported in THREE previous issues (#2157, #2204, #2207). This is the 5th consecutive failure of this smoke test since 2025-10-22. Despite multiple investigations and issue reports, the configuration has never been corrected.

Failure Details

  • Run: #18757658104
  • Commit: 8993988 - "Release v0.24.0"
  • Trigger: schedule (automated smoke test)
  • Duration: 3.5 minutes
  • Failed Job: detection (1.2 minutes)
  • Status: ❌ FAILED

Root Cause Analysis

The Problem Persists UNCHANGED

The GenAIScript configuration STILL uses an invalid OpenAI model name:

Location: .github/workflows/shared/genaiscript.md line 6

GH_AW_AGENT_MODEL_VERSION: "openai:gpt-4.1"

Problem: gpt-4.1 DOES NOT EXIST in OpenAI's model catalog.

Valid OpenAI models:

  • gpt-4o ✅ (recommended)
  • gpt-4-turbo
  • gpt-4
  • gpt-3.5-turbo

Error Chain (Identical to All Previous Occurrences)

  1. GenAIScript attempts to resolve and use model openai:gpt-4.1
  2. OpenAI API rejects the request (invalid model)
  3. GenAIScript receives undefined/null response
  4. GenAIScript crashes: TypeError: Cannot read properties of undefined (reading 'text')
  5. Detection job fails with exit code 255
  6. Smoke test marked as failed

Stack Trace

2025-10-23T18:10:09.4293104Z 2025-10-23T18:10:09.429Z genaiscript:error {
2025-10-23T18:10:09.4293428Z   name: 'TypeError',
2025-10-23T18:10:09.4293872Z   message: "Cannot read properties of undefined (reading 'text')",
2025-10-23T18:10:09.4294339Z   stack: "TypeError: Cannot read properties of undefined (reading 'text')\n" +
2025-10-23T18:10:09.4295107Z     '    at githubActionSetOutputs ((redacted))\n' +
2025-10-23T18:10:09.4296330Z     '    at async Command.runScriptWithExitCode ((redacted))'
2025-10-23T18:10:09.4297303Z }

Failed Jobs and Errors

Job Execution Summary

  1. activation - succeeded (2s)
  2. agent - succeeded (1.6m) - Agent completed successfully
  3. detection - FAILED (1.2m) - Threat detection crashed
  4. create_issue - succeeded (5s)
  5. ⏭️ missing_tool - skipped

Investigation Findings

Complete Failure Timeline

# Run ID Date/Time (UTC) Trigger Issue Created Issue Status
1 18727962258 2025-10-22 19:45:52 workflow_dispatch #2157 Closed as "not_planned"
2 18733557489 2025-10-23 00:19:22 schedule - Covered by #2157
3 18739169072 2025-10-23 06:07:04 schedule #2204 Closed as "completed"
4 18747816413 2025-10-23 12:08:41 schedule #2207 Closed as "completed"
5 18757658104 2025-10-23 18:06:57 schedule This issue Open

Pattern: Failing every ~6 hours on scheduled runs
Duration: Over 22 hours of continuous failures
Failure Rate: 100% since first occurrence

Why This Is Critical NOW

  1. Post-Release Failure: This failure occurred immediately after the v0.24.0 release, indicating the configuration issue persists across releases
  2. Multiple Closed Issues: Three separate issues ([smoke-detector] 🔍 Smoke Test Investigation - GenAIScript Invalid Model Name (gpt-4.1) #2157, [smoke-detector] 🚨 CRITICAL RECURRING: GenAIScript Invalid Model (gpt-4.1) - 3rd Occurrence #2204, [smoke-detector] Comment on #2157 #2207) have been created and closed without fixing the root cause
  3. Wasted Resources: Every scheduled run (every ~6 hours) consumes CI minutes while producing no value
  4. Security Gap: Threat detection has been non-functional for over 22 hours
  5. False Confidence: The team may not realize smoke tests are failing continuously

Recommended Actions

🔴 CRITICAL - Immediate Fix (1 minute)

Update .github/workflows/shared/genaiscript.md line 6:

- GH_AW_AGENT_MODEL_VERSION: "openai:gpt-4.1"
+ GH_AW_AGENT_MODEL_VERSION: "openai:gpt-4o"

That's it. One line change. Will fix all 5 failures instantly.

🟡 Alternative: Disable Scheduled Workflow

If GenAIScript smoke tests are not being maintained, disable the scheduled trigger to stop generating failed runs and investigation overhead:

# .github/workflows/smoke-genaiscript.md
# Comment out or remove the schedule trigger

🟢 Long-Term: Prevent Recurrence

  1. Add Pre-Flight Model Validation - Validate model names before execution
  2. Schema Validation - Use JSON schema to validate workflow configurations
  3. Better Error Handling - Work with GenAIScript team to improve error messages
  4. Documentation - Document valid model names in configuration files

Historical Context

From investigation database (/tmp/gh-aw/cache-memory/investigations/):

{
  "pattern_signature": "GENAISCRIPT_INVALID_MODEL",
  "first_occurrence": "2025-10-22T19:45:52Z",
  "recurrence_count": 5,
  "failure_rate": "100%",
  "days_recurring": 1,
  "hours_between_occurrences": [5.5, 6.2, 6.6, 6.0],
  "is_flaky": false,
  "external_dependency": "OpenAI API",
  "persistence_across_releases": true
}

Impact Assessment

Severity: 🔴 CRITICAL

  • All GenAIScript smoke tests failing continuously
  • Threat detection non-functional for 22+ hours
  • Multiple issues created and closed without resolution
  • Post-release failure indicates configuration persists across versions

Urgency: 🔴 IMMEDIATE

  • Simple one-line fix available
  • Continues to fail every 6 hours indefinitely
  • Wasting CI resources and investigation time

Scope:

  • Affects: All workflows using shared/genaiscript.md
  • Frequency: Every scheduled smoke test run
  • Duration: Ongoing since 2025-10-22 19:45 UTC (22+ hours)

Reproduction Steps

  1. Configure GenAIScript with model: openai:gpt-4.1
  2. Set OPENAI_API_KEY (so validation passes)
  3. Run any GenAIScript workflow
  4. Observe failure when invalid model is used
  5. See TypeError accessing undefined result

Related Issues


Request for Action

This issue is being created to request a decision on one of the following:

  1. Fix the configuration (1-line change) to resolve the issue permanently
  2. Disable the scheduled workflow if GenAIScript smoke tests are not planned to be maintained
  3. Explain the strategy if this is expected behavior (so future investigations understand context)

The current situation - where the same failure occurs every 6 hours, generates investigation reports, creates issues that get closed, but nothing gets fixed - is not sustainable.


Investigation Metadata

  • Investigator: Smoke Detector (Failure Investigation Agent)
  • Investigation Run: #18757754195
  • Pattern: GENAISCRIPT_INVALID_MODEL (5th occurrence)
  • Investigation Record: /tmp/gh-aw/cache-memory/investigations/2025-10-23-18757658104.json
  • Created: 2025-10-23T18:15:00Z

🤖 AI generated by Smoke Detector - Smoke Test Failure Investigator
This is an automated investigation of recurring smoke test failures.

AI generated by Smoke Detector - Smoke Test Failure Investigator

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions