Skip to content

fix: Fix openai responses api issues#685

Merged
Henry-811 merged 1 commit intodev/v0.1.29from
fix_responses_api
Dec 24, 2025
Merged

fix: Fix openai responses api issues#685
Henry-811 merged 1 commit intodev/v0.1.29from
fix_responses_api

Conversation

@ncrispino
Copy link
Collaborator

@ncrispino ncrispino commented Dec 22, 2025

Fix GPT-5 Reasoning Token Errors and Double Voting

Summary

This PR fixes two critical issues affecting GPT-5 reasoning models and multi-agent voting:

  1. GPT-5-mini (and other GPT-5-X models) reasoning token errors - Fixed malformed Response API requests causing 400 errors
  2. Double voting handling - Gracefully handle when reasoning models make multiple vote calls

Closes MAS-181

Issues Fixed

Issue 1: GPT-5 Reasoning Token Errors

Error:

Error code: 400 - {'error': {'message': "Item 'rs_...' of type 'reasoning' was provided without its required following item."}}

Root Causes:

  1. The id field was being stripped from function_call items, breaking reasoning-to-function-call pairing required by OpenAI's Response API
  2. When using previous_response_id, response items were being added both manually AND automatically, causing duplicate reasoning items

Fixes:

  • massgen/formatter/_response_formatter.py: Preserve id field in function_call items (required for reasoning pairing per LangChain PR #9082)
  • massgen/backend/response.py: Only add response items manually when NOT using previous_response_id to avoid duplicates

Issue 2: Double Voting in Multi-Agent Workflows

Issue: Certain GPT-5 models (particularly gpt-5.1 and gpt-5.2) make multiple vote calls despite enforcement messages, causing workflow failures.

Model Behavior:

  • gpt-5.1, gpt-5.2: Ignore tool error enforcement, repeat violations
  • gpt-5(-X), gpt-5.1-codex: Respond to enforcement correctly

Fix: Instead of rejecting multiple votes, gracefully handle by taking the last vote as the agent's final decision.

Rationale:

  • The last vote represents the agent's most refined thinking
  • Reasoning models iterate through their logic - later votes are more informed
  • Simpler than vote counting and more aligned with how reasoning models work

Changes

Files Modified

  1. massgen/formatter/_response_formatter.py

    • Preserve id field for reasoning item pairing
    • Added reference to LangChain's similar fix
  2. massgen/backend/response.py

    • Conditional response item addition based on response_id presence
    • Deduplicate items by ID to prevent duplicate reasoning tokens
    • Improved logging (debug level instead of info)
  3. massgen/orchestrator.py

    • Simplified multiple vote handling (38 lines → 14 lines)
    • Take last vote instead of complex counting/enforcement
    • Removed unnecessary debug logging
    • Cleaner variable names

Lines Changed

massgen/backend/response.py              | +16 -3
massgen/formatter/_response_formatter.py | +5 -2
massgen/orchestrator.py                  | +14 -38
Total: +35 -43 (net -8 lines, cleaner code)

Testing

Test Case 1: Reasoning Models

uv run massgen --automation --model gpt-5-mini "Create a simple website"

Before: 400 error about reasoning tokens
After: ✅ Completes successfully

Test Case 2: Multi-Agent Voting with gpt-5.1/5.2

uv run massgen --config config.yaml "Create a simple website"

Before: Agent execution failed: 'agent1' (KeyError)
After:

⚠️ Agent made 2 votes - using last (final decision): agent1
🏆 Turn 1 winner: agent_a

Verified Behavior

  • ✅ No reasoning token errors with GPT-5 models
  • ✅ Warning shown when multiple votes detected
  • ✅ Execution completes successfully using last vote
  • ✅ Anonymous ID mapping works correctly (agent1 → agent_a)

Related Documentation

Known Limitations

GPT-5 Model Variants: Some models (gpt-5.1, gpt-5.2) don't properly respond to tool error enforcement. This fix works around the limitation by handling multiple votes gracefully rather than relying on enforcement.

Recommendation: For critical voting scenarios, prefer models that respond to enforcement (gpt-5(-X), gpt-5.1-codex) or accept that multiple votes will be deduplicated to the last vote.

Migration Notes

No breaking changes. This is purely a bug fix that:

  • Makes reasoning models work correctly
  • Handles edge cases more gracefully
  • Improves user experience with better messaging

Users will see informational warnings when multiple votes are detected, but workflows will continue successfully.

@Henry-811 Henry-811 changed the base branch from main to dev/v0.1.29 December 24, 2025 16:17
@Henry-811 Henry-811 merged commit 0f4b482 into dev/v0.1.29 Dec 24, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants