Skip to content

Clarification on GPT-5.5 results under the Claude Code harness in Table 1 #95

Description

@samkim07

Hello, thank you for sharing this interesting work.

I had a question about the experimental setup in Table 1, specifically the rows under the Claude Code harness. The table lists the model as GPT-5.5, while the harness is described as Claude Code.

My understanding is that the paper separates the target model from the execution harness:

  • model: the frozen target model that solves the task
  • harness: the execution environment / adapter used to run the task

In that interpretation, the table would mean that GPT-5.5 was evaluated inside a Claude Code-style harness.

However, Section 4 also describes the Claude Code harness as mirroring the workspace contract through the claude CLI. Since Claude Code is usually understood as an Anthropic/Claude-based coding agent, I was unsure how GPT-5.5 is used in this setting.

Could you clarify which interpretation is correct?

  1. Does “Claude Code harness + GPT-5.5” mean that the Claude Code harness is used only as an execution interface, while the underlying target model is GPT-5.5?
  2. Or is the table meant to report results from Claude Code’s native model, in which case “GPT-5.5” might be a typo?
  3. If GPT-5.5 was indeed used through the Claude Code harness, could you briefly explain how the model was connected to the claude CLI / harness adapter?

This would be helpful for correctly interpreting the cross-harness results and for avoiding confusion between the model and the execution environment.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions