Skip to content

Low cache hit rate when Codex integrates with GPT-5.5 #20301

@pyfdtic

Description

@pyfdtic

What version of Codex CLI is running?

0.125.0

Which model were you using?

gpt-5.5

What platform is your computer?

WSL2

What terminal emulator and version are you using (if applicable)?

Windows Terminal

What issue are you seeing?

When Codex integrates with the GPT-5.5 model, its cache hit rate is very low, which causes costs to be consumed rapidly.

In contrast, OpenCode integrated with GPT-5.5 has no such problem and maintains a high cache hit rate.

Besides, Codex paired with GPT-5.4 also achieves a high cache hit rate normally.

What steps can reproduce the bug?

We tested Codex integrated with GPT-5.4 and GPT-5.5 respectively. By checking the session logs of both versions, we can see there is a significant difference in the frequency of cached_input_tokens.

What is the expected behavior?

The expected behavior is that Codex with GPT-5.5 should maintain a normal high cache hit rate, consistent with the performance of Codex + GPT-5.4 and OpenCode + GPT-5.5, to avoid excessive and rapid cost consumption.

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingrate-limitsIssues related to rate limits, quotas, and token usage reporting

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions