Skip to content

GPT-5 does not work well compared to 4.1 #1397

@sibblegp

Description

@sibblegp

Describe the bug

A clear and concise description of what the bug is.

Debug information

  • Agents SDK version: 0.2.4
  • Python version 3.13

Repro steps

I've been using GPT-5 as a drop in for GPT-4.1 for the last hour. It's a disaster. While 4.1 understood my requests, routed to the right agent, and called tools, GPT-5 keeps asking for more specifics and details before it'll call a tool. It's a much worse experience.

Like, for example, I sent in "What is the weather tomorrow night?"

My system prompt has a location in it for each request as well as the current date time and time zone. There is a tool that takes latitude and longitude and a date and gets the hourly forecast.

ChatGPT 4.1 does this 100% correctly. Gets the hourly forecast and gives a great answer.

GPT-5 asks if I mean tomorrow night (with the date) specifically and doesn't call the tool without more input.

It even sometimes say it'll do something but doesn't actually call the tool.

This is an unmitigated disaster and GPT-5 doesn't work at all for agents.

It's also very slow at handing off while actual responses do come back quickly. 4.1 overall is faster.

Expected behavior

GPT-5 should outperform 4.1 in terms of tool calling but it understand commands and tools less, fails to take the correct actions, and is slower than 4.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions