Conversation
…when a model is unset)
9b07c0b to
70ec1ea
Compare
Merged
seratch
added a commit
that referenced
this pull request
May 7, 2026
This was referenced May 7, 2026
|
I've tested both and 4.1 works better and has better conversational tone with more accurate tool calling though. This is a downgrade. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request updates the SDK default model from
gpt-4.1togpt-5.4-minifor agents that do not specify a model explicitly. Althoughgpt-5.5is the latest model,gpt-5.4-miniis a pragmatic default for users getting started because it keeps latency closer togpt-4.1while moving the default onto the GPT-5 family. This default is not meant to be permanent; we may update it again as newer models offer a better balance of intelligence, latency, and cost.Detailed analysis report
gpt-5.4-mini default-model validation report
Status: completed on the current checkout.
Investigation target
Validate
gpt-5.4-minias the Agents SDK default model replacement forgpt-4.1.The probe compares the current baseline default behavior against the candidate with the model
settings the SDK applies to
gpt-5.4-mini:reasoning.effort="none"andverbosity="low".Validation matrix
gpt-5.4-miniwith GPT-5 default settings?agents.modelsdefault helpers in the current checkout.gpt-5.4-miniwithreasoning.effort="none"andverbosity="low".artifacts/summary.jsonREADY.gpt-4.1in the same probeartifacts/results.jsonlookup_order_statusand report the returned status.gpt-4.1in the same probeartifacts/results.jsongpt-4.1in the same probeartifacts/results.jsongpt-4.1in the same probeartifacts/results.jsonParity controls
checkout, Python environment, and Responses path.
gpt-4.1without GPT-5 default model settings.gpt-5.4-miniwithreasoning.effort="none"andverbosity="low".not claim broad quality equivalence outside the covered patterns.
Docs preflight
The OpenAI developer docs guidance for GPT-5 reasoning models says
reasoning.effortcan be usedto tune latency and intelligence tradeoffs, and that
noneis reserved for cases where low latencyis more important than intelligence. The probe therefore treats
gpt-5.4-miniwithreasoning.effort="none"as a latency-oriented default candidate that still needs representativeagent-workflow validation.
Probe command
Findings
No candidate-specific regression was observed in the covered patterns.
gpt-5.4-miniwithreasoning.effort="none"passed 10/10 measured runs for the text, function-tool, handoff, and HITLapproval-resume cases. The local SDK default helper also resolved to
gpt-5.4-miniwithreasoning.effort="none"andverbosity="low".The comparison supports pattern parity for these representative workflows, not a broad quality
equivalence claim. The covered workflows intentionally focus on low-latency agent mechanics:
single-turn constrained output, required function tool use, a simple handoff, and approval
interruption/resume.
Median total latency was comparable or better for the candidate in three of four live cases:
gpt-4.1mediangpt-5.4-minimedianTail latency varied by case. The largest candidate max was 5.662s in the approval-resume case; the
largest baseline max was 5.592s in the handoff case.
Artifact status
The probe was run from the repository root on commit
ce462354fd3bbb841bb808dd63c8b94a4026a680with Python 3.12.9 andopenai2.26.0. It used theapproved
OPENAI_API_KEYenvironment variable and did not print the secret value.Raw runtime artifacts were generated under
validation/gpt_5_4_mini_default/artifacts/:metadata.jsonresults.jsonsummary.jsonProbe script
see also: openai/openai-agents-js#1248