In CAPI Responses API, when constructing input messages of the next request, we should preserve the original structure and ordering of the model output as much as possible. This would improve cache hit rates by keeping repeated prompt prefixes stable and efficient, and it would also make the prompt format closer to what the model saw during training, which should lead to more accurate behavior.
Currently, we observe two post-processing in vscode-copilot-chat extension:
- Within the same turn, previous rounds can be transformed in a way that changes the original ordering of commentary and analysis. That is, the original order could be
commentary-then-analysis, but the post-processing will convert them to analysis-then-commentary
- This is the primary issue that affects model quality
- Historical turns drop the model’s analysis / reasoning content
- This is fine. We can discuss whether it should be kept or not
Ideally, input construction should avoid such reshaping of prior assistant output. Historical and previous-round content should remain as close as possible to the original model response, including preserving reasoning/analysis metadata when available and maintaining the original relative order between commentary, analysis, final text, and tool calls.
Notes
- Each turn represents a user request. Each turn could contain multiple rounds (model requests).
- We do not hit such commentary-analysis reordering issue when using AOAI BYOK, since AOAI BYOK Response API uses stateful response API calls while CAPI uses stateless calls.
In CAPI Responses API, when constructing input messages of the next request, we should preserve the original structure and ordering of the model output as much as possible. This would improve cache hit rates by keeping repeated prompt prefixes stable and efficient, and it would also make the prompt format closer to what the model saw during training, which should lead to more accurate behavior.
Currently, we observe two post-processing in
vscode-copilot-chatextension:commentary-then-analysis, but the post-processing will convert them toanalysis-then-commentaryIdeally, input construction should avoid such reshaping of prior assistant output. Historical and previous-round content should remain as close as possible to the original model response, including preserving reasoning/analysis metadata when available and maintaining the original relative order between commentary, analysis, final text, and tool calls.
Notes