Skip to content

GitHub Copilot summarization is too simplistic, losing crucial semantics #283446

@garretwilson

Description

@garretwilson

Conversation summarization seems to be worse in v1.107, or maybe I'm imagining things. It seems to be just losing things that any real summarization would have kept. It's almost as bad as truncation now.

Here is just one example. This is happening frequently, especially in a long-running architectural discussion, especially with multiple refactorings along the way.

Image

In this example, the things it mentioned were really recent, after the last summarization. The things it had forgotten were before the last summarization. (And the part about deferring caching until later, and merely adding a TODO in its placed, I had told it over and over, several times over several days.)

This sort of thing seems to be happening significantly more after v1.07, but I can't swear to it. In any case it is now happening over and over. I don't know if you're truncating or if you're just telling some the LLM to "make a summary", but either way is inadequate for a world-class tool used for enterprise software development. The LLM needs to keep a running semantically weighted summary, continually aggregating the decisions made. A robust summarization strategy would include the following:

  • Provide a way to designate an LLM just for summarization (Specify GitHub Copilot LLM specifically for summarizing conversation history #282207).
  • Ask the LLM to analyze and categorize sections of the conversation, e.g. "brainstorming", "deep dive", "conclusion", "plan", "summary". (Those are off the top of my head as I'm writing this.) Examples:
    • Some parts of the conversation may go back and forth for ages on a deep-dive on some trivial detail. The summarization need only note that this deep-dive discussion happened, and simply preserve its conclusion.
    • Anything that is a "conclusion" should have a much higher weight than "brainstorming", and should only be lightly summarized, e.g. removing redundancy and wordsmithing for conciseness.
    • Anything that has been categorized as a "summary" should be maintained almost literally and carried forward. (After all, if the LLM provided a summary in the conversation, a resummarization will likely lose fidelity, like using JPEG compression on an already-compressed JPEG image.)
  • You might use a cheap LLM to discover, demarcate, and categorize the sections; while using a more robust LLM to make the actual summarizations.
  • The initial part of the conversation, as well as the most recent part of the conversation, should both have higher weight to be maintained in the summary.
  • GitHub Copilot needs a facility for a user to flag a response as important, so that it is kept with utmost fidelity in summarizations.

All that is just off the top of my head. I'm sure if you sit down and brainstorm with team members you can come up with something much more sophisticated. The current implementation is honestly not adequate.

Version: 1.107.0 (user setup)
Commit: 618725e67565b290ba4da6fe2d29f8fa1d4e3622
Date: 2025-12-10T07:43:47.883Z
Electron: 39.2.3
ElectronBuildId: 12895514
Chromium: 142.0.7444.175
Node.js: 22.21.1
V8: 14.2.231.21-electron.0
OS: Windows_NT x64 10.0.26200

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions