Fix an issue where the AI Assistant was not retaining conversation context between messages, with chat history compaction to manage token budgets.#9737
Conversation
…ntext between messages, with chat history compaction to manage token budgets.
WalkthroughAdds conversation-history compaction for the AI Assistant: a new compaction module with token-estimation, message classification, serialization, and filtering; integrates history deserialization/compaction into NLQ chat backend; frontend sends/receives history; tests and release notes added. Changes
Sequence DiagramsequenceDiagram
participant Frontend as NLQChatPanel
participant Backend as nlq_chat_stream
participant Compaction as compaction.py
participant LLM as chat_with_database
Frontend->>Backend: POST (message + optional history)
activate Backend
Backend->>Compaction: deserialize_history(history_data)
activate Compaction
Compaction-->>Backend: Message[] objects
deactivate Compaction
Backend->>Compaction: compact_history(messages, max_tokens)
activate Compaction
Note over Compaction: Classify messages by importance<br/>Preserve first & recent messages<br/>Maintain tool-call/result pairs<br/>Drop low-importance messages until within budget
Compaction-->>Backend: compacted Message[]
deactivate Compaction
Backend->>LLM: chat_with_database(conversation_history)
activate LLM
LLM-->>Backend: updated_history
deactivate LLM
Backend->>Compaction: filter_conversational(updated_history)
activate Compaction
Note over Compaction: Remove internal tool messages<br/>Keep user & final assistant messages
Compaction-->>Backend: filtered Message[]
deactivate Compaction
Backend-->>Frontend: SSE event (response + history + conversation_id)
deactivate Backend
Frontend->>Frontend: update conversationHistory state
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@web/pgadmin/llm/compaction.py`:
- Around line 185-212: _find_tool_pair_indices currently only records one
partner index per assistant turn, which loses additional tool-result indices
when a single assistant message has multiple Message.tool_calls; change it to
return (or build) groupings of indices so each index maps to the full set of
indices that belong to that tool-use turn (e.g., map an assistant index and all
matching tool-result indices to the same set/list of indices), update the
function logic to collect all matching j indices for every assistant i (use a
collection like set for tool_call_ids and matched_tool_indices), and then return
a mapping where every index in that group points to the complete group (and
update compact_history to consult the group for protection/removal instead of
expecting a single partner).
- Around line 249-258: The current protected set (protected, protected.add(0),
recent_window) can itself exceed max_tokens and leave candidates empty; add a
fallback after building protected that computes the token count of protected
messages (use the same tokenization logic used elsewhere) and, if
protected_tokens > max_tokens, iteratively relax protections by shrinking the
recent window (e.g., decrement recent_window or remove the highest-index entries
from protected, preserving index 0) until protected_tokens <= max_tokens or only
the first message remains protected; then rebuild candidates and proceed so
compaction can produce a trimmed history even when the initially protected set
is over budget.
In `@web/pgadmin/llm/tests/test_compaction.py`:
- Around line 138-168: The test_drops_low_value fixture accidentally inflates
low-value messages with the shared filler so they no longer classify as
low-importance; update the test so the transient/routine messages (the ones with
contents 'ok', 'thanks', etc.) remain short (remove the appended filler for
those specific Message.user/Message.assistant entries) or alternatively stub the
token estimation used by compact_history/_classify_message to force those
entries to be low-value, then assert that those exact short entries are removed
while the first and recent messages remain.
In `@web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx`:
- Around line 408-412: The Clear handler (handleClearConversation) currently
resets state via setMessages, setConversationId and setConversationHistory but
does not prevent in-flight stream handlers from later re-appending old events;
add a silent-cancel token/guard: store a mutable requestId or AbortController in
component state (e.g., currentRequestToken/currentAbortController), increment or
replace it inside handleClearConversation, and update all
streaming/read/complete event handlers to compare the incoming event's token
against the latest token (or check abortController.signal) and return early if
it doesn’t match, so late "Generation stopped." or "complete" events are ignored
and cannot repopulate messages/conversationId/conversationHistory after Clear.
In `@web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py`:
- Around line 50-65: The test case for "NLQ Chat - With History" doesn't verify
that nlq_chat_stream() passes the provided history to chat_with_database() or
that the SSE output includes the history; update the test to assert that
chat_with_database() was called with a non-empty conversation_history (or
specifically equals the provided history) and/or parse the emitted SSE event
from nlq_chat_stream() to assert the returned payload contains the history field
matching the input; reference the test helper/mock for chat_with_database(), the
nlq_chat_stream() invocation, and the "history" variable in the test to locate
where to add these assertions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a096758c-1b91-42eb-8f4f-519c35414927
📒 Files selected for processing (7)
docs/en_US/release_notes_9_14.rstweb/pgadmin/llm/compaction.pyweb/pgadmin/llm/tests/README.mdweb/pgadmin/llm/tests/test_compaction.pyweb/pgadmin/tools/sqleditor/__init__.pyweb/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsxweb/pgadmin/tools/sqleditor/tests/test_nlq_chat.py
web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx
Show resolved
Hide resolved
|
The failed test here appears to be the one fixed in #9723 |
- Track tool-use turns as groups instead of one-to-one pairs, so multi-tool assistant messages don't leave orphaned results. - Add fallback to shrink the recent window when protected messages alone exceed the token budget, preventing compaction no-ops. - Fix low-value test fixtures to keep transient messages short so they actually classify as low-importance. - Guard Clear button against in-flight stream race conditions by adding a clearedRef flag and cancelling active streams. - Assert that conversation history is actually passed through to chat_with_database in the "With History" test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx (1)
409-424:⚠️ Potential issue | 🟠 MajorMake Clear request-scoped instead of relying on one shared
clearedRef.Line 464 resets
clearedRef.currentfor every new submit. If request A is cleared and request B starts before A finishes unwinding, A's later cleanup can still clobber B's active refs/loading state and potentially reapply stale conversation state. Please gate cleanup and SSE handling with a per-request token (or current controller identity), not a single component-wide boolean.💡 Suggested direction
+ const requestIdRef = useRef(0); + const handleClearConversation = () => { + requestIdRef.current += 1; clearedRef.current = true; ... }; const handleSubmit = async () => { + const requestId = ++requestIdRef.current; stoppedRef.current = false; clearedRef.current = false; ... - handleSSEEvent(data, thinkingId); + if (requestIdRef.current === requestId) { + handleSSEEvent(data, thinkingId); + } ... - } finally { - setIsLoading(false); - setThinkingMessageId(null); - } + } finally { + if (requestIdRef.current === requestId) { + setIsLoading(false); + setThinkingMessageId(null); + } + } };Also applies to: 462-464, 579-582
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx` around lines 409 - 424, The component-wide clearedRef flow causes race conditions; replace it with a request-scoped identity (e.g., a requestId or attach an id to the AbortController) and use that id to gate cleanup and SSE handlers instead of clearedRef.current. On submit create a new id (or use abortControllerRef.current) and store it as the “currentRequestId” on a ref; when starting/handling streams (and in any async callbacks) compare the callback’s captured requestId to the live currentRequestId before mutating readerRef, abortControllerRef, setMessages, setIsLoading, setConversationId, or setConversationHistory. Modify handleClearConversation to cancel/abort the specific controller and advance/clear the currentRequestId so only that request’s callbacks will run cleanup, and ensure all SSE/stream handlers check the request-scoped id before applying state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@web/pgadmin/llm/compaction.py`:
- Around line 256-279: The recent-window protection logic builds the protected
set (protected, recent_window, recent_start) but can split tool-use groups
leaving orphaned Role.TOOL or assistant tool-call messages; before computing
protected_tokens (and before the while budget loop) expand protected to include
entire tool-use groups or mark any group that intersects protected as
undroppable so all partner messages are included; specifically, after building
protected (and similarly in the other similar blocks around the code handling
indices 311-318 and 333-339), iterate the message group boundaries (using
messages, their roles and any tool-call markers used by your grouping logic) to
union in all indices of any group that intersects protected, then compute
protected_tokens via estimate_message_tokens(provider) on the expanded set so
the budgeting loop never leaves an orphaned tool message.
- Around line 349-400: deserialize_history currently assumes history_data and
nested items are dicts and calls .get() directly, which will crash if any item
is None or not a dict; update deserialize_history to validate shapes before
accessing keys: ensure history_data is an iterable of dicts (skip non-dict
items), treat item.get('tool_calls') and item.get('tool_results') as
(item.get(...) or []) and iterate only over entries that are dicts, and check
each tc_data/tr_data is a dict before constructing ToolCall/ToolResult; also
validate role_str is a string before passing to Role(role_str) and skip
malformed entries so Message, Role, ToolCall, ToolResult creation only occurs
for well-formed dicts.
In `@web/pgadmin/llm/tests/test_compaction.py`:
- Around line 133-136: The test only compares overlapping elements so it can
pass if compact_history() returns fewer than four messages; update the assertion
to ensure the full four-message tail is preserved by first asserting
len(result[-4:]) == 4 (or len(result) >= 4) and then asserting the two slices
are equal (compare last_4_original == last_4_result) so the test fails when any
protected recent message is missing; locate this change around compact_history()
usage and the variables last_4_original / last_4_result in test_compaction.py.
---
Duplicate comments:
In `@web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx`:
- Around line 409-424: The component-wide clearedRef flow causes race
conditions; replace it with a request-scoped identity (e.g., a requestId or
attach an id to the AbortController) and use that id to gate cleanup and SSE
handlers instead of clearedRef.current. On submit create a new id (or use
abortControllerRef.current) and store it as the “currentRequestId” on a ref;
when starting/handling streams (and in any async callbacks) compare the
callback’s captured requestId to the live currentRequestId before mutating
readerRef, abortControllerRef, setMessages, setIsLoading, setConversationId, or
setConversationHistory. Modify handleClearConversation to cancel/abort the
specific controller and advance/clear the currentRequestId so only that
request’s callbacks will run cleanup, and ensure all SSE/stream handlers check
the request-scoped id before applying state.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 49df7280-efd9-4ef9-9139-880143b82c2a
📒 Files selected for processing (4)
web/pgadmin/llm/compaction.pyweb/pgadmin/llm/tests/test_compaction.pyweb/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsxweb/pgadmin/tools/sqleditor/tests/test_nlq_chat.py
🚧 Files skipped from review as they are similar to previous changes (1)
- web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py
| # Determine protected indices | ||
| protected = set() | ||
|
|
||
| # Always protect the first message | ||
| protected.add(0) | ||
|
|
||
| # Always protect the recent window | ||
| recent_start = max(1, total - recent_window) | ||
| for i in range(recent_start, total): | ||
| protected.add(i) | ||
|
|
||
| # If protected messages alone exceed the budget, shrink the | ||
| # recent window until we have room for compaction candidates. | ||
| while recent_window > 0: | ||
| protected_tokens = sum( | ||
| estimate_message_tokens(messages[i], provider) | ||
| for i in protected | ||
| ) | ||
| if protected_tokens <= max_tokens: | ||
| break | ||
| recent_window -= 1 | ||
| recent_start = max(1, total - recent_window) | ||
| protected = {0} | set(range(recent_start, total)) | ||
|
|
There was a problem hiding this comment.
A protected recent window can still split a tool-use turn.
If one member of a tool group lands in protected and another does not, the drop loop can still remove the unprotected side because it only skips protected partners. That leaves an orphaned assistant tool-call or Role.TOOL message in the compacted history. Expand protected to whole groups before budgeting, or treat any group intersecting protected as undroppable.
Also applies to: 311-318, 333-339
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@web/pgadmin/llm/compaction.py` around lines 256 - 279, The recent-window
protection logic builds the protected set (protected, recent_window,
recent_start) but can split tool-use groups leaving orphaned Role.TOOL or
assistant tool-call messages; before computing protected_tokens (and before the
while budget loop) expand protected to include entire tool-use groups or mark
any group that intersects protected as undroppable so all partner messages are
included; specifically, after building protected (and similarly in the other
similar blocks around the code handling indices 311-318 and 333-339), iterate
the message group boundaries (using messages, their roles and any tool-call
markers used by your grouping logic) to union in all indices of any group that
intersects protected, then compute protected_tokens via
estimate_message_tokens(provider) on the expanded set so the budgeting loop
never leaves an orphaned tool message.
| def deserialize_history( | ||
| history_data: list[dict] | ||
| ) -> list[Message]: | ||
| """Deserialize conversation history from JSON request data. | ||
|
|
||
| Converts a list of message dictionaries (from the frontend) into | ||
| Message objects suitable for passing to chat_with_database(). | ||
|
|
||
| Args: | ||
| history_data: List of dicts with 'role' and 'content' keys, | ||
| and optionally 'tool_calls' and 'tool_results'. | ||
|
|
||
| Returns: | ||
| List of Message objects. | ||
| """ | ||
| messages = [] | ||
| for item in history_data: | ||
| role_str = item.get('role', '') | ||
| content = item.get('content', '') | ||
|
|
||
| try: | ||
| role = Role(role_str) | ||
| except ValueError: | ||
| continue # Skip unknown roles | ||
|
|
||
| # Reconstruct tool calls if present | ||
| tool_calls = [] | ||
| for tc_data in item.get('tool_calls', []): | ||
| tool_calls.append(ToolCall( | ||
| id=tc_data.get('id', ''), | ||
| name=tc_data.get('name', ''), | ||
| arguments=tc_data.get('arguments', {}) | ||
| )) | ||
|
|
||
| # Reconstruct tool results if present | ||
| from pgadmin.llm.models import ToolResult | ||
| tool_results = [] | ||
| for tr_data in item.get('tool_results', []): | ||
| tool_results.append(ToolResult( | ||
| tool_call_id=tr_data.get('tool_call_id', ''), | ||
| content=tr_data.get('content', ''), | ||
| is_error=tr_data.get('is_error', False) | ||
| )) | ||
|
|
||
| messages.append(Message( | ||
| role=role, | ||
| content=content, | ||
| tool_calls=tool_calls, | ||
| tool_results=tool_results | ||
| )) | ||
|
|
||
| return messages |
There was a problem hiding this comment.
Validate deserialized history before calling .get().
This helper consumes request JSON, but it assumes every history, tool_calls, and tool_results entry is a dict. A single null or string value will raise here and fail the whole turn instead of just ignoring the bad element. Guard the top-level and nested items with shape checks, and use or [] for nullable lists.
💡 Suggested fix
def deserialize_history(
history_data: list[dict]
) -> list[Message]:
@@
- messages = []
+ if not isinstance(history_data, list):
+ return []
+
+ messages = []
for item in history_data:
+ if not isinstance(item, dict):
+ continue
+
role_str = item.get('role', '')
content = item.get('content', '')
@@
- for tc_data in item.get('tool_calls', []):
+ for tc_data in item.get('tool_calls') or []:
+ if not isinstance(tc_data, dict):
+ continue
tool_calls.append(ToolCall(
id=tc_data.get('id', ''),
name=tc_data.get('name', ''),
arguments=tc_data.get('arguments', {})
))
@@
- for tr_data in item.get('tool_results', []):
+ for tr_data in item.get('tool_results') or []:
+ if not isinstance(tr_data, dict):
+ continue
tool_results.append(ToolResult(
tool_call_id=tr_data.get('tool_call_id', ''),
content=tr_data.get('content', ''),
is_error=tr_data.get('is_error', False)
))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@web/pgadmin/llm/compaction.py` around lines 349 - 400, deserialize_history
currently assumes history_data and nested items are dicts and calls .get()
directly, which will crash if any item is None or not a dict; update
deserialize_history to validate shapes before accessing keys: ensure
history_data is an iterable of dicts (skip non-dict items), treat
item.get('tool_calls') and item.get('tool_results') as (item.get(...) or []) and
iterate only over entries that are dicts, and check each tc_data/tr_data is a
dict before constructing ToolCall/ToolResult; also validate role_str is a string
before passing to Role(role_str) and skip malformed entries so Message, Role,
ToolCall, ToolResult creation only occurs for well-formed dicts.
| last_4_original = messages[-4:] | ||
| last_4_result = result[-4:] | ||
| for orig, res in zip(last_4_original, last_4_result): | ||
| self.assertEqual(orig.content, res.content) |
There was a problem hiding this comment.
Assert the full protected tail, not just the overlapping prefix.
This comparison still passes when compact_history() returns fewer than four trailing messages, because the loop only checks as many elements as result[-4:] happens to contain. Compare the two four-message slices directly, or assert the tail length first, so the test fails when any protected recent message disappears.
💡 Suggested fix
- last_4_original = messages[-4:]
- last_4_result = result[-4:]
- for orig, res in zip(last_4_original, last_4_result):
- self.assertEqual(orig.content, res.content)
+ self.assertGreaterEqual(len(result), 5)
+ self.assertEqual(
+ [m.content for m in result[-4:]],
+ [m.content for m in messages[-4:]],
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| last_4_original = messages[-4:] | |
| last_4_result = result[-4:] | |
| for orig, res in zip(last_4_original, last_4_result): | |
| self.assertEqual(orig.content, res.content) | |
| last_4_original = messages[-4:] | |
| self.assertGreaterEqual(len(result), 5) | |
| self.assertEqual( | |
| [m.content for m in result[-4:]], | |
| [m.content for m in messages[-4:]], | |
| ) |
🧰 Tools
🪛 Ruff (0.15.5)
[warning] 135-135: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@web/pgadmin/llm/tests/test_compaction.py` around lines 133 - 136, The test
only compares overlapping elements so it can pass if compact_history() returns
fewer than four messages; update the assertion to ensure the full four-message
tail is preserved by first asserting len(result[-4:]) == 4 (or len(result) >= 4)
and then asserting the two slices are equal (compare last_4_original ==
last_4_result) so the test fails when any protected recent message is missing;
locate this change around compact_history() usage and the variables
last_4_original / last_4_result in test_compaction.py.
Summary by CodeRabbit
Bug Fixes
New Features
Tests
Documentation