fix: normalize restored tool results#2320
Conversation
0fada87 to
4dc960a
Compare
|
Rebased this onto current main and revalidated the focused session-manager path. Current head is 4dc960a. Validation from strands-py: uv run pytest tests/strands/session/test_repository_session_manager.py -q --basetemp ...tmp\pytest-2320 -p no:cacheprovider: 42 passed; uv run ruff check src/strands/session/repository_session_manager.py tests/strands/session/test_repository_session_manager.py; uv run ruff format --check src/strands/session/repository_session_manager.py tests/strands/session/test_repository_session_manager.py; python -m py_compile strands-py\src\strands\session\repository_session_manager.py strands-py\tests\strands\session\test_repository_session_manager.py; git diff --check origin/main...HEAD. |
|
/strands review |
| if missing_tool_use_ids or removed_tool_results: | ||
| logger.warning( | ||
| "Session message history has an orphaned toolUse with no toolResult. " | ||
| "Adding toolResult content blocks to create valid conversation." |
There was a problem hiding this comment.
Issue: The warning message is misleading when only duplicate/stale results are removed but no tool results are actually missing (i.e., removed_tool_results=True and missing_tool_use_ids is empty). The message says "orphaned toolUse with no toolResult" and "Adding toolResult content blocks" but no blocks are added in that scenario.
Suggestion: Consider differentiating the log message based on the actual repair being performed:
if missing_tool_use_ids:
logger.warning(
"Session message history has an orphaned toolUse with no toolResult. "
"Adding toolResult content blocks to create valid conversation."
)
if removed_tool_results:
logger.warning(
"Session message history has duplicate or unrelated toolResult blocks. "
"Removing extra toolResult content blocks to create valid conversation."
)There was a problem hiding this comment.
Agreed with the bot callout, I think specific log statements would be helpful here
| assert missing_result["toolResult"]["content"][0]["text"] == "Tool was interrupted." | ||
|
|
||
|
|
||
| def test_fix_broken_tool_use_drops_extra_tool_results(session_manager): |
There was a problem hiding this comment.
Issue: Missing docstring. The neighboring tests (test_fix_broken_tool_use_extends_partial_tool_results, test_fix_broken_tool_use_handles_multiple_orphaned_tools) all have docstrings describing what they test. This one should too for consistency.
Suggestion: Add a brief docstring:
def test_fix_broken_tool_use_drops_extra_tool_results(session_manager):
"""Test that stale and duplicate toolResult blocks are removed."""There was a problem hiding this comment.
Same here, docstring would be great
| tool_results = [content["toolResult"] for content in fixed_messages[1]["content"]] | ||
| assert [tool_result["toolUseId"] for tool_result in tool_results] == ["complete-123", "missing-456"] | ||
| assert tool_results[0]["content"] == [{"text": "ok"}] | ||
| assert tool_results[1]["status"] == "error" |
There was a problem hiding this comment.
Issue: Consider adding a test for the edge case where removed_tool_results=True but missing_tool_use_ids is empty (i.e., all toolUse IDs have a matching result, but there are also stale/duplicate results to drop). This scenario exercises the if missing_tool_use_ids or removed_tool_results path differently — generate_missing_tool_result_content returns [] and the content is just filtered_content.
Suggestion:
def test_fix_broken_tool_use_drops_stale_results_with_no_missing_ids(session_manager):
"""Test that stale toolResults are removed even when all toolUse IDs are satisfied."""
messages = [
{
"role": "assistant",
"content": [
{"toolUse": {"toolUseId": "id-1", "name": "t", "input": {}}},
],
},
{
"role": "user",
"content": [
{"toolResult": {"toolUseId": "id-1", "status": "success", "content": [{"text": "ok"}]}},
{"toolResult": {"toolUseId": "stale-999", "status": "error", "content": [{"text": "old"}]}},
],
},
]
fixed = session_manager._fix_broken_tool_use(messages)
results = [c["toolResult"] for c in fixed[1]["content"]]
assert len(results) == 1
assert results[0]["toolUseId"] == "id-1"There was a problem hiding this comment.
I agree with the bot again! I think we need this test for sure
| @@ -282,25 +282,39 @@ def _fix_broken_tool_use(self, messages: list[Message]) -> list[Message]: | |||
| ] | |||
|
|
|||
| # Check if there are more messages after the current toolUse message | |||
There was a problem hiding this comment.
Issue: The comment says "Check if there are more messages after the current toolUse message" but that's the old comment that no longer accurately describes what the block below now does (filtering and deduplication).
Suggestion: Update to something like:
# Filter the next message's content: keep only the first toolResult matching each toolUse id,
# drop duplicates and unrelated toolResults.There was a problem hiding this comment.
This one is definitely not blocking at all but if you are taking care of the others then might as well :)
|
Assessment: Comment (leans approve) The fix is well-scoped and correctly addresses the bug: it deduplicates and filters stale Review Categories
Clean, focused bugfix — nice work keeping the repair narrow and preserving existing behavior for the happy paths. |
|
Hi @he-yufeng , thanks for taking this up – if you can address the 4 callouts above, I am happy to shepherd this PR through! |
4dc960a to
6a0e7fb
Compare
|
Addressed the latest review feedback: split the warning for missing tool results vs duplicate/unrelated tool results, updated the stale filtering comment, added a docstring to the mixed missing/stale regression, and added a regression where all toolUse IDs already have valid results but stale toolResults are still removed. Validated with: uv run --extra dev python -m pytest tests\strands\session\test_repository_session_manager.py -q --basetemp ..\.tmp\pytest-2320-local -p no:cacheprovider; uv run --extra dev ruff check src\strands\session\repository_session_manager.py tests\strands\session\test_repository_session_manager.py; uv run --extra dev ruff format --check src\strands\session\repository_session_manager.py tests\strands\session\test_repository_session_manager.py; uv run --extra dev python -m py_compile src\strands\session\repository_session_manager.py tests\strands\session\test_repository_session_manager.py; git diff --check. |
Fixes #2296.
The repository session recovery path already fills in missing toolResult blocks, but it kept every existing toolResult in the following user message. If persisted history already contains a stale or duplicate toolResult, the repaired message can end up with more toolResult blocks than the previous assistant toolUse blocks, which Bedrock rejects.
This keeps the recovery narrow: for the user message immediately after an assistant toolUse message, preserve only the first toolResult matching each previous toolUse id, drop unrelated or duplicate toolResults, and then add any missing synthetic error results.
Validation: