fix: normalize restored tool results by he-yufeng · Pull Request #2320 · strands-agents/sdk-python

he-yufeng · 2026-05-23T12:46:56Z

The repository session recovery path already fills in missing toolResult blocks, but it kept every existing toolResult in the following user message. If persisted history already contains a stale or duplicate toolResult, the repaired message can end up with more toolResult blocks than the previous assistant toolUse blocks, which Bedrock rejects.

This keeps the recovery narrow: for the user message immediately after an assistant toolUse message, preserve only the first toolResult matching each previous toolUse id, drop unrelated or duplicate toolResults, and then add any missing synthetic error results.

Validation:

.venv\Scripts\python.exe -m pytest tests\strands\session\test_repository_session_manager.py::test_fix_broken_tool_use_extends_partial_tool_results tests\strands\session\test_repository_session_manager.py::test_fix_broken_tool_use_drops_extra_tool_results tests\strands\session\test_repository_session_manager.py::test_fix_broken_tool_use_handles_multiple_orphaned_tools -q
.venv\Scripts\python.exe -m pytest tests\strands\session\test_repository_session_manager.py -q
.venv\Scripts\python.exe -m ruff format --check src\strands\session\repository_session_manager.py tests\strands\session\test_repository_session_manager.py
.venv\Scripts\python.exe -m ruff check src\strands\session\repository_session_manager.py tests\strands\session\test_repository_session_manager.py
.venv\Scripts\python.exe -m py_compile src\strands\session\repository_session_manager.py tests\strands\session\test_repository_session_manager.py
git diff --check

he-yufeng · 2026-05-27T09:45:39Z

Rebased this onto current main and revalidated the focused session-manager path. Current head is 4dc960a. Validation from strands-py: uv run pytest tests/strands/session/test_repository_session_manager.py -q --basetemp ...tmp\pytest-2320 -p no:cacheprovider: 42 passed; uv run ruff check src/strands/session/repository_session_manager.py tests/strands/session/test_repository_session_manager.py; uv run ruff format --check src/strands/session/repository_session_manager.py tests/strands/session/test_repository_session_manager.py; python -m py_compile strands-py\src\strands\session\repository_session_manager.py strands-py\tests\strands\session\test_repository_session_manager.py; git diff --check origin/main...HEAD.

gautamsirdeshmukh · 2026-05-28T16:35:02Z

/strands review

github-actions · 2026-05-28T16:39:14Z

+                    if missing_tool_use_ids or removed_tool_results:
                        logger.warning(
                            "Session message history has an orphaned toolUse with no toolResult. "
                            "Adding toolResult content blocks to create valid conversation."


Issue: The warning message is misleading when only duplicate/stale results are removed but no tool results are actually missing (i.e., removed_tool_results=True and missing_tool_use_ids is empty). The message says "orphaned toolUse with no toolResult" and "Adding toolResult content blocks" but no blocks are added in that scenario.

Suggestion: Consider differentiating the log message based on the actual repair being performed:

if missing_tool_use_ids: logger.warning( "Session message history has an orphaned toolUse with no toolResult. " "Adding toolResult content blocks to create valid conversation." ) if removed_tool_results: logger.warning( "Session message history has duplicate or unrelated toolResult blocks. " "Removing extra toolResult content blocks to create valid conversation." )

Agreed with the bot callout, I think specific log statements would be helpful here

github-actions · 2026-05-28T16:39:16Z

    assert missing_result["toolResult"]["content"][0]["text"] == "Tool was interrupted."


+def test_fix_broken_tool_use_drops_extra_tool_results(session_manager):


Issue: Missing docstring. The neighboring tests (test_fix_broken_tool_use_extends_partial_tool_results, test_fix_broken_tool_use_handles_multiple_orphaned_tools) all have docstrings describing what they test. This one should too for consistency.

Suggestion: Add a brief docstring:

def test_fix_broken_tool_use_drops_extra_tool_results(session_manager): """Test that stale and duplicate toolResult blocks are removed."""

Same here, docstring would be great

github-actions · 2026-05-28T16:39:18Z

+    tool_results = [content["toolResult"] for content in fixed_messages[1]["content"]]
+    assert [tool_result["toolUseId"] for tool_result in tool_results] == ["complete-123", "missing-456"]
+    assert tool_results[0]["content"] == [{"text": "ok"}]
+    assert tool_results[1]["status"] == "error"


Issue: Consider adding a test for the edge case where removed_tool_results=True but missing_tool_use_ids is empty (i.e., all toolUse IDs have a matching result, but there are also stale/duplicate results to drop). This scenario exercises the if missing_tool_use_ids or removed_tool_results path differently — generate_missing_tool_result_content returns [] and the content is just filtered_content.

Suggestion:

def test_fix_broken_tool_use_drops_stale_results_with_no_missing_ids(session_manager): """Test that stale toolResults are removed even when all toolUse IDs are satisfied.""" messages = [ { "role": "assistant", "content": [ {"toolUse": {"toolUseId": "id-1", "name": "t", "input": {}}}, ], }, { "role": "user", "content": [ {"toolResult": {"toolUseId": "id-1", "status": "success", "content": [{"text": "ok"}]}}, {"toolResult": {"toolUseId": "stale-999", "status": "error", "content": [{"text": "old"}]}}, ], }, ] fixed = session_manager._fix_broken_tool_use(messages) results = [c["toolResult"] for c in fixed[1]["content"]] assert len(results) == 1 assert results[0]["toolUseId"] == "id-1"

I agree with the bot again! I think we need this test for sure

github-actions · 2026-05-28T16:39:20Z

@@ -282,25 +282,39 @@ def _fix_broken_tool_use(self, messages: list[Message]) -> list[Message]:
                    ]

                    # Check if there are more messages after the current toolUse message


Issue: The comment says "Check if there are more messages after the current toolUse message" but that's the old comment that no longer accurately describes what the block below now does (filtering and deduplication).

Suggestion: Update to something like:

# Filter the next message's content: keep only the first toolResult matching each toolUse id, # drop duplicates and unrelated toolResults.

This one is definitely not blocking at all but if you are taking care of the others then might as well :)

github-actions · 2026-05-28T16:39:22Z

Assessment: Comment (leans approve)

The fix is well-scoped and correctly addresses the bug: it deduplicates and filters stale toolResult blocks so that the count never exceeds the preceding toolUse blocks. The logic is sound and the new test covers the key scenario.

Review Categories

Logging accuracy: The warning message is misleading in the "only duplicates removed, no missing IDs" path — it claims orphaned toolUse and adding blocks when it's actually removing extras.
Test coverage: Missing an edge case where all toolUse IDs are satisfied but stale results still need to be dropped (exercises removed_tool_results=True with empty missing_tool_use_ids).
Code comments: One inherited comment is stale and no longer describes the new filtering logic.

Clean, focused bugfix — nice work keeping the repair narrow and preserving existing behavior for the happy paths.

gautamsirdeshmukh · 2026-05-29T20:39:57Z

Hi @he-yufeng , thanks for taking this up – if you can address the 4 callouts above, I am happy to shepherd this PR through!

he-yufeng · 2026-05-30T07:19:31Z

Addressed the latest review feedback: split the warning for missing tool results vs duplicate/unrelated tool results, updated the stale filtering comment, added a docstring to the mixed missing/stale regression, and added a regression where all toolUse IDs already have valid results but stale toolResults are still removed. Validated with: uv run --extra dev python -m pytest tests\strands\session\test_repository_session_manager.py -q --basetemp ..\.tmp\pytest-2320-local -p no:cacheprovider; uv run --extra dev ruff check src\strands\session\repository_session_manager.py tests\strands\session\test_repository_session_manager.py; uv run --extra dev ruff format --check src\strands\session\repository_session_manager.py tests\strands\session\test_repository_session_manager.py; uv run --extra dev python -m py_compile src\strands\session\repository_session_manager.py tests\strands\session\test_repository_session_manager.py; git diff --check.

github-actions Bot added the size/s label May 23, 2026

he-yufeng requested a deployment to manual-approval May 23, 2026 12:47 — with GitHub Actions Waiting

he-yufeng force-pushed the fix/session-manager-orphan-tool-results branch from 0fada87 to 4dc960a Compare May 27, 2026 09:45

github-actions Bot removed the size/s label May 27, 2026

he-yufeng requested a deployment to manual-approval May 27, 2026 09:45 — with GitHub Actions Waiting

github-actions Bot added the size/s label May 27, 2026

he-yufeng requested a deployment to manual-approval May 27, 2026 09:45 — with GitHub Actions Waiting

yonib05 added area-persistence Session management or checkpointing area-context Session or context related labels May 27, 2026

opieter-aws mentioned this pull request May 28, 2026

fix: prune stale tool results during session repair #2299

Closed

github-actions Bot added the strands-running label May 28, 2026

github-actions Bot reviewed May 28, 2026

View reviewed changes

github-actions Bot removed the strands-running label May 28, 2026

yonib05 added python Pull requests that update python code bug Something isn't working labels May 29, 2026

fix: normalize restored tool results

6a0e7fb

he-yufeng force-pushed the fix/session-manager-orphan-tool-results branch from 4dc960a to 6a0e7fb Compare May 30, 2026 07:19

github-actions Bot added size/m and removed size/s labels May 30, 2026

he-yufeng requested a deployment to manual-approval May 30, 2026 07:19 — with GitHub Actions Waiting

		assert missing_result["toolResult"]["content"][0]["text"] == "Tool was interrupted."


		def test_fix_broken_tool_use_drops_extra_tool_results(session_manager):

		@@ -282,25 +282,39 @@ def _fix_broken_tool_use(self, messages: list[Message]) -> list[Message]:
		]

		# Check if there are more messages after the current toolUse message

Conversation

he-yufeng commented May 23, 2026

Uh oh!

he-yufeng commented May 27, 2026

Uh oh!

gautamsirdeshmukh commented May 28, 2026

Uh oh!

github-actions Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gautamsirdeshmukh May 29, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gautamsirdeshmukh May 29, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gautamsirdeshmukh May 29, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gautamsirdeshmukh May 29, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

gautamsirdeshmukh commented May 29, 2026

Uh oh!

he-yufeng commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants