fix: Isolate evaluator errors in run_evaluations#84
fix: Isolate evaluator errors in run_evaluations#84afarntrog merged 2 commits intostrands-agents:mainfrom
Conversation
Previously, a single try/except wrapped both task execution and all evaluators, causing all evaluators to fail if any single evaluator threw an exception. Now task execution and evaluator execution have separate error handling: - Task execution errors: Record failure for all evaluators and skip case - Evaluator errors: Only the failing evaluator is marked failed; others continue to run and can succeed independently
Previously, a single try/except wrapped both task execution and all evaluators, causing all evaluators to fail if any single evaluator threw an exception. Now task execution and evaluator execution have separate error handling: - Task execution errors: Record failure for all evaluators and skip case - Evaluator errors: Only the failing evaluator is marked failed; others continue to run and can succeed independently
|
The integ tests failure is unrelated: The test |
strands-agent
left a comment
There was a problem hiding this comment.
Code Review Summary
Overall Assessment: ✅ APPROVE - This is a well-structured bug fix with proper error isolation.
What This PR Does
The PR fixes a critical bug in run_evaluations() where a single failing evaluator would cause ALL evaluators to fail. The fix properly isolates errors:
- Task execution errors → All evaluators marked as failed (correct behavior - no output to evaluate)
- Individual evaluator errors → Only the failing evaluator is marked failed; others continue independently
Code Quality Assessment
✅ Strengths
- Clean separation of concerns: Task execution and evaluator execution now have distinct try/except blocks
- Appropriate error messages:
"Task execution error: {str(e)}"for task failures"Evaluator error: {str(e)}"for evaluator failures
- Good test coverage: Added
ThrowingEvaluatortest fixture andtest_experiment_run_evaluations_evaluator_error_isolated()that validates the exact behavior - Minimal changes: Only 18 lines added to production code, focused and atomic
- Correct context preservation: Evaluator errors now use
evaluation_context.model_dump()instead ofcase.model_dump(), preserving the actual task output
📝 Minor Observations (Non-blocking)
-
Typo fixed: Nice catch on changing
"An error occured"→"Evaluator error"(was a typo: "occured" → "occurred") -
Test completeness: The test covers the sync path. Consider adding an async test in a future PR:
@pytest.mark.asyncio async def test_experiment_run_evaluations_async_evaluator_error_isolated(): # Similar test for run_evaluations_async
CI Status
- ✅ Pull Request and Push Action: SUCCESS
⚠️ Secure Integration test: FAILURE (as noted by author - AWS throttling, unrelated to this change)
Verdict
This is a clean, well-tested bug fix that follows best practices. The error isolation pattern is exactly what's needed for robust evaluator execution. The integration test failure is confirmed unrelated (AWS throttling on parallel evaluations).
🦆
🤖 This is an experimental AI agent response from the Strands team, powered by Strands Agents. We're exploring how AI agents can help with community support and development. Your feedback helps us improve! If you'd prefer human assistance, please let us know.
Description
Previously, a single try/except wrapped both task execution and all evaluators, causing all evaluators to fail if any single evaluator threw an exception.
Now task execution and evaluator execution have separate error handling:
Related Issues
#82
Documentation PR
Type of Change
Bug fix
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.