When running adk optimize, if a metric evaluation fails (e.g., due to a transient API error, rate limiting, or a JSONDecodeError from the LLM judge), the LocalEvalSampler crashes with a TypeError. This happens because the valuation logic gracefully catches the exception and returns a result with a None score, but the sampler subsequently tries to round this None value.
Error Logs
TypeError: type NoneType doesn't define __round__ method
Traceback (most recent call last):
...
File ".../google/adk/optimization/local_eval_sampler.py", line 362, in sample_and_score
self._extract_eval_data(eval_set_id, eval_results)
File ".../google/adk/optimization/local_eval_sampler.py", line 292, in _extract_eval_data
"score": round(eval_metric_result.score, 2), # accurate enough
TypeError: type NoneType doesn't define __round__ method
Root Cause
In google/adk/evaluation/local_eval_service.py, the _evaluate_metric_for_eval_case method catches all exceptions during evaluation:
except Exception as e:
logger.error(...)
# We use an empty result.
evaluation_result = EvaluationResult(
overall_eval_status=EvalStatus.NOT_EVALUATED
)
The EvaluationResult (and its nested PerInvocationResult) defaults its score field to None.
In google/adk/optimization/local_eval_sampler.py, the _extract_eval_data method iterates through these results and attempts to round the score
without checking if it is None:
for eval_metric_result in per_invocation_result.eval_metric_results:
eval_metric_results.append({
"metric_name": eval_metric_result.metric_name,
"score": round(eval_metric_result.score, 2), # <--- CRASH HERE
"eval_status": eval_metric_result.eval_status.name,
})
Reproduction Steps
- Configure an agent for optimization using adk optimize.
- Include a metric that relies on an LLM judge (e.g., rubric_based_tool_use_quality_v1).
- Trigger a scenario where the judge evaluation fails (e.g., simulate a network error or a malformed judge response).
- The process will crash during data extraction instead of reporting a 0.0 score or skipping the failed case.
Proposed Fix
The sampler should handle None scores gracefully, either by defaulting them to 0.0 or skipping the rounding step for un-evaluated metrics.
# google/adk/optimization/local_eval_sampler.py
"score": round(eval_metric_result.score, 2) if eval_metric_result.score is not None else 0.0,
When running adk optimize, if a metric evaluation fails (e.g., due to a transient API error, rate limiting, or a
JSONDecodeErrorfrom the LLM judge), theLocalEvalSamplercrashes with a TypeError. This happens because the valuation logic gracefully catches the exception and returns a result with a None score, but the sampler subsequently tries to round thisNonevalue.Root Cause
In
google/adk/evaluation/local_eval_service.py, the_evaluate_metric_for_eval_casemethod catches all exceptions during evaluation:The
EvaluationResult(and its nestedPerInvocationResult) defaults its score field toNone.In
google/adk/optimization/local_eval_sampler.py, the_extract_eval_datamethod iterates through these results and attempts to round the scorewithout checking if it is None:
Reproduction Steps
Proposed Fix
The sampler should handle None scores gracefully, either by defaulting them to 0.0 or skipping the rounding step for un-evaluated metrics.