TypeError in LocalEvalSampler when metric evaluation fails

When running adk optimize, if a metric evaluation fails (e.g., due to a transient API error, rate limiting, or a `JSONDecodeError` from the LLM judge), the `LocalEvalSampler` crashes with a TypeError. This happens because the valuation logic gracefully catches the exception and returns a result with a None score, but the sampler subsequently tries to round this `None` value.

```
  Error Logs

    TypeError: type NoneType doesn't define __round__ method
   
    Traceback (most recent call last):
      ...
      File ".../google/adk/optimization/local_eval_sampler.py", line 362, in sample_and_score
        self._extract_eval_data(eval_set_id, eval_results)
      File ".../google/adk/optimization/local_eval_sampler.py", line 292, in _extract_eval_data
        "score": round(eval_metric_result.score, 2),  # accurate enough
    TypeError: type NoneType doesn't define __round__ method
```

  # Root Cause
  In [`google/adk/evaluation/local_eval_service.py`](https://github.com/google/adk-python/blob/e967f2812e474476f0e0c71908530ce37ee93928/src/google/adk/evaluation/local_eval_service.py#L354), the `_evaluate_metric_for_eval_case` method catches all exceptions during evaluation:

```python
except Exception as e:
  logger.error(...)
  # We use an empty result.
  evaluation_result = EvaluationResult(
     overall_eval_status=EvalStatus.NOT_EVALUATED
  )
```

  The `EvaluationResult` (and its nested `PerInvocationResult`) defaults its score field to `None`.

  In [`google/adk/optimization/local_eval_sampler.py`](https://github.com/google/adk-python/blob/e967f2812e474476f0e0c71908530ce37ee93928/src/google/adk/optimization/local_eval_sampler.py#L292), the `_extract_eval_data` method iterates through these results and attempts to round the score
  without checking if it is None:

```python
for eval_metric_result in per_invocation_result.eval_metric_results:
  eval_metric_results.append({
      "metric_name": eval_metric_result.metric_name,
      "score": round(eval_metric_result.score, 2),  # <--- CRASH HERE
      "eval_status": eval_metric_result.eval_status.name,
})
```

  # Reproduction Steps
   1. Configure an agent for optimization using adk optimize.
   2. Include a metric that relies on an LLM judge (e.g., rubric_based_tool_use_quality_v1).
   3. Trigger a scenario where the judge evaluation fails (e.g., simulate a network error or a malformed judge response).
   4. The process will crash during data extraction instead of reporting a 0.0 score or skipping the failed case.

  # Proposed Fix
  The sampler should handle None scores gracefully, either by defaulting them to 0.0 or skipping the rounding step for un-evaluated metrics.

```python
# google/adk/optimization/local_eval_sampler.py
   
"score": round(eval_metric_result.score, 2) if eval_metric_result.score is not None else 0.0,
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError in LocalEvalSampler when metric evaluation fails #5403

Root Cause

Reproduction Steps

Proposed Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TypeError in LocalEvalSampler when metric evaluation fails #5403

Description

Root Cause

Reproduction Steps

Proposed Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions