Skip to content

Conversation

@devin-ai-integration
Copy link

@devin-ai-integration devin-ai-integration bot commented Dec 10, 2025

fix: update metrics table printing to handle backend response format

Summary

Fixes the issue where the datapoint & aggregate metrics table printed after running evaluate() was showing empty values.

Root cause: The SDK expected metrics as dynamic top-level keys (e.g., {"accuracy": {...}}), but the backend returns them in a details array format per the OpenAPI spec:

{
  "metrics": {
    "aggregation_function": "average",
    "details": [{"metric_name": "accuracy", "aggregate": 0.85, ...}]
  }
}

Changes:

  • Added typed Pydantic models: MetricDetail, DatapointResult, DatapointMetric, MetricDatapoints
  • Updated AggregatedMetrics to use details array instead of dynamic keys
  • Updated print_table() to use typed attributes instead of hasattr checks
  • Updated get_run_result() to parse datapoints into DatapointResult objects

Updates since last revision

  • Manually verified: User confirmed the metrics table now prints correctly with actual experiments
  • Documentation updates:
    • Added changelog entries to CHANGELOG.md and docs/changelog.rst
    • Updated docs/reference/experiments/models.rst with API reference for all new typed models
  • Backward compatibility: get_metric(), list_metrics(), and get_all_metrics() support BOTH the new details array format AND the legacy model_extra format
  • Test coverage: 19 unit tests for typed models + integration test against real API

All unit tests pass locally (64 tests in models + results files). Pre-commit checks pass.

Review & Testing Checklist for Human

  • Critical: Run an actual experiment with evaluators and verify the metrics table displays values correctly (manually verified by user)
  • Breaking change risk: get_metric() now returns Union[MetricDetail, Dict] instead of just Dict - check if any downstream code does strict type checking on the return value
  • Verify backward compatibility - any existing code using AggregatedMetrics with model_extra format should still work
  • Verify field names in typed models match actual backend response (especially DatapointMetric.name vs potential metric_name)

Recommended test plan:

  1. Run a simple experiment with at least one evaluator
  2. Verify the aggregated metrics table shows metric names and values
  3. Verify the datapoints table shows datapoint IDs, session IDs, and pass/fail status

Notes

- Add typed models for MetricDetail, DatapointResult, DatapointMetric, MetricDatapoints
- Update AggregatedMetrics to use details array instead of dynamic keys
- Update print_table() to use typed MetricDetail objects
- Update get_run_result() to parse datapoints into DatapointResult objects
- Fix datapoint rendering to use typed attributes instead of hasattr checks

This fixes the issue where the metrics table was printing without values
because the SDK expected dynamic metric keys but the backend returns
metrics in a 'details' array format per the OpenAPI spec.

Co-Authored-By: Dhruv <dhruv@honeyhive.ai>
@devin-ai-integration
Copy link
Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Contributor

📚 Documentation Preview Built

Documentation preview is ready!

📦 Download Preview

Download documentation artifact

🔍 How to Review

  1. Download the artifact from the link above
  2. Extract the files
  3. Open index.html in your browser

✅ Validation Status

  • API validation: ✅ Passed
  • Build process: ✅ Successful
  • Import tests: ✅ All imports working

Preview generated for PR #160

- Add pylint disable=no-member comments for Pydantic field access
- Add unit tests for new typed models (MetricDetail, DatapointResult, etc.)
- Add unit tests for print_table() with details array format
- Add integration test to validate typed models against real API

Co-Authored-By: Dhruv <dhruv@honeyhive.ai>
@github-actions
Copy link
Contributor

📚 Documentation Preview Built

Documentation preview is ready!

📦 Download Preview

Download documentation artifact

🔍 How to Review

  1. Download the artifact from the link above
  2. Extract the files
  3. Open index.html in your browser

✅ Validation Status

  • API validation: ✅ Passed
  • Build process: ✅ Successful
  • Import tests: ✅ All imports working

Preview generated for PR #160

@github-actions
Copy link
Contributor

📚 Documentation Preview Built

Documentation preview is ready!

📦 Download Preview

Download documentation artifact

🔍 How to Review

  1. Download the artifact from the link above
  2. Extract the files
  3. Open index.html in your browser

✅ Validation Status

  • API validation: ✅ Passed
  • Build process: ✅ Successful
  • Import tests: ✅ All imports working

Preview generated for PR #160

Co-Authored-By: Dhruv <dhruv@honeyhive.ai>
@github-actions
Copy link
Contributor

📚 Documentation Preview Built

Documentation preview is ready!

📦 Download Preview

Download documentation artifact

🔍 How to Review

  1. Download the artifact from the link above
  2. Extract the files
  3. Open index.html in your browser

✅ Validation Status

  • API validation: ✅ Passed
  • Build process: ✅ Successful
  • Import tests: ✅ All imports working

Preview generated for PR #160

… AggregatedMetrics

Co-Authored-By: Dhruv <dhruv@honeyhive.ai>
@github-actions
Copy link
Contributor

📚 Documentation Preview Built

Documentation preview is ready!

📦 Download Preview

Download documentation artifact

🔍 How to Review

  1. Download the artifact from the link above
  2. Extract the files
  3. Open index.html in your browser

✅ Validation Status

  • API validation: ✅ Passed
  • Build process: ✅ Successful
  • Import tests: ✅ All imports working

Preview generated for PR #160

…dels

Co-Authored-By: Dhruv <dhruv@honeyhive.ai>
@github-actions
Copy link
Contributor

📚 Documentation Preview Built

Documentation preview is ready!

📦 Download Preview

Download documentation artifact

🔍 How to Review

  1. Download the artifact from the link above
  2. Extract the files
  3. Open index.html in your browser

✅ Validation Status

  • API validation: ✅ Passed
  • Build process: ✅ Successful
  • Import tests: ✅ All imports working

Preview generated for PR #160

@dhruv-hhai dhruv-hhai merged commit 6d24cb5 into complete-refactor Dec 11, 2025
6 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants