[LEADS-348] evaluation latency#235
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (15)
WalkthroughThis PR refactors evaluation result timing from a single ChangesTiming Model Refactoring
🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
3eed19a to
14e437e
Compare
14e437e to
8eb287c
Compare
8eb287c to
e9c4b17
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lightspeed_evaluation/core/output/generator.py`:
- Line 817: The JSON summary currently adds "evaluation_latency":
r.evaluation_latency but removed the older "execution_time" field; restore
compatibility by including "execution_time" in the emitted dict (e.g., set
"execution_time": r.execution_time or, if r has no execution_time, set it to the
same value as r.evaluation_latency) alongside "evaluation_latency" in the output
constructed where the dict containing "evaluation_latency" is created so older
consumers still receive execution_time.
In `@src/lightspeed_evaluation/core/storage/sql_storage.py`:
- Line 60: The SQL schema and persistence code currently define/persist only
evaluation_latency, dropping the previously stored execution_time; add back
execution_time = Column(Float, nullable=True) alongside evaluation_latency in
the model definition and update the persistence logic that currently writes
evaluation_latency (e.g., in the function that inserts/updates evaluation
records—look for code referencing evaluation_latency around the save/insert
method) to also set execution_time from the same source value so existing
consumers continue receiving execution_time.
In `@tests/unit/pipeline/evaluation/test_errors.py`:
- Line 160: The test currently asserts results[0].evaluation_latency but misses
verifying the backward-compatible execution_time field on error results; update
the test in tests/unit/pipeline/evaluation/test_errors.py to also assert that
results[0].execution_time equals 0.0 alongside the existing evaluation_latency
assertion so error results explicitly include the legacy execution_time field.
In `@tests/unit/pipeline/evaluation/test_evaluator.py`:
- Around line 583-605: The test
test_execution_time_conversation_level_averages_agent_latency incorrectly
asserts result.agent_latency == 4.0; update the assertion to expect the average
agent latency of the conversation (2.0) since TurnData instances have
agent_latency 1.0 and 3.0 and EvaluationRequest.for_conversation should compute
the mean; change the assertion referencing result.agent_latency in this test to
assert result.agent_latency == 2.0 and keep the other
execution_time/evaluation_latency checks unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 56132854-62fe-4236-8dc5-3380d17e64a8
📒 Files selected for processing (15)
config/system.yamlsrc/lightspeed_evaluation/core/constants.pysrc/lightspeed_evaluation/core/models/data.pysrc/lightspeed_evaluation/core/output/generator.pysrc/lightspeed_evaluation/core/storage/sql_storage.pysrc/lightspeed_evaluation/pipeline/evaluation/evaluator.pytests/script/conftest.pytests/unit/core/models/test_data.pytests/unit/core/models/test_summary.pytests/unit/core/output/test_generator.pytests/unit/core/storage/test_sql_storage.pytests/unit/core/system/test_loader.pytests/unit/pipeline/evaluation/conftest.pytests/unit/pipeline/evaluation/test_errors.pytests/unit/pipeline/evaluation/test_evaluator.py
e9c4b17 to
457af2f
Compare
Description
Depends on #230
Further changes for LEADS-348:
result.execution_timewas renamed toresult.evaluation_latencywhich represents better that it is measuring the evaluationresult.evaluation_latencywas removed for consistency, since other timing metrics are not being roundedresult.execution_timewas reintroduced (for backward compatibility), but currently measures the whole api execution and evaluation per turntest_evaluation.pywas refactored - redundant mocking was moved to pytest fixture (~500 lines reduced)Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
Release Notes
New Features
Tests