feat: implement max_response_time_ms behavior rule#201
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Implements the max_response_time_ms behavior constraint so runs can be evaluated against a maximum allowed duration, and surfaces the results in behavior metrics and scoring.
Changes:
- Add
max_response_time_mstoBehaviorRulesand to the task JSON schema. - Extend
BehaviorMetricsto record actual duration, pass/fail for the response-time constraint, and include it inAllConstraintsPassed(). - Update behavior efficiency scoring to 5 constraint categories and expand unit tests accordingly.
Show a summary per file
| File | Description |
|---|---|
schemas/task.schema.json |
Adds max_response_time_ms to the behaviorRules schema. |
internal/models/testcase.go |
Adds MaxResponseTimeMs to BehaviorRules (and also introduces an output_contains_any field in TestExpectation). |
internal/metrics/behavior.go |
Implements response-time compliance check and updates efficiency scoring weights. |
internal/metrics/behavior_test.go |
Adds response-time test cases and updates expected efficiency scores for 5-category weighting. |
Copilot's findings
- Files reviewed: 4/4 changed files
- Comments generated: 1
Member
Author
|
Re: Copilot review on |
Add MaxResponseTimeMs field to BehaviorRules and implement timing compliance check in ComputeBehaviorMetrics. Changes: - Add MaxResponseTimeMs int64 to BehaviorRules (testcase.go) - Add MaxResponseTimeMs, ActualResponseTimeMs, MaxResponseTimeMsPassed to BehaviorMetrics (behavior.go) - Check run.DurationMs <= rules.MaxResponseTimeMs when set - Include MaxResponseTimeMsPassed in AllConstraintsPassed() - Update computeEfficiency from 4×0.25 to 5×0.20 categories - Add max_response_time_ms to JSON schema (task.schema.json) - Add 4 new test cases: under/at/over limit, combined failure - Update existing test expected efficiency scores Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Document the new behavior rule field in both the Writing Eval Specs guide and the YAML Schema reference page. Includes field table, usage examples, and description of efficiency scoring. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bacd80e to
4b99ca8
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements
max_response_time_msbehavior rule as requested in #136.Working as Linus (Backend Developer).
Changes
internal/models/testcase.goMaxResponseTimeMs int64field toBehaviorRulesstructinternal/metrics/behavior.goMaxResponseTimeMs,ActualResponseTimeMs,MaxResponseTimeMsPassedfields toBehaviorMetricsComputeBehaviorMetrics: comparesrun.DurationMsagainstrules.MaxResponseTimeMswhen set (>0)MaxResponseTimeMsPassedinAllConstraintsPassed()computeEfficiencyfrom 4 categories × 0.25 to 5 categories × 0.20internal/metrics/behavior_test.gowantResponseTimePassedassertion to test structschemas/task.schema.jsonmax_response_time_ms(integer, minimum: 1) tobehaviorRulesdefinitionUsage
Breaking Change Note
Efficiency scoring changed from 4×0.25 to 5×0.20 per constraint category. Anyone relying on exact efficiency score values will see different numbers (e.g., 1 failed constraint: 0.75 → 0.80).
Closes #136