Skip to content

feat: implement max_response_time_ms behavior rule#201

Merged
spboyer merged 3 commits into
mainfrom
squad/136-max-response-time
Apr 21, 2026
Merged

feat: implement max_response_time_ms behavior rule#201
spboyer merged 3 commits into
mainfrom
squad/136-max-response-time

Conversation

@spboyer
Copy link
Copy Markdown
Member

@spboyer spboyer commented Apr 21, 2026

Summary

Implements max_response_time_ms behavior rule as requested in #136.

Working as Linus (Backend Developer).

Changes

internal/models/testcase.go

  • Added MaxResponseTimeMs int64 field to BehaviorRules struct

internal/metrics/behavior.go

  • Added MaxResponseTimeMs, ActualResponseTimeMs, MaxResponseTimeMsPassed fields to BehaviorMetrics
  • Implemented timing compliance check in ComputeBehaviorMetrics: compares run.DurationMs against rules.MaxResponseTimeMs when set (>0)
  • Included MaxResponseTimeMsPassed in AllConstraintsPassed()
  • Updated computeEfficiency from 4 categories × 0.25 to 5 categories × 0.20

internal/metrics/behavior_test.go

  • Added 4 new test cases: under limit, at limit, exceeds limit, combined failure
  • Updated all existing test expected efficiency scores for new 5-category weighting
  • Added wantResponseTimePassed assertion to test struct

schemas/task.schema.json

  • Added max_response_time_ms (integer, minimum: 1) to behaviorRules definition

Usage

expected:
  behavior:
    max_response_time_ms: 5000   # must complete within 5 seconds
    max_tool_calls: 10

Breaking Change Note

Efficiency scoring changed from 4×0.25 to 5×0.20 per constraint category. Anyone relying on exact efficiency score values will see different numbers (e.g., 1 failed constraint: 0.75 → 0.80).

Closes #136

Copilot AI review requested due to automatic review settings April 21, 2026 17:02
@github-actions github-actions Bot enabled auto-merge (squash) April 21, 2026 17:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the max_response_time_ms behavior constraint so runs can be evaluated against a maximum allowed duration, and surfaces the results in behavior metrics and scoring.

Changes:

  • Add max_response_time_ms to BehaviorRules and to the task JSON schema.
  • Extend BehaviorMetrics to record actual duration, pass/fail for the response-time constraint, and include it in AllConstraintsPassed().
  • Update behavior efficiency scoring to 5 constraint categories and expand unit tests accordingly.
Show a summary per file
File Description
schemas/task.schema.json Adds max_response_time_ms to the behaviorRules schema.
internal/models/testcase.go Adds MaxResponseTimeMs to BehaviorRules (and also introduces an output_contains_any field in TestExpectation).
internal/metrics/behavior.go Implements response-time compliance check and updates efficiency scoring weights.
internal/metrics/behavior_test.go Adds response-time test cases and updates expected efficiency scores for 5-category weighting.

Copilot's findings

  • Files reviewed: 4/4 changed files
  • Comments generated: 1

Comment thread internal/models/testcase.go
@spboyer
Copy link
Copy Markdown
Member Author

spboyer commented Apr 21, 2026

Re: Copilot review on MayInclude/output_contains_any: This field is pre-existing and was not introduced by this PR. Our change only adds MaxResponseTimeMs to BehaviorRules. The schema alignment for output_contains_any can be tracked separately.

Copilot AI review requested due to automatic review settings April 21, 2026 17:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 6/6 changed files
  • Comments generated: 6

Comment thread internal/metrics/behavior.go
Comment thread internal/metrics/behavior_test.go
Comment thread site/src/content/docs/guides/eval-yaml.mdx
Comment thread site/src/content/docs/guides/eval-yaml.mdx
Comment thread site/src/content/docs/reference/schema.mdx
Comment thread site/src/content/docs/reference/schema.mdx
Copilot AI added 2 commits April 21, 2026 14:21
Add MaxResponseTimeMs field to BehaviorRules and implement timing
compliance check in ComputeBehaviorMetrics.

Changes:
- Add MaxResponseTimeMs int64 to BehaviorRules (testcase.go)
- Add MaxResponseTimeMs, ActualResponseTimeMs, MaxResponseTimeMsPassed
  to BehaviorMetrics (behavior.go)
- Check run.DurationMs <= rules.MaxResponseTimeMs when set
- Include MaxResponseTimeMsPassed in AllConstraintsPassed()
- Update computeEfficiency from 4×0.25 to 5×0.20 categories
- Add max_response_time_ms to JSON schema (task.schema.json)
- Add 4 new test cases: under/at/over limit, combined failure
- Update existing test expected efficiency scores

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Document the new behavior rule field in both the Writing Eval Specs
guide and the YAML Schema reference page. Includes field table,
usage examples, and description of efficiency scoring.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the squad/136-max-response-time branch from bacd80e to 4b99ca8 Compare April 21, 2026 18:21
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer merged commit 6fb2f65 into main Apr 21, 2026
5 checks passed
@spboyer spboyer deleted the squad/136-max-response-time branch April 21, 2026 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add implementation of currently non-existent BehaviorRules max_response_time_ms field.

3 participants