You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Autoloop currently assumes higher is better everywhere — in the best_metric comparison in the scheduler, in the "metric improved" check in the iteration loop, in the iteration-history delta formatting, and implicitly in the halting-condition rule for programs with a target-metric. Programs whose natural fitness is lower is better (minimize ratio / error / latency / cost / fitness score) currently have to invert their metric in the Evaluation block, which makes the value hard to read in iteration comments and inverts the semantics of target-metric in unintuitive ways.
Add first-class support for metric_direction: lower in the program frontmatter, and thread it through everywhere a metric comparison or delta is computed.
Motivation
OpenEvolve programs (proposed in sibling issue Add a strategy system; ship OpenEvolve as the first specialized iteration playbook #47) typically minimize a fitness ratio (candidate / reference — lower means our candidate beats the reference). Wanting to best_metric = min(history) is natural; having to negate + remember the sign adds cognitive overhead.
Latency / cost / error / bundle-size optimization programs are all lower-is-better. Users shouldn't have to invert their metric to use autoloop.
target-metric becomes confusing under inversion: a program targeting "reach ratio ≤ 0.9" would currently have to encode the target as -0.9 or 1/0.9, with cryptic comparison logic.
Today every user who wants lower-is-better has to either invert in their Evaluation script (ugly) or read the iteration comments "backwards" (confusing).
Proposed changes
1. Frontmatter field
---
schedule: every 6hmetric_direction: lower # defaults to "higher" if omittedtarget-metric: 0.9# interpreted as "program is complete when best_metric ≤ 0.9"
---
Values: higher (default, current behaviour) or lower. Reject anything else at frontmatter-parse time.
parse_program_frontmatter already parses schedule and target-metric; extend it to return metric_direction.
Plumb through parse_program_frontmatter's callers, into all_programs[name] + /tmp/gh-aw/autoloop.json.
Emit a new field in autoloop.json:
"selected_metric_direction": "lower"
3. Agent prompt (workflows/autoloop.md)
Three places need direction-aware logic:
a. The "metric improved" check in Step 5 (Accept or Reject)
-**If the metric improved** (or this is the first run establishing a baseline):+**If the metric improved** (or this is the first run establishing a baseline).+Improvement is direction-aware:+- If `metric_direction` is `higher` (default): improved = `new > best_metric`.+- If `metric_direction` is `lower`: improved = `new < best_metric`.+Read `selected_metric_direction` from `/tmp/gh-aw/autoloop.json` to know which.
b. The best_metric update in the state file
Currently "set best_metric" assumes replace-if-higher. Make it "set best_metric to the new value" (since improvement was already validated above), and separately instruct the state-file reader in the pre-step to know which direction determines "overdue" ranking. The scheduler comparison that picks the most-overdue program doesn't depend on metric direction, but the display delta does.
c. The Iteration History delta formatting
-Prepend an entry to **📊 Iteration History** (newest first) with status ✅, metric, PR link, the fix-attempt count if `> 0`, and a one-line summary…+Prepend an entry to **📊 Iteration History** (newest first) with status ✅, metric, **signed delta** (`+N` for `higher`-direction programs, `-N` for `lower`-direction programs; both are "improvement" arrows), PR link, the fix-attempt count if `> 0`, and a one-line summary…
d. Halting condition
-If the program has a `target-metric` in its frontmatter and the new `best_metric` meets or surpasses the target, mark the program as completed.+If the program has a `target-metric` in its frontmatter:+- `metric_direction: higher`: completed when `best_metric >= target-metric`.+- `metric_direction: lower`: completed when `best_metric <= target-metric`.+Mark the program as completed (set `Completed: true`, remove the `autoloop-program` label, add `autoloop-completed`).
4. Machine State table
Add a row:
| Metric Direction | lower |
For backward compatibility, if the row is absent, treat the program as higher. On the first iteration after this change lands for an existing program, the agent adds the row with the value from frontmatter (or higher if absent).
5. Tests
In tests/, add fixtures for metric_direction: lower:
Given best_metric = 1.5 and new metric 1.3 with metric_direction: lower, improvement returns true.
Given best_metric = 1.5 and new metric 1.7 with metric_direction: lower, improvement returns false.
Given target-metric: 0.9 and best_metric: 0.85 with metric_direction: lower, halting condition returns true.
Default (direction omitted) behaves as higher exactly as today — no regression for existing programs.
Backward compatibility
Programs without the field default to higher — no change in behaviour.
Machine State row is optional; absence is treated as higher.
No migration needed for existing state files.
Acceptance
A new program with metric_direction: lower in frontmatter has its best_metric ratchet downward, and iteration comments show -<delta> as improvement.
A program with target-metric: 0.9 and metric_direction: lower completes when best_metric reaches 0.9 or below.
All existing programs (implicit higher) keep working identically.
Tests for both directions land in tests/ and pass.
Summary
Autoloop currently assumes higher is better everywhere — in the
best_metriccomparison in the scheduler, in the "metric improved" check in the iteration loop, in the iteration-history delta formatting, and implicitly in the halting-condition rule for programs with atarget-metric. Programs whose natural fitness is lower is better (minimize ratio / error / latency / cost / fitness score) currently have to invert their metric in theEvaluationblock, which makes the value hard to read in iteration comments and inverts the semantics oftarget-metricin unintuitive ways.Add first-class support for
metric_direction: lowerin the program frontmatter, and thread it through everywhere a metric comparison or delta is computed.Motivation
best_metric = min(history)is natural; having to negate + remember the sign adds cognitive overhead.target-metricbecomes confusing under inversion: a program targeting "reach ratio ≤ 0.9" would currently have to encode the target as-0.9or1/0.9, with cryptic comparison logic.Today every user who wants lower-is-better has to either invert in their
Evaluationscript (ugly) or read the iteration comments "backwards" (confusing).Proposed changes
1. Frontmatter field
Values:
higher(default, current behaviour) orlower. Reject anything else at frontmatter-parse time.2. Scheduler (
workflows/scripts/autoloop_scheduler.py)parse_program_frontmatteralready parsesscheduleandtarget-metric; extend it to returnmetric_direction.Plumb through
parse_program_frontmatter's callers, intoall_programs[name]+/tmp/gh-aw/autoloop.json.Emit a new field in
autoloop.json:3. Agent prompt (
workflows/autoloop.md)Three places need direction-aware logic:
a. The "metric improved" check in Step 5 (Accept or Reject)
b. The
best_metricupdate in the state fileCurrently "set
best_metric" assumes replace-if-higher. Make it "setbest_metricto the new value" (since improvement was already validated above), and separately instruct the state-file reader in the pre-step to know which direction determines "overdue" ranking. The scheduler comparison that picks the most-overdue program doesn't depend on metric direction, but the display delta does.c. The Iteration History delta formatting
d. Halting condition
4. Machine State table
Add a row:
For backward compatibility, if the row is absent, treat the program as
higher. On the first iteration after this change lands for an existing program, the agent adds the row with the value from frontmatter (orhigherif absent).5. Tests
In
tests/, add fixtures formetric_direction: lower:best_metric = 1.5and new metric1.3withmetric_direction: lower, improvement returnstrue.best_metric = 1.5and new metric1.7withmetric_direction: lower, improvement returnsfalse.target-metric: 0.9andbest_metric: 0.85withmetric_direction: lower, halting condition returnstrue.higherexactly as today — no regression for existing programs.Backward compatibility
higher— no change in behaviour.higher.Acceptance
metric_direction: lowerin frontmatter has itsbest_metricratchet downward, and iteration comments show-<delta>as improvement.target-metric: 0.9andmetric_direction: lowercompletes whenbest_metricreaches 0.9 or below.higher) keep working identically.tests/and pass.Related
lowerdirection for programs whose metric is "failing tests count" or "lint violations count"; should work out of the box once this lands.