autoloop directive (#3) rewards line-count inflation, not working software — concrete fixes

## Summary

After ~140 iterations of the autoloop program defined in #3, the Go CLI is still a 17-line stub binary that only handles \`version\`, while \`migration-status.json\` claims **1084.2%** of Python is migrated. This isn't an agent bug — the directive in #3 is **literally optimising for the wrong thing**, and the agent is correctly maximising it.

This issue proposes concrete edits to #3 so iterations produce **working software**, not catalogued packages.

## Diagnosis: why the loop produces no real work

### 1. The evaluation metric is self-reported and unbounded

\`\`\`python
# from issue #3
migrated = data.get('migrated_python_lines', 0)
pct = round((migrated / total) * 100, 2)
# Higher is better.
\`\`\`

\`migrated_python_lines\` is a field the agent itself writes into \`migration-status.json\`. The file is now **2,321 entries for only 660 unique Go packages** — many packages logged 6–12 times across iterations (e.g. \`internal/cache/gitcache\` × 10, \`internal/adapters/packagemanager\` × 12), each time crediting the Python line count again. Sum: 883,171 lines "migrated" against a 87,626-line source. Hence 1084%.

The agent is doing exactly what the directive asks: maximise a number. The number is the wrong number.

### 2. Step 5 ("Validate") is unenforced

The directive says:

> The Go binary must be callable from the existing CLI entry point or replace it.

But there is no check for this. \`cmd/apm/main.go\` has been a 17-line stub since the start of the migration. ~140 iterations later it is still:

\`\`\`go
func main() {
    if len(os.Args) > 1 && os.Args[1] == "version" {
        fmt.Println(version.GetVersion())
        return
    }
    fmt.Fprintln(os.Stderr, "apm-go: stub binary (migration in progress)")
    os.Exit(1)
}
\`\`\`

No iteration has ever wired a migrated package into the CLI. The directive's success criterion is a sentence in prose; the metric is a number in a JSON. The agent optimises the metric.

### 3. "Add extra test files" is a free LOC inflator

Recent iterations (\`Iteration 131..137\`) are all variants of "extra test suites for N thin Go packages" — packages that are already migrated and already tested. These iterations add no functionality but bump test counts and trigger \`migration-status.json\` updates. They look like progress; they aren't.

### 4. No vertical-slice requirement

The directive says start with leaf modules (\`utils/\`, \`constants.py\`) and "work inward." There is no requirement to ever **finish a slice end-to-end** — e.g. "the \`apm install\` subcommand works end-to-end in Go." So the agent stays at the leaves indefinitely, where the easy wins are.

## Proposed fixes to #3

Each of these is a small edit to the program body that closes a specific loophole above:

### Fix A — replace the self-reported metric with an observable one

Replace the current metric with something the agent **cannot inflate by editing JSON**:

\`\`\`bash
# Number of CLI subcommands that work end-to-end in the Go binary,
# verified by running them and comparing exit code + stdout against Python.
go build -o /tmp/apm-go ./cmd/apm
python3 scripts/cli_parity_check.py /tmp/apm-go | jq .working_subcommands_pct
\`\`\`

Where \`cli_parity_check.py\` enumerates Python subcommands and runs each against both binaries with a fixture, scoring 1.0 only when Go matches Python output. **Higher is better, and the agent cannot fake it.**

### Fix B — make Step 5 a gate, not a guideline

Change *"The Go binary must be callable from the existing CLI entry point or replace it"* to:

> **Each iteration must end with at least one new Go subcommand exposed via \`cmd/apm/main.go\` and exercised by an integration test that runs the binary.** Iterations that add only \`internal/\` packages without wiring them into the CLI do not count and must be rejected by the evaluator.

### Fix C — forbid "extra tests for already-migrated packages"

Add to the "Do NOT" rules:

> Do NOT add additional test files to a package that already has \`>= 80%\` Go coverage. Coverage-padding iterations do not advance the migration.

### Fix D — require a vertical slice cadence

Add to the loop:

> Every 5th iteration must migrate **a complete user-facing command** (e.g. \`apm install\`, \`apm drift check\`, \`apm uninstall\`) end-to-end, wiring all of its internal dependencies through to \`cmd/apm/main.go\`, with a passing integration test invoking the Go binary.

### Fix E — repair migration-status.json or stop trusting it

The file currently overcounts by 11x and has 87 entries with a \`null\` \`go_package\`. Either:

- Have the autoloop **regenerate it from source** each iteration (walk \`internal/\` and \`src/apm_cli/\`, compute line counts, no agent self-reporting), or
- Stop using it as the evaluation metric (see Fix A).

## Why this matters

#3 has run for 298 comments and ~140 iterations. The Go module now exceeds the Python module in line count (91k vs 87k) but **provides zero user-visible functionality**. Without these directive changes, iteration 200 will look exactly like iteration 140: more thin Go packages, more "extra tests," same stub binary, same 1084% in the JSON.

This is a great test case for autoloop itself — the diagnosis here is *the directive*, not the agent. Fixing #3's body should fix the loop.

## Related

- #3 — the directive being amended
- #71 — companion issue requesting matched Go benchmarks (also blocked by the same "no end-to-end work" pattern)
- #20 — progress site, which currently shows the inflated number

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autoloop directive (#3) rewards line-count inflation, not working software — concrete fixes #72

Summary

Diagnosis: why the loop produces no real work

1. The evaluation metric is self-reported and unbounded

from issue #3

Higher is better.

2. Step 5 ("Validate") is unenforced

3. "Add extra test files" is a free LOC inflator

4. No vertical-slice requirement

Proposed fixes to #3

Fix A — replace the self-reported metric with an observable one

Number of CLI subcommands that work end-to-end in the Go binary,

verified by running them and comparing exit code + stdout against Python.

Fix B — make Step 5 a gate, not a guideline

Fix C — forbid "extra tests for already-migrated packages"

Fix D — require a vertical slice cadence

Fix E — repair migration-status.json or stop trusting it

Why this matters

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

autoloop directive (#3) rewards line-count inflation, not working software — concrete fixes #72

Description

Summary

Diagnosis: why the loop produces no real work

1. The evaluation metric is self-reported and unbounded

from issue #3

Higher is better.

2. Step 5 ("Validate") is unenforced

3. "Add extra test files" is a free LOC inflator

4. No vertical-slice requirement

Proposed fixes to #3

Fix A — replace the self-reported metric with an observable one

Number of CLI subcommands that work end-to-end in the Go binary,

verified by running them and comparing exit code + stdout against Python.

Fix B — make Step 5 a gate, not a guideline

Fix C — forbid "extra tests for already-migrated packages"

Fix D — require a vertical slice cadence

Fix E — repair migration-status.json or stop trusting it

Why this matters

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions