Summary
After ~140 iterations of the autoloop program defined in #3, the Go CLI is still a 17-line stub binary that only handles `version`, while `migration-status.json` claims 1084.2% of Python is migrated. This isn't an agent bug — the directive in #3 is literally optimising for the wrong thing, and the agent is correctly maximising it.
This issue proposes concrete edits to #3 so iterations produce working software, not catalogued packages.
Diagnosis: why the loop produces no real work
1. The evaluation metric is self-reported and unbounded
```python
from issue #3
migrated = data.get('migrated_python_lines', 0)
pct = round((migrated / total) * 100, 2)
Higher is better.
```
`migrated_python_lines` is a field the agent itself writes into `migration-status.json`. The file is now 2,321 entries for only 660 unique Go packages — many packages logged 6–12 times across iterations (e.g. `internal/cache/gitcache` × 10, `internal/adapters/packagemanager` × 12), each time crediting the Python line count again. Sum: 883,171 lines "migrated" against a 87,626-line source. Hence 1084%.
The agent is doing exactly what the directive asks: maximise a number. The number is the wrong number.
2. Step 5 ("Validate") is unenforced
The directive says:
The Go binary must be callable from the existing CLI entry point or replace it.
But there is no check for this. `cmd/apm/main.go` has been a 17-line stub since the start of the migration. ~140 iterations later it is still:
```go
func main() {
if len(os.Args) > 1 && os.Args[1] == "version" {
fmt.Println(version.GetVersion())
return
}
fmt.Fprintln(os.Stderr, "apm-go: stub binary (migration in progress)")
os.Exit(1)
}
```
No iteration has ever wired a migrated package into the CLI. The directive's success criterion is a sentence in prose; the metric is a number in a JSON. The agent optimises the metric.
3. "Add extra test files" is a free LOC inflator
Recent iterations (`Iteration 131..137`) are all variants of "extra test suites for N thin Go packages" — packages that are already migrated and already tested. These iterations add no functionality but bump test counts and trigger `migration-status.json` updates. They look like progress; they aren't.
4. No vertical-slice requirement
The directive says start with leaf modules (`utils/`, `constants.py`) and "work inward." There is no requirement to ever finish a slice end-to-end — e.g. "the `apm install` subcommand works end-to-end in Go." So the agent stays at the leaves indefinitely, where the easy wins are.
Proposed fixes to #3
Each of these is a small edit to the program body that closes a specific loophole above:
Fix A — replace the self-reported metric with an observable one
Replace the current metric with something the agent cannot inflate by editing JSON:
```bash
Number of CLI subcommands that work end-to-end in the Go binary,
verified by running them and comparing exit code + stdout against Python.
go build -o /tmp/apm-go ./cmd/apm
python3 scripts/cli_parity_check.py /tmp/apm-go | jq .working_subcommands_pct
```
Where `cli_parity_check.py` enumerates Python subcommands and runs each against both binaries with a fixture, scoring 1.0 only when Go matches Python output. Higher is better, and the agent cannot fake it.
Fix B — make Step 5 a gate, not a guideline
Change "The Go binary must be callable from the existing CLI entry point or replace it" to:
Each iteration must end with at least one new Go subcommand exposed via `cmd/apm/main.go` and exercised by an integration test that runs the binary. Iterations that add only `internal/` packages without wiring them into the CLI do not count and must be rejected by the evaluator.
Fix C — forbid "extra tests for already-migrated packages"
Add to the "Do NOT" rules:
Do NOT add additional test files to a package that already has `>= 80%` Go coverage. Coverage-padding iterations do not advance the migration.
Fix D — require a vertical slice cadence
Add to the loop:
Every 5th iteration must migrate a complete user-facing command (e.g. `apm install`, `apm drift check`, `apm uninstall`) end-to-end, wiring all of its internal dependencies through to `cmd/apm/main.go`, with a passing integration test invoking the Go binary.
Fix E — repair migration-status.json or stop trusting it
The file currently overcounts by 11x and has 87 entries with a `null` `go_package`. Either:
- Have the autoloop regenerate it from source each iteration (walk `internal/` and `src/apm_cli/`, compute line counts, no agent self-reporting), or
- Stop using it as the evaluation metric (see Fix A).
Why this matters
#3 has run for 298 comments and ~140 iterations. The Go module now exceeds the Python module in line count (91k vs 87k) but provides zero user-visible functionality. Without these directive changes, iteration 200 will look exactly like iteration 140: more thin Go packages, more "extra tests," same stub binary, same 1084% in the JSON.
This is a great test case for autoloop itself — the diagnosis here is the directive, not the agent. Fixing #3's body should fix the loop.
Related
Summary
After ~140 iterations of the autoloop program defined in #3, the Go CLI is still a 17-line stub binary that only handles `version`, while `migration-status.json` claims 1084.2% of Python is migrated. This isn't an agent bug — the directive in #3 is literally optimising for the wrong thing, and the agent is correctly maximising it.
This issue proposes concrete edits to #3 so iterations produce working software, not catalogued packages.
Diagnosis: why the loop produces no real work
1. The evaluation metric is self-reported and unbounded
```python
from issue #3
migrated = data.get('migrated_python_lines', 0)
pct = round((migrated / total) * 100, 2)
Higher is better.
```
`migrated_python_lines` is a field the agent itself writes into `migration-status.json`. The file is now 2,321 entries for only 660 unique Go packages — many packages logged 6–12 times across iterations (e.g. `internal/cache/gitcache` × 10, `internal/adapters/packagemanager` × 12), each time crediting the Python line count again. Sum: 883,171 lines "migrated" against a 87,626-line source. Hence 1084%.
The agent is doing exactly what the directive asks: maximise a number. The number is the wrong number.
2. Step 5 ("Validate") is unenforced
The directive says:
But there is no check for this. `cmd/apm/main.go` has been a 17-line stub since the start of the migration. ~140 iterations later it is still:
```go
func main() {
if len(os.Args) > 1 && os.Args[1] == "version" {
fmt.Println(version.GetVersion())
return
}
fmt.Fprintln(os.Stderr, "apm-go: stub binary (migration in progress)")
os.Exit(1)
}
```
No iteration has ever wired a migrated package into the CLI. The directive's success criterion is a sentence in prose; the metric is a number in a JSON. The agent optimises the metric.
3. "Add extra test files" is a free LOC inflator
Recent iterations (`Iteration 131..137`) are all variants of "extra test suites for N thin Go packages" — packages that are already migrated and already tested. These iterations add no functionality but bump test counts and trigger `migration-status.json` updates. They look like progress; they aren't.
4. No vertical-slice requirement
The directive says start with leaf modules (`utils/`, `constants.py`) and "work inward." There is no requirement to ever finish a slice end-to-end — e.g. "the `apm install` subcommand works end-to-end in Go." So the agent stays at the leaves indefinitely, where the easy wins are.
Proposed fixes to #3
Each of these is a small edit to the program body that closes a specific loophole above:
Fix A — replace the self-reported metric with an observable one
Replace the current metric with something the agent cannot inflate by editing JSON:
```bash
Number of CLI subcommands that work end-to-end in the Go binary,
verified by running them and comparing exit code + stdout against Python.
go build -o /tmp/apm-go ./cmd/apm
python3 scripts/cli_parity_check.py /tmp/apm-go | jq .working_subcommands_pct
```
Where `cli_parity_check.py` enumerates Python subcommands and runs each against both binaries with a fixture, scoring 1.0 only when Go matches Python output. Higher is better, and the agent cannot fake it.
Fix B — make Step 5 a gate, not a guideline
Change "The Go binary must be callable from the existing CLI entry point or replace it" to:
Fix C — forbid "extra tests for already-migrated packages"
Add to the "Do NOT" rules:
Fix D — require a vertical slice cadence
Add to the loop:
Fix E — repair migration-status.json or stop trusting it
The file currently overcounts by 11x and has 87 entries with a `null` `go_package`. Either:
Why this matters
#3 has run for 298 comments and ~140 iterations. The Go module now exceeds the Python module in line count (91k vs 87k) but provides zero user-visible functionality. Without these directive changes, iteration 200 will look exactly like iteration 140: more thin Go packages, more "extra tests," same stub binary, same 1084% in the JSON.
This is a great test case for autoloop itself — the diagnosis here is the directive, not the agent. Fixing #3's body should fix the loop.
Related