feat: optional evaluate_batch hook on ga()/nsga2() for whole-population evaluation#20
Conversation
Adversarial review — APPROVE (safe to merge)Verified against the committed code on `step/ctrl-freak-batch-hook`, not the description. 1. Hook correctness — PASS. When `evaluate_batch is not None`, `lifted_evaluate` is replaced by a closure calling `batch_fn(x)` directly (ga.py:172-176, nsga2.py:154-158). It does not wrap `lift`/`lift_parallel` — the per-individual loop and (for GA) `evaluate_array` are genuinely never constructed. Both call sites are covered in each algorithm: initial population (ga.py:183, nsga2.py:178) and offspring (ga.py:229, nsga2.py:217) all route through `lifted_evaluate`. GA's `ndim==1` reshape handles `(n,)`/`(n,1)`; NSGA-II passes the `(n,n_obj)` matrix straight through. Correct. 2. Back-compat — PASS (byte-identical). `evaluate_batch` is appended last with default `None`; the `else` branch reproduces the original assignment verbatim (`lift_parallel(..., n_workers) if n_workers != 1 else lift(...)`). Pre-existing test files are untouched (`git diff origin/main...HEAD -- tests/test_ga.py tests/test_nsga2.py tests/test_results.py tests/conftest.py operators/base.py` is empty). Full suite: 531 passed, 98.91% coverage. 3. Equivalence test non-vacuous — PASS. `_matches_per_individual` compares two independent runs on the same seed (per-individual reference vs batch) via `assert_array_equal` on x/objectives/fitness(or rank+crowding). The wiring is proven by the separate `_receives_full_matrix_and_bypasses_loop` tests: `forbidden_evaluate` raises `AssertionError` if the per-individual path is ever entered, and recorded shapes assert every call sees `(pop_size, n_params)`. Together these are non-tautological. All 6 tests execute (not skipped). 4. Conventions — PASS. `ruff format --check` clean (59 files), `ruff check` clean, `uv run ty check src/` (the CI-enforced gate, ci.yml:45) = "All checks passed!", doctests `--doctest-modules src/ctrl_freak` = 38 passed (incl. the two new evaluate_batch examples). Version 0.2.0→0.2.1 via single source (pyproject + uv.lock only); no tag at HEAD; no publish. 5. Scope — PASS. Exactly 5 files: ga.py, nsga2.py, pyproject.toml, uv.lock, tests/test_evaluate_batch.py. `.gitignore` not staged. CI green on py3.11/3.12/3.13 + numpy-floor. No blockers found. |
Adds an optional
evaluate_batch: Callable[[ndarray],ndarray] | None = Nonetoga()andnsga2(). When supplied it receives the whole(n, n_params)population matrix and returns objectives directly, bypassing the per-individuallift/lift_parallelloop — enabling a single vmapped/jit'd evaluation (e.g. a JAX model). WhenNone, behavior is byte-identical to today (back-compat).Used downstream by hydrologeez for batched GR6J calibration.
Gate: ruff format --check ✓ · ruff check ✓ · ty check ✓ · pytest 531 passed @ 98.91% cov ✓ · doctests ✓ · new equivalence tests (batch == per-individual) ✓. Version 0.2.0→0.2.1.