Skip to content

v5.3.0

Choose a tag to compare

@michael-denyer michael-denyer released this 29 Apr 19:12
· 47 commits to master since this release

Added

  • Weekly ASAN/UBSAN sanitizer workflow (.github/workflows/sanitizer.yml):
    rebuilds C extensions with -fsanitize=address,undefined and runs the test
    suite under both sanitizers every Sunday. Uses a sentinel meta-test
    (JAMMA_SENTINEL_UB=1 injects -DJAMMA_SENTINEL_UB, triggering a known
    heap-OOB) to verify the sanitizer harness is actually catching bugs and not
    silently passing. CI workflow runs with set -o pipefail, ASAN traces are
    written to a file rather than piped, and the sentinel asserter accepts both
    ASan heap-OOB and UBSan out-of-bounds signatures.
  • JAMMA_SANITIZE build seam in _build_support/compile_and_link.py:
    appends -fsanitize=... flags and disables the post-link import probe so
    sanitized builds don't crash the wheel-build subprocess. Wired into all
    three compile entry points (hatch_build.py, _compile_jlinalg.py,
    _compile_accel.py). check-compile-flag-literals.py extended to recognise
    sanitizer flag literals so they are not flagged by the lint hook.
  • JAMMA_FORCE_NUMPY_FALLBACK env-var gate for jlinalg and lmm:
    forces the NumPy fallback path even when vendor BLAS is available. Used by
    the sanitizer workflow to exercise the pure-Python paths and by debugging
    workflows where vendor-LAPACK output needs to be cross-checked against
    NumPy reference. Documented in docs/TESTING.md §1.10.
  • Sanitizer suppression file for ASan and the heap-OOB sentinel
    (-DJAMMA_SENTINEL_UB). Documented in docs/TESTING.md §1.10 "Running
    under sanitizers (local repro of CI)".
  • New pre-commit hooks (commit 2745846): actionlint (GitHub Actions
    workflow lint), zizmor (workflow security audit), shellcheck (shell-script
    lint), vulture (dead-code detection), refurb (Python refactor suggestions),
    and pytest-rerunfailures (test re-run on transient failure). Three
    categories of pre-existing findings are deferred — see project notes for
    triage status and tightening conditions for each hook. (.github/workflows/sanitizer.yml):
    rebuilds C extensions with -fsanitize=address,undefined and runs the test
    suite under both sanitizers every Sunday. Uses a sentinel meta-test
    (JAMMA_SENTINEL_UB=1 injects -DJAMMA_SENTINEL_UB, triggering a known
    heap-OOB) to verify the sanitizer harness is actually catching bugs and not
    silently passing. CI workflow runs with set -o pipefail, ASAN traces are
    written to a file rather than piped, and the sentinel asserter accepts both
    ASan heap-OOB and UBSan out-of-bounds signatures.
  • JAMMA_SANITIZE build seam in _build_support/compile_and_link.py:
    appends -fsanitize=... flags and disables the post-link import probe so
    sanitized builds don't crash the wheel-build subprocess. Wired into all
    three compile entry points (hatch_build.py, _compile_jlinalg.py,
    _compile_accel.py). check-compile-flag-literals.py extended to recognise
    sanitizer flag literals so they are not flagged by the lint hook.
  • JAMMA_FORCE_NUMPY_FALLBACK env-var gate for jlinalg and lmm:
    forces the NumPy fallback path even when vendor BLAS is available. Used by
    the sanitizer workflow to exercise the pure-Python paths and by debugging
    workflows where vendor-LAPACK output needs to be cross-checked against
    NumPy reference. Documented in docs/TESTING.md §1.10.
  • Sanitizer suppression file for ASan and the heap-OOB sentinel
    (-DJAMMA_SENTINEL_UB). Documented in docs/TESTING.md §1.10 "Running
    under sanitizers (local repro of CI)".
  • New pre-commit hooks (commit 2745846): actionlint (GitHub Actions
    workflow lint), zizmor (workflow security audit), shellcheck (shell-script
    lint), vulture (dead-code detection), refurb (Python refactor suggestions),
    and pytest-rerunfailures (test re-run on transient failure). Three
    categories of pre-existing findings are deferred — see project notes for
    triage status and tightening conditions for each hook.
  • Tier-marker enforcement gate: tests/conftest.py now AST-parses every
    collected test file in pytest_configure and fails the run when any file
    lacks a tier (tier0/tier1/tier2), slow, or benchmark marker. The
    gate runs once on the controller before xdist forks workers, fixing the
    silent fail-open under -n N that the previous collection-based gate had.
    Recognises parametrised markers (@pytest.mark.skipif(...)) and list-form
    pytestmark. Regression test test_gate_fires_under_xdist asserts the
    gate fires under -n 2.
  • Forbidden-patches gate: new scripts/check-forbidden-patches.py +
    pre-commit hook bans patching numpy.linalg.*, scipy.*, and JAMMA's own
    numerical functions in tests. Feature-flag constants (_C_*_AVAILABLE)
    are excluded; # allow-patch: escape hatch documented. Now uses AST
    scanning rather than regex, covers patch.object(<module>, ...),
    mocker.patch(...), and monkeypatch.setattr("dotted.path"...). Module-
    arg monkeypatch.setattr(<module>, "<func>") is also caught (closes a
    hole where two test files set callables on numerical modules and slipped
    past the previous gate). Read failures raise _ScanError and exit
    non-zero rather than passing vacuously on docs-only batches.
  • AST + runtime safety gates: replaced regex source-greps in
    TestLOCOIteratorRuntimeError and TestJlinalgABIValidation with
    ast.parse structural checks plus runtime tests that exercise the
    guards (python -O subprocess for loco_iter; in-subprocess monkey-
    patched _EXPECTED_JLINALG_ABI for ABI drift, asserting on exit code
    and stderr).
  • Fakes package: tests/fakes/ provides FakePipelineRunner,
    FakePipelineRunnerFactory, FakeAssocWriter, FakeProgressbarModule,
    and FakeProgressBar. Type-narrowed to real PipelineConfig /
    PipelineResult so adding a required field actually breaks tests.
    TestFakeProductionDrift compares inspect.signature of each fake
    method to the real production method and fails with a specific drift
    message instead of silently masking new args. Adopted by test_progress.py
    (10 nested patch(...) + MagicMock blocks → one fake_progressbar
    fixture) and test_cli.py (4 MagicMock chains → one factory).
  • GEMMA fixture manifest: tests/fixtures/MANIFEST.toml (55 entries)
    with SHA-256 of every git-tracked fixture. scripts/check_fixture_manifest.py
    verifies on-disk hashes match, flags untracked additions, and flags
    manifest-without-disk entries. scripts/regenerate_fixture_manifest.py
    rebuilds the manifest after intentional updates and auto-extracts
    GEMMA Version and Command Line Input from .log.txt headers.
    Pre-commit hook (fast) + tier0 self-test tests/test_fixture_manifest.py
    (slow) gate it.
  • Scheduled flaky-test detection: .github/workflows/flaky-detect.yml
    runs the default suite under five distinct pytest-randomly seeds every
    Sunday 06:00 UTC. Non-blocking; opens an issue on disagreement.
  • Subsystem coverage gates: per-subsystem coverage floors enforced in
    CI (src/jamma/jlinalg/ floor at 18% to accommodate the Linux-vs-macOS
    vendor-LAPACK fallback delta — Linux measured 21.8% without MKL-ILP64,
    macOS-Accelerate measured 33.6%; both reference numbers documented in
    the threshold comment).

Changed

  • Tier marker hygiene: 8 previously-unmarked test files now have
    module-level pytestmark. test_jlinalg_dispatch.py converts
    pytestmark = skipif(...) to a list combining tier0 + the existing
    skipif. test_runner_numpy.py: :443/:518 GEMMA-parity tests
    promoted to tier1; :396 internal dispatch test reclassified tier1 →
    tier0.
  • Tier3 marker removed from pyproject.toml, both CI workflows,
    conftest.py, and both docs — defined and excluded everywhere but
    never used.
  • Scratch-bin renames (git mv preserves history):
    test_audit_fixes.pytest_lmm_audit.py,
    test_review_fixes.pytest_lmm_io_validation.py,
    test_loco_bugs.pytest_loco_orchestration.py,
    test_lmm_likelihood_dev2.pytest_likelihood_derivatives.py.
  • Fakes drop call-count integers: FakeAssocWriter.call_count,
    FakePipelineRunner.run_calls, FakePipelineRunnerFactory.call_count,
    FakeProgressBar.start_calls/finish_calls replaced with state
    booleans and lifecycle-violation AssertionErrors. update_calls: list[int] retained because it records observable values, not counts.
  • FakeProgressbarModule.widgets simplified from nested class to
    SimpleNamespace(WidgetBase=_FakeWidget).
  • test_jlinalg_lapack.py: folded test_reconstruction_accuracy_large
    and test_orthogonality_large into one
    test_large_5000x200_reconstruction_and_orthogonality (both checked
    the same 5000×200 QR — running it twice wasted CI minutes). Loosened
    orthogonality bound for the large case from 1e-14 to 1e-13 (theoretical
    floor for sqrt(5000) accumulation is ~1.6e-14).
  • blas_backend known-backends set extended with system-BLAS-ILP64
    and system-BLAS-LP64 (returned by blas_dispatch.c:132 when a vendor
    library is loaded but path-string detection cannot identify it — typical
    on Linux distros linking against alias-only libblas.so).
  • test_blas_backend_string_has_known_value asserts membership in a
    documented set (incl. Accelerate-ILP64) instead of printing.

Fixed

  • Tier-marker gate failed open under xdist: collection-based gate
    silently no-op'd whenever -n N was active (default -n 3). Empirically
    reproduced — an unmarked file ran cleanly under -n 2. Switched the
    gate to source-parsing in pytest_configure (runs once on the controller
    before xdist forks workers).
  • monkeypatch.setattr(<module>, "<func>") previously bypassed the
    forbidden-patches policy. test_lmm_accel.py:207 set
    _compute_lmm_batch_c to a sentinel and test_prepare_common.py:282
    set _compute_score_batch_c to None — both exited 0 under the old
    gate. Added a module-form rule keyed off the documented forbidden-module
    aliases (compute_numpy, cn, likelihood, jlinalg, jl,
    kinship_compute, kc), still allowing _AVAILABLE/_ENABLED flags.
    Audited the existing call sites and added # allow-patch: comments to
    the 5 legitimate dispatch toggles.
  • scripts/check-forbidden-patches.py no longer swallows OSError /
    UnicodeDecodeError. Read failures now exit non-zero rather than silently
    producing zero findings (the silent-failure mode the gate is meant to
    prevent). Detects "argv passed but no .py among them" and falls back
    to a repo-wide scan with a stderr note instead of passing vacuously when
    pre-commit hands the hook a docs-only batch.
  • tests/conftest.py: replaced silent except ImportError: return in
    pytest_configure with a stderr warning so a broken freshness script
    is visible.
  • TestEigendecompLP64Threshold: replaced
    contextlib.suppress(...) with pytest.raises(RuntimeError, match="test stub"). The previous form could not distinguish "RuntimeError propagated
    to caller" from "caller silently caught and returned a default" — both
    passed the warning-routing assertion.
  • .github/workflows/ci.yml: dropped not tier3 from the default
    pytest filter (the marker was removed from pyproject / conftest /
    docs in 6d9ab15 but this one workflow line was missed).
  • git mv rename deletes: the renames in 6d9ab15 staged the new
    files but the matching D entries for the old files were never added
    to the index, so the new files shipped alongside the old ones. Staged
    the deletes for test_audit_fixes.py, test_review_fixes.py,
    test_loco_bugs.py, and test_lmm_likelihood_dev2.py.
  • tests/test_conftest_tier_gate.py: previously embedded a parallel
    stub of the old collection-based gate; after the xdist fail-open fix it
    was no longer testing the implementation it claimed to. Rewired the
    stub conftest to importlib-load the real _enforce_tier_markers from
    tests/conftest.py.
  • Removed dead scripts/pre-push: standalone bash hook duplicated
    the .pre-commit-config.yaml's ruff-format-all pre-push entry and
    was never wired into any git hook (.git/hooks/pre-push is prek-managed).

Removed

  • tier3 pytest marker (defined but never used).
  • scripts/pre-push (dead code; functionality lives in pre-commit).
  • docs/TESTING.md §3.3 "Tests / markers to remove" (all rows were
    already done); subsequent sections renumbered.
  • Stale 35-line "Test Tier System" block from conftest.py (claimed
    three tiers, listed nonexistent example tests, duplicated TESTING.md
    §1.5); replaced with a pointer to the source-of-truth doc.
  • Three near-identical "@pytest.mark.slow on individual tests still
    applies" comments (restated standard pytest semantics).
  • Transitional FakeAssocWriter re-export comment in
    test_runner_numpy.py.