Skip to content

v0.1.11 - Benchmark tasks, Rust profile, and localization coverage

Choose a tag to compare

@baskduf baskduf released this 11 Jun 04:26
· 10 commits to main since this release

Patch release for deterministic benchmark tasks, stack-profile fixture coverage, and README localization drift hardening. This release adds repository-owned benchmark task specs for the external runner while preserving the rule that project-specific oracles live in this repository, not in the runner.

Added

  • Rust profile guidance, fixture coverage, smoke-test wiring, installer coverage, and README/profile documentation so Rust crate and Cargo workspace targets have a conservative local verification path.
  • Buildable gobasic package in the go-basic fixture plus a Go toolchain smoke test that runs the installed Go profile check_harness.py (go build, go vet, go test) when go is available and skips otherwise, closing the verification gap from issue #41.
  • Eight deterministic benchmark task definitions under benchmarks/tasks/, covering small bugfixes, docs-only boundaries, forbidden-file guards, failure memory, decision memory, profile scope, installer safety, and command workflow guidance.
  • Benchmark documentation and task outcome evidence for runner smoke checks, Codex dry-run oracle fixes, and benchmark ownership boundaries.
  • Traditional Chinese README localization and README language-switcher drift coverage.

Changed

  • Harden benchmark task tests so each task has a deterministic oracle, narrow expected files, explicit forbidden files, and required expected-file edits where the runner treats expected_files as an allowlist.
  • Normalize Markdown-oriented benchmark oracles for docs-only and refresh workflow tasks so ordinary line wrapping does not create false negatives.
  • Refresh README badges, localized README image references, repository-transfer URLs, static-site metadata, profile lists, and validation docs.
  • Update /harness update and adoption prompt guidance around source tracking and current repository URLs.

Validation

  • python3 -m unittest discover -s tests (194 tests, 2 skipped)
  • python3 -m py_compile scripts/apply_harness.py scripts/check_docs_drift.py scripts/check_structure.py scripts/check_encoding_hygiene.py scripts/check_effectiveness_plan.py scripts/check_failure_memory.py scripts/check_decision_memory.py scripts/harness_doctor.py
  • python3 scripts/check_docs_drift.py
  • python3 scripts/check_structure.py
  • python3 scripts/check_encoding_hygiene.py
  • python3 scripts/check_effectiveness_plan.py
  • python3 scripts/check_failure_memory.py
  • python3 scripts/check_decision_memory.py
  • python3 scripts/harness_doctor.py --target . (100/100)
  • git diff --check