Releases: harnessworks/harness-starter-kit
v0.1.11 - Benchmark tasks, Rust profile, and localization coverage
Patch release for deterministic benchmark tasks, stack-profile fixture coverage, and README localization drift hardening. This release adds repository-owned benchmark task specs for the external runner while preserving the rule that project-specific oracles live in this repository, not in the runner.
Added
- Rust profile guidance, fixture coverage, smoke-test wiring, installer coverage, and README/profile documentation so Rust crate and Cargo workspace targets have a conservative local verification path.
- Buildable
gobasicpackage in thego-basicfixture plus a Go toolchain smoke test that runs the installed Go profilecheck_harness.py(go build,go vet,go test) whengois available and skips otherwise, closing the verification gap from issue #41. - Eight deterministic benchmark task definitions under
benchmarks/tasks/, covering small bugfixes, docs-only boundaries, forbidden-file guards, failure memory, decision memory, profile scope, installer safety, and command workflow guidance. - Benchmark documentation and task outcome evidence for runner smoke checks, Codex dry-run oracle fixes, and benchmark ownership boundaries.
- Traditional Chinese README localization and README language-switcher drift coverage.
Changed
- Harden benchmark task tests so each task has a deterministic oracle, narrow expected files, explicit forbidden files, and required expected-file edits where the runner treats
expected_filesas an allowlist. - Normalize Markdown-oriented benchmark oracles for docs-only and refresh workflow tasks so ordinary line wrapping does not create false negatives.
- Refresh README badges, localized README image references, repository-transfer URLs, static-site metadata, profile lists, and validation docs.
- Update
/harness updateand adoption prompt guidance around source tracking and current repository URLs.
Validation
python3 -m unittest discover -s tests(194 tests,2 skipped)python3 -m py_compile scripts/apply_harness.py scripts/check_docs_drift.py scripts/check_structure.py scripts/check_encoding_hygiene.py scripts/check_effectiveness_plan.py scripts/check_failure_memory.py scripts/check_decision_memory.py scripts/harness_doctor.pypython3 scripts/check_docs_drift.pypython3 scripts/check_structure.pypython3 scripts/check_encoding_hygiene.pypython3 scripts/check_effectiveness_plan.pypython3 scripts/check_failure_memory.pypython3 scripts/check_decision_memory.pypython3 scripts/harness_doctor.py --target .(100/100)git diff --check
v0.1.10 - Harness Doctor v2 coupling diagnostics
Patch release for Harness Doctor v2. This release turns Doctor from a flat baseline scan into a six-element repository health and coupling diagnostic while preserving the boundary between harness readiness and agent effectiveness.
Added
- Harness Doctor v2 scoring across Instructions, Constraints, Feedback, Memory, Evaluation, and Governance.
- First-class coupling findings for orphan constraints, orphan feedback, unoperationalized memory, unevaluated memory, ungoverned change types, and promotion gaps.
- Optional Doctor gates for minimum score and critical coupling findings, disabled by default.
- Decision memory for the Doctor v2 model and task outcome evidence for the implementation and review loop.
Changed
- Update
/harness doctor, the scoring rubric, example reports, component map, theory docs, and roadmap guidance to describe the six-element diagnostic. - Keep Proven/effectiveness signals unmeasured in Doctor output unless durable outcome evidence supports a claim.
- Tighten feedback-binding heuristics so generic documentation mentions do not count as execution wiring, and unbound check scripts do not inflate Feedback health.
- Expand Doctor regression tests for coupling findings, optional gates, non-comparable outcome evidence, illustrative effectiveness reports, and feedback-binding edge cases.
v0.1.9 - Operational evidence and command-reference validation
Patch release for operational evidence tracking, Go profile coverage, and command-reference validation. This release strengthens the kit's ability to collect trustworthy agent-work evidence without turning the starter kit into a heavier automation framework.
Added
- Go profile guidance, fixture coverage, smoke-test wiring, and README/profile documentation so Go targets have a conservative local verification path.
- Task outcome evidence decision guidance for substantial harness work, including required evidence fields for included task outcome records.
- Dogfood and effectiveness evidence reports for Today Bus, Harness ERP, and small evidence-pass scenarios, plus task outcome examples for harness adoption and maintenance work.
- Failure records and decision memory for dogfood evidence consistency and first-pass task outcome evidence gaps.
- Google site verification, sitemap, robots, and static-site metadata updates.
Changed
- Extend
scripts/check_effectiveness_plan.py, plus the generic template copy, to validate task outcome evidence fields and reject contradictory or stale effectiveness-report completion language. - Extend
scripts/check_failure_memory.pyandscripts/check_effectiveness_plan.py, plus generic template copies, to validate rootmaketargets and rootjustrecipes referenced by failure-memory records, adoption reports, and task outcome verification commands. - Strengthen dogfood evidence validation, effectiveness templates, adoption evidence checklists, roadmap guidance, and evaluation docs around operational evidence loops.
- Refresh README, localized README files, contributor visuals, profile lists, validation docs, and lifecycle pilot notes to match the current evidence and profile coverage.
- Revert the Crowdin localization sync path while preserving localized README consistency.
Validation
python3 -m unittest discover -s tests(146 tests,1 skipped)python3 -m py_compile scripts/apply_harness.py scripts/check_docs_drift.py scripts/check_structure.py scripts/check_encoding_hygiene.py scripts/check_effectiveness_plan.py scripts/check_failure_memory.py scripts/check_decision_memory.py scripts/harness_doctor.pypython3 scripts/check_docs_drift.pypython3 scripts/check_structure.pypython3 scripts/check_encoding_hygiene.pypython3 scripts/check_effectiveness_plan.pypython3 scripts/check_failure_memory.pypython3 scripts/check_decision_memory.pypython3 scripts/harness_doctor.py --target .(98/100)git diff --check
v0.1.8
Summary
- Hardened failure-memory verification so failure records must point to concrete detection or prevention evidence instead of non-committal future checks.
- Added
scripts/check_failure_memory.pyand kept the generic template copy aligned. - Extended adoption and effectiveness report checks to validate failure-memory linkage, concrete path references, and failure-memory fields.
- Added root
package.jsonscript existence validation fornpm,pnpm,yarn, andbunrun <script>references. - Documented the Today Bus Next.js dogfood target alongside the Django dogfood target.
Validation
python3 -m unittest discover -s tests-> 115 OK, 1 skippedpython3 -m py_compile scripts/apply_harness.py scripts/check_docs_drift.py scripts/check_structure.py scripts/check_encoding_hygiene.py scripts/check_effectiveness_plan.py scripts/check_failure_memory.py scripts/check_decision_memory.py scripts/harness_doctor.pypython3 scripts/check_docs_drift.pypython3 scripts/check_structure.pypython3 scripts/check_encoding_hygiene.pypython3 scripts/check_effectiveness_plan.pypython3 scripts/check_failure_memory.pypython3 scripts/check_decision_memory.pypython3 scripts/harness_doctor.py --target .-> 100/100git diff --checkgit diff --cached --check
v0.1.7
Summary
- Added deterministic behavior check gate-placement guidance across harness review, refresh, adoption, generic AGENTS, and verification checklist workflows.
- Added ADR and failure memory for deterministic product-behavior checks that remain focused/manual without gate-placement review.
- Added adoption report gate-placement fields and examples for normal, focused, and manual verification paths.
- Extended
check_effectiveness_plan.pyand the generic template copy to validate adoption report gate-placement fields, including exact heading matching and wrapped/nested field values.
Validation
python3 -m unittest discover -s tests-> 77 OK, 1 skippedpython3 -m py_compile scripts/apply_harness.py scripts/check_docs_drift.py scripts/check_structure.py scripts/check_encoding_hygiene.py scripts/check_effectiveness_plan.py scripts/check_decision_memory.py scripts/harness_doctor.py templates/generic/scripts/check_effectiveness_plan.pypython3 scripts/check_docs_drift.pypython3 scripts/check_structure.pypython3 scripts/check_encoding_hygiene.pypython3 scripts/check_effectiveness_plan.pypython3 scripts/check_decision_memory.pypython3 scripts/harness_doctor.py --target .-> 100/100git diff --checkgit diff --cached --check
v0.1.5
Patch release for the decision-memory follow-up to v0.1.4. This release moves the decision-docs gate from review-only guidance into the generic target template that future adoptions copy.
Added
- Decision-memory guidance in the generic
AGENTS.mdtemplate for non-trivial product or workflow structure, integration or mock external-behavior boundaries, major data models, state classifications, or UX principles that become code structure. - A completion criterion requiring agents to report whether decision docs were added, an existing ADR covers the choice, or no decision record was needed.
- A
Decision-docs gatefield in the harness review report template so the specific/harness reviewdiagnostic does not get lost when reviewers use the template. - Regression coverage that keeps the generic completion gate and review report gate wired in.
Validation
python3 -m unittest discover -s testspython3 -m py_compile scripts/apply_harness.py scripts/check_docs_drift.py scripts/check_structure.py scripts/check_encoding_hygiene.py scripts/check_effectiveness_plan.py scripts/harness_doctor.pypython3 scripts/check_docs_drift.pypython3 scripts/check_structure.pypython3 scripts/check_encoding_hygiene.pypython3 scripts/check_effectiveness_plan.pypython3 scripts/harness_doctor.py --target .
v0.1.4
Patch release for governance documentation and review diagnostics. This release keeps /harness review diagnostic-only while tightening the durable-memory review path and clarifying command usage.
Added
- Failure memory for missed ADR review when structural product decisions are implemented without decision-record consideration.
- A
/harness reviewdiagnostic warning for product or workflow structure, mock external-behavior boundaries, major data models, state classifications, or product UX principles that become code structure without adocs/decisions/update or explicit justification. - iOS as a roadmap candidate profile paired with the existing Android profile, with Xcode, simulator, signing, and device checks documented as macOS/manual unless a target repository already has macOS CI.
Changed
- Clarify that
/harness ...names are prompt conventions by default, not built-in editor commands. - Refine
/harness review sub-agentownership guidance so reviewer mode and fallback reason stay parent/orchestrator-owned. - Update localized README command guidance to match the English prompt convention wording.
Validation
python3 -m unittest discover -s testspython3 -m py_compile scripts/apply_harness.py scripts/check_docs_drift.py scripts/check_structure.py scripts/check_encoding_hygiene.py scripts/check_effectiveness_plan.py scripts/harness_doctor.pypython3 scripts/check_docs_drift.pypython3 scripts/check_structure.pypython3 scripts/check_encoding_hygiene.pypython3 scripts/check_effectiveness_plan.pypython3 scripts/harness_doctor.py --target .
v0.1.3 - Harness review sub-agent routing fix
Patch release for /harness review reviewer-mode routing. This keeps the command diagnostic-only while making subagent availability and fallback reporting harder to skip silently.
Added
- Explicit
/harness review sub-agentinvocation mode for read-only reviewer subagent use when the active runtime and tool instructions allow it. - Review report
Invocation,Reviewer mode, andFallback reasonfields in the template and example report. - Regression coverage for subagent fallback guidance, prompt drift, localized README wiring, and route precedence.
Changed
- Clarified
/harness reviewfallback behavior when a subagent tool is present but not permitted by active tool instructions. - Routed
/harness review sub-agentbefore the generic/harness reviewcommand in agent-facing prompts and command routing.
Validation
python3 -m unittest discover -s testspython3 -m py_compile scripts/apply_harness.py scripts/check_docs_drift.py scripts/check_structure.py scripts/check_encoding_hygiene.py scripts/check_effectiveness_plan.py scripts/harness_doctor.pypython3 scripts/check_docs_drift.pypython3 scripts/check_structure.pypython3 scripts/check_encoding_hygiene.pypython3 scripts/check_effectiveness_plan.pypython3 scripts/harness_doctor.py --target .
v0.1.2
Governance release for change-set review. This release adds a diagnostic
/harness review workflow so maintainers can challenge current changes before
completion without adding runtime hooks, policy enforcement, CI adapters, or
more automatic installer behavior.
Added
/harness reviewcommand workflow for opposing harness-engineering review of
the current change set.- Harness review report template and example report.
- Quick Start, full adoption prompt, localized README, static site, and
component-map wiring for/harness review. - Regression tests that keep
/harness reviewcommand routing, localized docs,
report-template sections, and prompt drift covered.
Changed
- Clarify that the existing harness review checklist is a periodic maintenance
checklist, distinct from the/harness reviewchange-set command. - Update the roadmap from adding
/harness reviewto refining it through real
target-repository use.
v0.1.1
Stabilization release for the initial harness workflow. This release strengthens
the theory, evaluation, failure-memory, and contributor guidance added around
the v0.1.0 early release.
Added
- Harness engineering theory document that separates repository harness health
from observed agent effectiveness. - Task outcome record template for comparable agent-work observations.
- Roadmap and expanded contributor guidance for profiles, drift checks,
adoption examples, and release validation. - Regression coverage that keeps the static site copy prompt aligned with the
README adoption prompt.
Changed
- Compact root and generic
AGENTS.mdguidance while preserving command
routing, analysis, validation, and commit rules. - Clarify
python3validation commands for macOS/Linux environments where
pythonis unavailable. - Clarify Harness Doctor score scope and non-scored evaluation/governance
signals. - Strengthen adoption and update guidance around failure-memory records for
user-visible runtime failures, high-risk bug paths, failed checks, repeated
agent mistakes, and cross-environment mismatches.