Skip to content

Releases: harnessworks/harness-starter-kit

v0.1.11 - Benchmark tasks, Rust profile, and localization coverage

11 Jun 04:26

Choose a tag to compare

Patch release for deterministic benchmark tasks, stack-profile fixture coverage, and README localization drift hardening. This release adds repository-owned benchmark task specs for the external runner while preserving the rule that project-specific oracles live in this repository, not in the runner.

Added

  • Rust profile guidance, fixture coverage, smoke-test wiring, installer coverage, and README/profile documentation so Rust crate and Cargo workspace targets have a conservative local verification path.
  • Buildable gobasic package in the go-basic fixture plus a Go toolchain smoke test that runs the installed Go profile check_harness.py (go build, go vet, go test) when go is available and skips otherwise, closing the verification gap from issue #41.
  • Eight deterministic benchmark task definitions under benchmarks/tasks/, covering small bugfixes, docs-only boundaries, forbidden-file guards, failure memory, decision memory, profile scope, installer safety, and command workflow guidance.
  • Benchmark documentation and task outcome evidence for runner smoke checks, Codex dry-run oracle fixes, and benchmark ownership boundaries.
  • Traditional Chinese README localization and README language-switcher drift coverage.

Changed

  • Harden benchmark task tests so each task has a deterministic oracle, narrow expected files, explicit forbidden files, and required expected-file edits where the runner treats expected_files as an allowlist.
  • Normalize Markdown-oriented benchmark oracles for docs-only and refresh workflow tasks so ordinary line wrapping does not create false negatives.
  • Refresh README badges, localized README image references, repository-transfer URLs, static-site metadata, profile lists, and validation docs.
  • Update /harness update and adoption prompt guidance around source tracking and current repository URLs.

Validation

  • python3 -m unittest discover -s tests (194 tests, 2 skipped)
  • python3 -m py_compile scripts/apply_harness.py scripts/check_docs_drift.py scripts/check_structure.py scripts/check_encoding_hygiene.py scripts/check_effectiveness_plan.py scripts/check_failure_memory.py scripts/check_decision_memory.py scripts/harness_doctor.py
  • python3 scripts/check_docs_drift.py
  • python3 scripts/check_structure.py
  • python3 scripts/check_encoding_hygiene.py
  • python3 scripts/check_effectiveness_plan.py
  • python3 scripts/check_failure_memory.py
  • python3 scripts/check_decision_memory.py
  • python3 scripts/harness_doctor.py --target . (100/100)
  • git diff --check

v0.1.10 - Harness Doctor v2 coupling diagnostics

08 Jun 07:27

Choose a tag to compare

Patch release for Harness Doctor v2. This release turns Doctor from a flat baseline scan into a six-element repository health and coupling diagnostic while preserving the boundary between harness readiness and agent effectiveness.

Added

  • Harness Doctor v2 scoring across Instructions, Constraints, Feedback, Memory, Evaluation, and Governance.
  • First-class coupling findings for orphan constraints, orphan feedback, unoperationalized memory, unevaluated memory, ungoverned change types, and promotion gaps.
  • Optional Doctor gates for minimum score and critical coupling findings, disabled by default.
  • Decision memory for the Doctor v2 model and task outcome evidence for the implementation and review loop.

Changed

  • Update /harness doctor, the scoring rubric, example reports, component map, theory docs, and roadmap guidance to describe the six-element diagnostic.
  • Keep Proven/effectiveness signals unmeasured in Doctor output unless durable outcome evidence supports a claim.
  • Tighten feedback-binding heuristics so generic documentation mentions do not count as execution wiring, and unbound check scripts do not inflate Feedback health.
  • Expand Doctor regression tests for coupling findings, optional gates, non-comparable outcome evidence, illustrative effectiveness reports, and feedback-binding edge cases.

v0.1.9 - Operational evidence and command-reference validation

06 Jun 14:43

Choose a tag to compare

Patch release for operational evidence tracking, Go profile coverage, and command-reference validation. This release strengthens the kit's ability to collect trustworthy agent-work evidence without turning the starter kit into a heavier automation framework.

Added

  • Go profile guidance, fixture coverage, smoke-test wiring, and README/profile documentation so Go targets have a conservative local verification path.
  • Task outcome evidence decision guidance for substantial harness work, including required evidence fields for included task outcome records.
  • Dogfood and effectiveness evidence reports for Today Bus, Harness ERP, and small evidence-pass scenarios, plus task outcome examples for harness adoption and maintenance work.
  • Failure records and decision memory for dogfood evidence consistency and first-pass task outcome evidence gaps.
  • Google site verification, sitemap, robots, and static-site metadata updates.

Changed

  • Extend scripts/check_effectiveness_plan.py, plus the generic template copy, to validate task outcome evidence fields and reject contradictory or stale effectiveness-report completion language.
  • Extend scripts/check_failure_memory.py and scripts/check_effectiveness_plan.py, plus generic template copies, to validate root make targets and root just recipes referenced by failure-memory records, adoption reports, and task outcome verification commands.
  • Strengthen dogfood evidence validation, effectiveness templates, adoption evidence checklists, roadmap guidance, and evaluation docs around operational evidence loops.
  • Refresh README, localized README files, contributor visuals, profile lists, validation docs, and lifecycle pilot notes to match the current evidence and profile coverage.
  • Revert the Crowdin localization sync path while preserving localized README consistency.

Validation

  • python3 -m unittest discover -s tests (146 tests, 1 skipped)
  • python3 -m py_compile scripts/apply_harness.py scripts/check_docs_drift.py scripts/check_structure.py scripts/check_encoding_hygiene.py scripts/check_effectiveness_plan.py scripts/check_failure_memory.py scripts/check_decision_memory.py scripts/harness_doctor.py
  • python3 scripts/check_docs_drift.py
  • python3 scripts/check_structure.py
  • python3 scripts/check_encoding_hygiene.py
  • python3 scripts/check_effectiveness_plan.py
  • python3 scripts/check_failure_memory.py
  • python3 scripts/check_decision_memory.py
  • python3 scripts/harness_doctor.py --target . (98/100)
  • git diff --check

v0.1.8

03 Jun 06:27

Choose a tag to compare

Summary

  • Hardened failure-memory verification so failure records must point to concrete detection or prevention evidence instead of non-committal future checks.
  • Added scripts/check_failure_memory.py and kept the generic template copy aligned.
  • Extended adoption and effectiveness report checks to validate failure-memory linkage, concrete path references, and failure-memory fields.
  • Added root package.json script existence validation for npm, pnpm, yarn, and bun run <script> references.
  • Documented the Today Bus Next.js dogfood target alongside the Django dogfood target.

Validation

  • python3 -m unittest discover -s tests -> 115 OK, 1 skipped
  • python3 -m py_compile scripts/apply_harness.py scripts/check_docs_drift.py scripts/check_structure.py scripts/check_encoding_hygiene.py scripts/check_effectiveness_plan.py scripts/check_failure_memory.py scripts/check_decision_memory.py scripts/harness_doctor.py
  • python3 scripts/check_docs_drift.py
  • python3 scripts/check_structure.py
  • python3 scripts/check_encoding_hygiene.py
  • python3 scripts/check_effectiveness_plan.py
  • python3 scripts/check_failure_memory.py
  • python3 scripts/check_decision_memory.py
  • python3 scripts/harness_doctor.py --target . -> 100/100
  • git diff --check
  • git diff --cached --check

v0.1.7

02 Jun 10:47
94e416b

Choose a tag to compare

Summary

  • Added deterministic behavior check gate-placement guidance across harness review, refresh, adoption, generic AGENTS, and verification checklist workflows.
  • Added ADR and failure memory for deterministic product-behavior checks that remain focused/manual without gate-placement review.
  • Added adoption report gate-placement fields and examples for normal, focused, and manual verification paths.
  • Extended check_effectiveness_plan.py and the generic template copy to validate adoption report gate-placement fields, including exact heading matching and wrapped/nested field values.

Validation

  • python3 -m unittest discover -s tests -> 77 OK, 1 skipped
  • python3 -m py_compile scripts/apply_harness.py scripts/check_docs_drift.py scripts/check_structure.py scripts/check_encoding_hygiene.py scripts/check_effectiveness_plan.py scripts/check_decision_memory.py scripts/harness_doctor.py templates/generic/scripts/check_effectiveness_plan.py
  • python3 scripts/check_docs_drift.py
  • python3 scripts/check_structure.py
  • python3 scripts/check_encoding_hygiene.py
  • python3 scripts/check_effectiveness_plan.py
  • python3 scripts/check_decision_memory.py
  • python3 scripts/harness_doctor.py --target . -> 100/100
  • git diff --check
  • git diff --cached --check

v0.1.5

02 Jun 05:51

Choose a tag to compare

Patch release for the decision-memory follow-up to v0.1.4. This release moves the decision-docs gate from review-only guidance into the generic target template that future adoptions copy.

Added

  • Decision-memory guidance in the generic AGENTS.md template for non-trivial product or workflow structure, integration or mock external-behavior boundaries, major data models, state classifications, or UX principles that become code structure.
  • A completion criterion requiring agents to report whether decision docs were added, an existing ADR covers the choice, or no decision record was needed.
  • A Decision-docs gate field in the harness review report template so the specific /harness review diagnostic does not get lost when reviewers use the template.
  • Regression coverage that keeps the generic completion gate and review report gate wired in.

Validation

  • python3 -m unittest discover -s tests
  • python3 -m py_compile scripts/apply_harness.py scripts/check_docs_drift.py scripts/check_structure.py scripts/check_encoding_hygiene.py scripts/check_effectiveness_plan.py scripts/harness_doctor.py
  • python3 scripts/check_docs_drift.py
  • python3 scripts/check_structure.py
  • python3 scripts/check_encoding_hygiene.py
  • python3 scripts/check_effectiveness_plan.py
  • python3 scripts/harness_doctor.py --target .

v0.1.4

02 Jun 05:29

Choose a tag to compare

Patch release for governance documentation and review diagnostics. This release keeps /harness review diagnostic-only while tightening the durable-memory review path and clarifying command usage.

Added

  • Failure memory for missed ADR review when structural product decisions are implemented without decision-record consideration.
  • A /harness review diagnostic warning for product or workflow structure, mock external-behavior boundaries, major data models, state classifications, or product UX principles that become code structure without a docs/decisions/ update or explicit justification.
  • iOS as a roadmap candidate profile paired with the existing Android profile, with Xcode, simulator, signing, and device checks documented as macOS/manual unless a target repository already has macOS CI.

Changed

  • Clarify that /harness ... names are prompt conventions by default, not built-in editor commands.
  • Refine /harness review sub-agent ownership guidance so reviewer mode and fallback reason stay parent/orchestrator-owned.
  • Update localized README command guidance to match the English prompt convention wording.

Validation

  • python3 -m unittest discover -s tests
  • python3 -m py_compile scripts/apply_harness.py scripts/check_docs_drift.py scripts/check_structure.py scripts/check_encoding_hygiene.py scripts/check_effectiveness_plan.py scripts/harness_doctor.py
  • python3 scripts/check_docs_drift.py
  • python3 scripts/check_structure.py
  • python3 scripts/check_encoding_hygiene.py
  • python3 scripts/check_effectiveness_plan.py
  • python3 scripts/harness_doctor.py --target .

v0.1.3 - Harness review sub-agent routing fix

31 May 07:31

Choose a tag to compare

Patch release for /harness review reviewer-mode routing. This keeps the command diagnostic-only while making subagent availability and fallback reporting harder to skip silently.

Added

  • Explicit /harness review sub-agent invocation mode for read-only reviewer subagent use when the active runtime and tool instructions allow it.
  • Review report Invocation, Reviewer mode, and Fallback reason fields in the template and example report.
  • Regression coverage for subagent fallback guidance, prompt drift, localized README wiring, and route precedence.

Changed

  • Clarified /harness review fallback behavior when a subagent tool is present but not permitted by active tool instructions.
  • Routed /harness review sub-agent before the generic /harness review command in agent-facing prompts and command routing.

Validation

  • python3 -m unittest discover -s tests
  • python3 -m py_compile scripts/apply_harness.py scripts/check_docs_drift.py scripts/check_structure.py scripts/check_encoding_hygiene.py scripts/check_effectiveness_plan.py scripts/harness_doctor.py
  • python3 scripts/check_docs_drift.py
  • python3 scripts/check_structure.py
  • python3 scripts/check_encoding_hygiene.py
  • python3 scripts/check_effectiveness_plan.py
  • python3 scripts/harness_doctor.py --target .

v0.1.2

31 May 06:28

Choose a tag to compare

Governance release for change-set review. This release adds a diagnostic
/harness review workflow so maintainers can challenge current changes before
completion without adding runtime hooks, policy enforcement, CI adapters, or
more automatic installer behavior.

Added

  • /harness review command workflow for opposing harness-engineering review of
    the current change set.
  • Harness review report template and example report.
  • Quick Start, full adoption prompt, localized README, static site, and
    component-map wiring for /harness review.
  • Regression tests that keep /harness review command routing, localized docs,
    report-template sections, and prompt drift covered.

Changed

  • Clarify that the existing harness review checklist is a periodic maintenance
    checklist, distinct from the /harness review change-set command.
  • Update the roadmap from adding /harness review to refining it through real
    target-repository use.

v0.1.1

30 May 12:31

Choose a tag to compare

Stabilization release for the initial harness workflow. This release strengthens
the theory, evaluation, failure-memory, and contributor guidance added around
the v0.1.0 early release.

Added

  • Harness engineering theory document that separates repository harness health
    from observed agent effectiveness.
  • Task outcome record template for comparable agent-work observations.
  • Roadmap and expanded contributor guidance for profiles, drift checks,
    adoption examples, and release validation.
  • Regression coverage that keeps the static site copy prompt aligned with the
    README adoption prompt.

Changed

  • Compact root and generic AGENTS.md guidance while preserving command
    routing, analysis, validation, and commit rules.
  • Clarify python3 validation commands for macOS/Linux environments where
    python is unavailable.
  • Clarify Harness Doctor score scope and non-scored evaluation/governance
    signals.
  • Strengthen adoption and update guidance around failure-memory records for
    user-visible runtime failures, high-risk bug paths, failed checks, repeated
    agent mistakes, and cross-environment mismatches.