v0.1.21 — Phase 4 W0 setup + security hardening (supersedes 0.1.20)
Phase 4 W0 setup ships. Twelve weeks of W0 machinery + four
security passes (HIGH + LOW) + the polished.help/corpus + the
locked perf baseline. No public API change — every CHANGELOG bullet
below is eitherSecurity(hardening),Fixed(CDN-supply-chain +
XSS), orChanged(docs, internal tooling, freeze workflows).
Phase 4 W1–W4 burn-in starts from this commit; the formal0.2.0
SemVer cut follows once the four-week soak completes cleanly.Supersedes the never-published 0.1.20 (tagged at 1f9a7d7) so the
security follow-up from PR #68 (W09.A.005..008) ships in the same
release as the rest of W0.
Security
-
Close macOS direct-path bypass in
_SYSTEM_DIRSdenylist
(attune_rag.eval.bench_prompts). Found during W0.11 triage of
W0.9 finding W09.S.011:/etc//sys/etc. were caught via
symlink-resolution (those paths resolve to/private/...on
macOS), but a user typing--output /private/etc/passwd
directly bypassed the guard because the raw-path arm of the
check didn't have/private/etcin its denylist. Mirrored
each original entry under/private/so the direct form is
blocked too. Did NOT add bare/private,/var, or
/usr/...— those would over-block legitimate
user-writable temp roots (pytest tmp_path lives under
/private/var/folders/...). 3 new test cases added to
tests/unit/test_eval_bench_prompts.py. Threat model is
developer-typo, not a hardened jail. -
W0.9 Source 2 LOW hardening (Phase 4). Three LOW findings surfaced
by a read-only security-review agent onsrc/attune_rag/, addressed
in this PR rather than deferred:- W09.A.005 — Rich-markup injection in
dashboard/show.py.
Snapshot fields (error message, retriever / corpus name, feature
labels, kind names, per-query feature) were interpolated raw into
Rich markup strings. A corpus value containing[blink]X[/blink]
would alter the developer's terminal styling. All untrusted fields
now flow throughrich.markup.escape. Tests in
tests/unit/test_dashboard_show.py. - W09.A.006 — ANSI escape in CLI stderr.
attune-rag dashboard render --outinterpolates the user-supplied path into the
ValueErrorrendered to stderr. A path with raw ANSI bytes would
repaint the terminal. New_safe_stderr(msg)helper strips C0/C1
control characters (preserves\t \n \r) before printing.
Tests intests/unit/test_cli.py. - W09.A.007 —
exc_info=Trueon Anthropic-SDK exceptions.
LLMReranker.rerankandQueryExpander.expandlogged failures
with full traceback (debug-level). Traceback frames can capture
SDK locals that may include secret-adjacent material under future
SDK changes. Now logs exception type + message only.
Tests intests/unit/test_expander_reranker.py.
- W09.A.005 — Rich-markup injection in
-
Close render-path macOS denylist bypass. Same class as W09.S.011
but indashboard/render.py.Path("/etc/foo").resolve()returns
/private/etc/fooon macOS, slipping past the/etcdenylist
entry. Added/private/etc,/private/sys,/private/dev
mirrors. Two new tests intests/unit/test_dashboard_render.py.
Fixed
- Dashboard XSS hardening (Phase 4 W0.9 / W0.11). Three HIGH-severity
findings closed againstdashboard/render.pyand
dashboard/templates/dashboard.html:- The embedded
snapshotJSON now goes through_json_for_script_block(),
which\u-escapes the less-than byte and the U+2028 / U+2029 line
separators so a corpus value containing a literal</script>cannot
terminate the inline<script>block. - The
titleargument is HTML-escaped viahtml.escape(…, quote=True)
before substitution into<title>…</title>, so values containing
</title><script>…cannot break out of the title element. - The Chart.js CDN
<script>tag now carries
integrity="sha384-…"(Subresource Integrity) plus
crossorigin="anonymous"andreferrerpolicy="no-referrer",
closing the CDN-compromise vector.
Tests undertests/unit/test_dashboard_render.pycover all three.
No public API change; freeze-compatible.
- The embedded
Changed
-
README task #6 closeout. Reranker row now ships real numbers
(llm_reranker_rerank.wall: mean 728 ms, threshold 1.07 s, σ ≈ 170 ms),
sourced from the full 4-benchmark lock that landed in #64. The other
three rows refreshed in lockstep: the full lock measured slightly
different timings than the LLM-free lock because the reranker
benchmark exercises corpus paths that warm and re-evaluate adjacent
hot paths, so the CPU means and σs shifted. New numbers:
keyword_retriever_retrieve.cpu3,212 µs / 34,493 µs (σ ≈ 15.6 ms,
cold-cache-noise dominated, as before);directory_corpus_load.cpu
47 µs / 66 µs (essentially unchanged);rag_pipeline_run.cpu
537 µs / 625 µs (up from the LLM-free numbers, reflecting the
full-pipeline measurement). Footnote updated to explain both
noise profiles (cold-cache for keyword, Anthropic-network for
reranker). Removed the "re-lock pending" placeholder; closes
task #6. -
README task #6 partial pass. Filled real locked numbers into 3
of the 4 perf-table rows (keyword_retriever_retrieve,
directory_corpus_load,rag_pipeline_run— all.cpuaxis,
measured at N=30 onubuntu-latest/ CPython 3.11.15). Added a
one-sentence note that thekeyword_retriever_retrievethreshold
is wide because cold-cache σ ≈ 3.5 ms dominates themean+2σ
formula. Added a "Bundled.help/corpus" section calling out the
143 polished templates (13 features × 11 kinds) that landed in
#58 + #61. Reranker row still placeholder — drops once the
full lock runs withinclude_llm=true. -
.help/corpus repolished for release readiness. Cache cleared,
every feature re-run throughattune-author generate --no-rag --all-kinds --fact-check strict. Each of the 13 features now ships
all 11.help/template kinds (concept, task, reference, quickstart,
faq, error, warning, tip, note, comparison, troubleshooting) — up
from 3 kinds per feature in #58. 143 polished templates total (39
pre-existing refreshed; 104 new). The--no-ragflag was needed
because the default RAG-grounded polish was cross-contaminating
attune-rag templates with attune-help vocabulary (function names,
file paths, command names that don't exist in this repo); switching
to no-RAG keeps polish doing its prose-rewriting work without
pulling in foreign references. The--fact-check strictenforces
that every reference (function, class, file path, link) resolves
in the repo; the pass shipped here cleared strict on all 143 files.
Project-doc kinds (docs/how-to, docs/tutorials) were attempted but
systematically failed strict — wrong package paths and dead
cross-doc links — and reverted; the dedicatedattune-author docs
three-stage pipeline is tracked as a follow-up. -
docs/specs/downstream-validation/security-findings.md—
W0.11 partial triage: 10 of 11 stdlib findings confirmed
non-issueafter deeper code reads; the 11th (W09.S.011)
surfaced the macOS denylist gap above and is being closed in
this same PR. Source 2 (attune-ai deep sweep) still pending. -
.help/corpus refreshed and extended. Regenerated stale
pipelineandretrievaltemplates against post-0.1.18 source.
Added four new feature areas to close public-surface coverage
gaps:editor(template-editor primitives),dashboard
(living-docs three-stage pipeline),expander(LLM-driven
query expansion),reranker(LLM-driven re-ranking). Each
feature shipsconcept,reference, andtasktemplates
produced byattune-author regeneratewith the polish pass
applied.features.yamlextended in lockstep. No code change. -
README rewritten with the eval-as-marketing thesis foregrounded.
New "Why attune-rag" section quotes the locked retrieval +
faithfulness thresholds plus the per-hot-path latency baseline
(docs/specs/release-quality-baseline/baseline-1.md,
docs/specs/downstream-validation/perf-baseline.md)
as the primary differentiator. Added a comparison table vs
LangChain / LlamaIndex and a "What attune-rag is not" section
for honest self-disqualification. Status section bumped from
stalev0.1.10tov0.1.19; roadmap section repositioned as
post-freeze 0.2.0+ instead of "next minor release." No public
API change. -
Deep-review MEDIUM/LOW capture marked closed.
security-findings.md
Source 2 now lists W09.A.005..007 asfix-now (closed in this PR)and
documents the read-only-agent path as the workaround for the broken
security_auditMCP. Hard gate (zero severity: high open) still
holds; LOWs are now zero open as well. -
benchmark.ymlthreshold-gate heredoc fix. The mode-decision step used thekey=valueform to writereasonto$GITHUB_OUTPUT, which GitHub Actions rejected withInvalid formatwhen the PR's diff touched ≥ 2 faithfulness-affecting paths (multi-line value). Switched to the documented heredoc form (reason<<EOF_REASON \n … \n EOF_REASON). The gate now correctly emitsmode=fullwith multi-line rationale. Surfaced on PR #68 which touched bothreranker.pyandexpander.py.