Release v0.1.21 — Phase 4 W0 setup + security hardening (supersedes 0.1.20) · Smart-AI-Memory/attune-rag

Phase 4 W0 setup ships. Twelve weeks of W0 machinery + four
security passes (HIGH + LOW) + the polished .help/ corpus + the
locked perf baseline. No public API change — every CHANGELOG bullet
below is either Security (hardening), Fixed (CDN-supply-chain +
XSS), or Changed (docs, internal tooling, freeze workflows).
Phase 4 W1–W4 burn-in starts from this commit; the formal 0.2.0
SemVer cut follows once the four-week soak completes cleanly.

Supersedes the never-published 0.1.20 (tagged at 1f9a7d7) so the
security follow-up from PR #68 (W09.A.005..008) ships in the same
release as the rest of W0.

Security

Close macOS direct-path bypass in _SYSTEM_DIRS denylist
(attune_rag.eval.bench_prompts). Found during W0.11 triage of
W0.9 finding W09.S.011: /etc//sys/etc. were caught via
symlink-resolution (those paths resolve to /private/... on
macOS), but a user typing --output /private/etc/passwd
directly bypassed the guard because the raw-path arm of the
check didn't have /private/etc in its denylist. Mirrored
each original entry under /private/ so the direct form is
blocked too. Did NOT add bare /private, /var, or
/usr/... — those would over-block legitimate
user-writable temp roots (pytest tmp_path lives under
/private/var/folders/...). 3 new test cases added to
tests/unit/test_eval_bench_prompts.py. Threat model is
developer-typo, not a hardened jail.
W0.9 Source 2 LOW hardening (Phase 4). Three LOW findings surfaced
by a read-only security-review agent on src/attune_rag/, addressed
in this PR rather than deferred:
- W09.A.005 — Rich-markup injection in dashboard/show.py.
  Snapshot fields (error message, retriever / corpus name, feature
  labels, kind names, per-query feature) were interpolated raw into
  Rich markup strings. A corpus value containing [blink]X[/blink]
  would alter the developer's terminal styling. All untrusted fields
  now flow through rich.markup.escape. Tests in
  tests/unit/test_dashboard_show.py.
- W09.A.006 — ANSI escape in CLI stderr. attune-rag dashboard render --out interpolates the user-supplied path into the
  ValueError rendered to stderr. A path with raw ANSI bytes would
  repaint the terminal. New _safe_stderr(msg) helper strips C0/C1
  control characters (preserves \t \n \r) before printing.
  Tests in tests/unit/test_cli.py.
- W09.A.007 — exc_info=True on Anthropic-SDK exceptions.
  LLMReranker.rerank and QueryExpander.expand logged failures
  with full traceback (debug-level). Traceback frames can capture
  SDK locals that may include secret-adjacent material under future
  SDK changes. Now logs exception type + message only.
  Tests in tests/unit/test_expander_reranker.py.
Close render-path macOS denylist bypass. Same class as W09.S.011
but in dashboard/render.py. Path("/etc/foo").resolve() returns
/private/etc/foo on macOS, slipping past the /etc denylist
entry. Added /private/etc, /private/sys, /private/dev
mirrors. Two new tests in tests/unit/test_dashboard_render.py.

Fixed

Dashboard XSS hardening (Phase 4 W0.9 / W0.11). Three HIGH-severity
findings closed against dashboard/render.py and
dashboard/templates/dashboard.html:
- The embedded snapshot JSON now goes through _json_for_script_block(),
  which \u-escapes the less-than byte and the U+2028 / U+2029 line
  separators so a corpus value containing a literal </script> cannot
  terminate the inline <script> block.
- The title argument is HTML-escaped via html.escape(…, quote=True)
  before substitution into <title>…</title>, so values containing
  </title><script>… cannot break out of the title element.
- The Chart.js CDN <script> tag now carries
  integrity="sha384-…" (Subresource Integrity) plus
  crossorigin="anonymous" and referrerpolicy="no-referrer",
  closing the CDN-compromise vector.
  Tests under tests/unit/test_dashboard_render.py cover all three.
  No public API change; freeze-compatible.

Changed

README task #6 closeout. Reranker row now ships real numbers
(llm_reranker_rerank.wall: mean 728 ms, threshold 1.07 s, σ ≈ 170 ms),
sourced from the full 4-benchmark lock that landed in #64. The other
three rows refreshed in lockstep: the full lock measured slightly
different timings than the LLM-free lock because the reranker
benchmark exercises corpus paths that warm and re-evaluate adjacent
hot paths, so the CPU means and σs shifted. New numbers:
keyword_retriever_retrieve.cpu 3,212 µs / 34,493 µs (σ ≈ 15.6 ms,
cold-cache-noise dominated, as before); directory_corpus_load.cpu
47 µs / 66 µs (essentially unchanged); rag_pipeline_run.cpu
537 µs / 625 µs (up from the LLM-free numbers, reflecting the
full-pipeline measurement). Footnote updated to explain both
noise profiles (cold-cache for keyword, Anthropic-network for
reranker). Removed the "re-lock pending" placeholder; closes
task #6.
README task #6 partial pass. Filled real locked numbers into 3
of the 4 perf-table rows (keyword_retriever_retrieve,
directory_corpus_load, rag_pipeline_run — all .cpu axis,
measured at N=30 on ubuntu-latest / CPython 3.11.15). Added a
one-sentence note that the keyword_retriever_retrieve threshold
is wide because cold-cache σ ≈ 3.5 ms dominates the mean+2σ
formula. Added a "Bundled .help/ corpus" section calling out the
143 polished templates (13 features × 11 kinds) that landed in
#58 + #61. Reranker row still placeholder — drops once the
full lock runs with include_llm=true.
.help/ corpus repolished for release readiness. Cache cleared,
every feature re-run through attune-author generate --no-rag --all-kinds --fact-check strict. Each of the 13 features now ships
all 11 .help/ template kinds (concept, task, reference, quickstart,
faq, error, warning, tip, note, comparison, troubleshooting) — up
from 3 kinds per feature in #58. 143 polished templates total (39
pre-existing refreshed; 104 new). The --no-rag flag was needed
because the default RAG-grounded polish was cross-contaminating
attune-rag templates with attune-help vocabulary (function names,
file paths, command names that don't exist in this repo); switching
to no-RAG keeps polish doing its prose-rewriting work without
pulling in foreign references. The --fact-check strict enforces
that every reference (function, class, file path, link) resolves
in the repo; the pass shipped here cleared strict on all 143 files.
Project-doc kinds (docs/how-to, docs/tutorials) were attempted but
systematically failed strict — wrong package paths and dead
cross-doc links — and reverted; the dedicated attune-author docs
three-stage pipeline is tracked as a follow-up.
docs/specs/downstream-validation/security-findings.md —
W0.11 partial triage: 10 of 11 stdlib findings confirmed
non-issue after deeper code reads; the 11th (W09.S.011)
surfaced the macOS denylist gap above and is being closed in
this same PR. Source 2 (attune-ai deep sweep) still pending.
.help/ corpus refreshed and extended. Regenerated stale
pipeline and retrieval templates against post-0.1.18 source.
Added four new feature areas to close public-surface coverage
gaps: editor (template-editor primitives), dashboard
(living-docs three-stage pipeline), expander (LLM-driven
query expansion), reranker (LLM-driven re-ranking). Each
feature ships concept, reference, and task templates
produced by attune-author regenerate with the polish pass
applied. features.yaml extended in lockstep. No code change.
README rewritten with the eval-as-marketing thesis foregrounded.
New "Why attune-rag" section quotes the locked retrieval +
faithfulness thresholds plus the per-hot-path latency baseline
(docs/specs/release-quality-baseline/baseline-1.md,
docs/specs/downstream-validation/perf-baseline.md)
as the primary differentiator. Added a comparison table vs
LangChain / LlamaIndex and a "What attune-rag is not" section
for honest self-disqualification. Status section bumped from
stale v0.1.10 to v0.1.19; roadmap section repositioned as
post-freeze 0.2.0+ instead of "next minor release." No public
API change.
Deep-review MEDIUM/LOW capture marked closed. security-findings.md
Source 2 now lists W09.A.005..007 as fix-now (closed in this PR) and
documents the read-only-agent path as the workaround for the broken
security_audit MCP. Hard gate (zero severity: high open) still
holds; LOWs are now zero open as well.
benchmark.yml threshold-gate heredoc fix. The mode-decision step used the key=value form to write reason to $GITHUB_OUTPUT, which GitHub Actions rejected with Invalid format when the PR's diff touched ≥ 2 faithfulness-affecting paths (multi-line value). Switched to the documented heredoc form (reason<<EOF_REASON \n … \n EOF_REASON). The gate now correctly emits mode=full with multi-line rationale. Surfaced on PR #68 which touched both reranker.py and expander.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.21 — Phase 4 W0 setup + security hardening (supersedes 0.1.20)

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Security

Fixed

Changed

Uh oh!