Skip to content

v0.1.21 — Phase 4 W0 setup + security hardening (supersedes 0.1.20)

Choose a tag to compare

@silversurfer562 silversurfer562 released this 20 May 05:18
2e2558f

Phase 4 W0 setup ships. Twelve weeks of W0 machinery + four
security passes (HIGH + LOW) + the polished .help/ corpus + the
locked perf baseline. No public API change — every CHANGELOG bullet
below is either Security (hardening), Fixed (CDN-supply-chain +
XSS), or Changed (docs, internal tooling, freeze workflows).
Phase 4 W1–W4 burn-in starts from this commit; the formal 0.2.0
SemVer cut follows once the four-week soak completes cleanly.

Supersedes the never-published 0.1.20 (tagged at 1f9a7d7) so the
security follow-up from PR #68 (W09.A.005..008) ships in the same
release as the rest of W0.

Security

  • Close macOS direct-path bypass in _SYSTEM_DIRS denylist
    (attune_rag.eval.bench_prompts). Found during W0.11 triage of
    W0.9 finding W09.S.011: /etc//sys/etc. were caught via
    symlink-resolution (those paths resolve to /private/... on
    macOS), but a user typing --output /private/etc/passwd
    directly bypassed the guard because the raw-path arm of the
    check didn't have /private/etc in its denylist. Mirrored
    each original entry under /private/ so the direct form is
    blocked too. Did NOT add bare /private, /var, or
    /usr/... — those would over-block legitimate
    user-writable temp roots (pytest tmp_path lives under
    /private/var/folders/...). 3 new test cases added to
    tests/unit/test_eval_bench_prompts.py. Threat model is
    developer-typo, not a hardened jail.

  • W0.9 Source 2 LOW hardening (Phase 4). Three LOW findings surfaced
    by a read-only security-review agent on src/attune_rag/, addressed
    in this PR rather than deferred:

    • W09.A.005 — Rich-markup injection in dashboard/show.py.
      Snapshot fields (error message, retriever / corpus name, feature
      labels, kind names, per-query feature) were interpolated raw into
      Rich markup strings. A corpus value containing [blink]X[/blink]
      would alter the developer's terminal styling. All untrusted fields
      now flow through rich.markup.escape. Tests in
      tests/unit/test_dashboard_show.py.
    • W09.A.006 — ANSI escape in CLI stderr. attune-rag dashboard render --out interpolates the user-supplied path into the
      ValueError rendered to stderr. A path with raw ANSI bytes would
      repaint the terminal. New _safe_stderr(msg) helper strips C0/C1
      control characters (preserves \t \n \r) before printing.
      Tests in tests/unit/test_cli.py.
    • W09.A.007 — exc_info=True on Anthropic-SDK exceptions.
      LLMReranker.rerank and QueryExpander.expand logged failures
      with full traceback (debug-level). Traceback frames can capture
      SDK locals that may include secret-adjacent material under future
      SDK changes. Now logs exception type + message only.
      Tests in tests/unit/test_expander_reranker.py.
  • Close render-path macOS denylist bypass. Same class as W09.S.011
    but in dashboard/render.py. Path("/etc/foo").resolve() returns
    /private/etc/foo on macOS, slipping past the /etc denylist
    entry. Added /private/etc, /private/sys, /private/dev
    mirrors. Two new tests in tests/unit/test_dashboard_render.py.

Fixed

  • Dashboard XSS hardening (Phase 4 W0.9 / W0.11). Three HIGH-severity
    findings closed against dashboard/render.py and
    dashboard/templates/dashboard.html:
    • The embedded snapshot JSON now goes through _json_for_script_block(),
      which \u-escapes the less-than byte and the U+2028 / U+2029 line
      separators so a corpus value containing a literal </script> cannot
      terminate the inline <script> block.
    • The title argument is HTML-escaped via html.escape(…, quote=True)
      before substitution into <title>…</title>, so values containing
      </title><script>… cannot break out of the title element.
    • The Chart.js CDN <script> tag now carries
      integrity="sha384-…" (Subresource Integrity) plus
      crossorigin="anonymous" and referrerpolicy="no-referrer",
      closing the CDN-compromise vector.
      Tests under tests/unit/test_dashboard_render.py cover all three.
      No public API change; freeze-compatible.

Changed

  • README task #6 closeout. Reranker row now ships real numbers
    (llm_reranker_rerank.wall: mean 728 ms, threshold 1.07 s, σ ≈ 170 ms),
    sourced from the full 4-benchmark lock that landed in #64. The other
    three rows refreshed in lockstep: the full lock measured slightly
    different timings than the LLM-free lock because the reranker
    benchmark exercises corpus paths that warm and re-evaluate adjacent
    hot paths, so the CPU means and σs shifted. New numbers:
    keyword_retriever_retrieve.cpu 3,212 µs / 34,493 µs (σ ≈ 15.6 ms,
    cold-cache-noise dominated, as before); directory_corpus_load.cpu
    47 µs / 66 µs (essentially unchanged); rag_pipeline_run.cpu
    537 µs / 625 µs (up from the LLM-free numbers, reflecting the
    full-pipeline measurement). Footnote updated to explain both
    noise profiles (cold-cache for keyword, Anthropic-network for
    reranker). Removed the "re-lock pending" placeholder; closes
    task #6.

  • README task #6 partial pass. Filled real locked numbers into 3
    of the 4 perf-table rows (keyword_retriever_retrieve,
    directory_corpus_load, rag_pipeline_run — all .cpu axis,
    measured at N=30 on ubuntu-latest / CPython 3.11.15). Added a
    one-sentence note that the keyword_retriever_retrieve threshold
    is wide because cold-cache σ ≈ 3.5 ms dominates the mean+2σ
    formula. Added a "Bundled .help/ corpus" section calling out the
    143 polished templates (13 features × 11 kinds) that landed in
    #58 + #61. Reranker row still placeholder — drops once the
    full lock runs with include_llm=true.

  • .help/ corpus repolished for release readiness. Cache cleared,
    every feature re-run through attune-author generate --no-rag --all-kinds --fact-check strict. Each of the 13 features now ships
    all 11 .help/ template kinds (concept, task, reference, quickstart,
    faq, error, warning, tip, note, comparison, troubleshooting) — up
    from 3 kinds per feature in #58. 143 polished templates total (39
    pre-existing refreshed; 104 new). The --no-rag flag was needed
    because the default RAG-grounded polish was cross-contaminating
    attune-rag templates with attune-help vocabulary (function names,
    file paths, command names that don't exist in this repo); switching
    to no-RAG keeps polish doing its prose-rewriting work without
    pulling in foreign references. The --fact-check strict enforces
    that every reference (function, class, file path, link) resolves
    in the repo; the pass shipped here cleared strict on all 143 files.
    Project-doc kinds (docs/how-to, docs/tutorials) were attempted but
    systematically failed strict — wrong package paths and dead
    cross-doc links — and reverted; the dedicated attune-author docs
    three-stage pipeline is tracked as a follow-up.

  • docs/specs/downstream-validation/security-findings.md
    W0.11 partial triage: 10 of 11 stdlib findings confirmed
    non-issue after deeper code reads; the 11th (W09.S.011)
    surfaced the macOS denylist gap above and is being closed in
    this same PR. Source 2 (attune-ai deep sweep) still pending.

  • .help/ corpus refreshed and extended. Regenerated stale
    pipeline and retrieval templates against post-0.1.18 source.
    Added four new feature areas to close public-surface coverage
    gaps: editor (template-editor primitives), dashboard
    (living-docs three-stage pipeline), expander (LLM-driven
    query expansion), reranker (LLM-driven re-ranking). Each
    feature ships concept, reference, and task templates
    produced by attune-author regenerate with the polish pass
    applied. features.yaml extended in lockstep. No code change.

  • README rewritten with the eval-as-marketing thesis foregrounded.
    New "Why attune-rag" section quotes the locked retrieval +
    faithfulness thresholds plus the per-hot-path latency baseline
    (docs/specs/release-quality-baseline/baseline-1.md,
    docs/specs/downstream-validation/perf-baseline.md)
    as the primary differentiator. Added a comparison table vs
    LangChain / LlamaIndex and a "What attune-rag is not" section
    for honest self-disqualification. Status section bumped from
    stale v0.1.10 to v0.1.19; roadmap section repositioned as
    post-freeze 0.2.0+ instead of "next minor release." No public
    API change.

  • Deep-review MEDIUM/LOW capture marked closed. security-findings.md
    Source 2 now lists W09.A.005..007 as fix-now (closed in this PR) and
    documents the read-only-agent path as the workaround for the broken
    security_audit MCP. Hard gate (zero severity: high open) still
    holds; LOWs are now zero open as well.

  • benchmark.yml threshold-gate heredoc fix. The mode-decision step used the key=value form to write reason to $GITHUB_OUTPUT, which GitHub Actions rejected with Invalid format when the PR's diff touched ≥ 2 faithfulness-affecting paths (multi-line value). Switched to the documented heredoc form (reason<<EOF_REASON \n … \n EOF_REASON). The gate now correctly emits mode=full with multi-line rationale. Surfaced on PR #68 which touched both reranker.py and expander.py.