Skip to content

v0.1.16

Choose a tag to compare

@silversurfer562 silversurfer562 released this 15 May 17:42
64a8135

Added

  • attune-rag-benchmark --compare-thinking runs the judge
    twice (thinking off, thinking on at --thinking-budget) and
    prints a side-by-side aggregate table plus a per-query
    verdict-shift list. Mutually exclusive with --thinking and
    --native-citations. Calibration data lives in
    docs/rag/faithfulness-thinking-calibration.md (resolves
    #17).
    Outcome of the 2026-05-15 calibration run: 80 % verdict-shift
    rate but mean faithfulness barely moves (−0.005) and
    hallucination rate worsens slightly (+2.5 pp); --thinking
    stays opt-in pending hand-labeled ground-truth queries.
  • attune-rag-benchmark --json PATH dumps the full
    structured faithfulness report — including per-query
    reasoning text and the supported / unsupported claim lists —
    to a JSON file for offline analysis. Per-query benchmark
    records also gain supported_claims, unsupported_claims,
    reasoning, and latency_ms fields.
  • plan_rename(..., kind="template_path") is now
    implemented (was NotImplementedError in v0.1.15). Moves a
    template file within its corpus root and updates path-keyed
    sidecars (summaries.json / summaries_by_path.json) when
    present. RenamePlan grows a new moves: list[FileMove]
    field and FileMove(old_path, new_path) is exported from
    attune_rag.editor. apply_rename applies moves first
    (creating missing parent directories, tracked for rollback)
    then text edits, reversing prior work on any mid-flight
    failure. Out of scope: cross-corpus moves, cross_links.json
    updates, attune-help static index, git history. The gui-side
    "Rename file…" trigger and WS path-change handling live in
    attune-gui. See docs/specs/template-path-rename/.