v0.1.22 — retrieval P@1 0.950→0.975 (freeze-override)
[0.1.22] - 2026-05-20
Freeze override (Phase 4 of v1.0 roadmap). The
KeywordRetriever.MIN_ALIAS_OVERLAPknob below is a new
public class attribute that lands inside the Phase 4
symbol-level freeze under the override mechanism
documented in
docs/specs/downstream-validation/tasks.md
§"Freeze semantics".[Override-rationale]: the knob
changes the default ranking behavior for attune-help in a
structurally meaningful way (replaces a documented
semantic-tie miss with a corpus-side alias-tuning miss),
and gating its default behind a separate 0.2.0 cut would
have shipped the behavior change without the
user-controllable safety valve. Cadence clock not reset
per the spec'sSecurity-scoped exception pattern,
applied here for an internal quality knob with comparable
reversibility — flippingMIN_ALIAS_OVERLAP = 1restores
pre-0.1.22 behavior exactly.
Added
KeywordRetriever.MIN_ALIAS_OVERLAPclass attribute
(default2). Requires at least this many distinct
query tokens to overlap an entry's alias-token union
before creditingaliases_hits. Multi-token alias matches
(the design intent —"CI pipeline failing",
"publish to PyPI") still fire; single common tokens
riding in via one alias no longer dominate ties. Set to
1(via subclass or class-attribute override) to restore
pre-0.1.22 behavior. Trade-off measured on the locked
golden set: gq-020 ("write unit tests") now retrieves
quickstarts/generate-tests.mdat top-1 (was
concepts/tool-fix-test.md— phantomtestalias hit);
gq-026 ("version bump and changelog") drops from top-1
(concepts/tool-release-prep.mdlost its single-token
versalias hit). Net Precision@1 unchanged at 0.975;
remaining miss is now a corpus-side alias-content
question rather than a documented embedding-gap.
Changed
-
KeywordRetrieverstemming: collapse-ity/-ities
to a shared stem._STEM_SUFFIXESnow includes"ities"
and"ity"(ordered before"ies"so plurals strip to the
same stem as the singular). Previouslyvulnerability
stayed unstemmed whilevulnerabilitiesstemmed to
vulnerabilitvia the"ies"suffix, so a singular query
token never overlapped the plural in summaries. Measured
delta on the locked 40-query golden set: Precision@1
lifted from 0.950 (38/40) to 0.975 (39/40) — gq-011
("vulnerability scan") now retrieves
concepts/tool-security-audit.mdat top-1 instead of
quickstarts/skill-security-audit.md. Recall@3 holds at
1.00 (40/40)._MIN_STEM_LEN = 3continues to protect
short tokens (city,pity,unityleft unchanged).
Rationale and per-query trace:
docs/specs/selection-criteria-robustness/proposal.md. -
Locked baseline lifted (20-run, with faithfulness).
docs/specs/release-quality-baseline/thresholds.json
re-measured after the changes above. P@1 floor lifts
0.950 → 0.975 (stdev 0); mean_faithfulness floor lifts
0.9686 → 0.9698 (mean 0.9801, stdev 0.0052); R@3 holds at
1.0.baseline-1.mdrefreshed with the new per-run table.