Presence-claim audit workflow + README/lead-paragraph scoring fixes#25
Conversation
…ixes Adds an external dynamic-workflow audit that independently re-checks the portfolio-truth snapshot's six presence claims against on-disk ground truth, and uses it to find and verify two scoring fixes in the auditor itself. Audit (read-only): - src/run_instructions_audit.py: deterministic pre-step (stratified pilot selection, evidence prep, live tool_today recompute, git drift) + bucket logic - scripts/presence-claims-audit.workflow.js: Workflow that fans out one Haiku verifier per repo (judging all 6 claims), deterministic tally, Sonnet synthesis - scripts/run-instructions-audit.workflow.js: original single-claim version, superseded by presence-claims (kept as the simpler example) Auditor fixes in analyze_project_context, both verified by the audit: - README fallback: presence claims now consider the top-level README, not only the primary context file (wires the previously-dormant readme_text param) - lead-paragraph fallback: a project summary is detected as the prose under the H1 title, not only under an "## Overview" section Verified deterministically on a 16-repo pilot: overall agreement 79% -> 90%, project_summary 75% -> 100%, stack 75% -> 100%. Adds direct unit coverage for analyze_project_context (previously untested). Full suite: 2091 passed. Canonical portfolio-truth-latest.json intentionally NOT regenerated (the fixes shift context_quality portfolio-wide; that actualization is a separate step).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7c101c6058
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if readme_path.is_file(): | ||
| readme_text = _read_small_text(readme_path) |
There was a problem hiding this comment.
Keep README fallback within context-file limits
When a top-level README.md exists but was excluded by _collect_context_files because it exceeds MAX_CONTEXT_BYTES, this fallback still reads the entire file and lets it drive all presence claims and context_quality. In the normal discovery path, that means oversized READMEs that the collector intentionally filtered out can now be scored as valid context, and the read is no longer bounded; gate this fallback on README.md being in context_file_names or apply the same size limit before reading.
Useful? React with 👍 / 👎.
| def _has_lead_summary(text: str) -> bool: | ||
| return _is_nontrivial_text(_lead_paragraph_text(text)) |
There was a problem hiding this comment.
Don't treat list-only leads as summaries
For READMEs whose area before the first ## is only a table of contents or navigation list, _lead_paragraph_text preserves the link/list text and _is_nontrivial_text will mark project_summary_present=True once it has four words. That over-claims the project summary and can promote context_quality even though there is no prose saying what the project is; strip list-only/TOC leads or require at least one non-list prose sentence before accepting the fallback.
Useful? React with 👍 / 👎.
What this is
An external dynamic-workflow audit that independently re-checks the portfolio-truth snapshot's six presence claims against on-disk ground truth — and uses it to find and verify two scoring fixes in the auditor itself. Read-only; never mutates repos, the snapshot, or git.
The audit (read-only)
src/run_instructions_audit.py— deterministic pre-step: stratified pilot selection, evidence prep, livetool_todayrecompute, git-drift detection, and the bucket logic (TDD'd).scripts/presence-claims-audit.workflow.js— the Workflow: fans out one Haiku verifier per repo (judging all 6 claims in one read, blind to the tool's answer) → deterministic per-(repo, claim)tally → one Sonnet synthesis.scripts/run-instructions-audit.workflow.js— original single-claim version, superseded by presence-claims (kept as the simpler example).The fixes (
analyze_project_context), both verified by the auditREADME.md, not only the primary context file. Wires the previously-dormantreadme_textparameter; primary-file identity unchanged (surgical).# Title, not only under an## Overviewsection.Verification (deterministic — verifier verdicts held constant, only
tool_todayrecomputed)stackandproject_summaryreach 100%. Adds the first direct unit coverage foranalyze_project_context(it had none).Test plan
pytest -q— 2091 passed, 2 skipped, 0 regressionsruff checkcleanNotes / out of scope
portfolio-truth-latest.jsonis not regenerated — these fixes shiftcontext_qualityportfolio-wide; that actualization (plus merge-gate/tier review) is a separate step.AGENTS.md-generator fence, a boilerplate-vs-real judgment, an auditor-audits-itself branch confound, and deferred bespoke-heading cases.docs/plans/2026-05-29-run-instructions-external-audit.md.