Patch-level release. Zero Python code changed; pipeline behaves byte-identically to v0.4.0. The new "feature" is agent-side runbook routing only.
What changed
Agents reading the SKILL.md now handle two flows:
- Mode A (original, unchanged): one paper in → all of its references out.
- Mode B (new): a custom batch in → those papers downloaded.
Mode B accepts whatever the user has on hand:
- DOIs in any wrapping — bare,
{},https://doi.org/..., full-width slash (/) - BibTeX / RIS files
- Non-DOI identifiers:
arXiv:,PMID:, Semantic Scholar IDs - Title lists ("download these papers: title1, title2, ...")
- Abstract queries ("Wang 2024 Nature Energy papers on hydrogen evolution")
- Mixed bags of the above
The agent resolves everything to DOIs through:
- Step 0 canonicalize: BibTeX braces stripped, URL prefixes peeled, full-width slash → ASCII, Unicode punctuation normalized
- B.0 non-DOI IDs: arXiv →
10.48550/arXiv.<id>(or journal DOI via Crossref), PMID → eutils, SS ID → SS API - B.1 DOI extract: regex character class excludes
{}; <50% yield falls back to B.2 - B.2 title → Crossref
query.titlewith relative score ratio + token overlap (NOT absolute thresholds) - B.3 discovery ladder: Crossref → OpenAlex → Semantic Scholar (rate-limited, 30s backoff) → PubMed → WebSearch. Google Scholar scraping explicitly forbidden.
Confirm table is the full table with source + confidence per row. Low-confidence rows are excluded by default. Append mode never renumbers existing entries (preserves validate_refs.py's id-keyed incremental skip).
Why 0.4.1 not 0.5.0?
Strict semver would argue minor. But zero Python source changed, the pipeline is unchanged, and anyone running the scripts directly sees byte-identical behavior. Patch-level signals "no install / re-pip needed; just re-copy the skill folder if you want the new Mode B docs."
Reviewed
- Opus design review caught: append-renumber would corrupt incremental skip; trigger table missed non-DOI IDs; B.2 absolute score thresholds; B.3 missing OpenAlex.
- Codex (gpt-5.5 high effort, read-only, repo workspace) verified each script claim by file:line and refuted Opus's
None→"None"claim by readingvalidate_refs.py:309,367,379-380. Also found: preview-only-5 hides bad matches; BibTeX}leak; router not MECE forSmith 2024 Nature paperambiguity; full-width slash misses ASCII regex.
15 risks total, all folded into the final design. Full design at docs/plans/2026-05-28-mode-b-custom-batch-design.md.
Tested
13-test merged plan executed (3 parallel agents designed 29 tests, deduped to 13):
- Tier 1 simulations (pure Python, no network): S1 heredoc shape · M3 BibTeX
{}excluded from regex · M4 full-width slash canonicalization (with control proving raw regex misses it) · S5 canonical dedupe collapses case + URL-wrap dupes → 19/19 ✅ - Tier 1 live scripts: S2
validate_refs.pyaccepts hand-builtrefs_raw.jsonwithparent_doi=""(3/3 refs verified, publishers correctly classified) · S4 two-round append (Verified: 3 (2 cached + 1 new)proving incremental skip honors hand-built JSON; ids 1,2 byte-identical between rounds) · S6 wrapper refuses list-file + garbage with clear error → ✅ - Tier 3 routing dispatch: fresh agent reading SKILL.md, 5 scenarios (Mode A canonical / single DOI / conversational refuse / mixed bag / ambiguous ask): 5/5 cite the matching trigger family row, no hallucination on
下次给你的那 5 篇✅
pytest 10/10 still passes throughout.
Commits
5b42464— design doc (530 lines, post Opus + Codex review)61ca6d7— SKILL.md restructure (Mode router top + Mode A subsection + Mode B subsection; +265 net lines)1cc6673— release commit (CHANGELOG + version + README badges + Mode B bullet)
Known follow-ups (not blocking)
- Semantic Scholar 429 honor-system pacing (skill instructs agent to pace but can't enforce)
- B.3 discovery confidence ranking (currently hands all candidates to user; no auto top-N-by-citations)
- Mode B project resumption across sessions (agent re-resolves from scratch each time)
Full notes: CHANGELOG.md