Skip to content

v0.4.1 — Mode B (custom batch download) docs-only release

Latest

Choose a tag to compare

@ltczding-gif ltczding-gif released this 01 Jun 09:17

Patch-level release. Zero Python code changed; pipeline behaves byte-identically to v0.4.0. The new "feature" is agent-side runbook routing only.

What changed

Agents reading the SKILL.md now handle two flows:

  • Mode A (original, unchanged): one paper in → all of its references out.
  • Mode B (new): a custom batch in → those papers downloaded.

Mode B accepts whatever the user has on hand:

  • DOIs in any wrapping — bare, {}, https://doi.org/..., full-width slash ()
  • BibTeX / RIS files
  • Non-DOI identifiers: arXiv:, PMID:, Semantic Scholar IDs
  • Title lists ("download these papers: title1, title2, ...")
  • Abstract queries ("Wang 2024 Nature Energy papers on hydrogen evolution")
  • Mixed bags of the above

The agent resolves everything to DOIs through:

  • Step 0 canonicalize: BibTeX braces stripped, URL prefixes peeled, full-width slash → ASCII, Unicode punctuation normalized
  • B.0 non-DOI IDs: arXiv → 10.48550/arXiv.<id> (or journal DOI via Crossref), PMID → eutils, SS ID → SS API
  • B.1 DOI extract: regex character class excludes {}; <50% yield falls back to B.2
  • B.2 title → Crossref query.title with relative score ratio + token overlap (NOT absolute thresholds)
  • B.3 discovery ladder: Crossref → OpenAlex → Semantic Scholar (rate-limited, 30s backoff) → PubMed → WebSearch. Google Scholar scraping explicitly forbidden.

Confirm table is the full table with source + confidence per row. Low-confidence rows are excluded by default. Append mode never renumbers existing entries (preserves validate_refs.py's id-keyed incremental skip).

Why 0.4.1 not 0.5.0?

Strict semver would argue minor. But zero Python source changed, the pipeline is unchanged, and anyone running the scripts directly sees byte-identical behavior. Patch-level signals "no install / re-pip needed; just re-copy the skill folder if you want the new Mode B docs."

Reviewed

  • Opus design review caught: append-renumber would corrupt incremental skip; trigger table missed non-DOI IDs; B.2 absolute score thresholds; B.3 missing OpenAlex.
  • Codex (gpt-5.5 high effort, read-only, repo workspace) verified each script claim by file:line and refuted Opus's None"None" claim by reading validate_refs.py:309,367,379-380. Also found: preview-only-5 hides bad matches; BibTeX } leak; router not MECE for Smith 2024 Nature paper ambiguity; full-width slash misses ASCII regex.

15 risks total, all folded into the final design. Full design at docs/plans/2026-05-28-mode-b-custom-batch-design.md.

Tested

13-test merged plan executed (3 parallel agents designed 29 tests, deduped to 13):

  • Tier 1 simulations (pure Python, no network): S1 heredoc shape · M3 BibTeX {} excluded from regex · M4 full-width slash canonicalization (with control proving raw regex misses it) · S5 canonical dedupe collapses case + URL-wrap dupes → 19/19 ✅
  • Tier 1 live scripts: S2 validate_refs.py accepts hand-built refs_raw.json with parent_doi="" (3/3 refs verified, publishers correctly classified) · S4 two-round append (Verified: 3 (2 cached + 1 new) proving incremental skip honors hand-built JSON; ids 1,2 byte-identical between rounds) · S6 wrapper refuses list-file + garbage with clear error → ✅
  • Tier 3 routing dispatch: fresh agent reading SKILL.md, 5 scenarios (Mode A canonical / single DOI / conversational refuse / mixed bag / ambiguous ask): 5/5 cite the matching trigger family row, no hallucination on 下次给你的那 5 篇

pytest 10/10 still passes throughout.

Commits

  • 5b42464 — design doc (530 lines, post Opus + Codex review)
  • 61ca6d7 — SKILL.md restructure (Mode router top + Mode A subsection + Mode B subsection; +265 net lines)
  • 1cc6673 — release commit (CHANGELOG + version + README badges + Mode B bullet)

Known follow-ups (not blocking)

  • Semantic Scholar 429 honor-system pacing (skill instructs agent to pace but can't enforce)
  • B.3 discovery confidence ranking (currently hands all candidates to user; no auto top-N-by-citations)
  • Mode B project resumption across sessions (agent re-resolves from scratch each time)

Full notes: CHANGELOG.md