Previously, when a bundle's pdf_hash matched an existing Paper, ingest()
early-returned without updating the Ref's header-derived fields. This left
refs permanently stuck with stale garbage metadata (e.g. title = InDesign
filename "nmat1849 Geim Progress Article.indd") even after a verified
bundle with the correct metadata was re-ingested.
Fix:
- Add _should_upgrade_ref() policy: upgrade when no Paper yet, or existing
Paper is unverified and new is verified, or current title is garbage
(acatome_meta.is_garbage_title) and new title is clean.
- Add _refresh_ref_metadata(): fill blanks always; on upgrade, overwrite
stale non-null scalar fields (doi, s2_id, arxiv_id, title, authors, year,
journal, entry_type, source). Flip Paper.verified on upgrade.
- Route both _upsert_ref() and the ingest() pdf_hash-dedup early-return
through the new refresh helper.
- Slug upgrade on ingest() dedup path with collision check.
Safety:
- Verified refs cannot be clobbered by later unverified bundles.
- Slug never overwritten into a collision.
- User-curated keywords/tags untouched.
Tests: 8 new parametrized (sqlite + postgres) cases covering
garbage→clean replacement, fill-blanks on verified re-ingest,
verified-preservation, and slug collision on upgrade.
Regression: matches the 'simpson2007nmat / geim2007rise' case I hit when
re-ingesting nmat1849.pdf after the acatome-meta 0.3.6 filename-DOI fix.
Also: bump acatome-meta floor to 0.3.6 (imports is_garbage_title).
style: ruff format