Skip to content

v0.8.1

@retospect retospect tagged this 24 Apr 12:48
Previously, when a bundle's pdf_hash matched an existing Paper, ingest()
early-returned without updating the Ref's header-derived fields. This left
refs permanently stuck with stale garbage metadata (e.g. title = InDesign
filename "nmat1849 Geim Progress Article.indd") even after a verified
bundle with the correct metadata was re-ingested.

Fix:
- Add _should_upgrade_ref() policy: upgrade when no Paper yet, or existing
  Paper is unverified and new is verified, or current title is garbage
  (acatome_meta.is_garbage_title) and new title is clean.
- Add _refresh_ref_metadata(): fill blanks always; on upgrade, overwrite
  stale non-null scalar fields (doi, s2_id, arxiv_id, title, authors, year,
  journal, entry_type, source). Flip Paper.verified on upgrade.
- Route both _upsert_ref() and the ingest() pdf_hash-dedup early-return
  through the new refresh helper.
- Slug upgrade on ingest() dedup path with collision check.

Safety:
- Verified refs cannot be clobbered by later unverified bundles.
- Slug never overwritten into a collision.
- User-curated keywords/tags untouched.

Tests: 8 new parametrized (sqlite + postgres) cases covering
garbage→clean replacement, fill-blanks on verified re-ingest,
verified-preservation, and slug collision on upgrade.

Regression: matches the 'simpson2007nmat / geim2007rise' case I hit when
re-ingesting nmat1849.pdf after the acatome-meta 0.3.6 filename-DOI fix.

Also: bump acatome-meta floor to 0.3.6 (imports is_garbage_title).
style: ruff format
Assets 2
Loading