Skip to content

refactor: tighten scope (drop pdf_dir_summary + pdf_doc_open_url) + list summaries#37

Merged
billdenney merged 3 commits into
mainfrom
claude/scope-tighten
May 21, 2026
Merged

refactor: tighten scope (drop pdf_dir_summary + pdf_doc_open_url) + list summaries#37
billdenney merged 3 commits into
mainfrom
claude/scope-tighten

Conversation

@billdenney
Copy link
Copy Markdown
Member

Acts on a design-review Q&A pass. Three substantive changes plus
a CLAUDE.md scope clarification:

What changed

  • pdf_doc_open(path) accepts URLs. The path argument now
    auto-detects URL strings (any RFC 3986 scheme:// form) and
    routes them through base::url() + readBin() ahead of
    PDFium's in-memory loader. No scheme allowlist — whatever
    url() knows how to open is what works. The standalone
    pdf_doc_open_url() is removed; its tests move into
    test-document.R as a "URL paths" section.

  • pdf_dir_summary() removed. Its body was list.files() +
    a loop over pdf_doc_summary — base R primitives glued
    together with no PDFium-specific behaviour. Users wanting
    bulk triage can write the three-line loop themselves.

  • Six new summary.pdfium_*_list S3 methods (obj_list,
    annot_list, attachment_list, signature_list, bookmark_list,
    form_field_list). Each dispatches to the companion as_tibble.*
    method, so summary(x) matches the R idiom of print() for
    the one-liner and summary() for the deep-dive tibble.

  • CLAUDE.md gains a "Scope" section capturing the
    deletion-justification for the two retracted functions, so
    future contributors don't re-add R-only conveniences that
    don't wrap PDFium.

Validation

  • 2158/2158 tests pass
  • 100% R coverage (2851/2851 lines)
  • 0 lints
  • Both pre-commit reference hooks (pkgdown-reference-check and
    rd-xref-check) pass

🤖 Generated with Claude Code

billdenney and others added 3 commits May 21, 2026 18:58
Acts on a Q&A pass that surfaced four design questions about
post-v0.1.0 additions:

Q1+Q2: fold URL handling into pdf_doc_open(path)

The `path =` argument now auto-detects URL strings (anything
matching RFC 3986's `scheme://` form) and routes them through
`base::url()` + `readBin()` ahead of PDFium's existing in-memory
loader. No scheme allowlist — `url()` decides what it supports.

`pdf_doc_open_url()` is removed. Its tests move into
test-document.R as a "URL paths" section.

Q3: remove pdf_dir_summary()

Per the package's PDFium-wrapper mandate, a function whose body
is `list.files()` + a loop over an existing reader doesn't
belong in this package. Users with bulk-triage needs can write
the loop themselves in three lines.

The internal `pdf_doc_summary_empty()` helper (only consumer of
which was pdf_dir_summary's zero-row early-return) is removed.

Q4: add summary() S3 methods for every pdfium_*_list class

Six new one-line methods, one per list type
(pdfium_obj_list / annot_list / attachment_list / signature_list
/ bookmark_list / form_field_list). Each dispatches to its
companion `as_tibble.*` method so `summary(x)` returns the
standard tibble view, matching the R idiom of `print()` for
the one-line summary and `summary()` for the deep dive.

CLAUDE.md: new "Scope" section

Captures the deletion-justification for pdf_doc_open_url +
pdf_dir_summary in a durable place so future contributors don't
re-add R-only conveniences that don't wrap PDFium. The rule:
"every public function should ultimately call into PDFium, or
be unambiguously tied to PDF-format concepts". Filesystem
walking, generic network plumbing, and bulk/batch wrappers
explicitly don't fit.

Test stats after all four changes:
* Full suite 2158/2158 pass.
* R coverage 100% (2851/2851 lines).
* 0 lints; pkgdown reference check + Rd xref check both pass.

NEWS.md folds the new pdf_doc_open URL behaviour into the
"Documents and pages" section and adds a "Scope retraction"
block explaining what was removed and why.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The comparison-vignette row and README "Authoring" bullet both
hedged with "small PDFs", which conflates two different things:

* Scope (page count + objects-per-page + vector complexity) —
  pdfium scales linearly here, no real ceiling.
* Asset richness (image embedding, custom fonts, encryption,
  doc-level metadata writes) — pdfium hits real walls in v0.1.0.

The original "small" qualifier was a defensive hedge that's
unhelpfully vague. Replace it with the actual constraint list.

`vignettes/comparison.Rmd`:
* The TL;DR-table row drops "small" and the (today-inaccurate)
  "images" claim. Points users at `minipdf` if they need image
  embedding now.
* New subsection "4. Programmatic PDF authoring (with v0.1.0
  limits)" makes the gap inventory explicit: image embedding,
  custom fonts, /Info writes, encryption-on-save. Each row in
  the limits table names the missing PDFium symbol and a
  workaround (use minipdf, render-as-paths, xmpdf, qpdf
  respectively), so readers picking a tool can decide quickly
  whether the gap matters for their workflow.

`README.Rmd` / `README.md`:
* The "Authoring" bullet now reads
  "v0.1.0 ships paths / text in the 14 standard PDF fonts /
  annotations; image embedding and custom-font loading come
  in a later release."

No code change; vignette + README only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous "v0.1.0 limits" table treated all four authoring gaps
as one undifferentiated list, which hid the most actionable
distinction:

* Two of the four (image embedding + custom fonts) are gaps in
  this package's wrapping — PDFium already exposes the public
  symbols (FPDFImageObj_LoadJpegFile / SetBitmap;
  FPDFText_LoadFont / LoadStandardFont / LoadCidType2Font), so
  closure timing is entirely in our hands.
* Two of the four (/Info dict writes + encryption on save) are
  gaps in PDFium itself — the symbols (FPDF_SetMetaText,
  FPDF_SetEncryption) don't exist upstream. We've drafted the
  first as a Gerrit patch and tracked the second as CL 5 in
  dev/upstream-api-gaps.md, but closure depends on Google's
  review cycle, a PDFium release, and a bblanchon binary
  rebuild — not our schedule.

A reader picking the package needs to know which planning horizon
applies. The vignette table now splits into two named sub-tables
("Blocked on this package's roadmap" vs "Blocked on upstream
PDFium") with the missing symbols spelled out and per-row
workarounds preserved.

The README "Authoring" bullet gets a one-sentence summary of the
same distinction so the README and vignette align.

No code change; vignette + README only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@billdenney billdenney merged commit 2ef8a3c into main May 21, 2026
13 checks passed
@billdenney billdenney deleted the claude/scope-tighten branch May 21, 2026 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant