refactor: tighten scope (drop pdf_dir_summary + pdf_doc_open_url) + list summaries#37
Merged
Conversation
Acts on a Q&A pass that surfaced four design questions about post-v0.1.0 additions: Q1+Q2: fold URL handling into pdf_doc_open(path) The `path =` argument now auto-detects URL strings (anything matching RFC 3986's `scheme://` form) and routes them through `base::url()` + `readBin()` ahead of PDFium's existing in-memory loader. No scheme allowlist — `url()` decides what it supports. `pdf_doc_open_url()` is removed. Its tests move into test-document.R as a "URL paths" section. Q3: remove pdf_dir_summary() Per the package's PDFium-wrapper mandate, a function whose body is `list.files()` + a loop over an existing reader doesn't belong in this package. Users with bulk-triage needs can write the loop themselves in three lines. The internal `pdf_doc_summary_empty()` helper (only consumer of which was pdf_dir_summary's zero-row early-return) is removed. Q4: add summary() S3 methods for every pdfium_*_list class Six new one-line methods, one per list type (pdfium_obj_list / annot_list / attachment_list / signature_list / bookmark_list / form_field_list). Each dispatches to its companion `as_tibble.*` method so `summary(x)` returns the standard tibble view, matching the R idiom of `print()` for the one-line summary and `summary()` for the deep dive. CLAUDE.md: new "Scope" section Captures the deletion-justification for pdf_doc_open_url + pdf_dir_summary in a durable place so future contributors don't re-add R-only conveniences that don't wrap PDFium. The rule: "every public function should ultimately call into PDFium, or be unambiguously tied to PDF-format concepts". Filesystem walking, generic network plumbing, and bulk/batch wrappers explicitly don't fit. Test stats after all four changes: * Full suite 2158/2158 pass. * R coverage 100% (2851/2851 lines). * 0 lints; pkgdown reference check + Rd xref check both pass. NEWS.md folds the new pdf_doc_open URL behaviour into the "Documents and pages" section and adds a "Scope retraction" block explaining what was removed and why. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The comparison-vignette row and README "Authoring" bullet both hedged with "small PDFs", which conflates two different things: * Scope (page count + objects-per-page + vector complexity) — pdfium scales linearly here, no real ceiling. * Asset richness (image embedding, custom fonts, encryption, doc-level metadata writes) — pdfium hits real walls in v0.1.0. The original "small" qualifier was a defensive hedge that's unhelpfully vague. Replace it with the actual constraint list. `vignettes/comparison.Rmd`: * The TL;DR-table row drops "small" and the (today-inaccurate) "images" claim. Points users at `minipdf` if they need image embedding now. * New subsection "4. Programmatic PDF authoring (with v0.1.0 limits)" makes the gap inventory explicit: image embedding, custom fonts, /Info writes, encryption-on-save. Each row in the limits table names the missing PDFium symbol and a workaround (use minipdf, render-as-paths, xmpdf, qpdf respectively), so readers picking a tool can decide quickly whether the gap matters for their workflow. `README.Rmd` / `README.md`: * The "Authoring" bullet now reads "v0.1.0 ships paths / text in the 14 standard PDF fonts / annotations; image embedding and custom-font loading come in a later release." No code change; vignette + README only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous "v0.1.0 limits" table treated all four authoring gaps
as one undifferentiated list, which hid the most actionable
distinction:
* Two of the four (image embedding + custom fonts) are gaps in
this package's wrapping — PDFium already exposes the public
symbols (FPDFImageObj_LoadJpegFile / SetBitmap;
FPDFText_LoadFont / LoadStandardFont / LoadCidType2Font), so
closure timing is entirely in our hands.
* Two of the four (/Info dict writes + encryption on save) are
gaps in PDFium itself — the symbols (FPDF_SetMetaText,
FPDF_SetEncryption) don't exist upstream. We've drafted the
first as a Gerrit patch and tracked the second as CL 5 in
dev/upstream-api-gaps.md, but closure depends on Google's
review cycle, a PDFium release, and a bblanchon binary
rebuild — not our schedule.
A reader picking the package needs to know which planning horizon
applies. The vignette table now splits into two named sub-tables
("Blocked on this package's roadmap" vs "Blocked on upstream
PDFium") with the missing symbols spelled out and per-row
workarounds preserved.
The README "Authoring" bullet gets a one-sentence summary of the
same distinction so the README and vignette align.
No code change; vignette + README only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Acts on a design-review Q&A pass. Three substantive changes plus
a CLAUDE.md scope clarification:
What changed
pdf_doc_open(path)accepts URLs. Thepathargument nowauto-detects URL strings (any RFC 3986
scheme://form) androutes them through
base::url()+readBin()ahead ofPDFium's in-memory loader. No scheme allowlist — whatever
url()knows how to open is what works. The standalonepdf_doc_open_url()is removed; its tests move intotest-document.Ras a "URL paths" section.pdf_dir_summary()removed. Its body waslist.files()+a loop over
pdf_doc_summary— base R primitives gluedtogether with no PDFium-specific behaviour. Users wanting
bulk triage can write the three-line loop themselves.
Six new
summary.pdfium_*_listS3 methods (obj_list,annot_list, attachment_list, signature_list, bookmark_list,
form_field_list). Each dispatches to the companion
as_tibble.*method, so
summary(x)matches the R idiom ofprint()forthe one-liner and
summary()for the deep-dive tibble.CLAUDE.mdgains a "Scope" section capturing thedeletion-justification for the two retracted functions, so
future contributors don't re-add R-only conveniences that
don't wrap PDFium.
Validation
rd-xref-check) pass
🤖 Generated with Claude Code