Desktop: Resolves #9794: Plugin API: Add support for loading PDFs with the imaging API #10177

personalizedrefrigerator · 2024-03-21T16:37:18Z

Summary

Resolves #9794.

Adds a plugin API to convert PDFs to images.

Specifically, it adds the following:

joplin.imaging.createFromPdfResource
joplin.imaging.createFromPdfPath: Creates images from a PDF file (one per page). Returns a handle to each page.

It also makes the following changes to existing methods:

joplin.imaging.createFromResource and joplin.imaging.createFromPath: If given a path to a pdf resource or file (based on the file extension), return an image of its first page.
joplin.imaging.free: Support freeing an array of image handles.

Testing

This pull request updates the imaging example plugin to use the new APIs. To manually test this pull request using that plugin,

Open a note that contains a PDF link in the markdown editor.
Select the PDF link.
Click on the second gear icon in the toolbar.
Verify that images for the first 10 pages are inserted into the note.

…o images

laurent22 · 2024-03-23T14:48:09Z

packages/app-desktop/services/plugins/PlatformImplementation.ts

-			nativeImage: nativeImage,
+			nativeImage: {
+				async createFromPath(path: string) {
+					if (path.endsWith('.pdf') || path.endsWith('.PDF')) {


Or path.toLowerCase().endsWith('.pdf')? (just to handle that one edge case where the pdf is going to be named .Pdf)

laurent22 · 2024-03-23T14:52:20Z

packages/lib/services/plugins/api/JoplinImaging.ts

+	/**
+	 * The first page to export. Defaults to `1`, the first page in
+	 * the document.
+	 */
+	minPage?: number;
+
+	/**
+	 * The number of the last page to convert. Defaults to the last page
+	 * if not given.
+	 *
+	 * If `maxPage` is greater than the number of pages in the PDF, all pages
+	 * in the PDF will be converted to images.
+	 */
+	maxPage?: number;


Do we need to expose this for now? Since we don't have an API to get this info from the pdf in the first place

Because PDF-to-image conversion can be slow, I would prefer to have some way to only load a specific number of pages (similar to the cursor we use for the data API). The intention for minPage and maxPage was to allow fetching only a certain number of pages, starting at a specific point.

An alternative could be to allow users to provide a cursor and a limit (and also return a has_more property), similar to the data API.

I think the cursor interface would be a bit too abstract for this, so it's fine as it is, however how will the plugin know the number of pages in the document?

however how will the plugin know the number of pages in the document?

It currently isn't possible to get the number of pages in the document. However, it's fine if maxPage is greater than the maximum number of pages. As such, a plugin could do something similar to the following:

const processCount = 5; for (let i = 1; ; i += processCount) { const images = await joplin.imaging.createFromPdfResource( resourceId, { minPage: i, maxPage: i + (processCount - 1) }, ); if (images.length === 0) { break; } await processImages(images); }

It could make sense to add the above to the test plugin (which currently processes only the first 10 pages).

It could also make sense to have joplin.imaging.createFromPdfResource return an object with additional PDF information, in addition to image handles. For example,

{ pages: Handle[]; // String image handles for each page pdfInfo: { pageCount: number; }, }

Such an object would also make it easier to return more information about PDFs in the future, without breaking changes to the plugin API.

packages/generator-joplin/generators/app/templates/api/JoplinImaging.d.ts

laurent22 · 2024-03-26T11:53:00Z

packages/app-cli/tests/support/plugins/imaging/api/JoplinImaging.d.ts

+    scaleFactor?: number;
+}
+export interface PdfInfo {
+    numPages: number;


Could you name it "pageCount" please since this is what we use for pagination in various APIs? (often "page_count" actually but "pageCount" would make more sense in this context)

laurent22 · 2024-03-26T11:56:09Z

packages/app-cli/tests/support/plugins/imaging/api/JoplinImaging.d.ts

+}
+export interface PdfInfo {
+    numPages: number;
+}


For a follow up pull request, is there any additional info we can retrieve using pdf.js? I see that there's a doc.getMetadata() which maybe we could use.

Desktop: Resolves laurent22#9794: Allow imaging API to convert PDFs t…

a0796ab

…o images

personalizedrefrigerator changed the title ~~Desktop: Resolves #9794: Allow imaging API to convert PDFs to images~~ Desktop: Resolves #9794: Plugin API: Add support for loading PDFs with the imaging API Mar 21, 2024

Update documentation text

39a0b7f

JackGruber mentioned this pull request Mar 23, 2024

Add thumbnail support for PDFs JackGruber/joplin-plugin-notelistpreview#15

Closed

laurent22 reviewed Mar 23, 2024

View reviewed changes

Handle edge case where path has a .Pdf extension

c78c712

personalizedrefrigerator commented Mar 25, 2024

View reviewed changes

packages/generator-joplin/generators/app/templates/api/JoplinImaging.d.ts Show resolved Hide resolved

personalizedrefrigerator added 2 commits March 25, 2024 11:42

Support getting PDF page count

916f7c2

Fix mobile build

2ab8b75

laurent22 reviewed Mar 26, 2024

View reviewed changes

Rename numPages -> pageCount

5075fe8

laurent22 merged commit 06aa640 into laurent22:dev Mar 27, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Desktop: Resolves #9794: Plugin API: Add support for loading PDFs with the imaging API #10177

Desktop: Resolves #9794: Plugin API: Add support for loading PDFs with the imaging API #10177

personalizedrefrigerator commented Mar 21, 2024 •

edited

laurent22 Mar 23, 2024

laurent22 Mar 23, 2024

personalizedrefrigerator Mar 23, 2024

laurent22 Mar 23, 2024

personalizedrefrigerator Mar 23, 2024 •

edited

personalizedrefrigerator Mar 23, 2024 •

edited

laurent22 Mar 26, 2024

laurent22 Mar 26, 2024

Desktop: Resolves #9794: Plugin API: Add support for loading PDFs with the imaging API #10177

Desktop: Resolves #9794: Plugin API: Add support for loading PDFs with the imaging API #10177

Conversation

personalizedrefrigerator commented Mar 21, 2024 • edited

Summary

Testing

laurent22 Mar 23, 2024

Choose a reason for hiding this comment

laurent22 Mar 23, 2024

Choose a reason for hiding this comment

personalizedrefrigerator Mar 23, 2024

Choose a reason for hiding this comment

laurent22 Mar 23, 2024

Choose a reason for hiding this comment

personalizedrefrigerator Mar 23, 2024 • edited

Choose a reason for hiding this comment

personalizedrefrigerator Mar 23, 2024 • edited

Choose a reason for hiding this comment

laurent22 Mar 26, 2024

Choose a reason for hiding this comment

laurent22 Mar 26, 2024

Choose a reason for hiding this comment

personalizedrefrigerator commented Mar 21, 2024 •

edited

personalizedrefrigerator Mar 23, 2024 •

edited

personalizedrefrigerator Mar 23, 2024 •

edited