Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF Exports with Images Very Large #7314

Open
DaAwesomeP opened this issue Nov 18, 2022 · 17 comments
Open

PDF Exports with Images Very Large #7314

DaAwesomeP opened this issue Nov 18, 2022 · 17 comments
Labels
backlog We'll get to it... eventually... bug It's a bug desktop All desktop platforms

Comments

@DaAwesomeP
Copy link

I am trying to export a PDF of a note that has two images in it. The images are JPEG and very high resolution (3024x4032), however each is less than 2.4 MB in file size. These photos were taken on a Google Pixel 4a 5G.

I have verified that Joplin copies them with this small file size by right-clicking the images and clicking "reveal in file folder." When I export this note as a PDF, the export is 66 MB. If I try to print to a file using the print menu and the file printer in CUPS, Joplin crashes.

Environment

Joplin version: 2.8.8
Platform: Appimage
OS specifics: openSUSE LEAP 15.3 x64

Steps to reproduce

  1. Drag photos into a Joplin note (JPEG, 3024x4032, 2.3MB)
  2. Click export to PDF
  3. Exported PDF is 66 MB and takes some time to export

Describe what you expected to happen

PDF should maintain good JPEG compression of source images.

Logfile

Logs shows loading ntoes and syncing but nothing about export except for this line:

2022-11-17 19:03:21: "CommandService::execute:", "exportPdf", "[]"

I am hesitant to share the log since it contains notes and keys, but I can go through it and redact if necessary.

@DaAwesomeP DaAwesomeP added the bug It's a bug label Nov 18, 2022
@laurent22 laurent22 added desktop All desktop platforms high High priority issues labels Nov 18, 2022
@om2137
Copy link

om2137 commented Nov 24, 2022

I have tried to reproduce this issue.
The PDF is not oversized with every JPEG of those dimensions or above, but there are certain JPEGs that cause this issue of oversizing.
I have tried 2 JPEGs(4000X6000) of 6MB exported PDF is of 82MB.
I have also tried with a different JPEG( 4400X6216 ) of 5.88 MB exported PDF is of 8.89MB.
can i work on this issue?

@roman-r-m
Copy link
Collaborator

can i work on this issue?

sure

@om2137
Copy link

om2137 commented Nov 24, 2022

can i work on this issue?

sure

can you tell, where can I discuss the issue?
like discord, forum or else?

@roman-r-m
Copy link
Collaborator

Here, on the forum or on discord.

Not sure what there is to discuss though.

@DaAwesomeP
Copy link
Author

Here, on the forum or on discord.

Since the issue has been reported here, maybe try to keep the discussion contained here? That way the debugging and solution steps remain well documented and searchable, and eventually all that information will be linkable to a pull request. Personally, I am not in the Discord so you will only be able to get a hold of me here.

Im happy to hear it is reproducible! A good place to start with debugging this would be to see how the PDF gets exported, which tools/libraries are used to do it, and what options/flags that tool might have available. I have yet to look into it all at myself.

@om2137
Copy link

om2137 commented Nov 29, 2022

@DaAwesomeP can you confirm if the JPEGs you were using were from DSLR or pro Camera ?

@om2137
Copy link

om2137 commented Nov 29, 2022

I have created a topic of this issue on Joplin forum: https://discourse.joplinapp.org/t/pdf-with-jpg-selected-exports-oversized-pdf-github-7314/28419

@DaAwesomeP
Copy link
Author

@DaAwesomeP can you confirm if the JPEGs you were using were from DSLR or pro Camera ?

The images came from a Google Pixel 4a 5G. I'm not certain which settings were enabled on the phone.

@roman-r-m
Copy link
Collaborator

Might be related: https://bugs.chromium.org/p/chromium/issues/detail?id=801430
@DaAwesomeP do you have any custom css?

@DaAwesomeP
Copy link
Author

@roman-r-m No, my Joplin is unmodified and installed via Appimage. That issue seems to suggest that EXIF vs JFIF JPEGs may cause a different result if that bug still exists.

@roman-r-m
Copy link
Collaborator

I haven't been able to replicate it so far, so can only guess.
Any chance you could share one of those huge pdfs?

@om2137
Copy link

om2137 commented Nov 30, 2022

I haven't been able to replicate it so far, so can only guess.
Any chance you could share one of those huge pdfs?

The issue do not appear with every jpeg, but only with certain jpeg.

@DaAwesomeP
Copy link
Author

@roman-r-m OK, Gist with photos (in Gist instead of attaching here to avoid compression/modification) and exported PDFs here: https://gist.github.com/DaAwesomeP/1e2359f73334471184d670f59ec21abc

I can confirm this is an EXIF issue. If I run exiftool -EXIF= original.jpg on the image first, then the issue goes away and the PDF is the expected size.

In the Gist, the 11.4MB file export_original.pdf is an export of original.jpg. The 2.7 MB file export_stripped.pdf is an export of stripped.jpg. You can see that the export of the file without EXIF data is effectively the same size as the original image, as expected. Note that in this example the PDF did not balloon to 60+ MB as this is a very simple, mostly white background photo that I took for this issue. More complicated photos definitely get much, much larger.

Please excuse my phone not properly rotating/applying metadata to rotate the image.

@roman-r-m
Copy link
Collaborator

I can confirm this is an EXIF issue. If I run exiftool -EXIF= original.jpg on the image first, then the issue goes away and the PDF is the expected size.

In this case I'm not sure what can possibly be done on the Joplin side as it relies on Electron/Chrome for creating PDFs.

There was an idea to replace Chrome's built in PDF converter with a 3rd party library but I doubt it's going to be done anytime soon, if at all.

@DaAwesomeP
Copy link
Author

In this case I'm not sure what can possibly be done on the Joplin side as it relies on Electron/Chrome for creating PDFs.

As a temporary workaround, there may be a simple way to remove the EXIF data before exporting. I will test more closely and try to figure out exactly which EXIF fields are causing this issue and propose a lightweight solution (obviously don't want to include something as large as ImageMagick). I think it's fine to remove some EXIF data from exported PDFs, as extracting images from PDFs and expecting the same EXIF data is somewhat niche. Chrome may already remove some of the data in the export process.

There was an idea to replace Chrome's built in PDF converter with a 3rd party library but I doubt it's going to be done anytime soon, if at all.

I can potentially look into this too, but this is obviously a much bigger task.

@laurent22 laurent22 added backlog We'll get to it... eventually... and removed high High priority issues labels Jan 5, 2023
@laurent22
Copy link
Owner

Perhaps something to report to the Electron repo? We use webContents.printToPDF() to export to PDF

@DaAwesomeP
Copy link
Author

@laurent22 I began to submit an issue just now, but it seems that Electron v19 is EOL. Maybe updating (if possible) would help to resolve the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog We'll get to it... eventually... bug It's a bug desktop All desktop platforms
Projects
None yet
Development

No branches or pull requests

4 participants