Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converts PDF without Fonts/Text #42

Open
dms-ts opened this issue Apr 27, 2023 · 16 comments
Open

Converts PDF without Fonts/Text #42

dms-ts opened this issue Apr 27, 2023 · 16 comments

Comments

@dms-ts
Copy link

dms-ts commented Apr 27, 2023

I'm trying to convert some shipping labels to png, it converts the barcodes and images, but no text/fonts. I already installed Font fix but it doesn't works.

@tabetommy
Copy link

I'm having thesame issue with @dms-ts. Please any ideas?

@Jussinevavuori
Copy link

Same issue, however only for some types of PDFs. Regular PDF files uploaded from the user's device can be converted fine as they are, however for some reason this library fails to convert PDFs created with React PDF.

@ojtramp
Copy link

ojtramp commented Jun 20, 2023

I'm having the same issue - any advice? I've installed Microsoft Fonts and have checked that Arial is installed on my EC2 Ubuntu system running node but still no luck.

I'm looking for a package that doesn't save to the file system and can import a PDF from URL and export an array of images. I'm very happy with this package with the exception of missing some text (obviously a big problem), but happy to switch an alternative if anyone has any advice?

@ojtramp
Copy link

ojtramp commented Jun 20, 2023

I changed the verbosity of the PDF.js command to 1 so that I could get the following error messages, the once relating to Helvetica match the text that is missing. These are my error messages:

Warning: fetchStandardFontData: failed to fetch file "FoxitSans.pfb" with "UnknownErrorException: The standard font "baseUrl" parameter must be specified, ensure that the "standardFontDataUrl" API parameter is provided.".
Warning: fetchStandardFontData: failed to fetch file "FoxitSansBold.pfb" with "UnknownErrorException: The standard font "baseUrl" parameter 
Warning: getPathGenerator - ignoring character: "Error: Requesting object that isn't resolved yet Helvetica_path_T.".
Warning: getPathGenerator - ignoring character: "Error: Requesting object that isn't resolved yet Helvetica_path_h.".

I think my system is saying that it would substitute the Helvetica with Arial:

fc-match Helvetica
Arial.ttf: "Arial" "Regular"

So not sure whats going on... I'll keep trying to find a solution and post back if I find something.

@ojtramp
Copy link

ojtramp commented Jun 20, 2023

Think I found a fix that is legit:

I changed line 100 in the file pdf-img-convert.js:

var loadingTask = pdfjs.getDocument({data: pdfData, disableFontFace: false, verbosity: 0});

It looks like this should be okay from the 2018 answer here.

@ojtramp
Copy link

ojtramp commented Jun 20, 2023

So that didn't work, as mentioned in the earlier part of that 2018 thread that change will break other documents' fonts.

@deathemperor
Copy link

deathemperor commented Oct 12, 2023

I'm able to resolve this issue using this instruction mozilla/pdf.js#4244 (comment)

final version:

diff --git a/pdf-img-convert.js b/pdf-img-convert.js
index 01e8c64c9ffa13ea226a689fa08e78d97213dabe..97939693584b700a985fe3ef3a2fe054a26ddf41 100644
--- a/pdf-img-convert.js
+++ b/pdf-img-convert.js
@@ -29,6 +29,7 @@ const Canvas = require("canvas");
 const assert = require("assert").strict;
 const fs = require("fs");
 const util = require('util');
+const path = require('path');
 
 const readFile = util.promisify(fs.readFile);
 
@@ -95,9 +96,9 @@ module.exports.convert = async function (pdf, conversion_config = {}) {
 
   // At this point, we want to convert the pdf data into a 2D array representing
   // the images (indexed like array[page][pixel])
-
+  let packagePath = path.dirname(require.resolve("pdfjs-dist/package.json"));
   var outputPages = [];
-  var loadingTask = pdfjs.getDocument({data: pdfData, disableFontFace: true, verbosity: 0});
+  var loadingTask = pdfjs.getDocument({data: pdfData, disableFontFace: true, verbosity: 0, standardFontDataUrl: packagePath + '/standard_fonts/'});
 
   var pdfDocument = await loadingTask.promise
 

@ol-th would you accept a PR for this?

@YoricWatterott
Copy link

I would also like to bump this issue, I will have to look for another library to use if this issue doesn't get solved
Has anyone looked at @deathemperor's response? could it work?

Love the simplicity of using this library, just hope this issue can get resolved
all the best

I'm able to resolve this issue using this instruction mozilla/pdf.js#4244 (comment)

final version:

diff --git a/pdf-img-convert.js b/pdf-img-convert.js
index 01e8c64c9ffa13ea226a689fa08e78d97213dabe..97939693584b700a985fe3ef3a2fe054a26ddf41 100644
--- a/pdf-img-convert.js
+++ b/pdf-img-convert.js
@@ -29,6 +29,7 @@ const Canvas = require("canvas");
 const assert = require("assert").strict;
 const fs = require("fs");
 const util = require('util');
+const path = require('path');
 
 const readFile = util.promisify(fs.readFile);
 
@@ -95,9 +96,9 @@ module.exports.convert = async function (pdf, conversion_config = {}) {
 
   // At this point, we want to convert the pdf data into a 2D array representing
   // the images (indexed like array[page][pixel])
-
+  let packagePath = path.dirname(require.resolve("pdfjs-dist/package.json"));
   var outputPages = [];
-  var loadingTask = pdfjs.getDocument({data: pdfData, disableFontFace: true, verbosity: 0});
+  var loadingTask = pdfjs.getDocument({data: pdfData, disableFontFace: true, verbosity: 0, standardFontDataUrl: packagePath + '/standard_fonts/'});
 
   var pdfDocument = await loadingTask.promise
 

@ol-th would you accept a PR for this?

@deathemperor
Copy link

Hope you find it useful. That patch successfully converts our 300+ pdf daily

@YoricWatterott
Copy link

YoricWatterott commented Nov 14, 2023

how can i implement your change @deathemperor?
has it been patched into the latest version?
or do you mean you made the change yourself in the lib files?

I can't edit the file directly,
because i have a pipeline that does npm install

if i indeed have to implement that change myself
i'll have to add a script to my pipeline to edit the file after the fact

i'd prefer not to do that, so If you have an alternative suggestion
that would be great

thanks for your response though @deathemperor
appreciate your time

@ol-th
Copy link
Owner

ol-th commented Nov 14, 2023

@deathemperor if you could send a PR for this fix that would be great. I'll test it out and add it to a new release if all good.

@deathemperor
Copy link

how can i implement your change @deathemperor? has it been patched into the latest version? or do you mean you made the change yourself in the lib files?

I can't edit the file directly, because i have a pipeline that does npm install

if i indeed have to implement that change myself i'll have to add a script to my pipeline to edit the file after the fact

i'd prefer not to do that, so If you have an alternative suggestion that would be great

thanks for your response though @deathemperor appreciate your time

I use https://www.npmjs.com/package/patch-package to maintain patches like these until the repo officially supports.

deathemperor added a commit to deathemperor/pdf-img-convert.js that referenced this issue Nov 14, 2023
@deathemperor
Copy link

@deathemperor if you could send a PR for this fix that would be great. I'll test it out and add it to a new release if all good.

sure, here's the PR #50

@YoricWatterott
Copy link

Hi guys, has this been merged into latest?
I'd love to start using this, thanks

@YoricWatterott
Copy link

Hi @deathemperor, thank you so much for leading me to https://www.npmjs.com/package/patch-package

I managed to implement it successfully to continue using the library seemlessly.

much appreciated

@deathemperor
Copy link

Hi @deathemperor, thank you so much for leading me to https://www.npmjs.com/package/patch-package

I managed to implement it successfully to continue using the library seemlessly.

much appreciated

I'm glad it helped!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants