Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting error 'Invalid number of entries in field Widths' #125

Closed
matpowel opened this issue Jan 8, 2021 · 18 comments
Closed

Getting error 'Invalid number of entries in field Widths' #125

matpowel opened this issue Jan 8, 2021 · 18 comments
Assignees

Comments

@matpowel
Copy link

matpowel commented Jan 8, 2021

Our production app is getting a lot of errors around processing PDFs. I want to document some of them but only one of them is causing major problems right now and that is per the title. Unfortunately, like the vast majority of our documents they contain sensitive data so I can't share it but here you see the problem:

[8] pry(main)> doc = HexaPDF::Document.open("/Users/matt/Downloads/VII\ MC\ Eachus\ stmt\ closing\ 12-3-20.pdf");
[9] pry(main)> doc.pages.size
=> 4
[10] pry(main)> doc.write('/tmp/test.pdf')
HexaPDF::Error: Validation error for (1,0): Invalid number of entries in field Widths
from /Users/matt/.rvm/gems/ruby-2.7.1@invoice-router/gems/hexapdf-0.12.3/lib/hexapdf/document.rb:648:in `block in write'

For all the other problems we see (like you can see below from the last couple), it's enough to rescue from HexaPDF::MalformedPDFError or HexaPDF::Error and send the PDF to be opened and converted back to PDF by a conversion service we use. For the field Widths issue it is not resolved, so obviously it is not a field that is corrected by conversion.

Is this something that is expected? I was reading this and wondering if this really needs to be a fatal error? Is there any workaround?

Here is the full list of errors again from the last 2 days:

Jan 06 07:55:34 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:55:34 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:55:46 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:55:48 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:56:46 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (7,0) values of the indirect object don't match the values (8,0) from the xref
Jan 06 07:56:47 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (7,0) values of the indirect object don't match the values (8,0) from the xref
Jan 06 07:57:00 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:57:01 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:57:06 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:57:07 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:57:15 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:57:18 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:57:29 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:57:29 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:57:36 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:57:40 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:57:44 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:57:50 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:57:52 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:57:56 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:58:01 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 07:58:08 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref
Jan 06 14:10:20 Failed to process PDF, will try convert it to a new PDF file: Validation error for (1,0): Field PageLayout does not contain an allowed value
Jan 06 14:34:18 Failed to process PDF, will try convert it to a new PDF file: PDF malformed around position : The oid,gen (22,0) values of the indirect object don't match the values (8,0) from the xref
Jan 07 05:52:07 Failed to process PDF, will try convert it to a new PDF file: Validation error for (1,0): Field PageLayout does not contain an allowed value
Jan 07 13:40:04 Failed to process PDF, will try convert it to a new PDF file: Validation error for (1,0): Invalid number of entries in field Widths
Jan 07 13:40:04 Failed to process PDF, will try convert it to a new PDF file: Validation error for (1,0): Invalid number of entries in field Widths
Jan 08 07:48:46 Failed to process PDF, will try convert it to a new PDF file: Validation error for (66,0): Field PageMode does not contain an allowed value
Jan 08 08:08:50 Failed to process PDF, will try convert it to a new PDF file: Validation error for (66,0): Field PageMode does not contain an allowed value
Jan 08 08:25:10 Failed to process PDF, will try convert it to a new PDF file: Validation error for (66,0): Field PageMode does not contain an allowed value
Jan 08 08:51:33 Failed to process PDF, will try convert it to a new PDF file: Validation error for (66,0): Field PageMode does not contain an allowed value
Jan 08 13:10:47 Failed to process PDF, will try convert it to a new PDF file: Validation error for (33,0): Required field Parent is not set
@gettalong gettalong self-assigned this Jan 8, 2021
@gettalong
Copy link
Owner

What you can do immediately is to use doc.write('/tmp/test.pdf', validate: false) and do the validation manually, if you need it, by invoking the doc.validate method before writing the document (generally, running the validation code this way should not make a document any worse).

If you just call doc.validate, validation does its thing without raising any errors. You can either log the return value to see if any validation failed or do something like I did for the hexapdf info command - https://github.com/gettalong/hexapdf/blob/master/lib/hexapdf/cli/info.rb#L92-L105.

The validation in #write is to ensure a valid document. It just calls #validate and aborts if any validation fails. However, PDF readers are often quite accepting when it comes to invalid PDFs, so if you don't need a completely valid file, you can do the validation manually and respond to validation errors any way you like.

As to the errors:

  • /Widths field error: Generally, the information in this field could be recovered from the font, if it was embedded; something like this is not implemented in HexaPDF. If the conversion API cannot produce a valid PDF for you, you could send me the PDF so that I can have a look at it, me signing a confidentially agreement if necessary. Then I can tell you what is wrong with the file and maybe find a way to implement an auto-correction feature.

  • Field XXX does not contain an allowed value: I will amend the current validation check to include the invalid value which might give a hint to what it is wrong.

  • The oid,gen (4,0) values of the indirect object don't match the values (5,0) from the xref: This should not occur (or at least occur less) with the latest version, due to the cross-reference table reconstruction feature. Are you seeing this with the latest version?

@matpowel
Copy link
Author

matpowel commented Jan 9, 2021

Thanks for the fast and thorough response! It's late on Friday here now so I'll make this brief but wanted to get back to you:

  1. doc.validate: That makes sense. I'll give it a try and see if we start having any issues. One of the more important things we do after validation is split page images out of it using Vips, so the main thing we need to check is that Vips can handle it.
  2. Widths: Per number 1, we'll see if Vips handles the fonts in a reasonable way with this being wrong.
  3. Disallowed value: I'm happy to try running from a branch to generate the logging if it helps.
  4. oid,gen: Yes that's right sorry, this was the other issue you fixed previously but we're still on 12.3 or whatever the version was in production. Our rails 6.1.1 branch has HexaPDF 14.0 so we'll be on the latest shortly! Note that I tested the Widths error and it still happens on 14.0.

@gettalong
Copy link
Owner

ad 3) I have pushed some changes to the devel branch, including the one the shows the invalid object. It would help if you could test this with your files to see what value that are using.

ad 4) Yes, the Widths validation error is meant to still happen because it is really an error.

@gettalong
Copy link
Owner

@matpowel Have you tried the latest release 0.14.4? Is there anything else I can help you with in this regard?

@gettalong
Copy link
Owner

@matpowel Just a quick reminder: If you could run the latest version and report back the error with respect to Field PageMode does not contain an allowed value and Field PageLayout does not contain an allowed value, it would help figure out a way to maybe correct that error. Thanks!

@matpowel
Copy link
Author

Hey @gettalong, sorry about the delay and thanks for pursuing this.

Yes we're on 0.14.4 in production now. I just checked our production logs and we are still seeing the occasional error like this one:

Validation error for (204,0): Field FontWeight does not contain an allowed value: 350

We do still occasionally get the Widths error. It doesn't seem to be mitigated by the PDF conversion we use but it is quite rare.

We're not, however, seeing any of the oid,gen errors anymore. At least not within the retention range of our production logs. So that's great news!

Matt

@gettalong
Copy link
Owner

@matpowel If you update to 0.15.0 the validation error for /FontWeight is now changed to be auto-correcting by deleting the field since this field is optional anyways. So it will show up in logs if you log all validation messages, correctable or not. But it won't be a hard error anymore.

As for the /Widths error: I guess that PDF reader applications just fall back to the widths stored in the font file if there is something wrong with that entry.

@gettalong
Copy link
Owner

@matpowel Is there still something to do on my side for this issue?

@matpowel
Copy link
Author

matpowel commented Nov 2, 2021

@gettalong I can't see any record of those errors in the logs now, I believe we have a couple of weeks of retention so looking good! Thanks for chasing this up. We're on version 0.15.8 now btw.

@gettalong
Copy link
Owner

@matpowel Thanks for getting back to me - than I will close this issue now! Feel free to open a new one if something pops up!

@joshkinabrew
Copy link

@gettalong I work with @matpowel

We're running in to this issue again for a PDF using version 0.28.0 of HexaPDF. We can send you the file we are using via email.

To reproduce the error:

doc = HexaPDF::Document.new(io: StringIO.new(File.read('invalidPDF.pdf')))
doc.validate #=> false
doc.pages.delete_at(1)
doc.write('test.pdf')

Error stacktrace is:

--
 0: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/document.rb:663:in `block in write'
 1: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/object.rb:292:in `block in validate'
 2: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/type/font_simple.rb:176:in `perform_validation'
 3: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/type/font_true_type.rb:66:in `perform_validation'
 4: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/object.rb:291:in `validate'
 5: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/document.rb:619:in `block in validate'
 6: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/revisions.rb:265:in `block (2 levels) in each_object'
 7: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/revision.rb:217:in `block in each'
 8: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/utils/object_hash.rb:120:in `block in each'
 9: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/utils/object_hash.rb:120:in `each'
10: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/utils/object_hash.rb:120:in `each'
11: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/revision.rb:217:in `each'
12: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/revisions.rb:263:in `block in each_object'
13: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/revisions.rb:262:in `reverse_each'
14: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/revisions.rb:262:in `each_object'
15: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/document.rb:394:in `each'
16: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/document.rb:618:in `validate'
17: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/hexapdf-0.28.0/lib/hexapdf/document.rb:661:in `write'
18: (pry):11:in `__pry__'
19: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/pry_instance.rb:290:in `eval'
20: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/pry_instance.rb:290:in `evaluate_ruby'
21: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/pry_instance.rb:659:in `handle_line'
22: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/pry_instance.rb:261:in `block (2 levels) in eval'
23: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/pry_instance.rb:260:in `catch'
24: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/pry_instance.rb:260:in `block in eval'
25: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/pry_instance.rb:259:in `catch'
26: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/pry_instance.rb:259:in `eval'
27: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/repl.rb:77:in `block in repl'
28: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/repl.rb:67:in `loop'
29: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/repl.rb:67:in `repl'
30: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/repl.rb:38:in `block in start'
31: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/input_lock.rb:61:in `__with_ownership'
32: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/input_lock.rb:78:in `with_ownership'
33: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/repl.rb:38:in `start'
34: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/repl.rb:15:in `start'
35: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/pry-0.14.1/lib/pry/pry_class.rb:188:in `start'
36: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/railties-6.1.7/lib/rails/commands/console/console_command.rb:70:in `start'
37: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/railties-6.1.7/lib/rails/commands/console/console_command.rb:19:in `start'
38: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/railties-6.1.7/lib/rails/commands/console/console_command.rb:102:in `perform'
39: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/thor-1.2.1/lib/thor/command.rb:27:in `run'
40: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/thor-1.2.1/lib/thor/invocation.rb:127:in `invoke_command'
41: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/thor-1.2.1/lib/thor.rb:392:in `dispatch'
42: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/railties-6.1.7/lib/rails/command/base.rb:69:in `perform'
43: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/railties-6.1.7/lib/rails/command.rb:48:in `invoke'
44: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/railties-6.1.7/lib/rails/commands.rb:18:in `<main>'
45: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/bootsnap-1.12.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:30:in `require'
46: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/bootsnap-1.12.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:30:in `require'
47: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/zeitwerk-2.6.0/lib/zeitwerk/kernel.rb:35:in `require'
48: /Users/me/code/myApp/bin/rails:5:in `<main>'
49: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/bootsnap-1.12.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:39:in `load'
50: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/bootsnap-1.12.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:39:in `load'
51: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/activesupport-6.1.7/lib/active_support/fork_tracker.rb:10:in `block in fork'
52: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/activesupport-6.1.7/lib/active_support/fork_tracker.rb:8:in `fork'
53: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/activesupport-6.1.7/lib/active_support/fork_tracker.rb:8:in `fork'
54: /Users/me/.rvm/gems/ruby-2.7.7@myApp/gems/activesupport-6.1.7/lib/active_support/fork_tracker.rb:27:in `fork'
55: /Users/me/.rvm/rubies/ruby-2.7.7/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:83:in `require'
56: /Users/me/.rvm/rubies/ruby-2.7.7/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:83:in `require'
57: -e:1:in `<main>'```

@gettalong
Copy link
Owner

Hi @joshkinabrew! Thanks for providing the invalid file.

So inspecting the file shows that the /Widths array of the font object with oid (40,0) has 225 entries and the /FirstChar and /LastChar are 30 and 255 respectively. If you count the number of character codes from first char to last char, however, you will get 226 entries (LastChar - FirstChar + 1), so an off by one error by the application/library creating the file.

In theory it is possible to inspect the font used by font object and re-generate the widths array. However, this is not that easy to implement and would also be quite slow.

So I think the better way would be to try and correct these errors in situations that are easy to fix, like in this case, where all entries in /Widths are 600, by adding another 600 value to the array.

@joshkinabrew
Copy link

Thanks for the explanation on that @gettalong!

Regarding adding 600 value to the array, would you be able to provide some code to do that? Is that done in HexaPDF? We don't have control over the application that generated the PDF so we'd need to be able to handle situations like this inside our app.

@gettalong
Copy link
Owner

I will add that work-around to the validation code, so you would have nothing to do.

@joshkinabrew
Copy link

Hi @gettalong , just checking on the status of this. I've tried v0.29.0 and it still throws the same error.

@gettalong
Copy link
Owner

@joshkinabrew Yes, I have not come around to implementing this, yet. So new release with the fix today or tomorrow!

@gettalong
Copy link
Owner

@joshkinabrew Just pushed 0.30.0 with the fix.

@joshkinabrew
Copy link

@gettalong you are awesome! Thanks so much. The PDF now shows as valid and we're able to delete pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants