Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name length error in Lexer_getName #6151

Closed
griffinmyers opened this issue Jun 29, 2015 · 6 comments
Closed

Name length error in Lexer_getName #6151

griffinmyers opened this issue Jun 29, 2015 · 6 comments

Comments

@griffinmyers
Copy link

I'm loading up From ADV Part 1a from the SEC and encountering the following error when rendering Page 34:

Warning: name token is longer than allowed by the spec: 139

console.loging the name buffer that is throwing the error yields:

D,o,e,s, ,a,n,y, ,r,e,p,o,r,t, ,p,r,e,p,a,r,e,d, ,b,y, ,t,h,e, ,i,n,d,e,p,e,n,d,e,n,t, ,p,u,b,l,i,c, ,a,c,c,o,u,n,t,a,n,t, ,t,h,a,t, ,a,u,d,i,t,e,d, ,t,h,e, ,p,o,o,l,e,d, ,i,n,v,e,s,t,m,e,n,t, ,v,e,h,i,c,l,e, ,o,r, ,t,h,a,t, ,e,x,a,m,i,n,e,d, ,i,n,t,e,r,n,a,l, ,c,o,n,t,r,o,l,s

which I can see present on Page 34 of the pdf.

I'd like to solicit your opinion re: the severity of the error, if it's technically and issue with PDFJS or the PDF (Adobe Reader sluggishly can open this file and save it) and options for recovering from the error.

  • PDFJS Version: 1.0.1130
  • Browser: Chrome 43.0.2357.130 on Mac OSX 10.7.5 and PhantomJS 2.0 on Max OSX 10.7.5
@timvandermeij
Copy link
Contributor

Adobe Reader/Acrobat does not complain about this PDF for me, so either they silently ignore the error (and the PDF is corrupted), or there is a bug in our parsing logic. We need to apply further triage here.

@griffinmyers
Copy link
Author

👍 Thanks for the quick look @timvandermeij. Looking forward to seeing what we can come up with.

@timvandermeij
Copy link
Contributor

Probably we need to change error to warn here:

error('Warning: name token is longer than allowed by the spec: ' +

This change is motivated by the fact that it also says 'warning' in the message and I don't think it's a fatal error. I'm not sure why we say it's a warning and then use a error call. I would like to hear what others think about this.

Edit: Poppler seems to do something similar (https://github.com/danigm/poppler/blob/master/poppler/Lexer.cc#L434-L444), which is confusing me.

@Snuffleupagus
Copy link
Collaborator

According to the specification, names are limited to a length (in bytes) of 127; see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G16.1010218.

I'm wondering if we're perhaps failing to parse the file correctly, based on the message above, hence I'm wondering if the actual issue is somewhere else in the parser.js.
In any case, just removing the error might move a potential error elsewhere in the code, so I think we should be careful with that kind of "solution".

@timvandermeij
Copy link
Contributor

Good find, and you're right. I was confused because the error has 'warning' in the message.

@Rob--W
Copy link
Member

Rob--W commented Jul 10, 2015

@Snuffleupagus
FYI: Our parser is correct, the PDF is bad. I have identified the ID of the faulty object, extracted it from the PDF using qpdf (qpdf page34.pdf --show-object=5144,0), and then formatted the output. Clearly, the name is too long.

5142 0 obj
<<
  /AP <<
    /D <<
      /Does#20any#20report#20prepared#20by#20the#20independent#20public#20accountant#20that#20audited#20the#20pooled#20investment#20vehicle#20or#20that#20examined#20internal#20controls 5144 0 R
      /Off 5145 0 R
    >>
    /N <<
      /Does#20any#20report#20prepared#20by#20the#20independent#20public#20accountant#20that#20audited#20the#20pooled#20investment#20vehicle#20or#20that#20examined#20internal#20controls 5143 0 R
    >>
  >>
  /AS /Off
  /BS <<
    /W 0
  >>
  /DA (/ZaDb 0 Tf 0 g)
  /F 4
  /FT /Btn
  /Ff 49152
  /MK <<
    /CA (n)
  >>
  /P 4 0 R
  /Rect [ 378.84000 579.96000 387.60000 588.72000 ]
  /Subtype /Widget
  /T (Does any report prepared by the independent public accountant that audited the pooled investment vehicle or that examined internal controls contain an unqualified opinion? yes check box)
  /TU (Does any report prepared by the independent public accountant that audited the pooled investment vehicle or that examined internal controls contain an unqualified opinion?yes check box)
  /Type /Annot
>>
endobj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants