New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with AutoCad generated PDF #24

Closed
laffen opened this Issue Aug 28, 2013 · 10 comments

Comments

Projects
None yet
4 participants
@laffen

laffen commented Aug 28, 2013

Hi

I am trying to use the pyPDF2 module to merge a lot of pdf-files.  For some of the pdf-files it fails.
The failing pdf-files is files generated directly from Autocad.


Traceback (most recent call last):
  File "", line 37, in
  File "", line 29, in main
  File "C:\Python27\lib\site-packages\PyPDF2\merger.py", line 168, in append
    self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
  File "C:\Python27\lib\site-packages\PyPDF2\merger.py", line 116, in merge
    pages = (0, pdfr.getNumPages())


My script:

def main():
    from PyPDF2 import PdfFileReader, PdfFileMerger
    doclistdir = r'xxxxxxxxxxxxxxxxxx''
    doclistfile = open(r'xxxxxxxxxx\list.txt','r')
    doclist = doclistfile.readlines()
    merger = PdfFileMerger()

    for doc in doclist:
        pdfdoc = doclistdir + '' + doc.strip()
        mergerelement = open(pdfdoc,'rb')
        #print 'Processing:  ' + pdfdoc

       
        merger.append(mergerelement)
       
   
    output = open(doclistdir + '' + "document-output.pdf", "wb")
    merger.write(output)
    pass

if name == 'main':
    main()


regards
Olav

@adammorris

This comment has been minimized.

Contributor

adammorris commented Aug 28, 2013

Hi Olav - I've encountered the same with a number of pdf's, also generated from autocad. I haven't been able to get to the bottom of what causes it - but something in the way autocad writes the pdf file. If the files are opened and re-saved in adobe, they can be then merged.

@mstamy2

This comment has been minimized.

Owner

mstamy2 commented Aug 28, 2013

Hello, I can now merge PDFs generated by AutoCad with this simple fix
See if commit 428dbf9 solves the issue for you

@laffen

This comment has been minimized.

laffen commented Aug 29, 2013

Hi Matthew

Thanks for your help!

This fixed the problem for all but some few pdfs.
In the last case, it looks like return NumberObject(name) fails when name=''.
I have sent you an example of such failing pdf. (also generated by AutoCad)

regards Olav

@ghost ghost assigned mstamy2 Aug 29, 2013

@claird

This comment has been minimized.

Collaborator

claird commented Aug 29, 2013

adammorris wrote yesterday, in regard to AutoCAD, "... If the files are opened and re-saved in adobe, they can be then merged." I need to explain this in the FAQ: a LOT of PDF-producing software in the world at large, including scanners and AutoCAD, produces broken PDF. At the same time, Adobe software is exceptionally "forgiving" in doing its best to read anything it encounters. It's generally safe to assume that you can "mollify" any busted PDF you find by running it through Acrobat and writing it back out. Preview for MacOS, incidentally, is almost as good in this role.

One of the aims we have with PyPDF2 is to make it as intelligent in reading as Acrobat is, so that it does "the right thing" with PDF instances that don't conform to the PDF standard.

laffen (and anyone else), send us any examples of PDF that cause PyPDF2 to fail, but which you think PyPDF2 should be able to handle. If your examples need to be kept private, be sure to tell us so; we're accustomed to handling proprietary material, and do so conscientiously.

@mstamy2

This comment has been minimized.

Owner

mstamy2 commented Sep 4, 2013

I created fixes for a few more bugs that occur with the PDF you sent me - such as, PyPDF2 could not read 'x\00', which it thought was a number object (it is actually an unconventional representation of a null object). There are a few bugs left, though - I will commit the results soon

@mstamy2

This comment has been minimized.

Owner

mstamy2 commented Sep 5, 2013

See if commit 54e0b6d works for the remaining PDFs

@laffen

This comment has been minimized.

laffen commented Sep 6, 2013

Hi Matthew!

Thanks a lot for your help. I really appreciate it :-)

I tried the commit
54e0b6dhttps://github.com/mstamy2/PyPDF2/commit/54e0b6d6a8e82c06b9d72c3bec635fe5bd1e76dd,
but it seems to be an indention error in the new file at line 570.

Regards Olav

2013/9/6 Matthew Stamy notifications@github.com

See if commit 54e0b6dhttps://github.com/mstamy2/PyPDF2/commit/54e0b6d6a8e82c06b9d72c3bec635fe5bd1e76ddworks for the remaining PDFs


Reply to this email directly or view it on GitHubhttps://github.com//issues/24#issuecomment-23907538
.

@mstamy2

This comment has been minimized.

Owner

mstamy2 commented Sep 6, 2013

Sorry about that - I had added some code for debugging purposes and deleted more than I intended. The only essential lines of added code are

if tok == b_('x\oo'):
    continue

This was added in reading a dictionary object; but there should probably be a more general fix in case it occurs elsewhere.
I am not sure why x\00 was written into the PDF file, but the best thing it seems we can do is to ignore it.

@laffen

This comment has been minimized.

laffen commented Sep 9, 2013

Hi

Great!
This solved the problem - every file merged successfully.
Thanks!

Regards Olav

@laffen laffen closed this Sep 9, 2013

@adammorris

This comment has been minimized.

Contributor

adammorris commented Sep 10, 2013

Awesome - this fixed all my problem PDF's as well! Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment