Problem with AutoCad generated PDF #24

laffen · 2013-08-28T06:05:21Z

Hi

I am trying to use the pyPDF2 module to merge a lot of pdf-files. For some of the pdf-files it fails.
The failing pdf-files is files generated directly from Autocad.

Traceback (most recent call last):
File "", line 37, in
File "", line 29, in main
File "C:\Python27\lib\site-packages\PyPDF2\merger.py", line 168, in append
self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
File "C:\Python27\lib\site-packages\PyPDF2\merger.py", line 116, in merge
pages = (0, pdfr.getNumPages())

My script:

def main():
from PyPDF2 import PdfFileReader, PdfFileMerger
doclistdir = r'xxxxxxxxxxxxxxxxxx''
doclistfile = open(r'xxxxxxxxxx\list.txt','r')
doclist = doclistfile.readlines()
merger = PdfFileMerger()

for doc in doclist:
pdfdoc = doclistdir + '' + doc.strip()
mergerelement = open(pdfdoc,'rb')
#print 'Processing: ' + pdfdoc

merger.append(mergerelement)

output = open(doclistdir + '' + "document-output.pdf", "wb")
merger.write(output)
pass

if name == 'main':
main()

regards
Olav

adammorris · 2013-08-28T12:50:29Z

Hi Olav - I've encountered the same with a number of pdf's, also generated from autocad. I haven't been able to get to the bottom of what causes it - but something in the way autocad writes the pdf file. If the files are opened and re-saved in adobe, they can be then merged.

mstamy2 · 2013-08-28T21:02:20Z

Hello, I can now merge PDFs generated by AutoCad with this simple fix
See if commit 428dbf9 solves the issue for you

laffen · 2013-08-29T08:49:51Z

Hi Matthew

Thanks for your help!

This fixed the problem for all but some few pdfs.
In the last case, it looks like return NumberObject(name) fails when name=''.
I have sent you an example of such failing pdf. (also generated by AutoCad)

regards Olav

claird · 2013-08-29T16:07:11Z

adammorris wrote yesterday, in regard to AutoCAD, "... If the files are opened and re-saved in adobe, they can be then merged." I need to explain this in the FAQ: a LOT of PDF-producing software in the world at large, including scanners and AutoCAD, produces broken PDF. At the same time, Adobe software is exceptionally "forgiving" in doing its best to read anything it encounters. It's generally safe to assume that you can "mollify" any busted PDF you find by running it through Acrobat and writing it back out. Preview for MacOS, incidentally, is almost as good in this role.

One of the aims we have with PyPDF2 is to make it as intelligent in reading as Acrobat is, so that it does "the right thing" with PDF instances that don't conform to the PDF standard.

laffen (and anyone else), send us any examples of PDF that cause PyPDF2 to fail, but which you think PyPDF2 should be able to handle. If your examples need to be kept private, be sure to tell us so; we're accustomed to handling proprietary material, and do so conscientiously.

mstamy2 · 2013-09-04T22:31:31Z

I created fixes for a few more bugs that occur with the PDF you sent me - such as, PyPDF2 could not read 'x\00', which it thought was a number object (it is actually an unconventional representation of a null object). There are a few bugs left, though - I will commit the results soon

mstamy2 · 2013-09-05T22:49:11Z

See if commit 54e0b6d works for the remaining PDFs

laffen · 2013-09-06T07:35:26Z

Hi Matthew!

Thanks a lot for your help. I really appreciate it :-)

I tried the commit
54e0b6dhttps://github.com/mstamy2/PyPDF2/commit/54e0b6d6a8e82c06b9d72c3bec635fe5bd1e76dd,
but it seems to be an indention error in the new file at line 570.

Regards Olav

2013/9/6 Matthew Stamy notifications@github.com

See if commit 54e0b6dhttps://github.com/mstamy2/PyPDF2/commit/54e0b6d6a8e82c06b9d72c3bec635fe5bd1e76ddworks for the remaining PDFs

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/24#issuecomment-23907538
.

mstamy2 · 2013-09-06T21:48:18Z

Sorry about that - I had added some code for debugging purposes and deleted more than I intended. The only essential lines of added code are

if tok == b_('x\oo'):
    continue

This was added in reading a dictionary object; but there should probably be a more general fix in case it occurs elsewhere.
I am not sure why x\00 was written into the PDF file, but the best thing it seems we can do is to ignore it.

laffen · 2013-09-09T07:53:51Z

Hi

Great!
This solved the problem - every file merged successfully.
Thanks!

Regards Olav

adammorris · 2013-09-10T19:22:00Z

Awesome - this fixed all my problem PDF's as well! Thank you so much!

ghost assigned mstamy2 Aug 29, 2013

laffen closed this as completed Sep 9, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with AutoCad generated PDF #24

Problem with AutoCad generated PDF #24

laffen commented Aug 28, 2013

adammorris commented Aug 28, 2013

mstamy2 commented Aug 28, 2013

laffen commented Aug 29, 2013

claird commented Aug 29, 2013

mstamy2 commented Sep 4, 2013

mstamy2 commented Sep 5, 2013

laffen commented Sep 6, 2013

mstamy2 commented Sep 6, 2013

laffen commented Sep 9, 2013

adammorris commented Sep 10, 2013

Problem with AutoCad generated PDF #24

Problem with AutoCad generated PDF #24

Comments

laffen commented Aug 28, 2013

My script:

adammorris commented Aug 28, 2013

mstamy2 commented Aug 28, 2013

laffen commented Aug 29, 2013

claird commented Aug 29, 2013

mstamy2 commented Sep 4, 2013

mstamy2 commented Sep 5, 2013

laffen commented Sep 6, 2013

mstamy2 commented Sep 6, 2013

laffen commented Sep 9, 2013

adammorris commented Sep 10, 2013