Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use preexisting streams like pyPdf while initializing PdfReader #223

Open
jerrian opened this issue Jul 14, 2021 · 0 comments
Open

Comments

@jerrian
Copy link

jerrian commented Jul 14, 2021

When I tried to get the total pages of "test.pdf" using PdfReader, it said 2 pages, but that pdf file actually has 19 pages.
So I tried again with PdfFileReader from PyPDF2, it worked fine.

>>> from pdfrw import PdfReader
>>> from PyPDF2 import PdfFileReader
>>> filename = './test.pdf'
>>> pdf_reader = PdfReader(filename)
>>> len(pdf_reader.pages)
2
>>> pdf_file_reader = PdfFileReader(open(filename, 'rb'))
>>> pdf_file_reader.getNumPages()
19

I don't know why PdfReader doesn't work properly, but I'm trying to use preexisting stream while initializing PdfReader as mentioned in the source code.

# Allow reading preexisting streams like pyPdf
if hasattr(fname, 'read'):
    fdata = fname.read()
else:
    try:
        f = open(fname, 'rb')
        fdata = f.read()
        f.close()

But it also failed because both PdfFileReader classes in pyPdf and pyPDF2 need stream argument as below.

>>> pdf_reader2 = PdfReader(pdf_file_reader)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/pdf_test/venv/lib/python3.7/site-packages/pdfrw/pdfreader.py", line 565, in __init__
    fdata = fname.read()
TypeError: read() missing 1 required positional argument: 'stream'

# pyPdf
def read(self, stream):
    # start at the end:
    stream.seek(-1, 2)

# pyPDF2    
def read(self, stream):
    debug = False
    if debug: print(">>read", stream)
    # start at the end:

Could you update your source code to work properly with those streams?
Also, I'm adding that "test.pdf" for you to examine what's wrong with the page number.

test.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant