Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PdfReadError: Could not read Boolean object #377

Closed
tataganesh opened this issue Nov 13, 2017 · 7 comments
Closed

PdfReadError: Could not read Boolean object #377

tataganesh opened this issue Nov 13, 2017 · 7 comments
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF needs-pdf The issue needs a PDF file to show the problem PdfReader The PdfReader component is affected

Comments

@tataganesh
Copy link

tataganesh commented Nov 13, 2017

I am getting the following error for certain PDFs. Due to the confidential nature of these documents, I can't share them, but I can try and provide information which can help solve this problem.
Stacktrace -

    inputpdf = PdfFileReader(open(pdfpath, "rb"), strict=False)
  File "/home/tata/.virtualenvs/obu/local/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1084, in __init__
    self.read(stream)
  File "/home/tata/.virtualenvs/obu/local/lib/python2.7/site-packages/PyPDF2/pdf.py", line 1732, in read
    num = readObject(stream, self)
  File "/home/tata/.virtualenvs/obu/local/lib/python2.7/site-packages/PyPDF2/generic.py", line 74, in readObject
    return BooleanObject.readFromStream(stream)
  File "/home/tata/.virtualenvs/obu/local/lib/python2.7/site-packages/PyPDF2/generic.py", line 137, in readFromStream
    raise utils.PdfReadError('Could not read Boolean object')
PdfReadError: Could not read Boolean object

The exception seems to be raised from the following function, in generic.py:

    def readFromStream(stream):
        word = stream.read(4)
        if word == b_("true"):
            return BooleanObject(True)
        elif word == b_("fals"):
            stream.read(1)
            return BooleanObject(False)
        else:
            raise utils.PdfReadError('Could not read Boolean object')

Printing the variable word yields trai. Can anyone suggest a solution for this issue?

@tataganesh
Copy link
Author

It seems as if all these pdfs are encrypted in some way. Using the solution cited in this issue #53 ,
That is, using the following command -
qpdf --password= --decrypt input.pdf output.pdf
and then reading output.pdf worked for me. I am not sure as to how I can determine beforehand, whether a pdf is encrypted ( or in this particular state ) or not.

@hvbtup
Copy link

hvbtup commented May 17, 2018

I had the same error for a signed document (Austrian "Amtssignatur"). The file contained the sequence

xref
trailer

I think this violates the PDF 1.7 specification which says that a cross reference section (marked by the xref) consist of one or more subsection, but here we have zero subsections.
Anyway, all the browsers and Adobe Reader render the document without an error.

The signed PDF obviously uses the "incremental update" feature of the PDF format.

@hvbtup
Copy link

hvbtup commented May 17, 2018

@ganeshtata , please reopen the issue.

@tataganesh tataganesh reopened this Jun 9, 2018
@MartinThoma MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfReader The PdfReader component is affected labels Apr 7, 2022
@MartinThoma
Copy link
Member

#1015 could fix this issue

@MartinThoma MartinThoma added the needs-pdf The issue needs a PDF file to show the problem label Jun 26, 2022
MartinThoma pushed a commit that referenced this issue Jun 26, 2022
@MartinThoma
Copy link
Member

As there is no pdf, we can not confirm that it's solved. I assume it is and hence I'll close the issue.

Please let me know if you still encounter the same issue with the latest PyPDF2 version (and please attach the pdf, if possible)

@isriam
Copy link

isriam commented Aug 9, 2023

I am able to reproduce this issue with test.pdf and pypdf.

Traceback (most recent call last):

  File "C:\Users\jerem\PycharmProjects\testing\main.py", line 16, in <module>
    main()
  File "C:\Users\jerem\PycharmProjects\testing\main.py", line 8, in main
    text = pypdf.PdfReader('test.pdf')
  File "C:\Users\jerem\AppData\Local\Programs\Python\Python310\lib\site-packages\pypdf\_reader.py", line 318, in __init__
    self.read(stream)
  File "C:\Users\jerem\AppData\Local\Programs\Python\Python310\lib\site-packages\pypdf\_reader.py", line 1548, in read
    self._read_xref_tables_and_trailers(stream, startxref, xref_issue_nr)
  File "C:\Users\jerem\AppData\Local\Programs\Python\Python310\lib\site-packages\pypdf\_reader.py", line 1758, in _read_xref_tables_and_trailers
    startxref = self._read_xref(stream)
  File "C:\Users\jerem\AppData\Local\Programs\Python\Python310\lib\site-packages\pypdf\_reader.py", line 1794, in _read_xref
    self._read_standard_xref_table(stream)
  File "C:\Users\jerem\AppData\Local\Programs\Python\Python310\lib\site-packages\pypdf\_reader.py", line 1657, in _read_standard_xref_table
    size = cast(int, read_object(stream, self))
  File "C:\Users\jerem\AppData\Local\Programs\Python\Python310\lib\site-packages\pypdf\generic\_data_structures.py", line 1229, in read_object
    return BooleanObject.read_from_stream(stream)
  File "C:\Users\jerem\AppData\Local\Programs\Python\Python310\lib\site-packages\pypdf\generic\_base.py", line 257, in read_from_stream
    raise PdfReadError("Could not read Boolean object")
pypdf.errors.PdfReadError: Could not read Boolean object

Process finished with exit code 1

@pubpub-zz
Copy link
Collaborator

@isriam
Please create a new issue. It will be easier to track

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF needs-pdf The issue needs a PDF file to show the problem PdfReader The PdfReader component is affected
Projects
None yet
Development

No branches or pull requests

5 participants