TypeError: 'NumberObject' object is not subscriptable #1273

DL6ER · 2022-08-25T19:14:38Z

See #1269 for further details.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.4.0-122-generic-x86_64-with-glibc2.29

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.10.3

Code + PDF

This is a minimal, complete example that shows the issue:

import PyPDF2
with open("shiv_resume.pdf", "rb") as f:
  pdfreader = PyPDF2.PdfFileReader(f, strict=False)

PDF used above: shiv_resume.pdf

Traceback

This is the complete Traceback I see:

Xref table not zero-indexed. ID numbers for objects will be corrected.
Xref table not zero-indexed. ID numbers for objects will be corrected.
Superfluous whitespace found in object header b'17' b'23'

Traceback (most recent call last):
  File "test4.py", line 3, in <module>
    pdfreader = PyPDF2.PdfFileReader(f, strict=True)
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/_reader.py", line 1775, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/_reader.py", line 275, in __init__
    self.read(stream)
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/_reader.py", line 1279, in read
    self._read_xref_tables_and_trailers(stream, startxref, xref_issue_nr)
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/_reader.py", line 1435, in _read_xref_tables_and_trailers
    xrefstream = self._read_pdf15_xref_stream(stream)
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/_reader.py", line 1515, in _read_pdf15_xref_stream
    assert cast(str, xrefstream["/Type"]) == "/XRef"
TypeError: 'NumberObject' object is not subscriptable

The text was updated successfully, but these errors were encountered:

MartinThoma · 2022-08-31T05:07:59Z

@DL6ER Thank you for sharing this issue and all the details you put in here (and in all the other issues as well) 🤗 I appreciate this a lot ❤️

MartinThoma · 2022-08-31T05:09:56Z

One part that might be interesting to you: PdfFileReader was deprecated. Instead, please use PdfReader. The difference is only that the old PdfFileReader has strict=True by default and PdfReader has strict=False as default. We decided to do this change as most users will want strict=False and the PdfFileReader doesn't actually need a file - ByteIO works fine as well.

Hence this:

import PyPDF2
with open("shiv_resume.pdf", "rb") as f:
  pdfreader = PyPDF2.PdfFileReader(f, strict=False)

can be simplified to this:

from PyPDF2 import PdfReader

reader = PdfReader("shiv_resume.pdf")

DL6ER · 2022-08-31T05:42:17Z

Just to clarify: with, i.e. close() is also not needed?

pubpub-zz · 2022-08-31T08:00:26Z

the PDF file is linearized. and there seems to be some issues in reading this part of the header. I will deeper analyze it later

MartinThoma · 2022-08-31T08:00:55Z

Just to clarify: with, i.e. close() is also not needed?

Exactly! PyPDF2 takes care of that: https://github.com/py-pdf/PyPDF2/blob/main/PyPDF2/_reader.py#L272-L274 - no file handles are left open. That is also the reason why I typically recommend to pass the file path directly to PyPDF2

pubpub-zz · 2022-08-31T20:14:15Z

the problem is not directly due to "Linearization"but to other errors (generated by linearization process ???)

the pointer to the 3rd (and potentially 4th)chained xref/trailer is invalid : in such case we will stop xref_trailer analysis
(I've fixed the /Prev enry in 2nd chained trailer to extend the test coverage)
shiv_resume.pdf
I've added a solution to search for an entry when the xref pointer is invalid
I've also added the same solution to search for an entry when the id/gen is not present in the xref.

PR #1297 completed

* if chained xref/trailer are not good * if the object header ('id' 'gen' obj) or if the object is not present in the xref table, will search the file for the object. fixes py-pdf#1273

pubpub-zz mentioned this issue Aug 31, 2022

ENH: Process XRefStm #1297

Merged

MartinThoma closed this as completed in 1252a49 Sep 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: 'NumberObject' object is not subscriptable #1273

TypeError: 'NumberObject' object is not subscriptable #1273

DL6ER commented Aug 25, 2022

MartinThoma commented Aug 31, 2022

MartinThoma commented Aug 31, 2022 •

edited

DL6ER commented Aug 31, 2022

pubpub-zz commented Aug 31, 2022

MartinThoma commented Aug 31, 2022 •

edited

pubpub-zz commented Aug 31, 2022 •

edited

TypeError: 'NumberObject' object is not subscriptable #1273

TypeError: 'NumberObject' object is not subscriptable #1273

Comments

DL6ER commented Aug 25, 2022

Environment

Code + PDF

Traceback

MartinThoma commented Aug 31, 2022

MartinThoma commented Aug 31, 2022 • edited

DL6ER commented Aug 31, 2022

pubpub-zz commented Aug 31, 2022

MartinThoma commented Aug 31, 2022 • edited

pubpub-zz commented Aug 31, 2022 • edited

MartinThoma commented Aug 31, 2022 •

edited

MartinThoma commented Aug 31, 2022 •

edited

pubpub-zz commented Aug 31, 2022 •

edited