Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getOutlines() returns repeated items #381

Closed
aalvrz opened this issue Nov 29, 2017 · 5 comments
Closed

getOutlines() returns repeated items #381

aalvrz opened this issue Nov 29, 2017 · 5 comments
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF needs-pdf The issue needs a PDF file to show the problem PdfReader The PdfReader component is affected

Comments

@aalvrz
Copy link

aalvrz commented Nov 29, 2017

When obtaining the outline of a PDF file:

with open(file_path, 'rb') as pdf_file:
    pdf_reader = PyPDF2.PdfFileReader(pdf_file)
    outlines = pdf_reader.getOutlines()

The method returns repeated items in the list. For example:

[
{'/Title': 'Working with a virtual environment', '/Page': IndirectObject(327, 0), '/Type': '/XYZ', '/Left': <PyPDF2.generic.NullObject object at 0x7f06aba3f278>, '/Top': <PyPDF2.generic.NullObject object at 0x7f06aba3f2b0>, '/Zoom': <PyPDF2.generic.NullObject object at 0x7f06aba3f2e8>}, 

{'/Title': 'Working with a virtual environment', '/Page': IndirectObject(327, 0), '/Type': '/XYZ', '/Left': <PyPDF2.generic.NullObject object at 0x7f06aba3f278>, '/Top': <PyPDF2.generic.NullObject object at 0x7f06aba3f2b0>, '/Zoom': <PyPDF2.generic.NullObject object at 0x7f06aba3f2e8>}, 

# ...

]

The 1st item should actually be another one with Title of Introduction:

@jayantsolanki
Copy link

I am also facing the same issue, one of the bookmark is getting repeated, it is happening for just one pdf. Working perfectly for other pdfs.

@MartinThoma MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Apr 8, 2022
@MartinThoma
Copy link
Member

Could you share a pdf that causes these issues?

@MartinThoma MartinThoma added needs-pdf The issue needs a PDF file to show the problem PdfReader The PdfReader component is affected labels Jun 27, 2022
@mtd91429
Copy link
Contributor

Without a PDF to test and compare, I can only guess. However, I believe that this issue is related to #1121 and is fixed by PR #1128

@MartinThoma
Copy link
Member

Yes, it looks like that to me as well. Thank you for the hint 🤗

I'm closing this issue now as I assume that PyPDF2==2.6.0 has fixed this issue. Please let me know if you see it again with the latest PyPDF2 version.

@jayantsolanki
Copy link

Could you share a pdf that causes these issues?

I am very sorry, I missed this. issue happened when I was interning somewhere. I don't have access to that pdf anymore (was proprietary). #1121 seems to tackle the similar issue. By the way, hugely appreciate the development done by you guys and keeping PyPDF alive. PyPDF2 for me has been a very versatile tool. I referenced your work in one of our paper How SAS® and Python Enhance PDF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF needs-pdf The issue needs a PDF file to show the problem PdfReader The PdfReader component is affected
Projects
None yet
Development

No branches or pull requests

4 participants