Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDFs aren't merged properly #1062

Closed
gprzyby opened this issue Jul 5, 2022 · 6 comments · Fixed by #1063
Closed

PDFs aren't merged properly #1062

gprzyby opened this issue Jul 5, 2022 · 6 comments · Fixed by #1063
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfMerger The PdfMerger component is affected PdfWriter The PdfWriter component is affected

Comments

@gprzyby
Copy link

gprzyby commented Jul 5, 2022

After upgrading from version 1.28.1 to version 2.4.1 I've faced an issue with merging linked pdfs.
I've also checked previous versions and problem appears after 2.0.0 version

Environment

$ python -m platform
Linux-5.18.5-200.fc36.x86_64-x86_64-with-glibc2.28

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.4.1

Code + PDF

PDFs:
file1.pdf
file2.pdf
merged.pdf

from PyPDF2 import PdfMerger

merger = PdfMerger(strict=True)
merger.append("file1.pdf")
merger.merge(1, "file2.pdf")
merger.write("merged.pdf")

Traceback

There is no traceback, but after checking merged.pdf content from file2.pdf dosen't exists, but that slide has dimensions from file2.pdf
It looks like dimensions are from file2.pdf, but image comes from first slide of file1.pdf

Expected behavior

In second slide of merged.pdf we could see slides from file2.pdf

@gprzyby gprzyby changed the title PDFs arent merged properly PDFs aren't merged properly Jul 5, 2022
@MartinThoma
Copy link
Member

Thank you for the nice bug report!

I made a git bisect and found that #207 caused the issue. That PR was introduced with PyPDF2==2.4.1. PyPDF2==2.4.0 should be fine. Could you please check @gprzyby ?

@Hatell would you mind having a look why _sweep_indirect_references causes those issues?

@MartinThoma
Copy link
Member

@Hatell As a quick-fix, I will revert the changes of #207 within _sweep_indirect_references today. Being slow is better than being incorrect

@Hatell
Copy link
Contributor

Hatell commented Jul 5, 2022

I will check this.

@MartinThoma MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfMerger The PdfMerger component is affected PdfWriter The PdfWriter component is affected labels Jul 5, 2022
MartinThoma added a commit that referenced this issue Jul 5, 2022
Caused-by: #207

Why it wasn't detected by the tests: We don't have any tests that check
for the correct result of a merge. We just check for exceptions

Closes: #1062
MartinThoma added a commit that referenced this issue Jul 5, 2022
Caused-by: #207

Why it wasn't detected by the tests: We don't have any tests that check
for the correct result of a merge. We just check for exceptions

How we prevent it in future: Unit test was added

Risk of the fix:
- We will have bigger file sizes again as #207 was effectively reverted
- We will need to adjust this test if we change the way we write PDFs

Closes: #1062
MartinThoma added a commit that referenced this issue Jul 5, 2022
Caused-by: #207

Why it wasn't detected by the tests: We don't have any tests that check
for the correct result of a merge. We just check for exceptions

How we prevent it in future: Unit test was added

Risk of the fix:
- We will have bigger file sizes again as #207 was effectively reverted
- We will need to adjust this test if we change the way we write PDFs

Closes: #1062
@gprzyby
Copy link
Author

gprzyby commented Jul 5, 2022

@MartinThoma
Great thanks for fast reply :D

I've checked how code will behave on PyPDF2=2.4.0 and it works like charm

@Hatell
Copy link
Contributor

Hatell commented Jul 5, 2022

It seems that objects sweep iteration is stopped too early. I made pull request to fix issue.

MartinThoma added a commit that referenced this issue Jul 5, 2022
Caused-by: #207

Why it wasn't detected by the tests: We don't have any tests that check
for the correct result of a merge. We just check for exceptions

How we prevent it in future: Unit test was added

Risk of the fix:
- We will have bigger file sizes again as #207 was effectively reverted
- We will need to adjust this test if we change the way we write PDFs

Closes: #1062
@MartinThoma
Copy link
Member

@gprzyby PyPDF2==2.4.2 was released with a revert and a test to prevent this to happen again. So you can still take the latest version or 2.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF PdfMerger The PdfMerger component is affected PdfWriter The PdfWriter component is affected
Projects
None yet
3 participants