Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure indirect sweep to handle all objects. #1064

Closed
wants to merge 5 commits into from

Conversation

Hatell
Copy link
Contributor

@Hatell Hatell commented Jul 5, 2022

This fixes #1062

@MartinThoma
Copy link
Member

You're quick!

I've just made a bugfix-release so that we have time to fix this. No need to rush :-)

@Hatell Hatell marked this pull request as draft July 5, 2022 12:51
@Hatell Hatell marked this pull request as ready for review July 5, 2022 12:57
@Hatell
Copy link
Contributor Author

Hatell commented Jul 5, 2022

Could you run tests for unknown reason they didn't run.

I found that there was missing some cross references added them as well.

PyPDF2/_writer.py Outdated Show resolved Hide resolved
@MartinThoma
Copy link
Member

@Hatell I've just added a tiny stylistic commit. I don't know why CI wasn't triggered before

@MartinThoma
Copy link
Member

@Hatell The failed test might be actually ok. It checks for file identity - so the file needs to be updated. We need to manually inspect the merged file if it looks ok. Not ideal, but better than no check.

@Hatell
Copy link
Contributor Author

Hatell commented Jul 5, 2022

Don't merge yet. There is some issues left.

@Hatell Hatell closed this Jul 5, 2022
@Hatell Hatell reopened this Jul 5, 2022
@Hatell Hatell marked this pull request as draft July 5, 2022 13:46
@Hatell Hatell marked this pull request as ready for review July 5, 2022 14:03
@Hatell
Copy link
Contributor Author

Hatell commented Jul 5, 2022

I found that some data to calculate hash was ignored. Now tests should pass.

@codecov
Copy link

codecov bot commented Jul 5, 2022

Codecov Report

Merging #1064 (b17baef) into main (a345690) will increase coverage by 0.09%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1064      +/-   ##
==========================================
+ Coverage   90.86%   90.95%   +0.09%     
==========================================
  Files          24       24              
  Lines        4508     4520      +12     
  Branches      920      923       +3     
==========================================
+ Hits         4096     4111      +15     
+ Misses        271      268       -3     
  Partials      141      141              
Impacted Files Coverage Δ
PyPDF2/_writer.py 89.25% <100.00%> (+0.18%) ⬆️
PyPDF2/generic.py 91.57% <100.00%> (+0.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a345690...b17baef. Read the comment docs.

@Hatell
Copy link
Contributor Author

Hatell commented Jul 5, 2022

Now all issues should be resolved.

Added test because coverage checks and found a bug.

There is a bug in main and this fixes it also. I found from previous PR #207 those files with NoneType errors and checked that xref-table is incorrectly produced.

@MartinThoma
Copy link
Member

Thank you ❤️

@MartinThoma
Copy link
Member

I'm running the compression checks again

@MartinThoma
Copy link
Member

It looks ok for me. @MasterOdin what do you think?

@MasterOdin
Copy link
Member

I'm still quite concerned that both extern_map and self._idnum_hash are doing very similar things with regards to trying to remove duplicate indirect objects during the sweep, but can potentially mingle in weird ways, especially since we only use self._idnum_hash when extern_map has a cache miss, but when it doesn't I'm not sure there's a guarantee that what extern_map has cached will be the same as what we'd have for self._idnum_hash, and so I guess we could still end up with repeated objects, though less than before.

Can we replace extern_map with self._idnum_hash?

@Hatell
Copy link
Contributor Author

Hatell commented Jul 7, 2022

It may work.

Need only to figure how this extern_map which is like tree structure based on pdf, generation and idnum to transform a flat lookup-table.

Also I would change this recursive depth-first processing to iterative version. Maybe after this structure change.

@Hatell
Copy link
Contributor Author

Hatell commented Jul 7, 2022

I created PR #1072 which do need testing.

@Hatell Hatell closed this Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PDFs aren't merged properly
3 participants