Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add page label support to PdfWriter #1558

Merged
merged 23 commits into from
Jan 19, 2023
Merged

Conversation

lorenzomanini
Copy link
Contributor

Introduce a new PdfWriter interface set_page_labels that sets page labels to a page interval (as mentioned in #1554).
It should support setting multiple page lables for different page intervals.
I've done some local testing and it seems to be working.
Pending: tests to be added

It's my first time contributing to an open-source project and I have little experience with python so my code will certainly need corrections, so I would like to know what you think about it.
Also, I've no idea how to set up correctly tests for my code but I'll look into it as soon as I will have some spare time.

@MartinThoma
Copy link
Member

I've done some local testing and it seems to be working.

How did you test it? Please add the test script

pypdf/_writer.py Outdated Show resolved Hide resolved
pypdf/_writer.py Outdated Show resolved Hide resolved
pypdf/_writer.py Outdated Show resolved Hide resolved
@lorenzomanini
Copy link
Contributor Author

Thank you a lot! This week I'm pretty busy, but I will try to address everything as soon as possible. At the latest, I think, next week I will be able to work on it.

@lorenzomanini
Copy link
Contributor Author

I have added the tests and changed a bit the behavior to manage correctly when ranges of page labels overlap. The three _nums functions should be moved to _page_labels.py I think, what do you think?
Also, the inputs of set_page_label (page_number_from and page_number_to) must be 1 based indexes. It seemed more natural and practical to me for this use case but I'm not sure.
Docs and mypy are still missing.
Let me know what you think! :)

pypdf/_writer.py Outdated Show resolved Hide resolved
pypdf/_writer.py Outdated Show resolved Hide resolved
pypdf/_writer.py Outdated Show resolved Hide resolved
pypdf/_writer.py Outdated Show resolved Hide resolved
Co-authored-by: pubpub-zz <4083478+pubpub-zz@users.noreply.github.com>
@lorenzomanini
Copy link
Contributor Author

Thank you for the mypy suggestions!
The label parameter should already be renamed style, did I mess something up with Git?
I agree about creating a class in constants for the possible styles.

@pubpub-zz
Copy link
Collaborator

Thank you for the mypy suggestions! The label parameter should already be renamed style, did I mess something up with Git? I agree about creating a class in constants for the possible styles.
oups old comment... All good.

pypdf/_writer.py Outdated Show resolved Hide resolved
pypdf/_writer.py Outdated Show resolved Hide resolved
pypdf/_writer.py Outdated Show resolved Hide resolved
pypdf/_writer.py Outdated Show resolved Hide resolved
pypdf/_writer.py Outdated Show resolved Hide resolved
lorenzomanini and others added 3 commits January 18, 2023 00:03
Co-authored-by: pubpub-zz <4083478+pubpub-zz@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Jan 17, 2023

Codecov Report

Base: 91.87% // Head: 91.84% // Decreases project coverage by -0.04% ⚠️

Coverage data is based on head (daad866) compared to base (3560550).
Patch coverage: 89.04% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1558      +/-   ##
==========================================
- Coverage   91.87%   91.84%   -0.04%     
==========================================
  Files          33       33              
  Lines        6130     6202      +72     
  Branches     1210     1228      +18     
==========================================
+ Hits         5632     5696      +64     
- Misses        322      326       +4     
- Partials      176      180       +4     
Impacted Files Coverage Δ
pypdf/_page_labels.py 76.74% <71.42%> (-2.92%) ⬇️
pypdf/_writer.py 84.37% <100.00%> (+0.61%) ⬆️
pypdf/constants.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

pypdf/constants.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@lorenzomanini lorenzomanini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are nearly there.
I wanted your opinion about the following: @MartinThoma @pubpub-zz

pypdf/_page_labels.py Show resolved Hide resolved
pypdf/_page_labels.py Show resolved Hide resolved
pypdf/_writer.py Outdated
def set_page_label(
self,
page_number_from: int,
page_number_to: int,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about page ranges indexes.
Now set_page_label requires page numbers starting with 1 while _set_page_label requires them starting with 0 (the parameters name change accordingly from page_number to page_index)
Also in both cases extremes are included.
I did it this way in the public interface because I think it is more natural this way for the user that probably is setting these watching a pdf (where page numbers start from 1), but I understand if you don't agree, especially considering that the private interface is 0 based.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to use page_index everywhere and hence starting from 0.
Don't forget that pypdf users are Python developers. Starting a list of pages with index 0 is a lot more natural than starting with 1.

My suggestion would be to rename the parameters to page_index_from and page_index_to and letting it start with 0.

@pubpub-zz / @MasterOdin What's your opinion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with your approach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehehe 1 based indexing is never the answer. The more I think about it the more I agree with you. I'll wait a bit for MasterOdin answer and then I'll change that. Unfortunately, I will have to change all the indexes in the test but I knew what I was risking when I did it:sweat_smile:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree with making it 0-based indexed for the simple reason that all other functions that refer to pages uses a 0-based index.

I agree with @MartinThoma that using page_index is the more "proper" way to refer to this as people may conflate page_number with a 1-based index scheme, whereas page_index, in my opinion, really only refers to 0-based index schemes in the context of Python.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'll do that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last commit changes everything to 0-based indexing. Are you convinced by including both extremes or should we do lower included and upper excluded?

@MartinThoma
Copy link
Member

I still need to go over a lot of details, but I wanted to give you a small heads-up: Excellent work 🎉 ❤️
Thank you for contributing to pypdf 🤗

pypdf/_page_labels.py Outdated Show resolved Hide resolved
pypdf/_page_labels.py Outdated Show resolved Hide resolved
pypdf/_page_labels.py Outdated Show resolved Hide resolved
pypdf/_writer.py Outdated Show resolved Hide resolved
@MartinThoma
Copy link
Member

Overall it looks good to me. There are some minor docstring improvements I suggested (the first line should always be a summary of the function) + the page_number/page_index question needs to be clarified before we merge.

I'm confident that this can be merged + released until Sunday :-)

@lorenzomanini If you want, you can add yourself to the https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html (docs/meta/CONTRIBUTORS.md) within this PR :-)

Co-authored-by: Martin Thoma <info@martin-thoma.de>
@lorenzomanini
Copy link
Contributor Author

I still need to go over a lot of details, but I wanted to give you a small heads-up: Excellent work 🎉 ❤️ Thank you for contributing to pypdf 🤗

Thank you!:blush: I'm having a lot of fun. And thank you for being so welcoming!:heart:

@lorenzomanini
Copy link
Contributor Author

I'm confident that this can be merged + released until Sunday :-)

@lorenzomanini If you want, you can add yourself to the https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html (docs/meta/CONTRIBUTORS.md) within this PR :-)

Cool! Thank you!

pypdf/_writer.py Outdated Show resolved Hide resolved
pypdf/_writer.py Outdated Show resolved Hide resolved
@MartinThoma
Copy link
Member

@lorenzomanini Looks good from my side - let me know when I can merge :-)

@lorenzomanini
Copy link
Contributor Author

Good, I think it is ready! Thank you @MartinThoma and @pubpub-zz for the help and for being so nice.
This was a great first experience contributing to an open-source project. I hope to collab again! ❤️ 😊

@MartinThoma MartinThoma merged commit e711846 into py-pdf:main Jan 19, 2023
@MartinThoma
Copy link
Member

I'm happy to hear that you feel welcome 🤗

Your contribution is now in the main branch and will be on PyPI with version pypdf==3.3.0 on Sunday.

Really good work!

MartinThoma added a commit that referenced this pull request Jan 22, 2023
New Features (ENH):
-  Add page label support to PdfWriter (#1558)
-  Accept inline images with space before EI (#1552)
-  Add circle annotation support (#1556)
-  Add polygon annotation support (#1557)
-  Make merging pages produce a deterministic PDF (#1542, #1543)

Bug Fixes (BUG):
-  Fix error in cmap extraction (#1544)
-  Remove erroneous assertion check (#1564)
-  Fix dictionary access of optional page label keys (#1562)

Robustness (ROB):
-  Set ignore_eof=True for read_until_regex (#1521)

Documentation (DOC):
-  Paper size (#1550)

Developer Experience (DEV):
-  Fix broken combination of dependencies of docs.txt
-  Annotate tests appropriately (#1551)

[Full Changelog](3.2.1...3.3.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants