Merge main ours #894

MartinThoma · 2022-05-23T20:21:34Z

In order to make the 2.0.0-dev branch become the new main branch, I would do this:

git merge main -s ours

After that, merging the 2.0.0-dev branch into main (with a merge-commit) should be automatically possible.

See #863

This commit strives to make the usage for new PyPDF2 users easier by following PEP8 naming schemes. It's mostly about camelCase method names being converted to snake_case. Other changes make the public interface of PyPDF2 smaller and thus easier to discover. This commit does not introduce any breaking changes as the old modules / classes / method signatures are still present. They have now deprecation warnings and the docs show that those are considered deprecated. If a property and a getter-method are both present, use the property. Module level changes -------------------- - utils ➔ _utils: The module is renamed to '_utils' to indicate that it should not be used by PyPDF2 users. It's only meant for PyPDF2 itself. - The 'pdf' module was removed. Most classes / functions are now either in '_utils' or in 'generic'. Core classes ------------ - PdfFileReader➔ PdfReader (strict=False is new default) - PdfFileWriter➔ PdfWriter - PdfFileMerger➔ PdfMerger (strict=False is new default) PdfReader --------- - writer.getPage(pageNumber) ➔ writer.pages[page_number] - writer.getNumPages() ➔ len(writer.pages) - getPageLayout / pageLayout ➔ page_layout - getPageMode / pageMode ➔ page_mode - getIsEncrypted / isEncrypted ➔ is_encrypted - getDocumentInfo ➔ metadata PdfWriter --------- - writer.getPage(pageNumber) ➔ writer.pages[page_number] - writer.getNumPages() ➔ len(writer.pages) - getPageLayout / setPageLayout / pageLayout ➔ page_layout - getPageMode / setPageMode / pageMode ➔ page_mode Page ---- - mediabox / trimbox / cropbox / bleedbox / artbox: - getWidth, getHeight ➔ width / height - getLowerLeft_x / getUpperLeft_x ➔ left - getUpperRight_x / getLowerRight_x ➔ right - getLowerLeft_y / getLowerRight_y ➔ bottom - getUpperRight_y / getUpperLeft_y ➔ top - getLowerLeft / setLowerLeft ➔ lower_left property - upperRight ➔ upper_right - Add Transformation class to make it easy to create transformation matrices - add_transformation and merge_page should be used instead of: - mergeTransformedPage - mergeScaledPage - mergeRotatedPage - mergeTranslatedPage - mergeRotatedTranslatedPage - mergeRotatedScaledPage - mergeScaledTranslatedPage - mergeRotatedScaledTranslatedPage See the CHANGELOG for a full list of changes

This release adds a lot of deprecation warnings in preparation of the PyPDF2 2.0.0 release. The changes are mostly using snake_case function-, method-, and variable-names as well as using properties instead of getter-methods. Maintenance (MAINT): - Remove IronPython Fallback for zlib (#868) Full Changelog: 1.27.12...1.27.13 * Make the `PyPDF2.utils` module private * Rename of core classes: * PdfFileReader ➔ PdfReader * PdfFileWriter ➔ PdfWriter * PdfFileMerger ➔ PdfMerger * Use PEP8 conventions for function names and parameters * If a property and a getter-method are both present, use the property In many places: - getObject ➔ get_object - writeToStream ➔ write_to_stream - readFromStream ➔ read_from_stream PyPDF2.generic - readObject ➔ read_object - convertToInt ➔ convert_to_int - DocumentInformation.getText ➔ DocumentInformation._get_text : This method should typically not be used; please let me know if you need it. PdfReader class: - `reader.getPage(pageNumber)` ➔ `reader.pages[page_number]` - `reader.getNumPages()` / `reader.numPages` ➔ `len(reader.pages)` - getDocumentInfo ➔ metadata - flattenedPages attribute ➔ flattened_pages - resolvedObjects attribute ➔ resolved_objects - xrefIndex attribute ➔ xref_index - getNamedDestinations / namedDestinations attribute ➔ named_destinations - getPageLayout / pageLayout ➔ page_layout attribute - getPageMode / pageMode ➔ page_mode attribute - getIsEncrypted / isEncrypted ➔ is_encrypted attribute - getOutlines ➔ get_outlines - readObjectHeader ➔ read_object_header (TODO: read vs get?) - cacheGetIndirectObject ➔ cache_get_indirect_object (TODO: public vs private?) - cacheIndirectObject ➔ cache_indirect_object (TODO: public vs private?) - getDestinationPageNumber ➔ get_destination_page_number - readNextEndLine ➔ read_next_end_line - _zeroXref ➔ _zero_xref - _authenticateUserPassword ➔ _authenticate_user_password - _pageId2Num attribute ➔ _page_id2num - _buildDestination ➔ _build_destination - _buildOutline ➔ _build_outline - _getPageNumberByIndirect(indirectRef) ➔ _get_page_number_by_indirect(indirect_ref) - _getObjectFromStream ➔ _get_object_from_stream - _decryptObject ➔ _decrypt_object - _flatten(..., indirectRef) ➔ _flatten(..., indirect_ref) - _buildField ➔ _build_field - _checkKids ➔ _check_kids - _writeField ➔ _write_field - _write_field(..., fieldAttributes) ➔ _write_field(..., field_attributes) - _read_xref_subsections(..., getEntry, ...) ➔ _read_xref_subsections(..., get_entry, ...) PdfWriter class: - `writer.getPage(pageNumber)` ➔ `writer.pages[page_number]` - `writer.getNumPages()` ➔ `len(writer.pages)` - addMetadata ➔ add_metadata - addPage ➔ add_page - addBlankPage ➔ add_blank_page - addAttachment(fname, fdata) ➔ add_attachment(filename, data) - insertPage ➔ insert_page - insertBlankPage ➔ insert_blank_page - appendPagesFromReader ➔ append_pages_from_reader - updatePageFormFieldValues ➔ update_page_form_field_values - cloneReaderDocumentRoot ➔ clone_reader_document_root - cloneDocumentFromReader ➔ clone_document_from_reader - getReference ➔ get_reference - getOutlineRoot ➔ get_outline_root - getNamedDestRoot ➔ get_named_dest_root - addBookmarkDestination ➔ add_bookmark_destination - addBookmarkDict ➔ add_bookmark_dict - addBookmark ➔ add_bookmark - addNamedDestinationObject ➔ add_named_destination_object - addNamedDestination ➔ add_named_destination - removeLinks ➔ remove_links - removeImages(ignoreByteStringObject) ➔ remove_images(ignore_byte_string_object) - removeText(ignoreByteStringObject) ➔ remove_text(ignore_byte_string_object) - addURI ➔ add_uri - addLink ➔ add_link - getPage(pageNumber) ➔ get_page(page_number) - getPageLayout / setPageLayout / pageLayout ➔ page_layout attribute - getPageMode / setPageMode / pageMode ➔ page_mode attribute - _addObject ➔ _add_object - _addPage ➔ _add_page - _sweepIndirectReferences ➔ _sweep_indirect_references PdfMerger class - `__init__` parameter: strict=True ➔ strict=False (the PdfFileMerger still has the old default) - addMetadata ➔ add_metadata - addNamedDestination ➔ add_named_destination - setPageLayout ➔ set_page_layout - setPageMode ➔ set_page_mode Page class: - artBox / bleedBox/ cropBox/ mediaBox / trimBox ➔ artbox / bleedbox/ cropbox/ mediabox / trimbox - getWidth, getHeight ➔ width / height - getLowerLeft_x / getUpperLeft_x ➔ left - getUpperRight_x / getLowerRight_x ➔ right - getLowerLeft_y / getLowerRight_y ➔ bottom - getUpperRight_y / getUpperLeft_y ➔ top - getLowerLeft / setLowerLeft ➔ lower_left property - upperRight ➔ upper_right - mergePage ➔ merge_page - rotateClockwise / rotateCounterClockwise ➔ rotate_clockwise - _mergeResources ➔ _merge_resources - _contentStreamRename ➔ _content_stream_rename - _pushPopGS ➔ _push_pop_gs - _addTransformationMatrix ➔ _add_transformation_matrix - _mergePage ➔ _merge_page XmpInformation class: - getElement(..., aboutUri, ...) ➔ get_element(..., about_uri, ...) - getNodesInNamespace(..., aboutUri, ...) ➔ get_nodes_in_namespace(..., aboutUri, ...) - _getText ➔ _get_text utils.py: - matrixMultiply ➔ matrix_multiply - RC4_encrypt is moved to the security module

* STY: Adjust code/docs in several places to make it more similar to the 2.0.0 branch * MAINT: Remove excessive <py36 warnings

Bug Fixes (BUG): - Incorrectly show deprecation warnings on internal usage (#887) Maintenance (MAINT): - Add stacklevel=2 to deprecation warnings (#889) - Remove duplicate warnings imports (#888) Full Changelog: 1.28.0...1.28.1

Fixes a deprecation warning being raised when trying to use the PdfMerger class. This regression of #887 is caused by #889 which reversed the changes done to the PyPDF2/merger.py module so that it once again used the deprecated user-facing isString method as opposed to the internal _isString method. Additionally, this PR fixes the deprecation warning raised by referencing reader.namedDestinations as opposed to reader.named_destinations. Closes #890

Bug Fixes (BUG): - PendingDeprecationWarning for getContents (#893) - PendingDeprecationWarning on using PdfMerger (#891)

MartinThoma · 2022-05-23T20:35:00Z

Initially, I wanted to get a couple of big shots done to have a great 2.0.0 release with nice new features. However, I do see that the fact that 2.0.0-dev is currently the branch to work on is confusing. For this reason, I thought about this procedure:

Phase PyPDF2 1.X support out: Wait until 29th of May. If there were any breaking changes (e.g. internal deprecation warnings), I would fix those in a 1.29.X release.
main branch: Make the current state of 2.0.0-dev become the new main (e.g. by merging this PR + then merging 2.0.0-dev back into main)
Release PyPDF2 2.0.0 latest on 1st of June and continue with a normal development workflow.
Actual deprecations: Although several of the warnings say that some functions become deprecated, I would keep many around for a long time (e.g. until 1st of June 2023) so that people / articles / tutorials have time to adjust. Especially the PdfFileReader. But I would likely drop tests for those "adapter classes / methods / functions".

I've also added version support in readthedocs:

I would keep the latest patch version for each minor version on readthedocs.

@MasterOdin Do you think that is an acceptable way to continue?

MasterOdin · 2022-05-23T23:01:22Z

I think it would make sense to do the following immediately:

Create a branch 1.x from main
Merge main into 2.0.0-dev
Merge 2.0.0-dev into main

Then, any new commits either go:

Commit to main, optionally backport to 1.x
Commit to 1.x

Thus, we never go from 1.x -> main, and avoid any painful merge conflicts for the default branch, especially given the amount of BC breakage happening in 2.0.0-dev. I find it infinitely easier to backport things than forward port. Equally easy to make commits to 1.x for any theoretical 1.29 release that includes stuff that doesn't make sense on 2.x branch (e.g. dealing with isString).

MartinThoma · 2022-05-24T05:37:00Z

That's an excellent plan! I'm about to do it ⏳

https://github.com/py-pdf/PyPDF2/tree/1.x

MartinThoma · 2022-05-24T06:25:56Z

I merged #859 into main. I hope I didn't mess up 🙈

Next steps:

Fix the warnings when running the tests (mostly by switching to the new syntax, but I also want to add a few tests for the core parts that got deprecated to ensure I didn't accidentally remove something like PdfFileReader)
Update the merge targets of several existing PRs which currently might point to 2.0.0-dev
Support people who are still on an older version

I'm not exactly sure yet when to make the 2.0.0 release. Some time after the warnings got fixed, I guess. But I would love to have also a nice new feature, e.g. there was something with cryptography and improved text extraction.

MasterOdin · 2022-05-25T16:01:34Z

Actual deprecations: Although several of the warnings say that some functions become deprecated, I would keep many around for a long time (e.g. until 1st of June 2023) so that people / articles / tutorials have time to adjust. Especially the PdfFileReader. But I would likely drop tests for those "adapter classes / methods / functions".

Should the deprecation warning messages then be updated to say 3.0.0 instead of 2.0.0 for the stuff that'll stick around into the 2.0.0 release? I do think that we should try to have tests around the methods that just test that a deprecation warning is raised just to ensure that functionality works, but otherwise yeah, ignore any sort of deeper testing.

MartinThoma and others added 11 commits May 9, 2022 22:20

MAINT: Remove IronPython Fallback for zlib (#868)

a791ef1

See #863

MAINT: Remove duplicate warnings imports (#888)

560d2a7

BUG: Incorrectly show deprecation warnings on internal usage (#887)

ce1cb66

MAINT: Add stacklevel=2 to deprecation warnings (#889)

f74d733

* STY: Adjust code/docs in several places to make it more similar to the 2.0.0 branch * MAINT: Remove excessive <py36 warnings

REL: 1.28.1

000ac49

Bug Fixes (BUG): - Incorrectly show deprecation warnings on internal usage (#887) Maintenance (MAINT): - Add stacklevel=2 to deprecation warnings (#889) - Remove duplicate warnings imports (#888) Full Changelog: 1.28.0...1.28.1

BUG: PendingDeprecationWarning for getContents (#893)

9947c7b

REL: 1.28.2

c68b98d

Bug Fixes (BUG): - PendingDeprecationWarning for getContents (#893) - PendingDeprecationWarning on using PdfMerger (#891)

Merge branch 'main' into merge-main-ours

7336b8f

MartinThoma merged commit 3729af0 into 2.0.0-dev May 24, 2022

MartinThoma deleted the merge-main-ours branch May 24, 2022 05:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge main ours #894

Merge main ours #894

MartinThoma commented May 23, 2022

MartinThoma commented May 23, 2022

MasterOdin commented May 23, 2022 •

edited

Loading

MartinThoma commented May 24, 2022

MartinThoma commented May 24, 2022 •

edited

Loading

MasterOdin commented May 25, 2022 •

edited

Loading

Merge main ours #894

Merge main ours #894

Conversation

MartinThoma commented May 23, 2022

MartinThoma commented May 23, 2022

MasterOdin commented May 23, 2022 • edited Loading

MartinThoma commented May 24, 2022

MartinThoma commented May 24, 2022 • edited Loading

MasterOdin commented May 25, 2022 • edited Loading

MasterOdin commented May 23, 2022 •

edited

Loading

MartinThoma commented May 24, 2022 •

edited

Loading

MasterOdin commented May 25, 2022 •

edited

Loading