Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge main ours #894

Merged
merged 11 commits into from May 24, 2022
Merged

Merge main ours #894

merged 11 commits into from May 24, 2022

Conversation

MartinThoma
Copy link
Member

In order to make the 2.0.0-dev branch become the new main branch, I would do this:

git merge main -s ours

After that, merging the 2.0.0-dev branch into main (with a merge-commit) should be automatically possible.

MartinThoma and others added 11 commits May 9, 2022 22:20
This commit strives to make the usage for new PyPDF2 users easier by following
PEP8 naming schemes. It's mostly about camelCase method names being converted to
snake_case. Other changes make the public interface of PyPDF2 smaller and thus
easier to discover.

This commit does not introduce any breaking changes as the old modules /
classes / method signatures are still present. They have now deprecation
warnings and the docs show that those are considered deprecated.

If a property and a getter-method are both present, use the property.

Module level changes
--------------------

- utils ➔ _utils: The module is renamed to '_utils' to indicate that it should
                not be used by PyPDF2 users. It's only meant for PyPDF2 itself.
- The 'pdf' module was removed. Most classes / functions are now either in
  '_utils' or in 'generic'.


Core classes
------------

- PdfFileReader➔ PdfReader (strict=False is new default)
- PdfFileWriter➔ PdfWriter
- PdfFileMerger➔ PdfMerger (strict=False is new default)

PdfReader
---------

- writer.getPage(pageNumber) ➔ writer.pages[page_number]
- writer.getNumPages() ➔ len(writer.pages)
- getPageLayout / pageLayout ➔ page_layout
- getPageMode / pageMode ➔ page_mode
- getIsEncrypted / isEncrypted ➔ is_encrypted
- getDocumentInfo ➔ metadata

PdfWriter
---------

- writer.getPage(pageNumber) ➔ writer.pages[page_number]
- writer.getNumPages() ➔ len(writer.pages)
- getPageLayout / setPageLayout / pageLayout ➔ page_layout
- getPageMode / setPageMode / pageMode ➔ page_mode

Page
----

- mediabox / trimbox / cropbox / bleedbox / artbox:
    - getWidth, getHeight  ➔ width / height
    - getLowerLeft_x / getUpperLeft_x ➔ left
    - getUpperRight_x / getLowerRight_x ➔ right
    - getLowerLeft_y / getLowerRight_y ➔ bottom
    - getUpperRight_y / getUpperLeft_y ➔ top
    - getLowerLeft / setLowerLeft ➔ lower_left property
    - upperRight ➔ upper_right
- Add Transformation class to make it easy to create transformation matrices
- add_transformation and merge_page should be used instead of:
    - mergeTransformedPage
    - mergeScaledPage
    - mergeRotatedPage
    - mergeTranslatedPage
    - mergeRotatedTranslatedPage
    - mergeRotatedScaledPage
    - mergeScaledTranslatedPage
    - mergeRotatedScaledTranslatedPage

See the CHANGELOG for a full list of changes
This release adds a lot of deprecation warnings in preparation of the
PyPDF2 2.0.0 release. The changes are mostly using snake_case function-, method-,
and variable-names as well as using properties instead of getter-methods.

Maintenance (MAINT):
-  Remove IronPython Fallback for zlib (#868)

Full Changelog: 1.27.12...1.27.13

* Make the `PyPDF2.utils` module private
* Rename of core classes:
  * PdfFileReader ➔ PdfReader
  * PdfFileWriter ➔ PdfWriter
  * PdfFileMerger ➔ PdfMerger
* Use PEP8 conventions for function names and parameters
* If a property and a getter-method are both present, use the property

In many places:
  - getObject ➔ get_object
  - writeToStream ➔ write_to_stream
  - readFromStream ➔ read_from_stream

PyPDF2.generic
  - readObject ➔ read_object
  - convertToInt ➔ convert_to_int
  - DocumentInformation.getText ➔ DocumentInformation._get_text :
    This method should typically not be used; please let me know if you need it.

PdfReader class:
  - `reader.getPage(pageNumber)` ➔ `reader.pages[page_number]`
  - `reader.getNumPages()` / `reader.numPages` ➔ `len(reader.pages)`
  - getDocumentInfo ➔ metadata
  - flattenedPages attribute ➔ flattened_pages
  - resolvedObjects attribute ➔ resolved_objects
  - xrefIndex attribute ➔ xref_index
  - getNamedDestinations / namedDestinations attribute ➔ named_destinations
  - getPageLayout / pageLayout ➔ page_layout attribute
  - getPageMode / pageMode ➔ page_mode attribute
  - getIsEncrypted / isEncrypted ➔ is_encrypted attribute
  - getOutlines ➔ get_outlines
  - readObjectHeader ➔ read_object_header (TODO: read vs get?)
  - cacheGetIndirectObject ➔ cache_get_indirect_object (TODO: public vs private?)
  - cacheIndirectObject ➔ cache_indirect_object (TODO: public vs private?)
  - getDestinationPageNumber ➔ get_destination_page_number
  - readNextEndLine ➔ read_next_end_line
  - _zeroXref ➔ _zero_xref
  - _authenticateUserPassword ➔ _authenticate_user_password
  - _pageId2Num attribute ➔ _page_id2num
  - _buildDestination ➔ _build_destination
  - _buildOutline ➔ _build_outline
  - _getPageNumberByIndirect(indirectRef) ➔ _get_page_number_by_indirect(indirect_ref)
  - _getObjectFromStream ➔ _get_object_from_stream
  - _decryptObject ➔ _decrypt_object
  - _flatten(..., indirectRef) ➔ _flatten(..., indirect_ref)
  - _buildField ➔ _build_field
  - _checkKids ➔ _check_kids
  - _writeField ➔ _write_field
  - _write_field(..., fieldAttributes) ➔ _write_field(..., field_attributes)
  - _read_xref_subsections(..., getEntry, ...) ➔ _read_xref_subsections(..., get_entry, ...)

PdfWriter class:
  - `writer.getPage(pageNumber)` ➔ `writer.pages[page_number]`
  - `writer.getNumPages()` ➔ `len(writer.pages)`
  - addMetadata ➔ add_metadata
  - addPage ➔ add_page
  - addBlankPage ➔ add_blank_page
  - addAttachment(fname, fdata) ➔ add_attachment(filename, data)
  - insertPage ➔ insert_page
  - insertBlankPage ➔ insert_blank_page
  - appendPagesFromReader ➔ append_pages_from_reader
  - updatePageFormFieldValues ➔ update_page_form_field_values
  - cloneReaderDocumentRoot ➔ clone_reader_document_root
  - cloneDocumentFromReader ➔ clone_document_from_reader
  - getReference ➔ get_reference
  - getOutlineRoot ➔ get_outline_root
  - getNamedDestRoot ➔ get_named_dest_root
  - addBookmarkDestination ➔ add_bookmark_destination
  - addBookmarkDict ➔ add_bookmark_dict
  - addBookmark ➔ add_bookmark
  - addNamedDestinationObject ➔ add_named_destination_object
  - addNamedDestination ➔ add_named_destination
  - removeLinks ➔ remove_links
  - removeImages(ignoreByteStringObject) ➔ remove_images(ignore_byte_string_object)
  - removeText(ignoreByteStringObject) ➔ remove_text(ignore_byte_string_object)
  - addURI ➔ add_uri
  - addLink ➔ add_link
  - getPage(pageNumber) ➔ get_page(page_number)
  - getPageLayout / setPageLayout / pageLayout ➔ page_layout attribute
  - getPageMode / setPageMode / pageMode ➔ page_mode attribute
  - _addObject ➔ _add_object
  - _addPage ➔ _add_page
  - _sweepIndirectReferences ➔ _sweep_indirect_references

PdfMerger class
  - `__init__` parameter: strict=True ➔ strict=False (the PdfFileMerger still has the old default)
  - addMetadata ➔ add_metadata
  - addNamedDestination ➔ add_named_destination
  - setPageLayout ➔ set_page_layout
  - setPageMode ➔ set_page_mode

Page class:
  - artBox / bleedBox/ cropBox/ mediaBox / trimBox ➔ artbox / bleedbox/ cropbox/ mediabox / trimbox
    - getWidth, getHeight  ➔ width / height
    - getLowerLeft_x / getUpperLeft_x ➔ left
    - getUpperRight_x / getLowerRight_x ➔ right
    - getLowerLeft_y / getLowerRight_y ➔ bottom
    - getUpperRight_y / getUpperLeft_y ➔ top
    - getLowerLeft / setLowerLeft ➔ lower_left property
    - upperRight ➔ upper_right
  - mergePage ➔ merge_page
  - rotateClockwise / rotateCounterClockwise ➔ rotate_clockwise
  - _mergeResources ➔ _merge_resources
  - _contentStreamRename ➔ _content_stream_rename
  - _pushPopGS ➔ _push_pop_gs
  - _addTransformationMatrix ➔ _add_transformation_matrix
  - _mergePage ➔ _merge_page

XmpInformation class:
  - getElement(..., aboutUri, ...) ➔ get_element(..., about_uri, ...)
  - getNodesInNamespace(..., aboutUri, ...) ➔ get_nodes_in_namespace(..., aboutUri, ...)
  - _getText ➔ _get_text

utils.py:
  - matrixMultiply ➔ matrix_multiply
  - RC4_encrypt is moved to the security module
* STY: Adjust code/docs in several places to make it more similar to the 2.0.0 branch
* MAINT: Remove excessive <py36 warnings
Bug Fixes (BUG):
-  Incorrectly show deprecation warnings on internal usage (#887)

Maintenance (MAINT):
-  Add stacklevel=2 to deprecation warnings (#889)
-  Remove duplicate warnings imports (#888)

Full Changelog: 1.28.0...1.28.1
Fixes a deprecation warning being raised when trying to use the PdfMerger class. This regression of #887 is caused by #889 which reversed the changes done to the PyPDF2/merger.py module so that it once again used the deprecated user-facing isString method as opposed to the internal _isString method.

Additionally, this PR fixes the deprecation warning raised by referencing reader.namedDestinations as opposed to reader.named_destinations.

Closes #890
Bug Fixes (BUG):
-  PendingDeprecationWarning for getContents (#893)
-  PendingDeprecationWarning on using PdfMerger (#891)
@MartinThoma
Copy link
Member Author

Initially, I wanted to get a couple of big shots done to have a great 2.0.0 release with nice new features. However, I do see that the fact that 2.0.0-dev is currently the branch to work on is confusing. For this reason, I thought about this procedure:

  1. Phase PyPDF2 1.X support out: Wait until 29th of May. If there were any breaking changes (e.g. internal deprecation warnings), I would fix those in a 1.29.X release.
  2. main branch: Make the current state of 2.0.0-dev become the new main (e.g. by merging this PR + then merging 2.0.0-dev back into main)
  3. Release PyPDF2 2.0.0 latest on 1st of June and continue with a normal development workflow.
  4. Actual deprecations: Although several of the warnings say that some functions become deprecated, I would keep many around for a long time (e.g. until 1st of June 2023) so that people / articles / tutorials have time to adjust. Especially the PdfFileReader. But I would likely drop tests for those "adapter classes / methods / functions".

I've also added version support in readthedocs:

I would keep the latest patch version for each minor version on readthedocs.

@MasterOdin Do you think that is an acceptable way to continue?

@MasterOdin
Copy link
Member

MasterOdin commented May 23, 2022

I think it would make sense to do the following immediately:

  1. Create a branch 1.x from main
  2. Merge main into 2.0.0-dev
  3. Merge 2.0.0-dev into main

Then, any new commits either go:

  • Commit to main, optionally backport to 1.x
  • Commit to 1.x

Thus, we never go from 1.x -> main, and avoid any painful merge conflicts for the default branch, especially given the amount of BC breakage happening in 2.0.0-dev. I find it infinitely easier to backport things than forward port. Equally easy to make commits to 1.x for any theoretical 1.29 release that includes stuff that doesn't make sense on 2.x branch (e.g. dealing with isString).

@MartinThoma
Copy link
Member Author

That's an excellent plan! I'm about to do it ⏳

@MartinThoma MartinThoma merged commit 3729af0 into 2.0.0-dev May 24, 2022
@MartinThoma MartinThoma deleted the merge-main-ours branch May 24, 2022 05:38
@MartinThoma
Copy link
Member Author

MartinThoma commented May 24, 2022

I merged #859 into main. I hope I didn't mess up 🙈

Next steps:

  1. Fix the warnings when running the tests (mostly by switching to the new syntax, but I also want to add a few tests for the core parts that got deprecated to ensure I didn't accidentally remove something like PdfFileReader)
  2. Update the merge targets of several existing PRs which currently might point to 2.0.0-dev
  3. Support people who are still on an older version

I'm not exactly sure yet when to make the 2.0.0 release. Some time after the warnings got fixed, I guess. But I would love to have also a nice new feature, e.g. there was something with cryptography and improved text extraction.

@MasterOdin
Copy link
Member

MasterOdin commented May 25, 2022

Actual deprecations: Although several of the warnings say that some functions become deprecated, I would keep many around for a long time (e.g. until 1st of June 2023) so that people / articles / tutorials have time to adjust. Especially the PdfFileReader. But I would likely drop tests for those "adapter classes / methods / functions".

Should the deprecation warning messages then be updated to say 3.0.0 instead of 2.0.0 for the stuff that'll stick around into the 2.0.0 release? I do think that we should try to have tests around the methods that just test that a deprecation warning is raised just to ensure that functionality works, but otherwise yeah, ignore any sort of deeper testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants