Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-115988: Add missing ARM64 and RISCV filter in lzma module #115989

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

ivq
Copy link

@ivq ivq commented Feb 27, 2024

#115988

Example:

import lzma

filter = [
    { "id": lzma.FILTER_RISCV },
    { "id": lzma.FILTER_LZMA2 },
]

in_fn = "in.bin"
out_fn = "out.bin"

with open(in_fn, mode="rb") as f:
    in_data = f.read()

with lzma.open(out_fn, mode="wb", filters=filter) as f:
    f.write(in_data)

馃摎 Documentation preview 馃摎: https://cpython-previews--115989.org.readthedocs.build/

Copy link

cpython-cla-bot bot commented Feb 27, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Feb 27, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Copy link
Member

@terryjreedy terryjreedy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does test_lzma test the existing constants in any way? Such as for validity of the spelling? If so, the new ones should be added.

Someone more familiar with lzma should do a commit review.

Doc/library/lzma.rst Outdated Show resolved Hide resolved
Modules/_lzmamodule.c Show resolved Hide resolved
Modules/_lzmamodule.c Show resolved Hide resolved
@bedevere-app
Copy link

bedevere-app bot commented Feb 27, 2024

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@bedevere-app
Copy link

bedevere-app bot commented Feb 28, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@ivq
Copy link
Author

ivq commented Feb 28, 2024

I have made the requested changes; please review again.

@terryjreedy
Copy link
Member

I cannot do a commit review of this as it is out of my expertise. I requested reviews from two coredevs who have edited the file most recently.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are pretty new features (https://xz.tukaani.org/format/xz-file-format-1.2.0.txt):

        1.2.0     2024-01-19    Added RISC-V filter and updated URLs in
                                Sections 0.2 and 7. The URL of this
                                specification was changed.

        1.1.0     2022-12-11    Added ARM64 filter and clarified 32-bit
                                ARM endianness in Section 5.3.2,
                                language improvements in Section 5.4

Adding support for them is a new feature in Python, so you need to add versionadded or versionchanged directives in the module documentation and add the corresponding entry in the What's New file.

You should also specify that the new filters (work? are supported? are available?) only with the specified XZ (library? file format?) version. Please add tests for new filters. They should produce expected result or raise an expected exception depending on the xz version.

I am not sure whether these constants should be defined if the underlying library does not support them. The new tests should show what is better.

@bedevere-app
Copy link

bedevere-app bot commented Mar 17, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@ivq
Copy link
Author

ivq commented Mar 17, 2024

They are pretty new features (xz.tukaani.org/format/xz-file-format-1.2.0.txt):

        1.2.0     2024-01-19    Added RISC-V filter and updated URLs in
                                Sections 0.2 and 7. The URL of this
                                specification was changed.

        1.1.0     2022-12-11    Added ARM64 filter and clarified 32-bit
                                ARM endianness in Section 5.3.2,
                                language improvements in Section 5.4

Adding support for them is a new feature in Python, so you need to add versionadded or versionchanged directives in the module documentation and add the corresponding entry in the What's New file.

You should also specify that the new filters (work? are supported? are available?) only with the specified XZ (library? file format?) version. Please add tests for new filters. They should produce expected result or raise an expected exception depending on the xz version.

I am not sure whether these constants should be defined if the underlying library does not support them. The new tests should show what is better.

OK. Done. Correct my if anything is missing.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. You only need to add an entry in the What's New document.

Doc/library/lzma.rst Outdated Show resolved Hide resolved
@ivq
Copy link
Author

ivq commented Mar 19, 2024

LGTM. You only need to add an entry in the What's New document.

Added

@ivq
Copy link
Author

ivq commented Mar 24, 2024

Ping @terryjreedy

@terryjreedy
Copy link
Member

As I said above, I am done with this. @serhiy-storchaka should do final review of changes he requested.

@serhiy-storchaka
Copy link
Member

It all LGTM, and I can merge it at any moment. Thank you for your contribution @ivq.

But it touches some other issues.

  • Data for lzma tests. Currently it is inlined in the Python code, and it takes up a lot of space (many hundreds lines). Your PR is correct, you follow the existing practice, but it may be better to save them as separate files (and do the same for gzip and bz2 tests). Some of zipfile and tarfile test data is already saved as separate files, other is inlined.
  • You exposed the version of the lzma library as strings. This is the same as what was done for zlib, so it all is fine. But a structured version info is actually more useful, and it is what is needed for tests. We should consider to add structured version objects for gzip, lzma, and other libraries.

I will try to resolve these issues separately. If I do this in the near future, this PR will need to be reworked (I can do it, or you can if you want). If it takes a long time, I will merge this PR as is.

@ivq
Copy link
Author

ivq commented Mar 28, 2024

It all LGTM, and I can merge it at any moment. Thank you for your contribution @ivq.

But it touches some other issues.

* Data for `lzma` tests. Currently it is inlined in the Python code, and it takes up a lot of space (many hundreds lines). Your PR is correct, you follow the existing practice, but it may be better to save them as separate files (and do the same for `gzip` and `bz2` tests). Some of `zipfile` and `tarfile` test data is already saved as separate files, other is inlined.

* You exposed the version of the `lzma` library as strings. This is the same as what was done for `zlib`, so it all is fine. But a structured version info is actually more useful, and it is what is needed for tests. We should consider to add structured version objects for `gzip`, `lzma`, and other libraries.

I will try to resolve these issues separately. If I do this in the near future, this PR will need to be reworked (I can do it, or you can if you want). If it takes a long time, I will merge this PR as is.

OK. It's up to your schedule. I am willing to help.
I did consider structured version info but thought string should be just enough as normal use cases would not care about
library version. Note that lzma version string may contain more info: https://github.com/tukaani-project/xz/blob/master/src/liblzma/api/lzma/version.h#L99

@erkia
Copy link

erkia commented Mar 30, 2024

Pre-generated random binary data in tests is not considered a good practice nowadays, I guess :)

@ivq
Copy link
Author

ivq commented Mar 30, 2024

Pre-generated random binary data in tests is not considered a good practice nowadays, I guess :)

True. Especially after that the xz backdoor was discovered recently https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27.
I think we should lock the PR right now until things get clear and maybe after the xz code base is audited.

We only test that the constants are defined if the library we compiled
against was supposed to define them.

Their functionality is up to the external library and not something we
need to test ourselves.

We don't want to accept new binary data from the ill-fated backdoored xz
project into CPython itself.
Modules/_lzmamodule.c Outdated Show resolved Hide resolved
Lib/test/test_lzma.py Outdated Show resolved Hide resolved
@gpshead
Copy link
Member

gpshead commented Mar 30, 2024

I think we should lock the PR right now until things get clear and maybe after the xz code base is audited.

this PR itself seems fine and is the kind of thing we want, though the fate of the upstream code is certainly unknown at the moment it isn't related to this specific issue. (I removed the added binary test data)

@serhiy-storchaka - I'm glad we agree on what needs doing in this PR! I hope I'm not stepping on your toes by going ahead and doing some of that today. =)

We make lzma_version be the runtime version as that is the one most
interestin to users.  The compile time version remains exposed as
the header version.
@gpshead gpshead added the type-feature A feature request or enhancement label Mar 30, 2024
@gpshead
Copy link
Member

gpshead commented Mar 30, 2024

Alright, I believe this is ready. I'll let @serhiy-storchaka do the final review and merge.

@gpshead
Copy link
Member

gpshead commented Mar 31, 2024

exposing the decimal number from the C library probably isn't right, I like Serhiy's larger issue. These can be structured (tuple). The code to do so won't be much different thanks to Py_BuildValue APIs.

@gpshead
Copy link
Member

gpshead commented Mar 31, 2024

and we can elide the LZMA_ prefix on the names and not include them in __all__. They're in the lzma module after all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting merge type-feature A feature request or enhancement
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

None yet

5 participants