Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd compression support #606

Closed
dralley opened this issue Jan 31, 2023 · 13 comments
Closed

zstd compression support #606

dralley opened this issue Jan 31, 2023 · 13 comments

Comments

@dralley
Copy link

dralley commented Jan 31, 2023

Currently DNF supports zstd compression for all other metadata, and this support is (IIRC) enabled on RHEL 8.2+

Createrepo_c is adding support so it is conceivable that it will be used more commonly soon: rpm-software-management/createrepo_c#345

I intend to promote its use in Fedora once that support is stabilized, as I've found it does a significantly better job with RPM metadata than Gzip / Zlib and is generally much faster to compress than Xz / Lzma.

@ppisar
Copy link
Collaborator

ppisar commented Feb 1, 2023

Adding new compression formats into createrepo_c and librepo (used by DNF) is the right place.

Extending libmodulemd to support new compression formats is not good. In my opinion adding a compression support into libmodulemd was a mistake. Applications should compress and decompress the YAML files as the like and only pass a raw YAML into libmoduemd. I actually plan to deprecate the compression support from libmodulemd.

If you agree, I will close this libmodulemd issue as there is nothing to do in libmodulemd.

@dralley
Copy link
Author

dralley commented Feb 1, 2023

Acknowledged

ppisar added a commit to ppisar/libmodulemd that referenced this issue Feb 1, 2023
YUM repositories gain support for more and more compression formats.
That naturally means enhancing createrepo_c (a repository producer),
and librepo (a repository consumer). The repository works here as
a transport mechanism.

Therefore it does not make sense to duplicate the compression support
in libmodulemd. Especially if the main application is package manager
which uses librepo for retrieving YAML documents. These applications
can use librepo for decompressing the documents before passing them
into libmodulemd.

Compression support in libmodulemd needs magic (file) and rpm
libraries. Especially libmagic carry dozen-megabyte large file format
database. It is desired to remove compression support from
libmodulemd.

As a result, this patch marks compression support deprecated.

<fedora-modularity#606>
@dralley
Copy link
Author

dralley commented Feb 1, 2023

@ppisar I do not see where in librepo compression types are handled? Are they handled there at all?

@ppisar
Copy link
Collaborator

ppisar commented Feb 2, 2023

I haven't looked yet at librepo. But there or somewhere else must be something because primary.xml is also compressed.

@kontura
Copy link
Contributor

kontura commented Feb 2, 2023

It is handled by libsolv.

@ppisar
Copy link
Collaborator

ppisar commented Feb 2, 2023

That's a terrible design. I'm working on an XML representation of module metadata and I do not want to duplicate the handling of compression. Because there is no compression declaration in repomd.xml, the consumers have to wild-guess and that makes the implementation prone to various attacks. Having the same functionality in libsolv, libmodulemd, and now in libmodulexd (my working name for the XML handler) is suboptimal.

In the light of the news I will accept extending libmodulemd for zstd, however I'd rather see separating the decompression into librepo or a yet another library. Or can libsolv decompress an arbitrary file from a YUM repository and return its raw representation?

@dralley
Copy link
Author

dralley commented Feb 2, 2023

@ppisar It sounds like what you want is a single library for all handling of RPM repository metadata.

@ppisar
Copy link
Collaborator

ppisar commented Feb 8, 2023

Yes, and that library should be librepo.

@kontura
Copy link
Contributor

kontura commented Feb 8, 2023

Or can libsolv decompress an arbitrary file from a YUM repository and return its raw representation?

I think it can (de)compress an arbitrary file: https://github.comodulemd/modulemd-module-index.cm/openSUSE/libsolv/blob/master/doc/libsolv-bindings.txt#file-management

libdnf even has a wrapper (for reading): https://github.com/rpm-software-management/libdnf/blob/dnf-4-master/libdnf/utils/CompressedFile.cpp

@ppisar
Copy link
Collaborator

ppisar commented Mar 29, 2023

I've prepared a support for Zstandard in libmodulemd. The library supports a detection of the compressed format. One of the detection method is by a file name extension. Therefore I'd like to know which extension createrepo_c will produce. Zstandard standard dictates ".zst", however this createrepo_c pull request rpm-software-management/createrepo_c#345 supports both ".zst" and ".zstd".

Which extension will be used in YUM repositories?

@dralley
Copy link
Author

dralley commented Mar 29, 2023

I agree, it should probably just be .zst

ppisar added a commit to ppisar/libmodulemd that referenced this issue Mar 30, 2023
Magic numbers and file name extension of Zstandard archives are taken
from RFC 8878. File extensions were consuleted with crearerepo_c
<rpm-software-management/createrepo_c#345>.
Decompression is implemented with rpmio, which added the support in
rpm-4.14.0. Zstandard support in rpmio is checked at run-time.

NOTE: Only standardized "zst" file name extension is suppored.
I believe that createrepo_c won't implement nonstandard "zstd"
extension
<rpm-software-management/createrepo_c#345>.

NOTE: Version bumped to 2.15 because this adds a new feature and
enhances API.

TODO: Check interoperability with crearerepo_c when it receives the
support.

TODO: Deprecate MODULEMD_COMPRESSION_TYPE_SENTINEL and remove it on
next ABI break. This end of enum prevents from adding new enum values
in natural order. E.g. if an application uses
MODULEMD_COMPRESSION_TYPE_SENTINEL for whatever reason, inserting
a value before it would change MODULEMD_COMPRESSION_TYPE_SENTINEL
value which is an ABI break. Funilly that value is not used anywhere
in libmodulemd code. Only in tests to iterate over the enum.

fedora-modularity#606
@ppisar
Copy link
Collaborator

ppisar commented Mar 30, 2023

I'm glad that you agreed. I only merged a support for "zst" extension. Once createrepo_c receives the support, I can revisit the "zstd" extension.

@ppisar ppisar closed this as completed Mar 30, 2023
@kontura
Copy link
Contributor

kontura commented Apr 3, 2023

I have updated the createrepo_c PR to recognize just zst. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants