Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conditional zstd compression support #345

Merged
merged 2 commits into from
May 16, 2023

Conversation

kontura
Copy link
Contributor

@kontura kontura commented Jan 31, 2023

For: #82

Including zstd is optional but defaults to ON.

I have converted PR: #251 to not use rpmio but instead use zstd directly by basically inlining the rpm zstd functions.

I chose the compression level value 11 kind of arbitrarily, we may want to find a better value.

src/cmd_parser.c Outdated Show resolved Hide resolved
@@ -12,8 +12,10 @@

%if 0%{?rhel}
%bcond_with zchunk
%bcond_with zstd
Copy link
Contributor

@dralley dralley Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike zchunk, support for zstd compressed metadata is (I believe) already enabled in libsolv on RHEL, there are just no repos with such metadata because none of the tools support it yet. So it might make sense to likewise default to ON for RHEL >7. Or at least 9+.

Copy link
Contributor Author

@kontura kontura Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have enabled it also for rhel 8. I believe it was enabled by: https://bugzilla.redhat.com/show_bug.cgi?id=1914876 (that is since 8.4) so I have added a note about that.

@dralley
Copy link
Contributor

dralley commented Jan 31, 2023

I've filed fedora-modularity/libmodulemd#606, as far as I can tell createrepo_c uses its own wrappers for opening up modulemd metadata files so it shouldn't be a problem here, but we wouldn't want to produce zstd compressed module metadata and have libmodulemd not be able to read it.

@dralley
Copy link
Contributor

dralley commented Feb 1, 2023

Basic testing with seems to work fine. Results are good. This is with the Fedora 37 "Everything" release repo.

Before:

Screenshot from 2023-01-31 23-32-20

After (zstd level 9):

Screenshot from 2023-01-31 23-38-22

After (zstd level 11):

Screenshot from 2023-01-31 23-43-29

Doesn't seem to be a lot of difference between 9 and 11 though the latter takes about 30% longer. I'd maybe recommend going with that unless we find some cases where there's a significant difference.

@j-mracek
Copy link
Member

j-mracek commented Feb 1, 2023

I think it is a great time to change createrepo defaults and not only for compression. But we have to be careful that metadata are compatible with DNF from RHEL8.0

For testing it would be also good to test a large RHEL8 metadata. If I good remember they have a different order of element then Fedora metadata. As most important value I suggest compressed size, decompression time, and RAM requirement for decompression. I am mentioning RAM, because if I good remember with some setting it could create the large dictionary that requires more RAM - minimal system requirements are not small at all, but people have 0.5 or 1 GB RAM on their systems and then they complain to DNF.

I think that the time required to compress files is important but it has less impact then other values.

@dralley
Copy link
Contributor

dralley commented Feb 1, 2023

If I good remember they have a different order of element then Fedora metadata.

You mean no order :) RHELDST-15791

For testing it would be also good to test a large RHEL8 metadata. If I good remember they have a different order of element then Fedora metadata. As most important value I suggest compressed size, decompression time, and RAM requirement for decompression. I am mentioning RAM, because if I good remember with some setting it could create the large dictionary that requires more RAM - minimal system requirements are not small at all, but people have 0.5 or 1 GB RAM on their systems and then they complain to DNF.

As far as I can tell lower level = less memory, as with higher levels the window and dictionary sizes are larger.

I did use /usr/bin/time -v during some of the testing, which outputs max RSS, and it looked comparable. But that would be measuring both compression and decompression.

@kontura
Copy link
Contributor Author

kontura commented Feb 2, 2023

Ok, I have set it to level 9 for now, it seems like a good compromise between compression efficiency and speed.

I did some quick tests as well (on rhel metadata with 22 000 pkgs) and it seems that zstd (with level 9) is strictly better than gz in all mentioned aspects except maybe for memory usage during decompression but only by a tiny amount (~2%).

I can do a more detailed analysis when we will be changing the default from gz to zstd.

@dralley
Copy link
Contributor

dralley commented Feb 13, 2023

It still looks fine, but what got updated in the last push?

@kontura
Copy link
Contributor Author

kontura commented Feb 13, 2023

It still looks fine, but what got updated in the last push?

Sorry, I should explain, there was a segfault when accessing one of the errors.
I found it when running rpm-software-management/ci-dnf-stack#1217 test with mergerepo_c.

src/cmd_parser.c Outdated Show resolved Hide resolved
ppisar added a commit to ppisar/libmodulemd that referenced this pull request Mar 30, 2023
Magic numbers and file name extension of Zstandard archives are taken
from RFC 8878. File extensions were consuleted with crearerepo_c
<rpm-software-management/createrepo_c#345>.
Decompression is implemented with rpmio, which added the support in
rpm-4.14.0. Zstandard support in rpmio is checked at run-time.

NOTE: Only standardized "zst" file name extension is suppored.
I believe that createrepo_c won't implement nonstandard "zstd"
extension
<rpm-software-management/createrepo_c#345>.

NOTE: Version bumped to 2.15 because this adds a new feature and
enhances API.

TODO: Check interoperability with crearerepo_c when it receives the
support.

TODO: Deprecate MODULEMD_COMPRESSION_TYPE_SENTINEL and remove it on
next ABI break. This end of enum prevents from adding new enum values
in natural order. E.g. if an application uses
MODULEMD_COMPRESSION_TYPE_SENTINEL for whatever reason, inserting
a value before it would change MODULEMD_COMPRESSION_TYPE_SENTINEL
value which is an ABI break. Funilly that value is not used anywhere
in libmodulemd code. Only in tests to iterate over the enum.

fedora-modularity#606
@kontura kontura force-pushed the zstd branch 2 times, most recently from 59260bb to 8c57740 Compare April 3, 2023 06:50
@@ -255,6 +277,8 @@ cr_compression_type(const char *name)
type = CR_CW_XZ_COMPRESSION;
if (!g_strcmp0(name_lower, "zck"))
type = CR_CW_ZCK_COMPRESSION;
if (!g_strcmp0(name_lower, "zst") || !g_strcmp0(name_lower, "zstd"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this "zst" and "zstd" choice? The compression is named "Zstandard" and "zstd". I think you are still mistaken it with "zst" file name extension. Can you remove "zst" option from here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry, I went through it again, hopefully "zst" is just an extension now.

@dralley
Copy link
Contributor

dralley commented Apr 25, 2023

@kontura Is anything remaining blocking this?

@kontura
Copy link
Contributor Author

kontura commented Apr 26, 2023

@dralley it should be ready. I need to find a reviewer.

@m-blaha m-blaha merged commit b9a323c into rpm-software-management:master May 16, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

5 participants