Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zchunk support #92

Merged
merged 14 commits into from Nov 26, 2018

Conversation

Projects
None yet
6 participants
@jdieter
Copy link
Contributor

jdieter commented Jun 12, 2018

This request adds zchunk support to createrepo_c. It does now pass the python test cases, but there are some untested edge cases that will definitely need some work, namely using zchunk compression for anything other than primary, filelists and other.

The zchunk metadata is in addition to the standard gzip metadata and createrepo_c currently only creates zchunk versions primary, filelists and other.

@Conan-Kudo

This comment has been minimized.

Copy link
Member

Conan-Kudo commented Jun 12, 2018

@jdieter So this means that if I added other files to the metadata using modifyrepo_c, we can't chunk via zck for now?

@jdieter

This comment has been minimized.

Copy link
Contributor Author

jdieter commented Jun 13, 2018

That is correct. I don't think it will be too hard to fix that, especially now that zchunk has the ability to chunk using the buzhash algorithm as well as user-specified blocks, but I first wanted to get the big three (well, more the big two: primary and filelists) sorted.

@Conan-Kudo

This comment has been minimized.

Copy link
Member

Conan-Kudo commented Jun 13, 2018

Well, as long as there is a plan for eventually supporting any arbitrary files manipulated by createrepo_c, this looks good to me.

@Conan-Kudo
Copy link
Member

Conan-Kudo left a comment

Overall, looks good.

A couple of things:

  1. Can you regenerate the man pages to include your new options? The script to do so is in utils directory.
  2. Could you add some tests specifically for zck stuff?
{ "zck-filelists-dict", 0, 0, G_OPTION_ARG_FILENAME, &(_cmd_options.zck_filelists_dict),
"Compression dictionary to use for zchunk filelists file", "ZCK_FILELISTS_DICT" },
{ "zck-other-dict", 0, 0, G_OPTION_ARG_FILENAME, &(_cmd_options.zck_other_dict),
"Compression dictionary to use for zchunk other file", "ZCK_OTHER_DICT" },

This comment has been minimized.

@Conan-Kudo

Conan-Kudo Jun 13, 2018

Member

Are the dictionaries mandatory, or optional? My understand of zstd is that they make it better for compression, but aren't necessary.

This comment has been minimized.

@jdieter

jdieter Jun 13, 2018

Author Contributor

They are optional, but make a significant difference in the size of the zchunked metadata. I've put together dictionaries for primary, filelists and other based on data from current Fedora releases, that I'd suggest packaging for Fedora/EPEL (if that's what our builders are on).

The dictionary is embedded in the zchunk file, so clients don't need a copy.

If the dictionary changes, there will be no common chunks between the pre-change and post-change files, so dictionary changes should happen a maximum of once a release (and probably aren't needed even that often)

This comment has been minimized.

@Conan-Kudo

Conan-Kudo Jun 13, 2018

Member

Forgive me for being forgetful, but is there a script or something that people can use to generate dictionaries for their distributions' metadata?

This comment has been minimized.

@jdieter

jdieter Jun 13, 2018

Author Contributor

I've got a small python script that splits the metadata into much smaller files and strips out any checksums (which by definition are unpredictable).

I ran the script over a couple of days worth of updates metadata, which produced thousands of files, and then, in the directory containing those files, ran:

zstd --train --dictID=0 *

The default dict size of roughly 100KB was fairly optimal for Fedora's metadata. Too small and you don't see the savings you'd expect. Too large and you're just wasting space.

This comment has been minimized.

@jdieter

jdieter Jun 13, 2018

Author Contributor

The proposed Fedora dicts are available at https://www.jdieter.net/downloads/zchunk-dicts

This comment has been minimized.

@jdieter

jdieter Jun 13, 2018

Author Contributor

I've gone ahead and updated the man page. I'll work on the tests but it might take a bit longer to get them done.

src/misc.c Outdated
@@ -1256,6 +1256,46 @@ cr_write_to_file(GError **err, gchar *filename, const char *format, ...)
return ret;
}

gchar *cr_read_from_file(GError **err, gchar *filename, size_t *file_size) {

This comment has been minimized.

@cgwalters

cgwalters Jun 13, 2018

How is this different from g_file_get_contents()?

This comment has been minimized.

@jdieter

jdieter Jun 13, 2018

Author Contributor

Wow, I feel stupid. It's exactly the same. I've just pushed a commit that remove cr_read_from_file and uses g_file_get_contents instead.

@jdieter jdieter referenced this pull request Jun 13, 2018

Closed

Zchunk support for dnf #1107

@jdieter

This comment has been minimized.

Copy link
Contributor Author

jdieter commented Jun 22, 2018

@Conan-Kudo, I've added some tests (and found and fixed some problems in the process). Do you have some further suggestions as to which tests I should create?

@jdieter

This comment has been minimized.

Copy link
Contributor Author

jdieter commented Jun 22, 2018

Just a heads up that the latest tests require zchunk-0.7.3 (available in the copr) in order to work, as it added the --stdout flag to unzck.

@Conan-Kudo

This comment has been minimized.

Copy link
Member

Conan-Kudo commented Jun 22, 2018

No, it looks good to me.

@Conan-Kudo
Copy link
Member

Conan-Kudo left a comment

One small issue with the CMakeLists...

@@ -34,7 +34,13 @@ find_package(LZMA REQUIRED)
find_package(OpenSSL REQUIRED)
find_package(Sqlite3 REQUIRED)
find_package(ZLIB REQUIRED)

find_package(PkgConfig REQUIRED)
find_library(ZCHUNKLIB NAMES zck)

This comment has been minimized.

@Conan-Kudo

Conan-Kudo Jun 22, 2018

Member

Could you write a FindZChunk.cmake similar to the FindGLIB2.cmake module and use that instead?

This comment has been minimized.

@jdieter

jdieter Jun 22, 2018

Author Contributor

Ok, let me know if this works or you want something different

@jdieter jdieter force-pushed the jdieter:zchunk branch from d96b051 to 36709e9 Jul 8, 2018

@@ -34,6 +34,8 @@ find_package(LZMA REQUIRED)
find_package(OpenSSL REQUIRED)
find_package(Sqlite3 REQUIRED)
find_package(ZLIB REQUIRED)
find_package(PkgConfig REQUIRED)
find_package(Zchunk REQUIRED)

This comment has been minimized.

@ignatenkobrain

ignatenkobrain Jul 30, 2018

Member

why not to use pkg_check_modules() ? do not over-complicate this. please.

This comment has been minimized.

@jdieter

jdieter Aug 8, 2018

Author Contributor

I didn't know about pkg_check_modules(). Thanks for the pointer!

cr_contentstat_free(pri_zck_stat, NULL);
cr_contentstat_free(fil_zck_stat, NULL);
cr_contentstat_free(oth_zck_stat, NULL);
g_free(pri_zck_filename);

This comment has been minimized.

@ignatenkobrain

ignatenkobrain Jul 30, 2018

Member

use g_autofree

This comment has been minimized.

@jdieter

jdieter Aug 8, 2018

Author Contributor

g_autofree() was added in glib-2.44. What's the minimum glib version that we need to support?

int ret = CRE_OK;

assert(cr_file);
assert(!err || *err == NULL);

This comment has been minimized.

@ignatenkobrain

ignatenkobrain Jul 30, 2018

Member

should be g_return_val_if_fail

@jdieter jdieter force-pushed the jdieter:zchunk branch from 49640f1 to 41a2fa6 Aug 8, 2018

@Conan-Kudo
Copy link
Member

Conan-Kudo left a comment

This change set looks good to me.

@ignatenkobrain
Copy link
Member

ignatenkobrain left a comment

generally looks good, but you miss include_directories(ZCK_INCLUDE_DIRS) or smth like that

@jdieter

This comment has been minimized.

Copy link
Contributor Author

jdieter commented Aug 16, 2018

There are no zck-specific include directories. zck.h is in /usr/include. Should I still have include_directories(...)?

@Conan-Kudo

This comment has been minimized.

Copy link
Member

Conan-Kudo commented Aug 16, 2018

@jdieter Yes, because it might be installed to non-standard paths by someone else.

@jdieter

This comment has been minimized.

Copy link
Contributor Author

jdieter commented Aug 16, 2018

Ok, is this right?

@Conan-Kudo

This comment has been minimized.

Copy link
Member

Conan-Kudo commented Aug 16, 2018

Looks good to me.

@j-mracek

This comment has been minimized.

Copy link
Contributor

j-mracek commented Aug 21, 2018

Please whats a status of availability of packages that fulfill requirement "pkg_check_modules(ZCK REQUIRED zck)" in Fedora/Centos/RHEL ... ?

@Conan-Kudo

This comment has been minimized.

Copy link
Member

Conan-Kudo commented Aug 21, 2018

@j-mracek It's been available in Fedora, CentOS, RHEL, openSUSE, and Mageia for a couple of weeks now.

@j-mracek

This comment has been minimized.

Copy link
Contributor

j-mracek commented Aug 21, 2018

Please and whats a package name, because I was unable to find it?

@Conan-Kudo

This comment has been minimized.

Copy link
Member

Conan-Kudo commented Aug 21, 2018

zchunk-devel is the package you need installed for this code. It's in EPEL for CentOS/RHEL 7. That package name will work across all distributions (though this will install lib64zck-devel on Mageia, and libzck-devel on openSUSE).

You can also request it by the pkgconfig name: sudo dnf install 'pkgconfig(zck)'.

@jdieter

This comment has been minimized.

Copy link
Contributor Author

jdieter commented Aug 23, 2018

I've been thinking about how zchunk-enabled createrepo_c uses dictionaries, and the current method is a bit awkward. To use createrepo_c without dictionaries, you just run:

createrepo_c --zck ./

But to enable dictionary use, you need to run:

createrepo_c --zck --zck-primary-dict=foo-primary.dict --zck-filelists-dict=foo-filelists.dict --zck-other-dict=foo-other.dict ./

As we look at adding zchunk support to prestodelta.xml and other files generated by createrepo_c or added using modifyrepo_c, this starts to become very unwieldy. I suggest a different solution, passing a --zck-dict-dir argument that points to a directory containing foo.dict where foo is primary, filelists, other, prestodelta, etc. The command usage would look like:

createrepo_c --zck --zck-dict-dir=/usr/share/repo-dict

If there is no dictionary in the directory that matches the current file, then the file would be compressed without a dictionary.

Thoughts? Should I go ahead and implement this as part of this patch set?

@Conan-Kudo

This comment has been minimized.

Copy link
Member

Conan-Kudo commented Aug 23, 2018

@jdieter That sounds fine to me, but I suggest that you document what it expects for that.

@Conan-Kudo

This comment has been minimized.

Copy link
Member

Conan-Kudo commented Aug 31, 2018

Rather than having that structure exclusively, it's probably better to support both ways.

So, for now, I'd like to see this go in as-is, and we can add the --zck-dict-dir option later.

@jdieter

This comment has been minimized.

Copy link
Contributor Author

jdieter commented Sep 1, 2018

@Conan-Kudo, if you feel strongly about it, we can keep the current options, but I feel like they're really bulky. I think --zck-dict-dir is a more elegant method, and I'd definitely prefer to drop the current options before people start depending on them.

@jdieter

This comment has been minimized.

Copy link
Contributor Author

jdieter commented Sep 8, 2018

@dmach Ok, I've gone ahead and written a patch to make zchunk optional, but it will currently fail the python tests when you compile with -DWITH_ZCHUNK=no. I'm going to work on the tests next.

FWIW, I've been focusing on getting createrepo_c fully finished before fixing up the client side because we've changed how the zchunk metadata is stored in repomd.xml due to mailing list feedback. The zchunk library has also had some API changes that I haven't yet ported to librepo.

If you'd prefer to review all of the patches simultaneously, we can do that, but my preference would be to get createrepo done first, and then do the others together afterwards.

@jdieter

This comment has been minimized.

Copy link
Contributor Author

jdieter commented Sep 8, 2018

@Conan-Kudo, I've gone ahead and implemented #92 (comment). When you set --zck-dict-dir, createrepo_c will look for a file in the directory that matches the metadata file and a .dict extension. So for primary.xml, the directory would need to have primary.zck.dict. This way, you can (eventually) use the same arguments when adding files using modifyrepo.

@Conan-Kudo

This comment has been minimized.

Copy link
Member

Conan-Kudo commented Sep 8, 2018

@jdieter So this does not yet work for mergerepo_c (used by Koji) and modifyrepo_c (used by COPR)?

@jdieter

This comment has been minimized.

Copy link
Contributor Author

jdieter commented Sep 8, 2018

@Conan-Kudo, you still can't run --zck in mergerepo_c or modifyrepo_c, but, at least as far as I've tested (which hasn't been extensive), they will keep any zchunk metadata that's already been added.

@jdieter

This comment has been minimized.

Copy link
Contributor Author

jdieter commented Sep 13, 2018

@jdieter So this does not yet work for mergerepo_c (used by Koji) and modifyrepo_c (used by COPR)?

cb111c3 implements zchunk support in mergerepo_c, just for primary, filelists and other.

@jdieter

This comment has been minimized.

Copy link
Contributor Author

jdieter commented Oct 17, 2018

Is there anything else that needs to be done with this before it's reviewed?

@jdieter

This comment has been minimized.

Copy link
Contributor Author

jdieter commented Nov 13, 2018

@dmach, now that F29's out the door, I'd love to start generating zchunk metadata in official Fedora. Is there anything I can do to help get this reviewed?

@Conan-Kudo

This comment has been minimized.

Copy link
Member

Conan-Kudo commented Nov 19, 2018

@jdieter Can you please rebase this change? There are reported conflicts from the other merge earlier today.

jdieter added some commits Apr 16, 2018

Add zchunk integration
Signed-off-by: Jonathan Dieter <jdieter@gmail.com>
Use pkg_check_modules to simplify CMakeLists
Signed-off-by: Jonathan Dieter <jdieter@gmail.com>
Fix C tests
Signed-off-by: Jonathan Dieter <jdieter@gmail.com>
Add zchunk include directories
Signed-off-by: Jonathan Dieter <jdieter@gmail.com>
Add --zck-dict-dir argument and remove --zck-primary-dict,
--zck-filelists-dict and --zck-other-dict.

Jonathan
Make zchunk dependency optional
Signed-off-by: Jonathan Dieter <jdieter@gmail.com>
Pass python tests when not compiled with zchunk support
Signed-off-by: Jonathan Dieter <jdieter@gmail.com>
Set zchunk defaults, move zchunk compression options in list and do
argument checking in cmd_parser.c rather than createrepo_c.c

Signed-off-by: Jonathan Dieter <jdieter@gmail.com>
Add zchunk support to mergerepo
Signed-off-by: Jonathan Dieter <jdieter@gmail.com>
Chunk packages based on srpm rather than rpm
Signed-off-by: Jonathan Dieter <jdieter@gmail.com>
Free header checksum from open_stat
Signed-off-by: Jonathan Dieter <jdieter@gmail.com>
Free dictionary file names when we're done with them
Signed-off-by: Jonathan Dieter <jdieter@gmail.com>

@jdieter jdieter force-pushed the jdieter:zchunk branch from 9e7c830 to 07e1235 Nov 19, 2018

@jdieter

This comment has been minimized.

Copy link
Contributor Author

jdieter commented Nov 19, 2018

Done! Thanks for the heads up!

@Conan-Kudo

This comment has been minimized.

Copy link
Member

Conan-Kudo commented Nov 19, 2018

@dmach Can we please merge this now?

#include <zlib.h>
#include <bzlib.h>
#include <lzma.h>
#include <zck.h>

This comment has been minimized.

@dmach

dmach Nov 26, 2018

Contributor

This needs to be wrapped into #ifdef WITH_ZCHUNK

This comment has been minimized.

@jdieter

jdieter Nov 26, 2018

Author Contributor

Ok, this is done now. I did a test build on a system without zchunk, and it built and ran just fine.

This comment has been minimized.

@jdieter

jdieter Nov 26, 2018

Author Contributor

Thanks so much for committing these patches!!!

Add missing #ifdef around zck.h header
Signed-off-by: Jonathan Dieter <jdieter@gmail.com>
@dmach

dmach approved these changes Nov 26, 2018

@dmach dmach merged commit 93c9970 into rpm-software-management:master Nov 26, 2018

@dmach

This comment has been minimized.

Copy link
Contributor

dmach commented Nov 26, 2018

I'll submit a patch to Fedora's dist-git, spec needs couple tweaks to handle bcond_with(out) zchunk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.