New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RPM with Copy on Write #1470
Closed
Closed
RPM with Copy on Write #1470
Commits on Feb 1, 2021
-
This is part of https://fedoraproject.org/wiki/Changes/RPMCoW The majority of changes are in two new programs: = rpm2extents Modeled as a 'stream processor'. It reads a regular .rpm file on stdin, and produces a modified .rpm file on stdout. The lead, signature and headers are preserved 1:1 to allow all the normal metadata inspection, signature verification to work as expected. Only the 'payload' is modified. The primary motivation for this tool is to re-organize the payload as a sequence of raw file extents (hence the name). The files are organized by their digest identity instead of path/filename. If any digest is repeated, then the file is skipped/de-duped. Only regular files are represented. All other entries like directories, symlinks, devices are fully described in the headers and are omitted. The files are padded so they start on `sysconf(_SC_PAGESIZE)` boundries to permit 'reflink' syscalls to work in the `reflink` plugin. At the end of the file is a footer with 3 sections: 1. List of calculated digests of the input stream. This is used in `librepo` because the file *written* is a derivative, and not the same as the repo metadata describes. `rpm2extents` takes one or more positional arguments that described which digest algorithms are desired. This is often just `SHA256`. This program is only measuring and recording the digest - it does not express an opinion on whether the file is correct. Due to the API on most compression libraries directly reading the source file, the whole file digest is measured using a subprocess and pipes. I don't love it, but it works. 2. Sorted List of file content digests + offset pairs. This is used in the plugin with a trivial binary search to locate the start of file content. The size is not needed because it's part of normal headers. 3. (offset of 1., offset of 2., 8 byte MAGIC value) triple = reflink plugin Looks for the 8 byte magic value at the end of the rpm file. If present it alters the `RPMTAG_PAYLOADFORMAT` in memory to `clon`, and reads in the digest-> offset table. `rpmPackageFilesInstall()` in `fsm.c` is modified to alter the enumeration strategy from `rpmfiNewArchiveReader()` to `rpmfilesIter()` if not `cpio`. This is needed because there is no cpio to enumerate. In the same function, if `rpmpluginsCallFsmFilePre()` returns `RPMRC_PLUGIN_CONTENTS` then `fsmMkfile()` is skipped as it is assumed the plugin did the work. The majority of the work is in `reflink_fsm_file_pre()` - the per file hook for RPM plugins. If the file enumerated in `rpmPackageFilesInstall()` is a regular file, this function will look up the offset in the digest->offset table and will try to reflink it, then fall back to a regular copy. If reflinking does work: we will have reflinked a whole number of pages, so we truncate the file to the expected size. Therefore installing most files does involve two writes: the reflink of the full size, then a fork/copy on write for the last page worth. If the file passed to `reflink_fsm_file_pre()` is anything other than a regular file, it return `RPMRC_OK` so the normal mechanics of `rpmPackageFilesInstall()` are used. That handles directories, symlinks and other non file types. = New API for internal use 1. `rpmReadPackageRaw()` is used within `rpm2extents` to read all the headers without trying to validate signatures. This eliminates the runtime dependency on rpmdb. 2. `rpmteFd()` exposes the Fd behind the rpmte, so plugins can interact with the rpm itself. 3. `RPMRC_PLUGIN_CONTENTS` in `rpmRC_e` for use in `rpmpluginsCallFsmFilePre()` specifically. 4. `pgpStringVal()` is used to help parse the command line in `rpm2extents` - the positional arguments are strings, and this converts the values back to the values in the table. Nothing has been removed, and none of the changes are intended to be used externally, so I don't think a soname bump is warranted here.
Configuration menu - View commit details
-
Copy full SHA for 82b454d - Browse repository at this point
Copy the full SHA 82b454dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9de362d - Browse repository at this point
Copy the full SHA 9de362dView commit details -
Match formatting/style of existing code
The existing code contains some variability in formatting. I'm not sure if { is meant to be on the end of the line, or on a new line, but I've standardized on the former. The indentation is intended to match the existing convention: 4 column indent, but 8 column wide tab characters. This is easy to follow/use in vim, but is surprisingly difficult to get right in vscode. I am doing this reformat here and now, and future changes will be after this. I'm keen to fold the patches together, but for now, I'm trying to keep the history of rpm-software-management#1470 linear so everyone can follow along.
Configuration menu - View commit details
-
Copy full SHA for 91f7284 - Browse repository at this point
Copy the full SHA 91f7284View commit details -
Fix printf formatting in reflink.c
There were some mismatches on field "sizes". This should eliminate the error messages.
Configuration menu - View commit details
-
Copy full SHA for 19694b7 - Browse repository at this point
Copy the full SHA 19694b7View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.