rpm package header based db backend #1959

lnussel · 2022-03-16T14:24:59Z

Following up on the idea in #1151 this submission uses rpm package(!) headers in /usr to determine what's installed. That way it is possible to query and verify(!) installed packages even when there is no real database. Filing as draft for the world to look at it. I don't expect this to go in this way 😆

The mode is sufficient for eg containers or appliances that do not
use rpm at runtime. Nevertheless one can still inspect and check the
content of such images. Also, diffing fs trees of revisions of a
container/os tree shows changed packages rather than only individual files.

Even though installation using this backend is actually possible and
implemented for the fun of it, it's not really the intended use
case. Due to lack of indexes there is no dependency checking nor
execution of triggers. To build a container one would use a real
backend like sqlite, then dump the headers to /usr and drop the
database in /var.

Nevertheless even the primitive install feature could be useful
in a way similar to systemd-sysext
(https://www.freedesktop.org/software/systemd/man/systemd-sysext.html).
Ie have /usr read-only and install temporary extensions as overlay
eg for debugging. cf https://github.com/kubic-project/microos-tools/pull/5/files

By storing package headers, certain information is not available
though. File states cannot be stored. The code reading headers just
assumes all files are installed. For the intended use case some
files may actually be skipped though. This could be fixed by
factoring out the code in the install path that determines the
intended state of files and call it when reading the header. Ie
evaluate eg. %_install_langs there.

Write a package header based on internel representation

rpmvs.h uses eg DIGEST_CTX from rpmgpg.h

Speed optimization used by libsolv to get unverified header from the database.

Function that adds header tags only found in installed packages

Support export also as package headers in addition to rpm headers.

Uses rpm package(!) headers in /usr to determine what's installed. That way it is possible to query and verify installed packages even when there is no real database. The mode is sufficient for eg containers or appliances that do not use rpm at runtime. Nevertheless one can still inspect and check the content of such images. Also, diffing fs trees of revisions of a container shows changed packages rather than only individual files. Even though installation using this backend is actually possible and implemented for the fun of it, it's not really the intended use case. Due to lack of indexes there is no dependency checking nor execution of triggers. To build a container one would use a real backend like sqlite, then dump the headers to /usr and drop the database in /var. Nevertheless even the primitive install feature could be useful in a way similar to systemd-sysext (https://www.freedesktop.org/software/systemd/man/systemd-sysext.html). Ie have /usr read-only and install temporary extensions as overlay eg for debugging. cf https://github.com/kubic-project/microos-tools/pull/5/files By storing package headers, certain information is not available though. File states cannot be stored. The code reading headers just assumes all files are installed. For the intended use case some files may actually be skipped though. This could be fixed by factoring out the code in the install path that determines the intended state of files and call it when reading the header. Ie evaluate eg. %_install_langs there.

pmatilai · 2022-03-18T09:31:25Z

Fs based db backend has been talked about for so long that it's nice to see somebody actually try it out.

I didn't look too deeply, but it seems to me you're doing a whole lot of extra work to avoid something you shouldn't be avoiding, and creating unnecessary problems in the process.

The backend should store the same exact data in the headers as the other backends do: the file states, install times and signature header contents are "piggybacked" after the immutable section (known as "dribbles" in the rpm lore). pkgdbPut() should quite literally be "open a filename at desired location followed by write(fd, hdrBlob, hdrLen), close fd", and none of this import stuff. Just use the header number as the filename, and then you could actually create simple indexes at least for name and like, it's a rather simple "string -> int" file format afterall. And pkgdbGet() simply returns that raw blob to the caller, nothing more. That way all the relevant state info is preserved, signature checks on the imported headers work without extra rpm-like file format etc. If you encountered something that makes that not work then it's most likely a bug that we should have a look at.

lnussel · 2022-03-22T13:18:20Z

Yes, the way you describe would be the straight forward implementation from an rpm maintainer's PoV when trying solve installation and to reuse the internal data formats when writing to the database.

I'm coming from a different direction though, ignoring how rpm installs packages.
With the implementation here I had https://lnussel.github.io/2020/07/07/rpm-delta-updates/ in mind (also #1151 and #1470). Ie use the the actual package header (lead|sigheader|header) as "database".
With a reflinkable payload the idea would be to have the actual package including payload in the header directory.

The rpm internal API as is does not allow to access the original header directly, that's why the db backend has to reconstruct it in a rather crude way when installing. As stated, the install part is not the important one here. Maybe I should remove it to avoid distractions :-)

pmatilai · 2022-03-22T14:04:05Z

I was about to say you can get the original, aka unmodified, header with headerGet(h, RPMTAG_HEADERIMMUTABLE, ...) but the term "header" is used ambiguously here, I guess you mean the lead+signature+header combo instead. "header" has a very specific meaning in rpm so that combo will need another name to be able to discuss stuff at all. I think it's better to just call them package files. They may be partial or not, this backend doesn't really care.

That out of the way, package files as the rpmdb is a wonderfully strange idea. Maybe an fs backend could support two modes: one like the traditional db (which I described in the earlier comment) and one which expects package files instead. There are different tradeoffs to each, the traditional way can properly support installs and carry state + use less space, except if you have the original package files laying around and can link to them.

Dump original package header into /usr/lib/sysimage/rpm-headers when installing packages. Querying installed packages becomes trivial with that. No separate DB needed. PoC of a similar thing in original rpm: rpm-software-management/rpm#1959

pmatilai · 2022-08-25T09:27:50Z

Okay, I think the world has seen this now.

The problem drafts is that they linger on forever in our review queue. Wild ideas might be better shown to the world by starting a discussion and linking to your work from there. It's an interesting idea anyhow, thanks for sharing.

lnussel and others added 12 commits March 16, 2022 14:50

Fix chroot for file based keyring

2e8e43e

Don't import into db for %_keyring fs

3bda25e

Implement headerWriteAsPackage()

a129bc7

Write a package header based on internel representation

Introduce rpmReadPackageHeader() to allow custom verify flags

8098c31

rpmvs.h needs to include rpmgpg.h

d57c713

rpmvs.h uses eg DIGEST_CTX from rpmgpg.h

Support absolute file names for databases

5994fb8

rpmdbNextIteratorHeaderBlob() for libsolv

361119f

Speed optimization used by libsolv to get unverified header from the database.

Comment FAF_UNOWNED

08535c0

Factor our headerAddInstallTags()

a7e76ac

Function that adds header tags only found in installed packages

Prevent NULL deref in rpmfsGetStates()

12f000a

rpmdb: export of package headers

f7c3768

Support export also as package headers in addition to rpm headers.

lnussel mentioned this pull request Apr 13, 2022

RFE: container-native rpmdb format #2005

Open

pmatilai closed this Aug 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rpm package header based db backend #1959

rpm package header based db backend #1959

lnussel commented Mar 16, 2022

pmatilai commented Mar 18, 2022 •

edited

lnussel commented Mar 22, 2022

pmatilai commented Mar 22, 2022

pmatilai commented Aug 25, 2022

rpm package header based db backend #1959

rpm package header based db backend #1959

Conversation

lnussel commented Mar 16, 2022

pmatilai commented Mar 18, 2022 • edited

lnussel commented Mar 22, 2022

pmatilai commented Mar 22, 2022

pmatilai commented Aug 25, 2022

pmatilai commented Mar 18, 2022 •

edited