Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpm package header based db backend #1959

Closed
wants to merge 12 commits into from

Conversation

lnussel
Copy link
Contributor

@lnussel lnussel commented Mar 16, 2022

Following up on the idea in #1151 this submission uses rpm package(!) headers in /usr to determine what's installed. That way it is possible to query and verify(!) installed packages even when there is no real database. Filing as draft for the world to look at it. I don't expect this to go in this way 馃槅

The mode is sufficient for eg containers or appliances that do not
use rpm at runtime. Nevertheless one can still inspect and check the
content of such images. Also, diffing fs trees of revisions of a
container/os tree shows changed packages rather than only individual files.

Even though installation using this backend is actually possible and
implemented for the fun of it, it's not really the intended use
case. Due to lack of indexes there is no dependency checking nor
execution of triggers. To build a container one would use a real
backend like sqlite, then dump the headers to /usr and drop the
database in /var.

Nevertheless even the primitive install feature could be useful
in a way similar to systemd-sysext
(https://www.freedesktop.org/software/systemd/man/systemd-sysext.html).
Ie have /usr read-only and install temporary extensions as overlay
eg for debugging. cf https://github.com/kubic-project/microos-tools/pull/5/files

By storing package headers, certain information is not available
though. File states cannot be stored. The code reading headers just
assumes all files are installed. For the intended use case some
files may actually be skipped though. This could be fixed by
factoring out the code in the install path that determines the
intended state of files and call it when reading the header. Ie
evaluate eg. %_install_langs there.

lnussel and others added 12 commits March 16, 2022 14:50
Write a package header based on internel representation
rpmvs.h uses eg DIGEST_CTX from rpmgpg.h
Speed optimization used by libsolv to get unverified header from the
database.
Function that adds header tags only found in installed packages
Support export also as package headers in addition to rpm headers.
Uses rpm package(!) headers in /usr to determine what's installed.
That way it is possible to query and verify installed packages even
when there is no real database.

The mode is sufficient for eg containers or appliances that do not
use rpm at runtime. Nevertheless one can still inspect and check the
content of such images. Also, diffing fs trees of revisions of a
container shows changed packages rather than only individual files.

Even though installation using this backend is actually possible and
implemented for the fun of it, it's not really the intended use
case. Due to lack of indexes there is no dependency checking nor
execution of triggers. To build a container one would use a real
backend like sqlite, then dump the headers to /usr and drop the
database in /var.

Nevertheless even the primitive install feature could be useful
in a way similar to systemd-sysext
(https://www.freedesktop.org/software/systemd/man/systemd-sysext.html).
Ie have /usr read-only and install temporary extensions as overlay
eg for debugging. cf https://github.com/kubic-project/microos-tools/pull/5/files

By storing package headers, certain information is not available
though. File states cannot be stored. The code reading headers just
assumes all files are installed. For the intended use case some
files may actually be skipped though. This could be fixed by
factoring out the code in the install path that determines the
intended state of files and call it when reading the header. Ie
evaluate eg. %_install_langs there.
@pmatilai
Copy link
Member

pmatilai commented Mar 18, 2022

Fs based db backend has been talked about for so long that it's nice to see somebody actually try it out.

I didn't look too deeply, but it seems to me you're doing a whole lot of extra work to avoid something you shouldn't be avoiding, and creating unnecessary problems in the process.

The backend should store the same exact data in the headers as the other backends do: the file states, install times and signature header contents are "piggybacked" after the immutable section (known as "dribbles" in the rpm lore). pkgdbPut() should quite literally be "open a filename at desired location followed by write(fd, hdrBlob, hdrLen), close fd", and none of this import stuff. Just use the header number as the filename, and then you could actually create simple indexes at least for name and like, it's a rather simple "string -> int" file format afterall. And pkgdbGet() simply returns that raw blob to the caller, nothing more. That way all the relevant state info is preserved, signature checks on the imported headers work without extra rpm-like file format etc. If you encountered something that makes that not work then it's most likely a bug that we should have a look at.

@lnussel
Copy link
Contributor Author

lnussel commented Mar 22, 2022

Yes, the way you describe would be the straight forward implementation from an rpm maintainer's PoV when trying solve installation and to reuse the internal data formats when writing to the database.

I'm coming from a different direction though, ignoring how rpm installs packages.
With the implementation here I had https://lnussel.github.io/2020/07/07/rpm-delta-updates/ in mind (also #1151 and #1470). Ie use the the actual package header (lead|sigheader|header) as "database".
With a reflinkable payload the idea would be to have the actual package including payload in the header directory.

The rpm internal API as is does not allow to access the original header directly, that's why the db backend has to reconstruct it in a rather crude way when installing. As stated, the install part is not the important one here. Maybe I should remove it to avoid distractions :-)

@pmatilai
Copy link
Member

I was about to say you can get the original, aka unmodified, header with headerGet(h, RPMTAG_HEADERIMMUTABLE, ...) but the term "header" is used ambiguously here, I guess you mean the lead+signature+header combo instead. "header" has a very specific meaning in rpm so that combo will need another name to be able to discuss stuff at all. I think it's better to just call them package files. They may be partial or not, this backend doesn't really care.

That out of the way, package files as the rpmdb is a wonderfully strange idea. Maybe an fs backend could support two modes: one like the traditional db (which I described in the earlier comment) and one which expects package files instead. There are different tradeoffs to each, the traditional way can properly support installs and carry state + use less space, except if you have the original package files laying around and can link to them.

lnussel added a commit to lnussel/busybox that referenced this pull request Jul 26, 2022
Dump original package header into /usr/lib/sysimage/rpm-headers when
installing packages. Querying installed packages becomes trivial
with that. No separate DB needed.

PoC of a similar thing in original rpm:
rpm-software-management/rpm#1959
lnussel added a commit to lnussel/busybox that referenced this pull request Aug 9, 2022
Dump original package header into /usr/lib/sysimage/rpm-headers when
installing packages. Querying installed packages becomes trivial
with that. No separate DB needed.

PoC of a similar thing in original rpm:
rpm-software-management/rpm#1959
lnussel added a commit to lnussel/busybox that referenced this pull request Aug 25, 2022
Dump original package header into /usr/lib/sysimage/rpm-headers when
installing packages. Querying installed packages becomes trivial
with that. No separate DB needed.

PoC of a similar thing in original rpm:
rpm-software-management/rpm#1959
@pmatilai
Copy link
Member

Okay, I think the world has seen this now.

The problem drafts is that they linger on forever in our review queue. Wild ideas might be better shown to the world by starting a discussion and linking to your work from there. It's an interesting idea anyhow, thanks for sharing.

@pmatilai pmatilai closed this Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants