Add support for reading BDB without the library #980

mlschroe · 2019-12-19T13:23:25Z

This commit implements a slow read-only backend that allows
accessing of BerkeleyDB databases without using the BerkeleyDB
library. The code supports btree version 9 and hash version 8
and 9.

There are two use cases for this:

Conversion of an existing BerkeleyDB to a different
backend.
Allowing package scriptlets to do database queries while
in a transaction that replaced rpm with a version that
no longer links against BerkeleyDB.

Currently prefix searching is not supported (but it would be
easy to add).

Conan-Kudo

I don't know how I feel about this, but...

Conan-Kudo · 2019-12-19T13:47:06Z

configure.ac

@@ -594,6 +594,20 @@ AS_IF([test "$enable_ndb" = yes],[
 ])
 AM_CONDITIONAL([NDB], [test "$enable_ndb" = yes])

+#=================
+# Process --enable-bdb-ro
+AC_ARG_ENABLE([bdb-ro], [AS_HELP_STRING([--enable-ndb-ro (EXPERIMENTAL)],[enable the read-only Berkeley DB code])],


--enable-ndb-ro -> --enable-bdb-ro

Freudian slip? ;)

No, copy'n'pasted from the ndb lines.

mlschroe · 2019-12-20T13:03:18Z

Updated the code. It's now much faster and also implements prefix search.

Conan-Kudo · 2019-12-21T15:30:49Z

@mlschroe I'm going to guess this probably doesn't work with a bdb6-based rpmdb (e.g. OpenMandriva's rpmdb)?

mlschroe · 2019-12-23T11:17:10Z

Db6 just added support for large data blobs and a heap database type, both are not used by rpm. I tested it with a cooker container image, and the code could read the database without problems.

mlschroe · 2019-12-23T11:27:05Z

Added a check for the hash/btree version.

ignatenkobrain

 5 files changed, 837 insertions(+)

:/

Conan-Kudo · 2019-12-26T08:16:29Z

@mlschroe This should probably conflict with the option to enable the normal bdb backend. I see no reason for both to be enabled at the same time.

pmatilai · 2020-01-02T09:06:39Z

Oh my.

I'm not exactly overjoyed about the idea of having custom BDB reader code in rpm, but given the alternatives, it actually looks almost pretty 😁
Oh and our database team will love you forever for this.

pmatilai · 2020-01-02T09:19:57Z

 5 files changed, 837 insertions(+)

:/

To put it into perspective:

$ find db-5.3.28 -name "*.[ch]" |wc -l
1037

In other words, this is fewer lines than the BDB has source files.

Conan-Kudo · 2020-01-02T09:49:56Z

In other words, this is fewer lines than the BDB has source files.

This is a wonderful and equally terrifying statistic.

pmatilai · 2020-01-02T10:14:03Z

It really is 😂

Actually the above stats are a bit off, because the Fedora sources include db-1.85 compat. But the point does hold even with a pristine BDB tarball: it has 894 *.[ch] source files. And that's missing quite a bit of other stuff, the tarball has 9241 files in total (including docs and all).

Conan-Kudo · 2020-01-02T10:18:30Z

@pmatilai I'm pretty sure I'm going to want this for transitioning OpenMandriva away from BDB. We're using db6 (even though I didn't want to...), and with the latest versions of DNF okay with non-BDB, I can finally start considering it...

Conan-Kudo · 2020-01-02T10:57:00Z

@mlschroe The code works surprisingly well for me (which is terrifying and awesome in itself), but I think I'd be more comfortable with this if it conflicted with the regular bdb backend option.

pmatilai · 2020-01-02T11:25:08Z

Greetings from our DB people: "truly like X-mas gift" 🎅

One possibility to handle the "conflict" might be making it an argument to --enable-bdb (eg --enable-bdb=readonly), which then skips the other variants. In that case it could technically be called "bdb" and avoid all the "configured to blabla, using blabla" warnings from backend detection.
OTOH for various purposes (testing etc) its useful to be able to support both from one rpm version.

Conan-Kudo · 2020-01-03T22:02:44Z

One possibility to handle the "conflict" might be making it an argument to --enable-bdb (eg --enable-bdb=readonly), which then skips the other variants. In that case it could technically be called "bdb" and avoid all the "configured to blabla, using blabla" warnings from backend detection.

I think this makes sense. If @mlschroe does this, then I'm happy to approve this too.

pmatilai · 2020-01-07T09:08:41Z

Well, I only said "one possibility", doesn't necessarily mean I think it's the right one. Technically there's no reason to limit it that way, and like I said, especially for testing it's highly useful to be able to build both the "real" and the readonly backends in.

pmatilai · 2020-01-08T08:45:46Z

lib/backend/bdb_ro.c

+	if (meta[4] < 9 || meta[4] > 10) {
+	    fprintf(stderr, "unsupported btree version %d\n", meta[4]);
+	    bdb_close(db);
+	    return 0;


All these early returns should be NULLs, not zeros as the function returns a pointer.
I'd also prefer the using the "goto err" idiom to minimize the number of exit points from the function.

pmatilai · 2020-01-08T08:47:52Z

lib/backend/bdb_ro.c

+    if ((rdb->db_mode & O_ACCMODE) != O_RDONLY)
+	return EPERM;
+    if ((dbi = dbiNew(rdb, rpmtag)) == NULL)
+	return 1;


dbiNew() can't actually fail, so this is superfluous.

pmatilai · 2020-01-08T08:54:04Z

lib/backend/bdb_ro.c

+    free(path);
+    dbi->dbi_flags |= DBI_RDONLY;
+    if (dbip)
+	*dbip = dbi;


This would leak the dbi allocation if somebody actually called it with dbip as NULL. I don't think anything does, but the code checks for that case so it'll likely trip up false positives from static analyzers if nothing else.

Here too, as allocations are involved, I'd much prefer single point of cleanup + exit idiom that rpm generally uses. Not a showstopper though.

pmatilai · 2020-01-08T09:00:28Z

lib/backend/bdb_ro.c

+static int bdbro_Verify(dbiIndex dbi, unsigned int flags)
+{
+    return 0;
+}


This is beyond the scope of this patch, but just an idea: we could have a generic db verify operation that does something in a backend independent way in the case the backend doesn't implement a specific verify operation on its own. "Something" could be simply open, read through all records and close.

pmatilai · 2020-01-08T09:41:27Z

configure.ac

+  AC_DEFINE(WITH_BDB_RO, 1, [Build with read-only Berkeley DB])
+])
+AM_CONDITIONAL([BDB_RO], [test "$enable_bdb_ro" = yes])
+


Please enable this in CI (ie ci/Dockerfile)

On a related note to self/whoever gets to it first: for whatever reason ndb is not enabled there either, and the others are only implicitly enabled.

pmatilai · 2020-01-08T09:46:38Z

I think I'd be more comfortable with this if it conflicted with the regular bdb backend option.

Now that I looked atthis a bit closer, conflicting would be both unnecessary and plain draconian. If both are enabled in configure, real BDB always wins on autodetection but either can be manually configured. Which is just the way it should be, but it wouldn't hurt to mention the expected behavior in the commit message.

BDBD, LMDB and SQLite were already implicitly enabled via build dependencies, but NDB build has not been enabled at all. Came up when discussing read-only BDB in rpm-software-management#980.

BDBD, LMDB and SQLite were already implicitly enabled via build dependencies, but NDB build has not been enabled at all. Came up when discussing read-only BDB in #980.

This commit implements a read-only backend that allows accessing of BerkeleyDB databases without using the BerkeleyDB library. The code supports btree version 9-10 and hash version 8-10. There are two use cases for this: 1) Conversion of an existing BerkeleyDB to a different backend. 2) Allowing package scriptlets to do database queries while in a transaction that replaced rpm with a version that no longer links against BerkeleyDB. If both BerkeleyDB and the read-only backend are enabled, rpm will default to BerkeleyDB.

mlschroe · 2020-01-13T13:35:39Z

Updated the pull request:

about the dbiNew() call that cannot fail: this is copied from the other backends. I currently left it untouched so that a different commit can clean up all the backends.
about using NULL: I think NULL is some leftover from the old K&R days and should be avoided. Saying to use NULL as return value is like saying you need to use return 0ULL in functions that return unsigned long longs. Anyway, I've changed the code so that NULL is used.
about that goto exit idiom: I don't like it so I only use it if it really reduces the lines of code.
about the dbi allocation leak: yes, that was a bug. Fixed.
the ro backend is now also enabled in the ci.

pmatilai · 2020-01-13T14:12:06Z

Thanks.

As for the stylistic issues:

0 long long is still an integer value, a NULL pointer is a rather special entity even if the technical value happens to be zero as well. If nothing else, it's a powerful clue to the user that it's supposed to be a pointer and not an integer zero, whereas '0' is ambiguous. When looking at code with tired eyes, I take every crutch I can get :)

As for goto exit, it might not always save lines initially, the main feature is avoiding redundancy and leaks in the long run. In a function with many return points, if a new feature or such needs to use malloc, you'll end up adding umptheen free()'s to unrelated error paths instead of just doing it once at the end. Not to preach or anything, it's just a lesson I got hammered into my head in the early years of cleaning up memleaks and the like in rpm.

Conan-Kudo suggested changes Dec 19, 2019

View reviewed changes

mlschroe force-pushed the bdbro branch 3 times, most recently from 97092ae to 66c055f Compare December 20, 2019 13:02

mlschroe force-pushed the bdbro branch from 66c055f to 56736fe Compare December 23, 2019 11:26

ignatenkobrain approved these changes Dec 26, 2019

View reviewed changes

pmatilai reviewed Jan 8, 2020

View reviewed changes

pmatilai mentioned this pull request Jan 8, 2020

Explicitly enable all our database backends in CI #999

Merged

mlschroe force-pushed the bdbro branch from 56736fe to 884ad40 Compare January 13, 2020 13:27

Conan-Kudo approved these changes Jan 13, 2020

View reviewed changes

pmatilai merged commit 7cc9eb8 into rpm-software-management:master Jan 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for reading BDB without the library #980

Add support for reading BDB without the library #980

mlschroe commented Dec 19, 2019

Conan-Kudo left a comment

Conan-Kudo Dec 19, 2019

mlschroe Dec 19, 2019

mlschroe commented Dec 20, 2019

Conan-Kudo commented Dec 21, 2019

mlschroe commented Dec 23, 2019

mlschroe commented Dec 23, 2019

ignatenkobrain left a comment

Conan-Kudo commented Dec 26, 2019

pmatilai commented Jan 2, 2020

pmatilai commented Jan 2, 2020

Conan-Kudo commented Jan 2, 2020

pmatilai commented Jan 2, 2020

Conan-Kudo commented Jan 2, 2020

Conan-Kudo commented Jan 2, 2020 •

edited

pmatilai commented Jan 2, 2020

Conan-Kudo commented Jan 3, 2020

pmatilai commented Jan 7, 2020

pmatilai Jan 8, 2020

pmatilai Jan 8, 2020

pmatilai Jan 8, 2020

pmatilai Jan 8, 2020

pmatilai Jan 8, 2020

pmatilai commented Jan 8, 2020 •

edited

mlschroe commented Jan 13, 2020

pmatilai commented Jan 13, 2020

Add support for reading BDB without the library #980

Add support for reading BDB without the library #980

Conversation

mlschroe commented Dec 19, 2019

Conan-Kudo left a comment

Choose a reason for hiding this comment

Conan-Kudo Dec 19, 2019

Choose a reason for hiding this comment

mlschroe Dec 19, 2019

Choose a reason for hiding this comment

mlschroe commented Dec 20, 2019

Conan-Kudo commented Dec 21, 2019

mlschroe commented Dec 23, 2019

mlschroe commented Dec 23, 2019

ignatenkobrain left a comment

Choose a reason for hiding this comment

Conan-Kudo commented Dec 26, 2019

pmatilai commented Jan 2, 2020

pmatilai commented Jan 2, 2020

Conan-Kudo commented Jan 2, 2020

pmatilai commented Jan 2, 2020

Conan-Kudo commented Jan 2, 2020

Conan-Kudo commented Jan 2, 2020 • edited

pmatilai commented Jan 2, 2020

Conan-Kudo commented Jan 3, 2020

pmatilai commented Jan 7, 2020

pmatilai Jan 8, 2020

Choose a reason for hiding this comment

pmatilai Jan 8, 2020

Choose a reason for hiding this comment

pmatilai Jan 8, 2020

Choose a reason for hiding this comment

pmatilai Jan 8, 2020

Choose a reason for hiding this comment

pmatilai Jan 8, 2020

Choose a reason for hiding this comment

pmatilai commented Jan 8, 2020 • edited

mlschroe commented Jan 13, 2020

pmatilai commented Jan 13, 2020

Conan-Kudo commented Jan 2, 2020 •

edited

pmatilai commented Jan 8, 2020 •

edited