Add a key-only rpmdb iterator to optimize rpmtsCheck() #971

pmatilai · 2019-12-05T12:42:53Z

rpmtsCheck() builds some hashes over rpmdb keys, whose buildup is barely measurable in normal install/update/erase transactions, but when repeated thousands of times during rpm -Va, things start adding up. Add a new interface to avoid fetching tonnes of unused data and use it. More details in the commits.

mlschroe · 2019-12-05T13:22:47Z

Oooh, did you notice the #if 0 code in ndb/glue.c? I once started the same thing (but used DBC_KEY_SEARCH to select it), but didn't finish the implementation.

mlschroe · 2019-12-05T13:31:36Z

You may want to rename skipdata to ii_skipdata so that it fits in better with the other elements.

pmatilai · 2019-12-09T08:41:37Z

Yeah I noticed the #if 0 -block and its similarity, I just assumed it was a leftover/work-in-progress thing related to, well prefix search :)

I remember a fleeting thought "why is there no flags field in the index iterator struct where I could stuff this in", but somehow the thought of putting the flag in the cursor never occurred to me. Maybe thoughts on the long weekend already...

Anyway, I'll rename to ii_skipdata and merge with the ndb #if 0 part now that I know what it is.
We can always switch the implementation to use a cursor flag later if it becomes useful (where it'd really make a difference of course is the pkgdb, but don't have an immediate use-case for that).

The regular index iterator grabs the associated data too, which we don't always need. The data associated with indexes is relatively lightweight, but as with everything, it adds up if in the millions scale. Update all backends to allow for NULL set in the index retrieve to signal key-only retrieval. Ndb actually had an earlier, abandoned implementation of the same idea under slightly different API, lets reuse the code-block.

In normal transactions this is but a drop in the ocean. However on rpm -Va, the hashes get rebuilt from scratch on every single package which on my moderate rpmdb (~2800 packages) testcase results in 103418347 data values fetched and added to dbi sets that are only thrown away. With bdb and lmdb this is only a minor optimization but for ndb and sqlite which can retrieve keys independently, this is a much bigger win. In case of sqlite, it's a massive one.

pmatilai · 2019-12-09T08:53:56Z

Updated to reuse the #if 0'ed codeblock, renamed to ii_skipdata and commit messages adjusted, they were somewhat mixed up with the latter commit speaking about updating backends, which was obviously done in the first commit.

pmatilai · 2019-12-09T09:39:42Z

Oh and thanks for the review!

pmatilai added the API API related label Dec 5, 2019

pmatilai requested review from ffesti and mlschroe December 5, 2019 12:43

pmatilai added the optimization label Dec 5, 2019

mlschroe approved these changes Dec 5, 2019

View reviewed changes

pmatilai added 2 commits December 9, 2019 10:50

pmatilai force-pushed the keyiter-pr branch from d3b063e to b33397f Compare December 9, 2019 08:52

pmatilai merged commit 0f0fbea into rpm-software-management:master Dec 9, 2019

pmatilai deleted the keyiter-pr branch December 9, 2019 09:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a key-only rpmdb iterator to optimize rpmtsCheck() #971

Add a key-only rpmdb iterator to optimize rpmtsCheck() #971

pmatilai commented Dec 5, 2019

mlschroe commented Dec 5, 2019

mlschroe commented Dec 5, 2019

pmatilai commented Dec 9, 2019

pmatilai commented Dec 9, 2019

pmatilai commented Dec 9, 2019

Add a key-only rpmdb iterator to optimize rpmtsCheck() #971

Add a key-only rpmdb iterator to optimize rpmtsCheck() #971

Conversation

pmatilai commented Dec 5, 2019

mlschroe commented Dec 5, 2019

mlschroe commented Dec 5, 2019

pmatilai commented Dec 9, 2019

pmatilai commented Dec 9, 2019

pmatilai commented Dec 9, 2019