Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a key-only rpmdb iterator to optimize rpmtsCheck() #971

Merged
merged 2 commits into from Dec 9, 2019

Conversation

pmatilai
Copy link
Member

@pmatilai pmatilai commented Dec 5, 2019

rpmtsCheck() builds some hashes over rpmdb keys, whose buildup is barely measurable in normal install/update/erase transactions, but when repeated thousands of times during rpm -Va, things start adding up. Add a new interface to avoid fetching tonnes of unused data and use it. More details in the commits.

@pmatilai pmatilai added the API API related label Dec 5, 2019
@mlschroe
Copy link
Contributor

mlschroe commented Dec 5, 2019

Oooh, did you notice the #if 0 code in ndb/glue.c? I once started the same thing (but used DBC_KEY_SEARCH to select it), but didn't finish the implementation.

@mlschroe
Copy link
Contributor

mlschroe commented Dec 5, 2019

You may want to rename skipdata to ii_skipdata so that it fits in better with the other elements.

@pmatilai
Copy link
Member Author

pmatilai commented Dec 9, 2019

Yeah I noticed the #if 0 -block and its similarity, I just assumed it was a leftover/work-in-progress thing related to, well prefix search :)

I remember a fleeting thought "why is there no flags field in the index iterator struct where I could stuff this in", but somehow the thought of putting the flag in the cursor never occurred to me. Maybe thoughts on the long weekend already...

Anyway, I'll rename to ii_skipdata and merge with the ndb #if 0 part now that I know what it is.
We can always switch the implementation to use a cursor flag later if it becomes useful (where it'd really make a difference of course is the pkgdb, but don't have an immediate use-case for that).

The regular index iterator grabs the associated data too, which we
don't always need. The data associated with indexes is relatively
lightweight, but as with everything, it adds up if in the millions scale.

Update all backends to allow for NULL set in the index retrieve to
signal key-only retrieval. Ndb actually had an earlier, abandoned
implementation of the same idea under slightly different API, lets
reuse the code-block.
In normal transactions this is but a drop in the ocean. However on
rpm -Va, the hashes get rebuilt from scratch on every single package
which on my moderate rpmdb (~2800 packages) testcase results in
103418347 data values fetched and added to dbi sets that are only
thrown away.

With bdb and lmdb this is only a minor optimization but for ndb and sqlite
which can retrieve keys independently, this is a much bigger win. In case
of sqlite, it's a massive one.
@pmatilai
Copy link
Member Author

pmatilai commented Dec 9, 2019

Updated to reuse the #if 0'ed codeblock, renamed to ii_skipdata and commit messages adjusted, they were somewhat mixed up with the latter commit speaking about updating backends, which was obviously done in the first commit.

@pmatilai
Copy link
Member Author

pmatilai commented Dec 9, 2019

Oh and thanks for the review!

@pmatilai pmatilai merged commit 0f0fbea into rpm-software-management:master Dec 9, 2019
@pmatilai pmatilai deleted the keyiter-pr branch December 9, 2019 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants