RPM backend performance is limited by arrays of hdrNum's #290

n3npq · 2017-07-29T19:11:39Z

The following callgraphs for BDB/LMDB/NDB all show a common hotspot retrieving arrays of hdrNum's from indices.

The performance problem shows up worst on add/del operations, where a RMW loop has to be performed to add/del a hdrNum item to an array. The array is then sorted (and perhaps uniqified) by qsort(3) repeatedly, the worst case behavior for the algorithm, resorting almost sorted arrays (merge sort or even a home rolled insertion loop would be less costly).

Maintaining the hdrNum's endianness is another flaw; exposing the hdrNum's through the RPM API is yet another flaw because the values will change with every --rebuildb (i.e. the hdrNum's are not persistent).

The fundamental architectural problem that needs solving for better performance is the nesting of per-header and then per-tag operations performed by rpmdbAdd(). Ideally, a batch mode update for each index of all the headers would remove the need to constantly reread/modify/rewrite.

One approach to removing the overhead associated with the array management that "works" with BerkeleyDB is to tie the secondary index to the primary store using db->associate. Then Berkeley DB can handle the caching/optimizations needed to handle indices transparently to RPM.

(aside)
I don't yet know how to do db->associate like optimization for NDB/LMDB. On the todo++ list ...

Using db->associate in Berkeley DB is essentially the same as using a SQL trigger to maintain indices derived from a primary store, a very common abstraction used with RDBM's.

Here are the callgraphs that show the performance bottleneck for all of BDB/NDB/LMDB:

BDB

[jbj@ji rpm]$ /usr/bin/time sudo ./libtool --mode=execute /home/jbj/bin/cg ./rpmdb --rebuilddb
208.17user 3.31system 3:32.94elapsed 99%CPU (0avgtext+0avgdata 74000maxresident)k
0inputs+492608outputs (0major+37493minor)pagefuls 0swaps

bdb.cga.gz

NDB

/usr/bin/time sudo ./libtool --mode=execute /home/jbj/bin/cg ./rpmdb --rebuilddb --ndb
99.59user 3.67system 8:35.82elapsed 20%CPU (0avgtext+0avgdata 93888maxresident)k
0inputs+3315224outputs (0major+461509minor)pagefuls 0swaps

ndb.cga.gz

LMDB

[jbj@ji rpm]$ /usr/bin/time sudo ./libtool --mode=execute /home/jbj/bin/cg ./rpmdb --rebuilddb --lmdb
113.50user 1.57system 1:55.07elapsed 99%CPU (0avgtext+0avgdata 393692maxresident)k
0inputs+455720outputs (1103major+129040minor)pagefuls 0swaps

lmdb.cga.gz

The text was updated successfully, but these errors were encountered:

n3npq closed this as completed Aug 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RPM backend performance is limited by arrays of hdrNum's #290

RPM backend performance is limited by arrays of hdrNum's #290

n3npq commented Jul 29, 2017 •

edited

RPM backend performance is limited by arrays of hdrNum's #290

RPM backend performance is limited by arrays of hdrNum's #290

Comments

n3npq commented Jul 29, 2017 • edited

BDB

NDB

LMDB

n3npq commented Jul 29, 2017 •

edited