You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following callgraphs for BDB/LMDB/NDB all show a common hotspot retrieving arrays of hdrNum's from indices.
The performance problem shows up worst on add/del operations, where a RMW loop has to be performed to add/del a hdrNum item to an array. The array is then sorted (and perhaps uniqified) by qsort(3) repeatedly, the worst case behavior for the algorithm, resorting almost sorted arrays (merge sort or even a home rolled insertion loop would be less costly).
Maintaining the hdrNum's endianness is another flaw; exposing the hdrNum's through the RPM API is yet another flaw because the values will change with every --rebuildb (i.e. the hdrNum's are not persistent).
The fundamental architectural problem that needs solving for better performance is the nesting of per-header and then per-tag operations performed by rpmdbAdd(). Ideally, a batch mode update for each index of all the headers would remove the need to constantly reread/modify/rewrite.
One approach to removing the overhead associated with the array management that "works" with BerkeleyDB is to tie the secondary index to the primary store using db->associate. Then Berkeley DB can handle the caching/optimizations needed to handle indices transparently to RPM.
(aside)
I don't yet know how to do db->associate like optimization for NDB/LMDB. On the todo++ list ...
Using db->associate in Berkeley DB is essentially the same as using a SQL trigger to maintain indices derived from a primary store, a very common abstraction used with RDBM's.
Here are the callgraphs that show the performance bottleneck for all of BDB/NDB/LMDB:
The following callgraphs for BDB/LMDB/NDB all show a common hotspot retrieving arrays of hdrNum's from indices.
The performance problem shows up worst on add/del operations, where a RMW loop has to be performed to add/del a hdrNum item to an array. The array is then sorted (and perhaps uniqified) by qsort(3) repeatedly, the worst case behavior for the algorithm, resorting almost sorted arrays (merge sort or even a home rolled insertion loop would be less costly).
Maintaining the hdrNum's endianness is another flaw; exposing the hdrNum's through the RPM API is yet another flaw because the values will change with every --rebuildb (i.e. the hdrNum's are not persistent).
The fundamental architectural problem that needs solving for better performance is the nesting of per-header and then per-tag operations performed by rpmdbAdd(). Ideally, a batch mode update for each index of all the headers would remove the need to constantly reread/modify/rewrite.
One approach to removing the overhead associated with the array management that "works" with BerkeleyDB is to tie the secondary index to the primary store using db->associate. Then Berkeley DB can handle the caching/optimizations needed to handle indices transparently to RPM.
(aside)
I don't yet know how to do db->associate like optimization for NDB/LMDB. On the todo++ list ...
Using db->associate in Berkeley DB is essentially the same as using a SQL trigger to maintain indices derived from a primary store, a very common abstraction used with RDBM's.
Here are the callgraphs that show the performance bottleneck for all of BDB/NDB/LMDB:
BDB
bdb.cga.gz
NDB
ndb.cga.gz
LMDB
lmdb.cga.gz
The text was updated successfully, but these errors were encountered: