-
Notifications
You must be signed in to change notification settings - Fork 366
Description
fd_stake_ci is used to track leader schedules with a weighted sampler. Gossip's role is to update the destination address for staked nodes, and also keep track of unstaked nodes active in the cluster. Frankendancer/Agave (for which stake_ci's APIs were written for) would publish all destinations in a single broadcast at a set interval. stake_ci would generate a new fd_shred_dest object any time a change to unstaked nodes was detected between two broadcasts. This is a costly operation that involves:
- sorting unstaked pubkeys in place
- initializing a
fd_wsamplefor all staked keys, an nlog5(n) operation
Making it an O(nlog(n) + mlog(m)) operation where n is the number of staked nodes and m is unstaked.
Firedancer's Gossip edge-triggered update model publishes updates to individual contact info entries (both insertion and removal) as they happen. Time constraints led us to retrofit this regime into the existing fd_stake_ci APIs with little modifications, introducing two shims, to deal with new and dropped contact infos. Both trigger the (expensive) regeneration of fd_shred_dest. This means anytime a new unstaked contact info is added or dropped, we undergo a O(nlog(n) + mlog(m)) operation.
This problem is exacerbated by the fact that gossip publishes contact info updates from bootup, even before stake information is extracted from snapshot loading. So there is a window where all nodes are "unstaked" in the beginning. Therefore all new contact infos observed in this window will trigger the regeneration. Therefore this O(mlog(m)) operation (and note, all nodes are "unstaked" nodes) will be called m times during this window.
#7393 further exacerbates this with a surge of contact info remove updates (which also trigger a regeneration) when stake info is received