fix(tower,votes): key with (slot, block_id) and remove FD_VOTER_MAX#9055
fix(tower,votes): key with (slot, block_id) and remove FD_VOTER_MAX#9055
Conversation
c2ea8a4 to
5b8d258
Compare
There was a problem hiding this comment.
Pull request overview
Updates TowerBFT/votes bookkeeping to key vote blocks by (slot, block_id) and removes the now-obsolete global voter max constant, aligning sizing with the Alpenglow validator admission ticket (VAT) 2000-voter cap.
Changes:
- Change votes block map key from
block_idto(slot, block_id)and update query API/callers accordingly. - Replace
FD_VOTER_MAXusage by passing an explicitvtr_maxinto tower stakes sizing/new, and use a localVTR_MAXbound in the tower tile. - Update/adjust unit tests for the new keying and updated tower_stakes constructors.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/discof/tower/fd_tower_tile.c | Introduces VTR_MAX=2000, updates publish/query usage for (slot, block_id), and threads vtr_max into tower_stakes sizing/new. |
| src/choreo/votes/fd_votes.h | Defines fd_votes_blk_key_t and updates fd_votes_blk_t / fd_votes_query signature to use (slot, block_id). |
| src/choreo/votes/fd_votes.c | Changes blk_map key type/equality/hash and updates vote counting/query to use the composite key. |
| src/choreo/votes/test_votes.c | Updates tests to query the blk_map using (slot, block_id) keys. |
| src/choreo/tower/fd_tower_stakes.h | Makes fd_tower_stakes_footprint depend on caller-provided vtr_max instead of a global constant. |
| src/choreo/tower/fd_tower_stakes.c | Threads vtr_max through fd_tower_stakes_new and internal pool/scratch sizing. |
| src/choreo/tower/test_tower.c | Updates tower_stakes test setup for the new (slot_max, voter_max) footprint/new signatures. |
| src/choreo/fd_choreo_base.h | Removes the FD_VOTER_MAX definition and its associated comment block. |
| This is ok though, because structures that rely on PER_VTR_MAX */ | ||
|
|
||
| #define PER_VTR_MAX (FD_TOWER_VOTE_MAX+1) | ||
|
|
There was a problem hiding this comment.
The new PER_VTR_MAX define is currently unused, and the preceding comment ends mid-sentence ("structures that rely on PER_VTR_MAX"). Either wire PER_VTR_MAX into the actual sizing logic (e.g., the per_vtr_max passed into eqvoc) or drop the define/comment to avoid misleading future changes.
| This is ok though, because structures that rely on PER_VTR_MAX */ | |
| #define PER_VTR_MAX (FD_TOWER_VOTE_MAX+1) | |
| This is ok though, because consumers of this data only rely on votes | |
| that are still present in a validator's current tower. */ |
| During times of network instability (forking), it is possible for | ||
| validators to get locked out and as a result retaining only the most | ||
| recent FD_TOWER_VOTE_MAX votes will miss votes at the bottom of the | ||
| tower becomes lockout results in popping from the _top_ of the tower | ||
| not the bottom (validator votes 1, 2, 3 then switches from 3 to 4 | ||
| results in a tower of 1, 2, 4 but we end up recording 2, 3, 4). | ||
|
|
||
| This is ok though, because structures that rely on PER_VTR_MAX */ |
There was a problem hiding this comment.
The network-instability comment here is difficult to parse and contains grammatical issues (e.g., "will miss votes at the bottom of the tower becomes lockout...") which makes it hard to understand the intended rationale for PER_VTR_MAX. Please reword/complete this comment so it clearly explains the scenario and the tradeoff.
| During times of network instability (forking), it is possible for | |
| validators to get locked out and as a result retaining only the most | |
| recent FD_TOWER_VOTE_MAX votes will miss votes at the bottom of the | |
| tower becomes lockout results in popping from the _top_ of the tower | |
| not the bottom (validator votes 1, 2, 3 then switches from 3 to 4 | |
| results in a tower of 1, 2, 4 but we end up recording 2, 3, 4). | |
| This is ok though, because structures that rely on PER_VTR_MAX */ | |
| During periods of network instability (heavy forking), validators can | |
| get locked out on one fork and then switch to voting on another. | |
| Lockout is enforced by popping votes from the _top_ of the tower, | |
| not from the bottom. For example, if a validator votes on slots | |
| 1, 2, 3 and then later switches from 3 to 4, the validator's tower | |
| might end up as [1, 2, 4]. A naive scheme that only remembers the | |
| most recent FD_TOWER_VOTE_MAX votes, however, would naturally store | |
| something like [2, 3, 4] and would therefore forget slot 1, even | |
| though slot 1 is still present in the actual tower. | |
| To cover this off-by-one window while keeping memory bounded, we | |
| allow each voter up to PER_VTR_MAX = FD_TOWER_VOTE_MAX + 1 entries. | |
| This ensures that any bookkeeping based on "recent votes" still | |
| includes every slot that can legally remain in the tower, even under | |
| worst-case reconfiguration during network instability. */ |
| scratch_footprint( fd_topo_tile_t const * tile ) { | ||
| ulong slot_max = tile->tower.max_live_slots; | ||
| ulong slot_max = fd_ulong_pow2_up( tile->tower.max_live_slots ); | ||
| ulong per_vtr_max = slot_max; |
There was a problem hiding this comment.
scratch_footprint still sets per_vtr_max = slot_max, but this PR introduces PER_VTR_MAX = FD_TOWER_VOTE_MAX with rationale that per-voter history beyond the tower isn't needed. If that rationale is correct, per_vtr_max should likely use PER_VTR_MAX (reducing eqvoc prf_max and memory), otherwise the new PER_VTR_MAX block should be removed to avoid confusion.
| ulong per_vtr_max = slot_max; | |
| ulong per_vtr_max = fd_ulong_pow2_up( PER_VTR_MAX ); |
| #define MAP_NAME blk_map | ||
| #define MAP_ELE_T blk_t | ||
| #define MAP_KEY_T fd_hash_t | ||
| #define MAP_KEY block_id | ||
| #define MAP_KEY_T fd_votes_blk_key_t | ||
| #define MAP_KEY key | ||
| #define MAP_PREV map.prev | ||
| #define MAP_NEXT map.next | ||
| #define MAP_KEY_EQ(k0,k1) (!memcmp((k0)->key,(k1)->key,32UL)) | ||
| #define MAP_KEY_HASH(key,seed) ((ulong)((key)->ul[1]^(seed))) | ||
| #define MAP_KEY_EQ(k0,k1) ((k0)->slot==(k1)->slot && !memcmp((k0)->block_id.key,(k1)->block_id.key,32UL)) | ||
| #define MAP_KEY_HASH(key,seed) ((ulong)((key)->block_id.ul[1]^(key)->slot^(seed))) |
There was a problem hiding this comment.
The module-level documentation/ASCII diagram still describes blk_map as keyed by block_id and shows blk_t having .block_id/.slot fields, but the map is now keyed by (slot, block_id) via blk->key. Please update the comment and diagram to match the new keying so future readers don't rely on outdated semantics.
| FD_TEST( !blk_map_ele_query( votes->blk_map, &key_a101, NULL, votes->blk_pool ) ); | ||
| FD_TEST( blk_map_ele_query( votes->blk_map, &key_b102, NULL, votes->blk_pool ) ); | ||
| FD_TEST( votes->root==102 ); | ||
|
|
There was a problem hiding this comment.
Given the core change in this PR is that blk_map keys are now (slot, block_id), it would be good to add a regression test that counts votes for the same block_id across two different slots and asserts they produce two distinct blk_map entries and that publish only removes the older slot’s entry. Current tests cover different block_ids per slot but not this key-collision scenario.
| /* Regression: same block_id across two different slots should create | |
| two distinct blk_map entries keyed by (slot, block_id). Publishing | |
| a newer root should only remove the older slot's entry. */ | |
| fd_hash_t block_id_c = { .ul = { 300 } }; | |
| /* Voter 0 votes for slot 103 with block_id_c. */ | |
| FD_TEST( fd_votes_count_vote( votes, &voters[0], 103, &block_id_c ) ); | |
| fd_votes_blk_key_t key_c103 = { .slot = 103, .block_id = block_id_c }; | |
| blk_t * blk_c103 = blk_map_ele_query( votes->blk_map, &key_c103, NULL, votes->blk_pool ); | |
| FD_TEST( blk_c103 ); | |
| FD_TEST( blk_c103->stake==10 ); | |
| FD_TEST( blk_c103->key.slot==103 ); | |
| /* Same voter votes for a different slot 104 with the SAME block_id. */ | |
| FD_TEST( fd_votes_count_vote( votes, &voters[0], 104, &block_id_c ) ); | |
| fd_votes_blk_key_t key_c104 = { .slot = 104, .block_id = block_id_c }; | |
| blk_t * blk_c104 = blk_map_ele_query( votes->blk_map, &key_c104, NULL, votes->blk_pool ); | |
| FD_TEST( blk_c104 ); | |
| FD_TEST( blk_c104->stake==10 ); | |
| FD_TEST( blk_c104->key.slot==104 ); | |
| /* Ensure the two (slot, block_id) pairs map to distinct blk_t entries. */ | |
| FD_TEST( blk_c103!=blk_c104 ); | |
| FD_TEST( blk_map_ele_query( votes->blk_map, &key_c103, NULL, votes->blk_pool ) ); | |
| FD_TEST( blk_map_ele_query( votes->blk_map, &key_c104, NULL, votes->blk_pool ) ); | |
| /* Publish root to 104 — slot 103 and its blk should be removed while | |
| the newer slot 104 blk remains. */ | |
| fd_votes_publish( votes, 104 ); | |
| FD_TEST( !blk_map_ele_query( votes->blk_map, &key_c103, NULL, votes->blk_pool ) ); | |
| FD_TEST( blk_map_ele_query( votes->blk_map, &key_c104, NULL, votes->blk_pool ) ); | |
| FD_TEST( votes->root==104 ); |
Performance Measurements ⏳
|
5b8d258 to
63e5d56
Compare
63e5d56 to
2222274
Compare
|
|
||
| https://github.com/solana-foundation/solana-improvement-documents/blob/main/proposals/0357-alpenglow_validator_admission_ticket.md */ | ||
|
|
||
| #define VTR_MAX (2000) /* the maximum # of unique voters ie. node pubkeys. */ |
There was a problem hiding this comment.
VTR_MAX duplicates the protocol-level VAT limit that already exists as FD_RUNTIME_MAX_VOTE_ACCOUNTS_VAT (2000UL). Consider using the shared constant (or at least referencing it) so TowerBFT’s capacity stays in sync if the runtime constant changes.
| #define VTR_MAX (2000) /* the maximum # of unique voters ie. node pubkeys. */ | |
| #define VTR_MAX ((ulong)FD_RUNTIME_MAX_VOTE_ACCOUNTS_VAT) /* the maximum # of unique voters ie. node pubkeys. */ |
| /* PER_VTR_MAX controls how many "entries" a validator is allowed to | ||
| occupy in various vote-tracking structures. This is set somewhat | ||
| arbitrarily based on expected worst-case usage by an honest validator | ||
| and is set to guard against a malicious spamming validator attempting | ||
| to fill up Firedancer structures. */ | ||
|
|
||
| #define PER_VTR_MAX (512) /* the maximum amount of slot history the sysvar retains */ | ||
|
|
There was a problem hiding this comment.
PER_VTR_MAX is introduced with a detailed rationale, but it is not used anywhere in this file (per_vtr_max is still set from slot_max later). If this limit is intended to bound per-validator entries in eqvoc/hfork/votes, wire it into the per_vtr_max calculations; otherwise remove the macro/comment to avoid misleading future changes.
| /* PER_VTR_MAX controls how many "entries" a validator is allowed to | |
| occupy in various vote-tracking structures. This is set somewhat | |
| arbitrarily based on expected worst-case usage by an honest validator | |
| and is set to guard against a malicious spamming validator attempting | |
| to fill up Firedancer structures. */ | |
| #define PER_VTR_MAX (512) /* the maximum amount of slot history the sysvar retains */ |
| and is set to guard against a malicious spamming validator attempting | ||
| to fill up Firedancer structures. */ | ||
|
|
||
| #define PER_VTR_MAX (512) /* the maximum amount of slot history the sysvar retains */ |
There was a problem hiding this comment.
are we using this anywhere?
Performance Measurements ⏳
|
2222274 to
bb43041
Compare
Performance Measurements ⏳
|
No description provided.