fix(tower,votes): key with (slot, block_id) and remove FD_VOTER_MAX by lidatong · Pull Request #9055 · firedancer-io/firedancer

lidatong · 2026-03-26T19:09:14Z

No description provided.

Copilot

Pull request overview

Updates TowerBFT/votes bookkeeping to key vote blocks by (slot, block_id) and removes the now-obsolete global voter max constant, aligning sizing with the Alpenglow validator admission ticket (VAT) 2000-voter cap.

Changes:

Change votes block map key from block_id to (slot, block_id) and update query API/callers accordingly.
Replace FD_VOTER_MAX usage by passing an explicit vtr_max into tower stakes sizing/new, and use a local VTR_MAX bound in the tower tile.
Update/adjust unit tests for the new keying and updated tower_stakes constructors.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/discof/tower/fd_tower_tile.c	Introduces `VTR_MAX=2000`, updates publish/query usage for `(slot, block_id)`, and threads `vtr_max` into tower_stakes sizing/new.
src/choreo/votes/fd_votes.h	Defines `fd_votes_blk_key_t` and updates `fd_votes_blk_t` / `fd_votes_query` signature to use `(slot, block_id)`.
src/choreo/votes/fd_votes.c	Changes blk_map key type/equality/hash and updates vote counting/query to use the composite key.
src/choreo/votes/test_votes.c	Updates tests to query the blk_map using `(slot, block_id)` keys.
src/choreo/tower/fd_tower_stakes.h	Makes `fd_tower_stakes_footprint` depend on caller-provided `vtr_max` instead of a global constant.
src/choreo/tower/fd_tower_stakes.c	Threads `vtr_max` through `fd_tower_stakes_new` and internal pool/scratch sizing.
src/choreo/tower/test_tower.c	Updates tower_stakes test setup for the new `(slot_max, voter_max)` footprint/new signatures.
src/choreo/fd_choreo_base.h	Removes the `FD_VOTER_MAX` definition and its associated comment block.

Copilot · 2026-03-26T19:15:47Z

+   This is ok though, because structures that rely on PER_VTR_MAX  */
+
+#define PER_VTR_MAX (FD_TOWER_VOTE_MAX+1)
+


The new PER_VTR_MAX define is currently unused, and the preceding comment ends mid-sentence ("structures that rely on PER_VTR_MAX"). Either wire PER_VTR_MAX into the actual sizing logic (e.g., the per_vtr_max passed into eqvoc) or drop the define/comment to avoid misleading future changes.

Suggested change

This is ok though, because structures that rely on PER_VTR_MAX */

#define PER_VTR_MAX (FD_TOWER_VOTE_MAX+1)

This is ok though, because consumers of this data only rely on votes

that are still present in a validator's current tower. */

Copilot · 2026-03-26T19:15:47Z

+   During times of network instability (forking), it is possible for
+   validators to get locked out and as a result retaining only the most
+   recent FD_TOWER_VOTE_MAX votes will miss votes at the bottom of the
+   tower becomes lockout results in popping from the _top_ of the tower
+   not the bottom (validator votes 1, 2, 3 then switches from 3 to 4
+   results in a tower of 1, 2, 4 but we end up recording 2, 3, 4).
+
+   This is ok though, because structures that rely on PER_VTR_MAX  */


The network-instability comment here is difficult to parse and contains grammatical issues (e.g., "will miss votes at the bottom of the tower becomes lockout...") which makes it hard to understand the intended rationale for PER_VTR_MAX. Please reword/complete this comment so it clearly explains the scenario and the tradeoff.

Suggested change

During times of network instability (forking), it is possible for

validators to get locked out and as a result retaining only the most

recent FD_TOWER_VOTE_MAX votes will miss votes at the bottom of the

tower becomes lockout results in popping from the _top_ of the tower

not the bottom (validator votes 1, 2, 3 then switches from 3 to 4

results in a tower of 1, 2, 4 but we end up recording 2, 3, 4).

This is ok though, because structures that rely on PER_VTR_MAX */

During periods of network instability (heavy forking), validators can

get locked out on one fork and then switch to voting on another.

Lockout is enforced by popping votes from the _top_ of the tower,

not from the bottom. For example, if a validator votes on slots

1, 2, 3 and then later switches from 3 to 4, the validator's tower

might end up as [1, 2, 4]. A naive scheme that only remembers the

most recent FD_TOWER_VOTE_MAX votes, however, would naturally store

something like [2, 3, 4] and would therefore forget slot 1, even

though slot 1 is still present in the actual tower.

To cover this off-by-one window while keeping memory bounded, we

allow each voter up to PER_VTR_MAX = FD_TOWER_VOTE_MAX + 1 entries.

This ensures that any bookkeeping based on "recent votes" still

includes every slot that can legally remain in the tower, even under

worst-case reconfiguration during network instability. */

Copilot · 2026-03-26T19:15:48Z

 scratch_footprint( fd_topo_tile_t const * tile ) {
-  ulong slot_max    = tile->tower.max_live_slots;
+  ulong slot_max    = fd_ulong_pow2_up( tile->tower.max_live_slots );
  ulong per_vtr_max = slot_max;


scratch_footprint still sets per_vtr_max = slot_max, but this PR introduces PER_VTR_MAX = FD_TOWER_VOTE_MAX with rationale that per-voter history beyond the tower isn't needed. If that rationale is correct, per_vtr_max should likely use PER_VTR_MAX (reducing eqvoc prf_max and memory), otherwise the new PER_VTR_MAX block should be removed to avoid confusion.

Suggested change

ulong per_vtr_max = slot_max;

ulong per_vtr_max = fd_ulong_pow2_up( PER_VTR_MAX );

Copilot · 2026-03-26T19:15:48Z

 #define MAP_NAME                           blk_map
 #define MAP_ELE_T                          blk_t
-#define MAP_KEY_T                          fd_hash_t
-#define MAP_KEY                            block_id
+#define MAP_KEY_T                          fd_votes_blk_key_t
+#define MAP_KEY                            key
 #define MAP_PREV                           map.prev
 #define MAP_NEXT                           map.next
-#define MAP_KEY_EQ(k0,k1)                  (!memcmp((k0)->key,(k1)->key,32UL))
-#define MAP_KEY_HASH(key,seed)             ((ulong)((key)->ul[1]^(seed)))
+#define MAP_KEY_EQ(k0,k1)                  ((k0)->slot==(k1)->slot && !memcmp((k0)->block_id.key,(k1)->block_id.key,32UL))
+#define MAP_KEY_HASH(key,seed)             ((ulong)((key)->block_id.ul[1]^(key)->slot^(seed)))


The module-level documentation/ASCII diagram still describes blk_map as keyed by block_id and shows blk_t having .block_id/.slot fields, but the map is now keyed by (slot, block_id) via blk->key. Please update the comment and diagram to match the new keying so future readers don't rely on outdated semantics.

Copilot · 2026-03-26T19:15:48Z

+  FD_TEST( !blk_map_ele_query( votes->blk_map, &key_a101, NULL, votes->blk_pool ) );
+  FD_TEST(  blk_map_ele_query( votes->blk_map, &key_b102, NULL, votes->blk_pool ) );
  FD_TEST(  votes->root==102 );



Given the core change in this PR is that blk_map keys are now (slot, block_id), it would be good to add a regression test that counts votes for the same block_id across two different slots and asserts they produce two distinct blk_map entries and that publish only removes the older slot’s entry. Current tests cover different block_ids per slot but not this key-collision scenario.

Suggested change

/* Regression: same block_id across two different slots should create

two distinct blk_map entries keyed by (slot, block_id). Publishing

a newer root should only remove the older slot's entry. */

fd_hash_t block_id_c = { .ul = { 300 } };

/* Voter 0 votes for slot 103 with block_id_c. */

FD_TEST( fd_votes_count_vote( votes, &voters[0], 103, &block_id_c ) );

fd_votes_blk_key_t key_c103 = { .slot = 103, .block_id = block_id_c };

blk_t * blk_c103 = blk_map_ele_query( votes->blk_map, &key_c103, NULL, votes->blk_pool );

FD_TEST( blk_c103 );

FD_TEST( blk_c103->stake==10 );

FD_TEST( blk_c103->key.slot==103 );

/* Same voter votes for a different slot 104 with the SAME block_id. */

FD_TEST( fd_votes_count_vote( votes, &voters[0], 104, &block_id_c ) );

fd_votes_blk_key_t key_c104 = { .slot = 104, .block_id = block_id_c };

blk_t * blk_c104 = blk_map_ele_query( votes->blk_map, &key_c104, NULL, votes->blk_pool );

FD_TEST( blk_c104 );

FD_TEST( blk_c104->stake==10 );

FD_TEST( blk_c104->key.slot==104 );

/* Ensure the two (slot, block_id) pairs map to distinct blk_t entries. */

FD_TEST( blk_c103!=blk_c104 );

FD_TEST( blk_map_ele_query( votes->blk_map, &key_c103, NULL, votes->blk_pool ) );

FD_TEST( blk_map_ele_query( votes->blk_map, &key_c104, NULL, votes->blk_pool ) );

/* Publish root to 104 — slot 103 and its blk should be removed while

the newer slot 104 blk remains. */

fd_votes_publish( votes, 104 );

FD_TEST( !blk_map_ele_query( votes->blk_map, &key_c103, NULL, votes->blk_pool ) );

FD_TEST( blk_map_ele_query( votes->blk_map, &key_c104, NULL, votes->blk_pool ) );

FD_TEST( votes->root==104 );

github-actions · 2026-03-26T19:21:51Z

Performance Measurements ⏳

Suite	Baseline	New	Change
backtest `mainnet-406545575-perf` per slot	`0.143515 s`	`0.14386 s`	`0.240%` ✅
backtest `mainnet-406545575-perf` snapshot load	`5.14 s`	`3.56 s`	`-30.739%` ✅
backtest `mainnet-406545575-perf` total elapsed	`143.514787 s`	`143.860159 s`	`0.241%` ✅
`firedancer mem` usage with `mainnet.toml`	`1096.43 GiB`	`1090.43 GiB`	`-0.547%` ✅

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Copilot · 2026-03-26T19:32:28Z

+
+   https://github.com/solana-foundation/solana-improvement-documents/blob/main/proposals/0357-alpenglow_validator_admission_ticket.md */
+
+#define VTR_MAX (2000) /* the maximum # of unique voters ie. node pubkeys. */


VTR_MAX duplicates the protocol-level VAT limit that already exists as FD_RUNTIME_MAX_VOTE_ACCOUNTS_VAT (2000UL). Consider using the shared constant (or at least referencing it) so TowerBFT’s capacity stays in sync if the runtime constant changes.

Suggested change

#define VTR_MAX (2000) /* the maximum # of unique voters ie. node pubkeys. */

#define VTR_MAX ((ulong)FD_RUNTIME_MAX_VOTE_ACCOUNTS_VAT) /* the maximum # of unique voters ie. node pubkeys. */

Copilot · 2026-03-26T19:32:29Z

+/* PER_VTR_MAX controls how many "entries" a validator is allowed to
+   occupy in various vote-tracking structures.  This is set somewhat
+   arbitrarily based on expected worst-case usage by an honest validator
+   and is set to guard against a malicious spamming validator attempting
+   to fill up Firedancer structures. */
+
+#define PER_VTR_MAX (512) /* the maximum amount of slot history the sysvar retains */
+


PER_VTR_MAX is introduced with a detailed rationale, but it is not used anywhere in this file (per_vtr_max is still set from slot_max later). If this limit is intended to bound per-validator entries in eqvoc/hfork/votes, wire it into the per_vtr_max calculations; otherwise remove the macro/comment to avoid misleading future changes.

Suggested change

/* PER_VTR_MAX controls how many "entries" a validator is allowed to

occupy in various vote-tracking structures. This is set somewhat

arbitrarily based on expected worst-case usage by an honest validator

and is set to guard against a malicious spamming validator attempting

to fill up Firedancer structures. */

#define PER_VTR_MAX (512) /* the maximum amount of slot history the sysvar retains */

emwang-jump · 2026-03-26T19:32:46Z

+   and is set to guard against a malicious spamming validator attempting
+   to fill up Firedancer structures. */
+
+#define PER_VTR_MAX (512) /* the maximum amount of slot history the sysvar retains */


are we using this anywhere?

github-actions · 2026-03-26T19:37:07Z

Performance Measurements ⏳

Suite	Baseline	New	Change
backtest `mainnet-406545575-perf` per slot	`0.13639 s`	`0.137745 s`	`0.993%` ✅
backtest `mainnet-406545575-perf` snapshot load	`3.594 s`	`3.186 s`	`-11.352%` ✅
backtest `mainnet-406545575-perf` total elapsed	`136.390431 s`	`137.745213 s`	`0.993%` ✅
`firedancer mem` usage with `mainnet.toml`	`1096.43 GiB`	`1090.43 GiB`	`-0.547%` ✅

github-actions · 2026-03-26T19:54:06Z

Performance Measurements ⏳

Suite	Baseline	New	Change
backtest `mainnet-406545575-perf` per slot	`0.122594 s`	`0.122277 s`	`-0.259%` ✅
backtest `mainnet-406545575-perf` snapshot load	`3.345 s`	`2.904 s`	`-13.184%` ✅
backtest `mainnet-406545575-perf` total elapsed	`122.594459 s`	`122.276862 s`	`-0.259%` ✅
`firedancer mem` usage with `mainnet.toml`	`1096.43 GiB`	`1090.43 GiB`	`-0.547%` ✅

lidatong requested a review from emwang-jump as a code owner March 26, 2026 19:09

Copilot AI review requested due to automatic review settings March 26, 2026 19:09

Copilot started reviewing on behalf of lidatong March 26, 2026 19:09 View session

lidatong force-pushed the chali/feat/8341 branch from c2ea8a4 to 5b8d258 Compare March 26, 2026 19:12

Copilot AI reviewed Mar 26, 2026

View reviewed changes

lidatong force-pushed the chali/feat/8341 branch from 5b8d258 to 63e5d56 Compare March 26, 2026 19:24

Copilot AI review requested due to automatic review settings March 26, 2026 19:26

lidatong force-pushed the chali/feat/8341 branch from 63e5d56 to 2222274 Compare March 26, 2026 19:26

Copilot started reviewing on behalf of lidatong March 26, 2026 19:26 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

emwang-jump reviewed Mar 26, 2026

View reviewed changes

fix(tower,votes): key with (slot, block_id) and remove FD_VOTER_MAX

bb43041

lidatong force-pushed the chali/feat/8341 branch from 2222274 to bb43041 Compare March 26, 2026 19:48

lidatong enabled auto-merge (squash) March 26, 2026 19:48

emwang-jump approved these changes Mar 26, 2026

View reviewed changes

lidatong merged commit f1ea6cd into main Mar 26, 2026
17 checks passed

lidatong deleted the chali/feat/8341 branch March 26, 2026 20:01

		This is ok though, because structures that rely on PER_VTR_MAX */

		#define PER_VTR_MAX (FD_TOWER_VOTE_MAX+1)

-   During times of network instability (forking), it is possible for
-   validators to get locked out and as a result retaining only the most
-   recent FD_TOWER_VOTE_MAX votes will miss votes at the bottom of the
-   tower becomes lockout results in popping from the _top_ of the tower
-   not the bottom (validator votes 1, 2, 3 then switches from 3 to 4
-   results in a tower of 1, 2, 4 but we end up recording 2, 3, 4).
-   This is ok though, because structures that rely on PER_VTR_MAX  */
+   During periods of network instability (heavy forking), validators can
+   get locked out on one fork and then switch to voting on another.
+   Lockout is enforced by popping votes from the _top_ of the tower,
+   not from the bottom.  For example, if a validator votes on slots
+, 2, 3 and then later switches from 3 to 4, the validator's tower
+   might end up as [1, 2, 4].  A naive scheme that only remembers the
+   most recent FD_TOWER_VOTE_MAX votes, however, would naturally store
+   something like [2, 3, 4] and would therefore forget slot 1, even
+   though slot 1 is still present in the actual tower.
+   To cover this off-by-one window while keeping memory bounded, we
+   allow each voter up to PER_VTR_MAX = FD_TOWER_VOTE_MAX + 1 entries.
+   This ensures that any bookkeeping based on "recent votes" still
+   includes every slot that can legally remain in the tower, even under
+   worst-case reconfiguration during network instability. */

	ulong per_vtr_max = slot_max;
	ulong per_vtr_max = fd_ulong_pow2_up( PER_VTR_MAX );

+  /* Regression: same block_id across two different slots should create
+     two distinct blk_map entries keyed by (slot, block_id).  Publishing
+     a newer root should only remove the older slot's entry. */
+  fd_hash_t block_id_c = { .ul = { 300 } };
+  /* Voter 0 votes for slot 103 with block_id_c. */
+  FD_TEST( fd_votes_count_vote( votes, &voters[0], 103, &block_id_c ) );
+  fd_votes_blk_key_t key_c103 = { .slot = 103, .block_id = block_id_c };
+  blk_t * blk_c103 = blk_map_ele_query( votes->blk_map, &key_c103, NULL, votes->blk_pool );
+  FD_TEST( blk_c103 );
+  FD_TEST( blk_c103->stake==10 );
+  FD_TEST( blk_c103->key.slot==103 );
+  /* Same voter votes for a different slot 104 with the SAME block_id. */
+  FD_TEST( fd_votes_count_vote( votes, &voters[0], 104, &block_id_c ) );
+  fd_votes_blk_key_t key_c104 = { .slot = 104, .block_id = block_id_c };
+  blk_t * blk_c104 = blk_map_ele_query( votes->blk_map, &key_c104, NULL, votes->blk_pool );
+  FD_TEST( blk_c104 );
+  FD_TEST( blk_c104->stake==10 );
+  FD_TEST( blk_c104->key.slot==104 );
+  /* Ensure the two (slot, block_id) pairs map to distinct blk_t entries. */
+  FD_TEST( blk_c103!=blk_c104 );
+  FD_TEST( blk_map_ele_query( votes->blk_map, &key_c103, NULL, votes->blk_pool ) );
+  FD_TEST( blk_map_ele_query( votes->blk_map, &key_c104, NULL, votes->blk_pool ) );
+  /* Publish root to 104 — slot 103 and its blk should be removed while
+     the newer slot 104 blk remains. */
+  fd_votes_publish( votes, 104 );
+  FD_TEST( !blk_map_ele_query( votes->blk_map, &key_c103, NULL, votes->blk_pool ) );
+  FD_TEST(  blk_map_ele_query( votes->blk_map, &key_c104, NULL, votes->blk_pool ) );
+  FD_TEST(  votes->root==104 );


		https://github.com/solana-foundation/solana-improvement-documents/blob/main/proposals/0357-alpenglow_validator_admission_ticket.md */

		#define VTR_MAX (2000) /* the maximum # of unique voters ie. node pubkeys. */

	#define VTR_MAX (2000) /* the maximum # of unique voters ie. node pubkeys. */
	#define VTR_MAX ((ulong)FD_RUNTIME_MAX_VOTE_ACCOUNTS_VAT) /* the maximum # of unique voters ie. node pubkeys. */

Conversation

lidatong commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Mar 26, 2026

Performance Measurements ⏳

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

emwang-jump Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 26, 2026

Performance Measurements ⏳

Uh oh!

github-actions Bot commented Mar 26, 2026

Performance Measurements ⏳

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants