Skip to content

fix(tower,votes): key with (slot, block_id) and remove FD_VOTER_MAX#9055

Merged
lidatong merged 1 commit intomainfrom
chali/feat/8341
Mar 26, 2026
Merged

fix(tower,votes): key with (slot, block_id) and remove FD_VOTER_MAX#9055
lidatong merged 1 commit intomainfrom
chali/feat/8341

Conversation

@lidatong
Copy link
Copy Markdown
Member

No description provided.

@lidatong lidatong requested a review from emwang-jump as a code owner March 26, 2026 19:09
Copilot AI review requested due to automatic review settings March 26, 2026 19:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates TowerBFT/votes bookkeeping to key vote blocks by (slot, block_id) and removes the now-obsolete global voter max constant, aligning sizing with the Alpenglow validator admission ticket (VAT) 2000-voter cap.

Changes:

  • Change votes block map key from block_id to (slot, block_id) and update query API/callers accordingly.
  • Replace FD_VOTER_MAX usage by passing an explicit vtr_max into tower stakes sizing/new, and use a local VTR_MAX bound in the tower tile.
  • Update/adjust unit tests for the new keying and updated tower_stakes constructors.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/discof/tower/fd_tower_tile.c Introduces VTR_MAX=2000, updates publish/query usage for (slot, block_id), and threads vtr_max into tower_stakes sizing/new.
src/choreo/votes/fd_votes.h Defines fd_votes_blk_key_t and updates fd_votes_blk_t / fd_votes_query signature to use (slot, block_id).
src/choreo/votes/fd_votes.c Changes blk_map key type/equality/hash and updates vote counting/query to use the composite key.
src/choreo/votes/test_votes.c Updates tests to query the blk_map using (slot, block_id) keys.
src/choreo/tower/fd_tower_stakes.h Makes fd_tower_stakes_footprint depend on caller-provided vtr_max instead of a global constant.
src/choreo/tower/fd_tower_stakes.c Threads vtr_max through fd_tower_stakes_new and internal pool/scratch sizing.
src/choreo/tower/test_tower.c Updates tower_stakes test setup for the new (slot_max, voter_max) footprint/new signatures.
src/choreo/fd_choreo_base.h Removes the FD_VOTER_MAX definition and its associated comment block.

Comment thread src/discof/tower/fd_tower_tile.c Outdated
Comment on lines +99 to +102
This is ok though, because structures that rely on PER_VTR_MAX */

#define PER_VTR_MAX (FD_TOWER_VOTE_MAX+1)

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new PER_VTR_MAX define is currently unused, and the preceding comment ends mid-sentence ("structures that rely on PER_VTR_MAX"). Either wire PER_VTR_MAX into the actual sizing logic (e.g., the per_vtr_max passed into eqvoc) or drop the define/comment to avoid misleading future changes.

Suggested change
This is ok though, because structures that rely on PER_VTR_MAX */
#define PER_VTR_MAX (FD_TOWER_VOTE_MAX+1)
This is ok though, because consumers of this data only rely on votes
that are still present in a validator's current tower. */

Copilot uses AI. Check for mistakes.
Comment thread src/discof/tower/fd_tower_tile.c Outdated
Comment on lines +92 to +99
During times of network instability (forking), it is possible for
validators to get locked out and as a result retaining only the most
recent FD_TOWER_VOTE_MAX votes will miss votes at the bottom of the
tower becomes lockout results in popping from the _top_ of the tower
not the bottom (validator votes 1, 2, 3 then switches from 3 to 4
results in a tower of 1, 2, 4 but we end up recording 2, 3, 4).

This is ok though, because structures that rely on PER_VTR_MAX */
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The network-instability comment here is difficult to parse and contains grammatical issues (e.g., "will miss votes at the bottom of the tower becomes lockout...") which makes it hard to understand the intended rationale for PER_VTR_MAX. Please reword/complete this comment so it clearly explains the scenario and the tradeoff.

Suggested change
During times of network instability (forking), it is possible for
validators to get locked out and as a result retaining only the most
recent FD_TOWER_VOTE_MAX votes will miss votes at the bottom of the
tower becomes lockout results in popping from the _top_ of the tower
not the bottom (validator votes 1, 2, 3 then switches from 3 to 4
results in a tower of 1, 2, 4 but we end up recording 2, 3, 4).
This is ok though, because structures that rely on PER_VTR_MAX */
During periods of network instability (heavy forking), validators can
get locked out on one fork and then switch to voting on another.
Lockout is enforced by popping votes from the _top_ of the tower,
not from the bottom. For example, if a validator votes on slots
1, 2, 3 and then later switches from 3 to 4, the validator's tower
might end up as [1, 2, 4]. A naive scheme that only remembers the
most recent FD_TOWER_VOTE_MAX votes, however, would naturally store
something like [2, 3, 4] and would therefore forget slot 1, even
though slot 1 is still present in the actual tower.
To cover this off-by-one window while keeping memory bounded, we
allow each voter up to PER_VTR_MAX = FD_TOWER_VOTE_MAX + 1 entries.
This ensures that any bookkeeping based on "recent votes" still
includes every slot that can legally remain in the tower, even under
worst-case reconfiguration during network instability. */

Copilot uses AI. Check for mistakes.
Comment thread src/discof/tower/fd_tower_tile.c Outdated
scratch_footprint( fd_topo_tile_t const * tile ) {
ulong slot_max = tile->tower.max_live_slots;
ulong slot_max = fd_ulong_pow2_up( tile->tower.max_live_slots );
ulong per_vtr_max = slot_max;
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scratch_footprint still sets per_vtr_max = slot_max, but this PR introduces PER_VTR_MAX = FD_TOWER_VOTE_MAX with rationale that per-voter history beyond the tower isn't needed. If that rationale is correct, per_vtr_max should likely use PER_VTR_MAX (reducing eqvoc prf_max and memory), otherwise the new PER_VTR_MAX block should be removed to avoid confusion.

Suggested change
ulong per_vtr_max = slot_max;
ulong per_vtr_max = fd_ulong_pow2_up( PER_VTR_MAX );

Copilot uses AI. Check for mistakes.
Comment on lines 61 to +68
#define MAP_NAME blk_map
#define MAP_ELE_T blk_t
#define MAP_KEY_T fd_hash_t
#define MAP_KEY block_id
#define MAP_KEY_T fd_votes_blk_key_t
#define MAP_KEY key
#define MAP_PREV map.prev
#define MAP_NEXT map.next
#define MAP_KEY_EQ(k0,k1) (!memcmp((k0)->key,(k1)->key,32UL))
#define MAP_KEY_HASH(key,seed) ((ulong)((key)->ul[1]^(seed)))
#define MAP_KEY_EQ(k0,k1) ((k0)->slot==(k1)->slot && !memcmp((k0)->block_id.key,(k1)->block_id.key,32UL))
#define MAP_KEY_HASH(key,seed) ((ulong)((key)->block_id.ul[1]^(key)->slot^(seed)))
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module-level documentation/ASCII diagram still describes blk_map as keyed by block_id and shows blk_t having .block_id/.slot fields, but the map is now keyed by (slot, block_id) via blk->key. Please update the comment and diagram to match the new keying so future readers don't rely on outdated semantics.

Copilot uses AI. Check for mistakes.
FD_TEST( !blk_map_ele_query( votes->blk_map, &key_a101, NULL, votes->blk_pool ) );
FD_TEST( blk_map_ele_query( votes->blk_map, &key_b102, NULL, votes->blk_pool ) );
FD_TEST( votes->root==102 );

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the core change in this PR is that blk_map keys are now (slot, block_id), it would be good to add a regression test that counts votes for the same block_id across two different slots and asserts they produce two distinct blk_map entries and that publish only removes the older slot’s entry. Current tests cover different block_ids per slot but not this key-collision scenario.

Suggested change
/* Regression: same block_id across two different slots should create
two distinct blk_map entries keyed by (slot, block_id). Publishing
a newer root should only remove the older slot's entry. */
fd_hash_t block_id_c = { .ul = { 300 } };
/* Voter 0 votes for slot 103 with block_id_c. */
FD_TEST( fd_votes_count_vote( votes, &voters[0], 103, &block_id_c ) );
fd_votes_blk_key_t key_c103 = { .slot = 103, .block_id = block_id_c };
blk_t * blk_c103 = blk_map_ele_query( votes->blk_map, &key_c103, NULL, votes->blk_pool );
FD_TEST( blk_c103 );
FD_TEST( blk_c103->stake==10 );
FD_TEST( blk_c103->key.slot==103 );
/* Same voter votes for a different slot 104 with the SAME block_id. */
FD_TEST( fd_votes_count_vote( votes, &voters[0], 104, &block_id_c ) );
fd_votes_blk_key_t key_c104 = { .slot = 104, .block_id = block_id_c };
blk_t * blk_c104 = blk_map_ele_query( votes->blk_map, &key_c104, NULL, votes->blk_pool );
FD_TEST( blk_c104 );
FD_TEST( blk_c104->stake==10 );
FD_TEST( blk_c104->key.slot==104 );
/* Ensure the two (slot, block_id) pairs map to distinct blk_t entries. */
FD_TEST( blk_c103!=blk_c104 );
FD_TEST( blk_map_ele_query( votes->blk_map, &key_c103, NULL, votes->blk_pool ) );
FD_TEST( blk_map_ele_query( votes->blk_map, &key_c104, NULL, votes->blk_pool ) );
/* Publish root to 104slot 103 and its blk should be removed while
the newer slot 104 blk remains. */
fd_votes_publish( votes, 104 );
FD_TEST( !blk_map_ele_query( votes->blk_map, &key_c103, NULL, votes->blk_pool ) );
FD_TEST( blk_map_ele_query( votes->blk_map, &key_c104, NULL, votes->blk_pool ) );
FD_TEST( votes->root==104 );

Copilot uses AI. Check for mistakes.
Comment thread src/choreo/tower/fd_tower_stakes.c Outdated
@github-actions
Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.143515 s 0.14386 s 0.240%
backtest mainnet-406545575-perf snapshot load 5.14 s 3.56 s -30.739%
backtest mainnet-406545575-perf total elapsed 143.514787 s 143.860159 s 0.241%
firedancer mem usage with mainnet.toml 1096.43 GiB 1090.43 GiB -0.547%

Copilot AI review requested due to automatic review settings March 26, 2026 19:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.


https://github.com/solana-foundation/solana-improvement-documents/blob/main/proposals/0357-alpenglow_validator_admission_ticket.md */

#define VTR_MAX (2000) /* the maximum # of unique voters ie. node pubkeys. */
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VTR_MAX duplicates the protocol-level VAT limit that already exists as FD_RUNTIME_MAX_VOTE_ACCOUNTS_VAT (2000UL). Consider using the shared constant (or at least referencing it) so TowerBFT’s capacity stays in sync if the runtime constant changes.

Suggested change
#define VTR_MAX (2000) /* the maximum # of unique voters ie. node pubkeys. */
#define VTR_MAX ((ulong)FD_RUNTIME_MAX_VOTE_ACCOUNTS_VAT) /* the maximum # of unique voters ie. node pubkeys. */

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +95
/* PER_VTR_MAX controls how many "entries" a validator is allowed to
occupy in various vote-tracking structures. This is set somewhat
arbitrarily based on expected worst-case usage by an honest validator
and is set to guard against a malicious spamming validator attempting
to fill up Firedancer structures. */

#define PER_VTR_MAX (512) /* the maximum amount of slot history the sysvar retains */

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PER_VTR_MAX is introduced with a detailed rationale, but it is not used anywhere in this file (per_vtr_max is still set from slot_max later). If this limit is intended to bound per-validator entries in eqvoc/hfork/votes, wire it into the per_vtr_max calculations; otherwise remove the macro/comment to avoid misleading future changes.

Suggested change
/* PER_VTR_MAX controls how many "entries" a validator is allowed to
occupy in various vote-tracking structures. This is set somewhat
arbitrarily based on expected worst-case usage by an honest validator
and is set to guard against a malicious spamming validator attempting
to fill up Firedancer structures. */
#define PER_VTR_MAX (512) /* the maximum amount of slot history the sysvar retains */

Copilot uses AI. Check for mistakes.
and is set to guard against a malicious spamming validator attempting
to fill up Firedancer structures. */

#define PER_VTR_MAX (512) /* the maximum amount of slot history the sysvar retains */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we using this anywhere?

@github-actions
Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.13639 s 0.137745 s 0.993%
backtest mainnet-406545575-perf snapshot load 3.594 s 3.186 s -11.352%
backtest mainnet-406545575-perf total elapsed 136.390431 s 137.745213 s 0.993%
firedancer mem usage with mainnet.toml 1096.43 GiB 1090.43 GiB -0.547%

@lidatong lidatong enabled auto-merge (squash) March 26, 2026 19:48
@github-actions
Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.122594 s 0.122277 s -0.259%
backtest mainnet-406545575-perf snapshot load 3.345 s 2.904 s -13.184%
backtest mainnet-406545575-perf total elapsed 122.594459 s 122.276862 s -0.259%
firedancer mem usage with mainnet.toml 1096.43 GiB 1090.43 GiB -0.547%

@lidatong lidatong merged commit f1ea6cd into main Mar 26, 2026
17 checks passed
@lidatong lidatong deleted the chali/feat/8341 branch March 26, 2026 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants