Skip to content

refactor(tower): polish tower APIs#8689

Merged
lidatong merged 1 commit intomainfrom
chali/refactor/tower-cleanup
Mar 6, 2026
Merged

refactor(tower): polish tower APIs#8689
lidatong merged 1 commit intomainfrom
chali/refactor/tower-cleanup

Conversation

@lidatong
Copy link
Copy Markdown
Member

@lidatong lidatong commented Mar 3, 2026

No description provided.

Copilot AI review requested due to automatic review settings March 3, 2026 22:45
@lidatong lidatong requested a review from emwang-jump as a code owner March 3, 2026 22:45
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch from ccff54b to 33e472f Compare March 3, 2026 22:45
@lidatong lidatong marked this pull request as draft March 3, 2026 22:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors tower-adjacent APIs in the choreo/discof tower components, while extending tower metrics to better distinguish ignored replay completions vs. equivocation-related replays.

Changes:

  • Refactors fd_tower_stakes internals (slot map rename) and renames stake insertion/removal APIs.
  • Renames tower block lockout APIs (*_add/clear*_insert/remove) and updates tower tile call sites.
  • Adds new tower metrics for equivocation detection and updates generated metrics artifacts/docs.

Reviewed changes

Copilot reviewed 8 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/discof/tower/fd_tower_tile.c Updates tower logic to use renamed stakes/lockouts APIs; introduces new equivocation metrics fields.
src/disco/metrics/metrics.xml Adds SlotEqvoced* metrics and tweaks SlotIgnored metric descriptions.
src/disco/metrics/generated/fd_metrics_tower.h Regenerates metric metadata (new offsets/total, new eqvoced metrics).
src/disco/metrics/generated/fd_metrics_tower.c Adds eqvoced metrics to the tower metric registry.
src/choreo/tower/fd_tower_stakes.h Renames/reshapes stakes APIs and related types (blkslot map).
src/choreo/tower/fd_tower_stakes.c Implements renamed stakes APIs and adds seed parameter to *_new.
src/choreo/tower/fd_tower_blocks.h Renames lockouts APIs to *_insert/remove and adjusts prototypes/formatting.
src/choreo/tower/fd_tower_blocks.c Minor formatting change; (implementation not yet aligned with renamed lockouts API).
src/choreo/tower/fd_tower.c Small refactors/robustness tweaks in switch-check logic and formatting.
book/api/metrics-generated.md Regenerates metrics documentation to include new eqvoced metrics.

Comment thread src/choreo/tower/fd_tower_blocks.h Outdated
Comment thread src/choreo/tower/fd_tower_stakes.h Outdated
Comment thread src/choreo/tower/fd_tower_stakes.h
Comment thread src/discof/tower/fd_tower_tile.c Outdated
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch from 33e472f to 9f09e06 Compare March 3, 2026 23:20
Copilot AI review requested due to automatic review settings March 4, 2026 02:57
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch from 9f09e06 to ae4e9b9 Compare March 4, 2026 02:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 19 changed files in this pull request and generated 14 comments.

Comment thread src/discof/tower/fd_tower_tile.c Outdated
Comment thread src/discof/tower/fd_tower_tile.c Outdated
Comment thread src/discof/tower/fd_tower_tile.c Outdated
Comment thread src/discof/tower/fd_tower_tile.c Outdated
Comment thread src/discof/tower/fd_tower_tile.c
Comment thread src/discof/tower/fd_tower_tile.c
Comment thread src/choreo/tower/test_tower_blocks.c Outdated
Comment thread src/choreo/tower/test_tower_blocks.c Outdated
Comment thread src/choreo/tower/test_tower_blocks.c Outdated
Comment thread src/choreo/tower/Local.mk
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch from ae4e9b9 to bddf2c2 Compare March 4, 2026 17:52
Copilot AI review requested due to automatic review settings March 4, 2026 18:34
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch from bddf2c2 to 2103d44 Compare March 4, 2026 18:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 20 changed files in this pull request and generated 12 comments.

Comments suppressed due to low confidence (1)

src/discof/tower/fd_tower_tile.c:1125

  • New metrics slot_eqvoced_{cnt,gauge} are tracked in ctx->metrics but never exported in metrics_write(). Add the corresponding FD_MCNT_SET/FD_MGAUGE_SET calls so the new metrics show up in the TOWER metrics stream.
static inline void
metrics_write( fd_tower_tile_t * ctx ) {
  FD_MCNT_SET  ( TOWER, SLOT_IGNORED_CNT,   ctx->metrics.slot_ignored_cnt   );
  FD_MGAUGE_SET( TOWER, SLOT_IGNORED_GAUGE, ctx->metrics.slot_ignored_gauge );

Comment thread src/choreo/tower/fd_tower_leaves.c
Comment thread src/choreo/tower/fd_tower_blocks.c
Comment thread src/choreo/tower/test_tower.c Outdated
Comment thread src/choreo/tower/fd_tower_lockos.h Outdated
Comment thread src/discof/tower/fd_tower_tile.c
Comment thread src/choreo/tower/test_tower_blocks.c Outdated
Comment thread src/choreo/tower/fd_tower_lockos.c
Comment thread src/choreo/tower/Local.mk
Comment thread src/discof/tower/fd_tower_tile.c Outdated
Comment thread src/discof/tower/fd_tower_tile.c
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch from 2103d44 to 7a26513 Compare March 4, 2026 19:44
Copilot AI review requested due to automatic review settings March 4, 2026 19:53
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch from 7a26513 to 61c668a Compare March 4, 2026 19:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 20 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

src/discof/tower/fd_tower_tile.c:1128

  • New metrics slot_eqvoced_cnt/gauge are updated in replay_slot_completed, but metrics_write() does not emit them via FD_MCNT_SET / FD_MGAUGE_SET. Add the missing metric writes so the new counters/gauges actually show up in telemetry.
static inline void
metrics_write( fd_tower_tile_t * ctx ) {
  FD_MCNT_SET  ( TOWER, SLOT_IGNORED_CNT,   ctx->metrics.slot_ignored_cnt   );
  FD_MGAUGE_SET( TOWER, SLOT_IGNORED_GAUGE, ctx->metrics.slot_ignored_gauge );

  FD_MGAUGE_SET( TOWER, REPLAY_SLOT,  ctx->metrics.replay_slot  );
  FD_MGAUGE_SET( TOWER, VOTE_SLOT,    ctx->metrics.vote_slot    );

Comment thread src/choreo/tower/fd_tower_lockos.c Outdated
Comment thread src/choreo/tower/fd_tower_lockos.h Outdated
Comment thread src/discof/tower/fd_tower_tile.c Outdated
Comment thread src/discof/tower/fd_tower_tile.c Outdated
Comment thread src/discof/tower/fd_tower_tile.c Outdated
Comment thread src/choreo/tower/fd_tower_lockos.h Outdated
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch 2 times, most recently from 5acb609 to 7e2936e Compare March 4, 2026 20:18
Copilot AI review requested due to automatic review settings March 4, 2026 20:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 20 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/discof/tower/fd_tower_tile.c:1128

  • New equivocation metrics (slot_eqvoced_cnt/gauge) are updated in replay_slot_completed, but metrics_write never exports them via FD_MCNT_SET / FD_MGAUGE_SET. As a result, the newly added metrics in metrics.xml/generated headers won't actually report meaningful values. Add the corresponding metric writes alongside the existing SLOT_IGNORED metrics writes.
static inline void
metrics_write( fd_tower_tile_t * ctx ) {
  FD_MCNT_SET  ( TOWER, SLOT_IGNORED_CNT,   ctx->metrics.slot_ignored_cnt   );
  FD_MGAUGE_SET( TOWER, SLOT_IGNORED_GAUGE, ctx->metrics.slot_ignored_gauge );

Comment thread src/choreo/tower/fd_tower_leaves.c
Comment thread src/choreo/tower/fd_tower_stakes.c Outdated
Comment thread src/discof/tower/fd_tower_tile.c
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch 2 times, most recently from 7d526fe to 1e9cc5d Compare March 4, 2026 20:36
Copilot AI review requested due to automatic review settings March 4, 2026 20:52
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch from 1e9cc5d to b973e50 Compare March 4, 2026 20:52
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch from ec8eb7d to 4788828 Compare March 5, 2026 17:39
Copilot AI review requested due to automatic review settings March 5, 2026 17:40
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch 3 times, most recently from 206b430 to 283c4e9 Compare March 5, 2026 17:48
@lidatong lidatong enabled auto-merge (squash) March 5, 2026 17:48
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch 2 times, most recently from 5275a57 to 79e0f63 Compare March 5, 2026 18:09
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 5, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.061301 s 0.061185 s -0.189%
backtest mainnet-368528500-perf snapshot load 3.282 s 2.509 s -23.553%
backtest mainnet-368528500-perf total elapsed 61.30082 s 61.18486 s -0.189%
firedancer mem usage with mainnet.toml 963.38 GiB 963.38 GiB 0.000%

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 28 changed files in this pull request and generated 4 comments.

Comment thread src/choreo/tower/fd_tower_lockos.c
Comment thread src/choreo/tower/fd_tower_leaves.c
Comment thread src/choreo/tower/fd_tower_lockos.h
Comment thread src/discof/tower/fd_tower_tile.c
Comment thread src/discof/tower/fd_tower_tile.c Outdated
tower_blk->replayed = 1;
tower_blk->replayed_block_id = slot_completed->block_id;
tower_blk->voted = 0;
tower_blk->confirmed = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we should be resetting voted and confirmed here?

@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch from 79e0f63 to d1293cb Compare March 6, 2026 13:41
Copilot AI review requested due to automatic review settings March 6, 2026 13:42
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch from d1293cb to c659cbc Compare March 6, 2026 13:42
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 6, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.061091 s 0.06087 s -0.362%
backtest mainnet-368528500-perf snapshot load 3.172 s 2.522 s -20.492%
backtest mainnet-368528500-perf total elapsed 61.09129 s 60.86989 s -0.362%
firedancer mem usage with mainnet.toml 963.38 GiB 963.38 GiB 0.000%

@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch from c659cbc to 859994e Compare March 6, 2026 13:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 31 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/discof/tower/fd_tower_tile.c:1093

  • New metrics slot_eqvoced_cnt / slot_eqvoced_gauge are tracked in ctx->metrics but never exported in metrics_write, so they will remain invisible to the metrics subsystem. Add the corresponding FD_MCNT_SET / FD_MGAUGE_SET calls alongside the existing SlotIgnored metrics.

Comment thread src/choreo/hfork/fd_hfork.c Outdated
Comment thread src/discof/tower/fd_tower_tile.c
Comment thread src/discof/tower/fd_tower_tile.c
Comment thread src/discof/tower/fd_tower_tile.c
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 6, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.072235 s 0.072579 s 0.476%
backtest mainnet-368528500-perf snapshot load 3.238 s 2.795 s -13.681%
backtest mainnet-368528500-perf total elapsed 72.234625 s 72.579288 s 0.477%
firedancer mem usage with mainnet.toml 963.38 GiB 963.38 GiB 0.000%

@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch from 859994e to 69bf43b Compare March 6, 2026 14:48
emwang-jump
emwang-jump previously approved these changes Mar 6, 2026
Copilot AI review requested due to automatic review settings March 6, 2026 20:11
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch from 69bf43b to 432d778 Compare March 6, 2026 20:11
@lidatong lidatong force-pushed the chali/refactor/tower-cleanup branch from 432d778 to 9bb2afc Compare March 6, 2026 20:12
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 6, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-368528500-perf per slot 0.061112 s 0.06095 s -0.265%
backtest mainnet-368528500-perf snapshot load 3.059 s 2.392 s -21.805%
backtest mainnet-368528500-perf total elapsed 61.111762 s 60.949877 s -0.265%
firedancer mem usage with mainnet.toml 963.38 GiB 963.38 GiB 0.000%

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 29 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/discof/tower/fd_tower_tile.c:1093

  • New equivocation metrics (slot_eqvoced_cnt/gauge) are updated in ctx->metrics but never exported in metrics_write(), so they will stay at zero in published metrics despite being incremented. Add the corresponding FD_MCNT_SET / FD_MGAUGE_SET calls here to match the new generated tower metrics.
static inline void
metrics_write( fd_tower_tile_t * ctx ) {
  FD_MCNT_SET  ( TOWER, SLOT_IGNORED_CNT,   ctx->metrics.slot_ignored_cnt   );
  FD_MGAUGE_SET( TOWER, SLOT_IGNORED_GAUGE, ctx->metrics.slot_ignored_gauge );

  FD_MGAUGE_SET( TOWER, REPLAY_SLOT,  ctx->metrics.replay_slot  );
  FD_MGAUGE_SET( TOWER, VOTE_SLOT,    ctx->metrics.vote_slot    );

Comment on lines +710 to +748
fd_tower_blk_t * eqvoc_tower_blk = NULL;
if( FD_UNLIKELY( eqvoc_tower_blk = fd_tower_blocks_query( ctx->tower_blocks, slot_completed->slot ) ) ) {

/* As fd_replay_slot_completed guarantees, we process at most 2
equivocating blocks for a given slot. fd_tower_{...} only stores
one version of a block (by design, given the tower protocol's
limitations ... see notes in fd_tower_blocks.h). We also know
when seeing the same slot a second time that second block is
confirmed, also guaranteed by fd_replay_slot_completed ... see
notes in fd_replay_tile.h).

So we retain the existing tower_block we have (which contains the
first replayed_block_id as well as the voted_block_id if we did
indeed vote for it). We clear out the slot from the other tower
adjacent structures, and re-insert into them with the confirmed
version of the slot . */

FD_TEST( eqvoc_tower_blk->confirmed ); /* check the confirmed bit is set (second replay_slot_completed version must be confirmed) */
fd_tower_leaves_remove( ctx->tower_leaves, slot_completed->slot );
fd_tower_lockos_remove( ctx->tower_lockos, slot_completed->slot );
fd_tower_stakes_remove( ctx->tower_stakes, slot_completed->slot );

/* If the previous equivocating version had a different parent than
the new version, then we need to ensure the original equivocating
block's parent is restored as a tower leaf.

TODO check agave doesn't have equivocating? */

if( FD_UNLIKELY( eqvoc_tower_blk->parent_slot != slot_completed->parent_slot &&
!fd_tower_leaves_map_ele_query( ctx->tower_leaves->map, &eqvoc_tower_blk->parent_slot, NULL, ctx->tower_leaves->pool ) /* added by another leaf */ ) ) {
fd_tower_leaf_t * leaf = fd_tower_leaves_pool_ele_acquire( ctx->tower_leaves->pool );
leaf->slot = eqvoc_tower_blk->parent_slot;
fd_tower_leaves_map_ele_insert( ctx->tower_leaves->map, leaf, ctx->tower_leaves->pool );
fd_tower_leaves_dlist_ele_push_tail( ctx->tower_leaves->dlist, leaf, ctx->tower_leaves->pool );
eqvoc_tower_blk->parent_slot = slot_completed->parent_slot;
}
/* update the parent slot. Crucial to do this before tower_blocks_replayed*/
tower_block->parent_slot = slot_completed->parent_slot;

ctx->metrics.slot_eqvoced_cnt++;
ctx->metrics.slot_eqvoced_gauge = slot_completed->slot;
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equivocation path assumes the existing tower block is already duplicate-confirmed (FD_TEST(eqvoc_tower_blk->confirmed)), but the first replayed version may not be confirmed yet. This will crash on the normal “first replay unconfirmed, second replay confirmed” flow described in the new fd_replay_slot_completed guarantee. Instead, on the second replay of the same slot, update the existing blk in-place to mark it confirmed and set confirmed_block_id (and any other per-slot metadata like parent_slot/bank_idx) based on slot_completed, rather than asserting it was already confirmed.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants