Skip to content

bundle: defer publish to after_credit via pending deque#8763

Merged
cmoyes-jump merged 12 commits intofiredancer-io:mainfrom
cmoyes-jump:cmoyes/bundletile
Mar 9, 2026
Merged

bundle: defer publish to after_credit via pending deque#8763
cmoyes-jump merged 12 commits intofiredancer-io:mainfrom
cmoyes-jump:cmoyes/bundletile

Conversation

@cmoyes-jump
Copy link
Copy Markdown
Contributor

@cmoyes-jump cmoyes-jump commented Mar 6, 2026

Split into before_credit (I/O, gated on deque empty and cr_avail) and after_credit (drain one bundle or up to STEM_BURST packets per call). Publish metadata is buffered in a zero-copy deque. dcache writes are capped at cr_avail to prevent overwriting data the downstream consumer hasn't read.

Fixes #8654

@cmoyes-jump cmoyes-jump self-assigned this Mar 6, 2026
Copilot AI review requested due to automatic review settings March 6, 2026 19:22
@cmoyes-jump cmoyes-jump force-pushed the cmoyes/bundletile branch 3 times, most recently from 4a7245c to 25d8077 Compare March 6, 2026 19:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Firedancer bundle tile's publish path to separate gRPC I/O (before_credit) from stem publishing (after_credit). Instead of publishing directly to the tango message bus during gRPC callbacks, transactions are written to the dcache and their metadata is buffered in a zero-copy dynamic deque (pending_pubs). The after_credit callback then drains the deque: publishing complete bundles atomically (up to STEM_BURST) or batches of up to STEM_BURST individual packets per call. Two new metrics expose the pending buffer depth and any backpressure-induced drops.

Changes:

  • New fd_bundle_pending_pub deque in the bundle tile's private state, used to stage publish metadata between gRPC I/O and fd_stem_publish.
  • New before_credit/after_credit split: I/O is gated on the deque being empty and at least 1 credit being available; draining is gated on min_cr_avail >= STEM_BURST.
  • New metrics (PendingTransactions gauge, TransactionDroppedBackpressure counter) and associated generated code and documentation.

Reviewed changes

Copilot reviewed 7 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/disco/bundle/fd_bundle_tile.c Adds before_credit, moves STEM_BURST earlier, splits after_credit into drain loop + plugin update
src/disco/bundle/fd_bundle_tile_private.h Adds fd_bundle_pending_pub struct, deque template, depth field in fd_bundle_out_ctx, backpressure_drop_cnt metric
src/disco/bundle/fd_bundle_client.c Replaces direct fd_stem_publish calls with deque pushes; adds per-publish backpressure checks
src/disco/bundle/test_bundle_common.c Allocates/destroys the deque in the test harness, adds depth to test verify_out
src/disco/bundle/test_bundle_client.c Updates existing tests to check deque count instead of mcache; adds three new drain-logic unit tests
src/disco/metrics/metrics.xml Adds two new bundle metrics and two unrelated snapct gossip metrics
src/disco/metrics/generated/fd_metrics_bundle.h Regenerated header reflecting new metric offsets
src/disco/metrics/generated/fd_metrics_bundle.c Regenerated source with new DECLARE_METRIC entries
book/api/metrics-generated.md Documentation updated for new bundle and snapct metrics

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/disco/bundle/fd_bundle_tile.c Outdated
Comment thread src/disco/bundle/fd_bundle_tile.c Outdated
Copilot AI review requested due to automatic review settings March 6, 2026 19:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 11 changed files in this pull request and generated 4 comments.


You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/disco/bundle/fd_bundle_tile.c Outdated
Comment thread src/disco/bundle/fd_bundle_tile_private.h
Comment thread src/disco/bundle/test_bundle_client.c
Comment thread src/disco/metrics/metrics.xml Outdated
Comment thread src/disco/metrics/metrics.xml Outdated
Copy link
Copy Markdown
Contributor

@jherrera-jump jherrera-jump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this!

<histogram name="MessageRxDelayNanos" min="100000" max="1000000000">
<summary>Message receive delay in nanoseconds from bundle server to bundle client</summary>
</histogram>
<gauge name="PendingTransactions" summary="Number of transactions buffered and waiting to be published" />
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PendingTransactions here is probably less useful as a metric due to how slow consumers poll metrics compared to the rate of change of the metric itself.

Maybe instead this should be the number of times pending_transaction_cnt exceeds pending_max/2?

Also don't feel too strongly about this so fine to leave it as-is.

Copy link
Copy Markdown
Contributor Author

@cmoyes-jump cmoyes-jump Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if others also feel this is worth doing, I will change, how does that sound?

Comment thread src/disco/bundle/fd_bundle_client.c Outdated
Comment thread src/disco/bundle/fd_bundle_client.c Outdated
Comment thread src/disco/bundle/fd_bundle_tile.c Outdated
Copilot AI review requested due to automatic review settings March 6, 2026 20:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 9 changed files in this pull request and generated no new comments.


You can also share your feedback on Copilot code review. Take the survey.

Copilot AI review requested due to automatic review settings March 6, 2026 20:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 9 changed files in this pull request and generated 2 comments.


You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/disco/bundle/fd_bundle_tile.c Outdated
Comment thread src/disco/bundle/fd_bundle_tile.c Outdated
Comment on lines +41 to +46
ulong pending_max = tile->bundle.buf_sz / FD_BUNDLE_MIN_GRPC_WIRE_SZ;
ulong l = FD_LAYOUT_INIT;
l = FD_LAYOUT_APPEND( l, alignof(fd_bundle_tile_t), sizeof(fd_bundle_tile_t) );
l = FD_LAYOUT_APPEND( l, fd_grpc_client_align(), fd_grpc_client_footprint( tile->bundle.buf_sz ) );
l = FD_LAYOUT_APPEND( l, fd_alloc_align(), fd_alloc_footprint() );
l = FD_LAYOUT_APPEND( l, pending_txn_align(), pending_txn_footprint( pending_max ) );
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pending transaction deque is sized as buf_sz / FD_BUNDLE_MIN_GRPC_WIRE_SZ entries, but each fd_bundle_pending_txn_t entry occupies approximately 1304 bytes (1232 bytes payload + metadata). For the default buffer_size_kib = 2048 (fdctl), this yields ~131,072 entries × 1304 bytes ≈ 171 MB just for the deque. For buffer_size_kib = 16384 (firedancer), it yields ~1.37 GB.

The sizing rationale is correct for overflow prevention (the gRPC frame buffer limits inbound transactions), but the deque entries are ~81x larger than the minimum wire format assumption (16 bytes). In practice, a much smaller deque (e.g., capped at a few hundred entries, since at most STEM_BURST * N transactions are needed to keep the pipeline full) would suffice and avoid this large memory footprint in the tile's scratch space.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need guidance on whether we can safely size down the theoretical upper bound

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just size to match the bundle_verif link, i.e. config->tiles.verify.receive_buffer_size / FD_TPU_PARSED_MTU

Copy link
Copy Markdown
Contributor Author

@cmoyes-jump cmoyes-jump Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that makes sense, I made your suggested change

Copilot AI review requested due to automatic review settings March 6, 2026 20:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 9 changed files in this pull request and generated 1 comment.


You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/disco/bundle/fd_bundle_tile_private.h
Copilot AI review requested due to automatic review settings March 7, 2026 19:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 13 changed files in this pull request and generated 1 comment.


You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/disco/bundle/fd_bundle_client.c
Comment thread src/disco/bundle/fd_bundle_tile.c Outdated
ripatel-fd
ripatel-fd previously approved these changes Mar 8, 2026
Copilot AI review requested due to automatic review settings March 9, 2026 14:33
@cmoyes-jump cmoyes-jump enabled auto-merge (squash) March 9, 2026 14:33
Co-authored-by: ripatel-fd <ripatel+git@jumptrading.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 13 changed files in this pull request and generated 2 comments.


You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/disco/bundle/fd_bundle_client.c Outdated
Comment thread src/disco/bundle/fd_bundle_tile.c
@cmoyes-jump cmoyes-jump merged commit b7f5790 into firedancer-io:main Mar 9, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix bundle tile STEM_BURST

5 participants