feat(tools/talis): vendor talis deployment tool + Fibre experiment runner#3301
Merged
julienrbrt merged 1 commit intoevstack:julien/fiberfrom Apr 29, 2026
Merged
Conversation
Contributor
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
dde3881 to
5e0cf38
Compare
ee05d3c to
2c98c5d
Compare
5e0cf38 to
4dafc89
Compare
…nner
Brings the celestia-app talis multi-cloud deploy tool into ev-node,
plus a long-lived ev-node aggregator runner that wires the existing
celestia-node-fiber adapter behind ev-node's DA client interface.
Verified end-to-end on AWS — talis up → genesis → deploy →
setup-fibre → start-fibre → fibre-bootstrap-evnode reaches
24.57 MB/s @ 99.7 % ok on a 60 s sustained loadgen
(3 × c6in.4xlarge validators + c6in.2xlarge bridge +
c6in.8xlarge ev-node + c6in.2xlarge load-gen, us-east-1).
What this adds:
• tools/talis/ — vendored from celestia-app's
feat/fibre-payments. Provisions AWS / DO / GCP boxes for
validators + bridge + ev-node + load-gen, deploys binaries +
init scripts, drives the Fibre setup-fibre + start-fibre flow,
and ships a fibre-bootstrap-evnode step that scp's the bridge
JWT and Fibre payment keyring onto each ev-node before its
init script starts the daemon.
• tools/celestia-node-fiber/cmd/evnode-fibre/ — the long-lived
aggregator runner. Wires block.NewFiberDAClient on top of the
celestia-node-fiber adapter that julien/fiber already ships,
plus the in-memory executor + HTTP /tx ingress used by
evnode-txsim. Distinct from the existing fiber-bench cmd.
• tools/talis/cmd/evnode-txsim/ — small Go load-gen that pumps
the runner's HTTP /tx ingress for a fixed duration; deployed
to load-gen boxes and prints a single TXSIM: line on completion.
Two small ev-node-side helpers the runner calls:
• block/public.go: SetMaxBlobSize(n) — overrides the per-blob
byte cap so the runner can lift Celestia's 5 MiB default to
Fibre's 120 MiB headroom.
• pkg/config/config.go: Config.ApplyFiberDefaults() — flips the
DA config to Fibre-friendly settings (adaptive batching, 1 s
DA.BlockTime, 50-deep pending-cache window) when the Fiber
profile is enabled, so a runner can opt in with one call.
setup-fibre robustness fixes uncovered during the verified run:
• bash script for set-host now retries until the validator's
host appears in `query valaddr providers`. The previous one-
shot call relied on `--yes` returning the txhash before block
inclusion; if the chain wasn't ready, the tx silently bounced.
The Fibre client cached the partial set on startup and uploads
cascaded to "host not found" → "voting power: collected 0".
• talis-CLI side polls `query valaddr providers` after the per-
validator scripts finish and refuses to return until all
validators are registered (5-minute deadline).
External dependency (documented in tools/talis/fibre.md):
• Sibling clone of celestia-app on a branch with feat/fibre-payments
+ sysrex/fibre_url_fix cherry-picked. Without the URL-parse fix
the Fibre client rejects every host:port registration.
Tested:
- go build ./... — clean
- go test ./block/internal/submitting ./pkg/config (the two
pre-existing test failures on julien/fiber — TestAddFlags
and TestFiberClient_Submit_BlobTooLarge — are not introduced
by this PR and reproduce on raw julien/fiber)
- End-to-end AWS deploy from this branch — 24.57 MB/s, 99.7 % ok
4dafc89 to
d77175d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings the celestia-app talis multi-cloud deploy tool into ev-node, plus the wiring needed to deploy a working Fibre DA aggregator end-to-end on top of it. Verified via a fresh AWS run from this branch —
talis up → genesis → deploy → setup-fibre → start-fibre → fibre-bootstrap-evnodereaches 24.57 MB/s @ 99.7 % ok rate on a 60 s sustained loadgen (3 × c6in.4xlarge validators + c6in.2xlarge bridge + c6in.8xlarge ev-node + c6in.2xlarge load-gen, all us-east-1).What's added
tools/talis/— vendored from celestia-app'sfeat/fibre-payments. Provisions AWS / DO / GCP boxes for one or more validators + bridge + ev-node + load-gen, deploys binaries + init scripts, drives a Fibresetup-fibre+start-fibreflow, and ships anfibre-bootstrap-evnodestep that scp's the bridge JWT and Fibre payment keyring onto each ev-node before its init script starts the daemon.tools/celestia-node-fiber/cmd/evnode-fibre/— long-lived aggregator runner that wiresblock.NewFiberDAClienton top of the celestia-node-fiber adapter. Compiled bytalis genesisand shipped to evnode-* hosts.tools/talis/cmd/evnode-txsim/— small Go load-gen that pumps the runner's HTTP/txingress for a fixed duration; deployed to load-gen boxes and prints a singleTXSIM:line on completion.tools/talis/Makefile— cross-compiles celestia-appd, the fibre server + load tool, the bridge/light celestia binary, and both runner binaries to linux/amd64 fortalis genesis -b.ev-node-side foundation work that the runner needs
These changes were what turned a "wired but slow" Fibre DA path into something that actually overlaps uploads and survives sustained load:
pkg/config/config.go—ApplyFiberDefaults()profile (adaptive batching, 1 sDA.BlockTime, 50-deep pending-cache window).block/public.go—SetMaxBlobSizeto lift the 5 MiB Celestia default to Fibre's 120 MiB headroom;NewFiberDAClient+ Fibre type re-exports.block/internal/da/{fiber_client.go, fiber/types.go, fibremock/}— Fibre adapter wired through the DA client interface, with a matching mock used by the testing tree.block/internal/submitting/da_submitter.go— per-stream upload workers,splitByBlobSizechunking, parallel signing pool, and an oversized-blob safety net that advances the cache instead of looping forever (which OOM'd the daemon under sustained Fibre stalls in the AWS run).core/sequencer/sequencing.go+pkg/sequencers/solo/sequencer.go—ErrQueueFullsentinel + bounded mempool queue so the reaper backs off when the executor is paused on the pending-cache cap, instead of feeding an unbounded queue and getting one giant block on resume.setup-fibre fixes uncovered during the verified run
set-hostnow retries until the validator's host appears inquery valaddr providers. The previous one-shot call relied on--yesreturning the txhash before block inclusion; if the chain wasn't ready yet, the tx silently bounced and the validator never registered. The Fibre client cached the partial set on startup and uploads cascaded tohost not found→voting power: collected 0.query valaddr providersafter the per-validator scripts finish and refuses to return until all validators are registered (5-minute deadline, then errors so the operator can re-run).External dependency
Documented in
tools/talis/fibre.md: a sibling clone of celestia-app on a branch withfeat/fibre-payments+ thesysrex/fibre_url_fixcherry-pick. Without the URL-parse fix the Fibre client rejects everyhost:portregistration withfirst path segment in URL cannot contain colonandvoting power: collected 0cascades. (This is a celestia-app fix that needs to land separately.)Test plan
go test ./block/... ./pkg/... ./types/...— all greengo build ./...— cleanmake build-bins— all 6 binaries cross-compile cleanly to linux/amd64🤖 Generated with Claude Code