feat: add TestDeFiSimulation benchmark for Uniswap V2 workload#3187
feat: add TestDeFiSimulation benchmark for Uniswap V2 workload#3187
Conversation
Replace hardcoded spammer config with benchConfig fields so all parameters are controllable via BENCH_* env vars. Add pair_count and rebroadcast as configurable options for the uniswap-swaps scenario.
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @chatton's task in 2m 9s —— View job Code Review: feat: add TestDeFiSimulation benchmark for Uniswap V2 workloadReview Checklist
SummaryThis PR adds a Code Quality & Architecture
The test closely mirrors
This means the DeFi test can silently succeed even if every transaction fails. There is no equivalent to
"pair_count": envInt("BENCH_PAIR_COUNT", 1),
"rebroadcast": envInt("BENCH_REBROADCAST", 0),These parameters are read directly from the environment, bypassing the
Both should be added as fields on
Both DeFi and ERC20 tests hardcode Error Handling & ResilienceDrain timeout hardcoded at 30s (lines 91–95, also noted by CodeRabbit): drainCtx, drainCancel := context.WithTimeout(ctx, 30*time.Second)This is the same pattern used in A comment documenting the rationale (e.g., "sufficient for default pair_count=1; increase if using many pairs") would satisfy the CodeRabbit note without requiring a code change. A config-driven drain timeout would be the clean fix: Fix this →
case <-timer.C:
t.Logf("metric %s did not reach target %.0f within %v", name, target, timeout)
returnThe function logs a warning and returns silently on timeout. In
The 5-second sleep buys time for spammers to start before the health check. This is a best-effort delay: it is too short on slow CI runners and wastes time on fast ones. The existing SecurityNo security concerns. The DeFi scenario uses the same Performance & Resource Efficiency
"refill_amount": "10000000000000000000", // 10 ETH
"refill_balance": "5000000000000000000", // 5 ETHvs. ERC20's 5 ETH/2 ETH. The inline comment explains why ("swaps need ETH for WETH wrapping and router approvals"), which is correct and appreciated.
The log is printed before No spammer ID slice pre-allocation: var spammerIDs []int // line 53With Testing & ReliabilityNo assertion on As noted above, s.Require().Greater(sent, float64(0), "at least one transaction should have been sent")
s.Require().Zero(failed, "no transactions should have failed")
Documentation & Developer ExperienceTest-level comment is excellent (lines 13–22): The godoc comment for
This is noticeably better than the ERC20 test's comment and sets a good precedent. Inline comments on warmup exclusion (lines 50–51, 62–68) are clear and helpful.
As noted in the architecture section, these DeFi-specific parameters are invisible in the test log. When debugging a run from CI logs alone, there is no way to know what CI Workflow (
|
| Severity | Location | Issue |
|---|---|---|
| Medium | spamoor_defi_test.go:88 |
No assertion on failed_total; silent failure if all txs fail |
| Medium | spamoor_defi_test.go:39-40 |
pair_count/rebroadcast bypass benchConfig; not logged |
| Low | spamoor_defi_test.go:91 |
Drain timeout may be too short for pair_count > 1; undocumented |
| Low | spamoor_defi_test.go:64 |
Unconditional 5s sleep before requireSpammersRunning; unexplained |
| Low | helpers.go:477 (pre-existing) |
waitForMetricTarget silently returns on timeout; no test failure |
📝 WalkthroughWalkthroughA new E2E benchmark test Changes
Sequence DiagramsequenceDiagram
participant Test as Test Harness
participant Spammer as Spamoor Spammers
participant Node as Blockchain Node
participant Metrics as Metrics System
participant Traces as Trace Collector
participant Results as Results Writer
Test->>Spammer: Spawn Uniswap V2 swap spammers
activate Spammer
Test->>Test: Sleep (warm-up delay)
Test->>Spammer: Assert spammers running
Spammer-->>Test: Running status
loop Warm-up polling
Test->>Metrics: Read spamoor_transactions_sent_total
Metrics-->>Test: Counter value
end
Test->>Traces: Reset trace collection window
Test->>Node: Fetch start block header
Node-->>Test: Start block header
loop Steady-state
Test->>Metrics: Poll until target transaction count reached
Metrics-->>Test: Count updates
end
Test->>Node: Wait for pending tx drain (bounded)
Node-->>Test: Drain status
Test->>Node: Fetch end block header
Node-->>Test: End block header
Test->>Node: Collect block gas/tx metrics for range
Node-->>Test: Block metrics
Test->>Traces: Gather execution traces
Traces-->>Test: Trace data
Test->>Results: Construct and record benchmark result
Results-->>Test: Ack
deactivate Spammer
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3187 +/- ##
==========================================
+ Coverage 61.12% 61.14% +0.01%
==========================================
Files 117 117
Lines 12082 12082
==========================================
+ Hits 7385 7387 +2
+ Misses 3870 3869 -1
+ Partials 827 826 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
test/e2e/benchmark/spamoor_defi_test.go (2)
91-95: Drain timeout is hardcoded; consider documenting the rationale.The 30-second timeout for draining pending transactions is reasonable, but a brief comment explaining why this value was chosen (or why it differs from
cfg.WaitTimeout) would help future maintainers understand the trade-off between waiting longer for accuracy vs. keeping test duration bounded.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/spamoor_defi_test.go` around lines 91 - 95, Add a short comment explaining why the drain timeout is set to 30 seconds and how it relates to test duration vs. accuracy (and note why it differs from cfg.WaitTimeout if that is relevant); place this comment immediately above the creation of drainCtx/drainCancel where waitForDrain is called (referencing drainCtx, drainCancel, and waitForDrain) so future maintainers understand the trade-off and rationale for this hardcoded value.
39-40: Consider addingpair_countandrebroadcasttobenchConfigfor consistency.Other
BENCH_*environment variables are accessed via thebenchConfigstruct (e.g.,cfg.Throughput,cfg.MaxWallets), but these two callenvInt()directly. Moving them tobenchConfigwould centralize configuration and allowcfg.log(t)to display all parameters.♻️ Suggested approach
Add fields to
benchConfiginconfig.go:PairCount: envInt("BENCH_PAIR_COUNT", 1), Rebroadcast: envInt("BENCH_REBROADCAST", 0),Then use
cfg.PairCountandcfg.Rebroadcasthere.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/spamoor_defi_test.go` around lines 39 - 40, Add PairCount and Rebroadcast fields to the benchConfig struct (in config.go) and initialize them using envInt("BENCH_PAIR_COUNT", 1) and envInt("BENCH_REBROADCAST", 0) respectively; then replace the direct envInt(...) calls in test/e2e/benchmark/spamoor_defi_test.go with cfg.PairCount and cfg.Rebroadcast so all BENCH_* settings are centralized and will be included when calling cfg.log(t).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/e2e/benchmark/spamoor_defi_test.go`:
- Around line 91-95: Add a short comment explaining why the drain timeout is set
to 30 seconds and how it relates to test duration vs. accuracy (and note why it
differs from cfg.WaitTimeout if that is relevant); place this comment
immediately above the creation of drainCtx/drainCancel where waitForDrain is
called (referencing drainCtx, drainCancel, and waitForDrain) so future
maintainers understand the trade-off and rationale for this hardcoded value.
- Around line 39-40: Add PairCount and Rebroadcast fields to the benchConfig
struct (in config.go) and initialize them using envInt("BENCH_PAIR_COUNT", 1)
and envInt("BENCH_REBROADCAST", 0) respectively; then replace the direct
envInt(...) calls in test/e2e/benchmark/spamoor_defi_test.go with cfg.PairCount
and cfg.Rebroadcast so all BENCH_* settings are centralized and will be included
when calling cfg.log(t).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: df5aba88-ffb3-47dd-9a60-f619f5a9a81c
📒 Files selected for processing (1)
test/e2e/benchmark/spamoor_defi_test.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/benchmark.yml (1)
103-123: Newdefi-benchmarkjob looks good and follows established patterns.The job structure, action versions, and test invocation are consistent with the existing
spamoor-benchmarkanderc20-benchmarkjobs.One note: The static analysis tool flagged
extractions/setup-just@v3(line 116) as an unpinned action. This is a pre-existing pattern used by all other jobs in this workflow (lines 34, 66, 94). Consider pinning to a commit SHA for supply chain security in a follow-up PR that addresses all instances.,
🔒 Optional: Pin the action to a specific commit SHA
You can find the commit SHA for the v3 tag and update all usages:
#!/bin/bash # Get the commit SHA for extractions/setup-just v3 tag curl -s https://api.github.com/repos/extractions/setup-just/git/refs/tags/v3 | jq -r '.object.sha'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/benchmark.yml around lines 103 - 123, Pin the unpinned GitHub Action reference "extractions/setup-just@v3" to a specific commit SHA for supply-chain security: replace every occurrence of extractions/setup-just@v3 in the workflow with extractions/setup-just@<commit-sha> (use the commit SHA for the v3 tag from the extractions/setup-just repo). Locate the action usages by searching for the exact string "extractions/setup-just@v3" and update them all consistently; fetch the correct SHA from the remote repo tags and use that SHA in the workflow entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.github/workflows/benchmark.yml:
- Around line 103-123: Pin the unpinned GitHub Action reference
"extractions/setup-just@v3" to a specific commit SHA for supply-chain security:
replace every occurrence of extractions/setup-just@v3 in the workflow with
extractions/setup-just@<commit-sha> (use the commit SHA for the v3 tag from the
extractions/setup-just repo). Locate the action usages by searching for the
exact string "extractions/setup-just@v3" and update them all consistently; fetch
the correct SHA from the remote repo tags and use that SHA in the workflow
entries.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a6048a31-43c5-489f-b8b4-38e1be131444
📒 Files selected for processing (1)
.github/workflows/benchmark.yml
Summary
TestDeFiSimulationbenchmark measuring Uniswap V2 swap throughputbenchConfigwithBENCH_*env vars (no hardcoded values)NOTE: it looks like a lot of these results do not look great, but I think this is largely due to a bottleneck at the spamoor level not being able to spam enough transactions. The full breakdown is below.
Hopefully we can see more realistic results with the dedicated hardware.
Part of #2288
DeFi Simulation Results (locally on docker)
Key Findings
Spamoor injection is the bottleneck, not ev-node or ev-reth. Gas/block never exceeds ~2-3M avg at 100ms blocks (out of 100M limit). The Uniswap scenario has heavy per-spammer warmup (contract deploys, liquidity provision) and limited steady-state send rate.
Best sustained result: 21.48 MGas/s (run 1, 100ms/100M/4x30). Best peak: 26.04 MGas/s (run 4, 8 spammers) but only 7s window.
ev-node overhead is consistently low (0.7-3.9%), confirming it is not a bottleneck.
ProduceBlock avg (28-202ms) varies with block size, not block time config. More gas/block = longer ProduceBlock.
These numbers are spamoor-limited, not system-limited. The gasburner test achieves 297-383 MGas/s on Docker Desktop and 645 MGas/s on dedicated hardware. The DeFi gap is due to spamoor Uniswap scenario send rate, not EVM execution or state root cost.
Summary by CodeRabbit
Tests
Chores