Add mkosi flashbots-images VM to playground#302
Conversation
|
Claude finished @canercidam's task in 4m 18s —— View job PR Review: Add mkosi flashbots-images VM to playground
SummaryThis PR adds support for running a BuilderNet VM (via QEMU/mkosi) alongside the Docker-based playground. It includes Go changes to the core components/recipes and a set of shell scripts for VM lifecycle management. Bugs
Regressions affecting existing users
Other issues
RecommendationThe main concern is that several changes affect all recipes (target-peers, healthmon disabling, Electra fork epoch, Fileserver) rather than being scoped to VM-enabled deployments. These should be behind a flag to avoid regressions for existing users. |
| resp, err := http.Get(beaconApi) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to get beacon node identity: %v", err) | ||
| } | ||
| var identityRespPayload identityResponse | ||
| if err := json.NewDecoder(resp.Body).Decode(&identityRespPayload); err != nil { | ||
| return fmt.Errorf("failed to decode identity resp payload: %v", err) | ||
| } |
There was a problem hiding this comment.
Bug: resp.Body is never closed. This leaks an HTTP connection. Add defer resp.Body.Close() after the nil-error check.
| resp, err := http.Get(beaconApi) | |
| if err != nil { | |
| return fmt.Errorf("failed to get beacon node identity: %v", err) | |
| } | |
| var identityRespPayload identityResponse | |
| if err := json.NewDecoder(resp.Body).Decode(&identityRespPayload); err != nil { | |
| return fmt.Errorf("failed to decode identity resp payload: %v", err) | |
| } | |
| resp, err := http.Get(beaconApi) | |
| if err != nil { | |
| return fmt.Errorf("failed to get beacon node identity: %v", err) | |
| } | |
| defer resp.Body.Close() | |
| var identityRespPayload identityResponse | |
| if err := json.NewDecoder(resp.Body).Decode(&identityRespPayload); err != nil { | |
| return fmt.Errorf("failed to decode identity resp payload: %v", err) | |
| } |
| respData, err := postRequest(rethApi, "/", map[string]interface{}{ | ||
| "jsonrpc": "2.0", | ||
| "method": "admin_nodeInfo", | ||
| "id": 1, | ||
| }) | ||
| var enodeRespPayload enodeResponse | ||
| if err := json.Unmarshal(respData, &enodeRespPayload); err != nil { | ||
| return fmt.Errorf("failed to decode enode resp payload: %v", err) | ||
| } |
There was a problem hiding this comment.
Bug: The error from postRequest is silently ignored. If the admin_nodeInfo call fails, respData could be nil, causing json.Unmarshal to fail with a confusing error instead of the real HTTP error.
| respData, err := postRequest(rethApi, "/", map[string]interface{}{ | |
| "jsonrpc": "2.0", | |
| "method": "admin_nodeInfo", | |
| "id": 1, | |
| }) | |
| var enodeRespPayload enodeResponse | |
| if err := json.Unmarshal(respData, &enodeRespPayload); err != nil { | |
| return fmt.Errorf("failed to decode enode resp payload: %v", err) | |
| } | |
| respData, err := postRequest(rethApi, "/", map[string]interface{}{ | |
| "jsonrpc": "2.0", | |
| "method": "admin_nodeInfo", | |
| "id": 1, | |
| }) | |
| if err != nil { | |
| return fmt.Errorf("failed to get reth admin_nodeInfo: %v", err) | |
| } | |
| var enodeRespPayload enodeResponse | |
| if err := json.Unmarshal(respData, &enodeRespPayload); err != nil { | |
| return fmt.Errorf("failed to decode enode resp payload: %v", err) | |
| } |
| @@ -558,7 +561,8 @@ func (l *LighthouseBeaconNode) Apply(ctx *ExContext) *Component { | |||
| WithArtifact("/data/jwtsecret", "jwtsecret"). | |||
| WithVolume("data", "/data_beacon") | |||
|
|
|||
| UseHealthmon(component, svc, healthmonBeacon) | |||
| // TODO: Enable later - doesn't work with --target-peers=1 which is required for builder VM | |||
| //UseHealthmon(component, svc, healthmonBeacon) | |||
There was a problem hiding this comment.
Changing --target-peers from 0 to 1 and disabling the beacon healthmon (with the TODO comments at lines 564-565 and 642-643) is a significant regression for existing users who don't use the VM feature.
With --target-peers 1, lighthouse will wait for a peer before reporting healthy, which breaks single-node L1 and non-VM BuilderNet deployments. Disabling UseHealthmon for the beacon node means the DependsOnHealthy("beacon") calls (e.g., from contender) lose their safety guarantees.
This was also flagged in the existing review discussion. Consider making these changes conditional on a flag (e.g., --with-vm) rather than applying them unconditionally to all recipes.
| flag "github.com/spf13/pflag" | ||
| ) | ||
|
|
||
| const BuilderHostIPAddress = "10.0.2.2" |
There was a problem hiding this comment.
The IP 10.0.2.2 is the QEMU user-mode networking host gateway address. This only works with QEMU's user-mode NAT (-nic user). If this constant is used in the builder config that gets registered into builder-hub (e.g., builderhub-config.yaml line 44-50 use it for relay URLs), it means BuilderNet deployments without a VM will have broken relay URLs in the config. Consider making this configurable or only injecting it when VM mode is active.
| var config builderHubConfig | ||
| if err := yaml.Unmarshal([]byte(configYaml), &config); err != nil { |
There was a problem hiding this comment.
yaml.Unmarshal takes []byte, but defaultBuilderHubConfig is already []byte (from the //go:embed directive). The []byte(configYaml) cast is redundant but harmless. However, the real issue: if b.BuilderConfig is empty and the default config is used, the configYaml variable is []byte. But if a custom config is read via os.ReadFile, it's also []byte. This is fine—just noting the type is consistent.
More importantly: there is no validation that the parsed config.Playground.BuilderHubConfig fields are non-empty before using them. If someone provides a YAML file that doesn't include the playground.builder_hub_config section, all fields will silently be empty strings, which will create a builder with empty ID, empty IP, etc.
| endpoint := fmt.Sprintf("http://localhost:%d", apiPort.HostPort) | ||
|
|
||
| err := registerBuilder(endpoint, &builderHubRegisterBuilderInput{ | ||
| err := registerBuilder(endpoint, "", "", &builderHubRegisterBuilderInput{ |
There was a problem hiding this comment.
Passing empty strings for beaconApi and rethApi will cause registerBuilder to make HTTP requests to empty URLs (/eth/v1/node/identity and /), which will fail. This test only works because the function hits the measurements endpoint first, and the previous test code didn't need beacon/reth APIs.
Now that registerBuilder fetches beacon identity and reth enode as its first operations, this test will fail at the http.Get(beaconApi) call. The test needs to either:
- Set up mock HTTP servers for those endpoints, or
- Skip the beacon/reth discovery when those URLs are empty
|
|
||
| genesisTime := time.Now().Add(time.Duration(b.genesisDelay) * time.Second) | ||
| config := params.BeaconConfig() | ||
| config.ElectraForkEpoch = 0 |
There was a problem hiding this comment.
This line (config.ElectraForkEpoch = 0) forces the Electra fork to activate at epoch 0 unconditionally, regardless of the applyLatestL1Fork flag. This changes behavior for all recipes, not just BuilderNet. Was this intentional? It should likely be conditional on b.applyLatestL1Fork or removed if Electra is already the default.
| return fmt.Errorf("failed to decode identity resp payload: %v", err) | ||
| } | ||
| peerID := identityRespPayload.Data.PeerID | ||
| libP2PAddr := fmt.Sprintf("/ip4/%s/tcp/9001/p2p/%s", BuilderHostIPAddress, peerID) |
There was a problem hiding this comment.
The CL P2P port 9001 is hardcoded here. The actual lighthouse beacon node uses {{Port "p2p" 9000}} (line 544 of components.go), which defaults to 9000, not 9001. This mismatch means the libp2p address registered in builder-hub will point to the wrong port.
If 9001 is the TCP port specifically for libp2p (distinct from the ENR port), this needs a comment explaining why it differs. Otherwise, this should match the p2p port from the beacon service.
| auth_header = "" | ||
|
|
||
| disk_encryption: | ||
| key: 5d7052c0c3aff5834f45e3f33aca0a55ef9f43ca9cf6c5c8e8375ab82564ddb6 # playground value |
There was a problem hiding this comment.
This file is embedded via //go:embed into the binary and used as the default builder-hub configuration. It contains:
- A BLS relay secret key (line 15)
- A coinbase secret key (line 17)
- A disk encryption key (line 54)
While these are clearly development/playground values, embedding secret keys in the binary (even for dev) means they'll persist in all builds. Consider adding a comment at the top of this file making it explicit that these are playground-only keys and must never be used in production.
|
|
||
| # TRIED TO DISABLE SERVICES - DID NOT WORK | ||
| # error: | ||
| # qemu-system-x86_64: -append only allowed with -kernel option | ||
|
|
||
| # PLAYGROUND_DISABLE_SERVICES=( | ||
| # reth-sync # Downloads Reth snapshot from S3 bucket | ||
| # acme-le # Issues Let's Encrypt TLS certificates | ||
| # acme-le-renewal # Renews Let's Encrypt certificates | ||
| # rbuilder-bidding-downloader # Downloads binary from private GitHub repo | ||
| # vector # Observability pipeline (logs/metrics) | ||
| # rbuilder-rebalancer # ETH balance rebalancing across wallets | ||
| # operator-api # Management API for node operators | ||
| # config-watchdog # Watches and reloads rbuilder config | ||
| # ) | ||
|
|
||
| # mask_args() { | ||
| # [[ $# -gt 0 ]] && printf "systemd.mask=%s.service " "$@" | ||
| # } | ||
| # # # add argument to qemu-system-x86_64: | ||
| # # \ | ||
| # # -append "console=ttyS0 $(mask_args "${PLAYGROUND_DISABLE_SERVICES[@]}")" |
There was a problem hiding this comment.
Large block of dead code (commented-out service masking). This should either be implemented or removed. If it's a known limitation worth tracking, file an issue and reference it instead of leaving 20+ lines of commented-out code in a new file.
|
|
||
| SSH_PORT=2222 | ||
|
|
||
| ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -p ${SSH_PORT} bnet@localhost |
There was a problem hiding this comment.
Security: -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null suppresses all host key verification. This is acceptable for ephemeral dev VMs, but consider adding a comment explaining this is intentional for playground use only, so it doesn't get cargo-culted into production scripts.
| SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
|
|
||
| FLASHBOTS_IMAGES_DIR="${SCRIPT_DIR}/.flashbots-images" | ||
| FLASHBOTS_IMAGES_BRANCH="fryd/mkosi-playground" |
There was a problem hiding this comment.
This hardcodes the branch fryd/mkosi-playground. Once the flashbots-images work is merged to main, this will need updating. Consider making it a variable that defaults to main with an override, or at least add a TODO comment.
| }) | ||
| } | ||
|
|
||
| component.AddComponent(ctx, &Fileserver{}) |
There was a problem hiding this comment.
The Fileserver is now unconditionally added to every L1 recipe deployment. This means even users who never use VMs will have a Caddy container running and serving genesis/testnet files on port 8100. This adds resource overhead and exposes configuration files on the network.
Consider making this conditional on a flag (e.g., --with-fileserver or --with-vm), or at minimum documenting that this is a new service users can expect.
playground/recipe_l1.go
Outdated
|
|
||
| svcManager.RunContenderIfEnabled() | ||
|
|
||
| svcManager.AddService(&Fileserver{}) |
There was a problem hiding this comment.
Shall we move this to BuilderNetRecipe.Apply()? I guess we don't need it for the L1 recipe.
This reverts commit 472899d.
This PR tries to unify the BuilderNet recipe by basing itself on Fryd's PR (#302). It may contain changes from PRs awaiting to be merged and will shrink in size as they are merged. --------- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
No description provided.