Skip to content

Add mkosi flashbots-images VM to playground#302

Merged
canercidam merged 15 commits intomainfrom
fryd/mkosi-playground
Feb 11, 2026
Merged

Add mkosi flashbots-images VM to playground#302
canercidam merged 15 commits intomainfrom
fryd/mkosi-playground

Conversation

@fkondej
Copy link
Contributor

@fkondej fkondej commented Jan 9, 2026

No description provided.

@fkondej fkondej self-assigned this Jan 9, 2026
@claude
Copy link
Contributor

claude bot commented Feb 10, 2026

Claude finished @canercidam's task in 4m 18s —— View job


PR Review: Add mkosi flashbots-images VM to playground

  • Read and analyze all changed files
  • Review Go source changes (components, recipes, manifest, artifacts)
  • Review shell scripts and config files
  • Post inline comments for issues found
  • Post summary review

Summary

This PR adds support for running a BuilderNet VM (via QEMU/mkosi) alongside the Docker-based playground. It includes Go changes to the core components/recipes and a set of shell scripts for VM lifecycle management.

Bugs

  1. Resource leak in registerBuilderresp.Body from http.Get(beaconApi) is never closed (playground/recipe_buildernet.go:147-154).

  2. Swallowed error from postRequest — The err return from the admin_nodeInfo call is ignored (playground/recipe_buildernet.go:159). If the HTTP call fails, the code proceeds to unmarshal nil data.

  3. Broken testTestRecipeBuilderHub_RegisterBuilder passes empty strings for beaconApi and rethApi, but registerBuilder now makes HTTP requests to those URLs as its first operations. This test will fail at runtime (playground/components_test.go:195).

  4. Port mismatch — The libp2p address uses hardcoded port 9001 (recipe_buildernet.go:156) but the beacon node's p2p port defaults to 9000 (components.go:544).

Regressions affecting existing users

  1. --target-peers change breaks non-VM setups — Changing from --target-peers 0 to --target-peers 1 unconditionally (components.go:551) means lighthouse will wait for a peer in single-node deployments, which blocks startup. The corresponding disabling of UseHealthmon for the beacon node removes health-check guarantees for all downstream DependsOnHealthy("beacon") dependencies. This was also raised in the prior review discussion — these changes should be gated behind a flag.

  2. Fileserver added unconditionally to L1 — Every L1 deployment now includes a Caddy container serving genesis files (recipe_l1.go:140), adding overhead for users who don't need VM support.

  3. ElectraForkEpoch = 0 forced unconditionallyartifacts.go:255 sets Electra fork at epoch 0 regardless of the applyLatestL1Fork flag, changing genesis behavior for all recipes.

Other issues

  1. No validation of parsed config — If a custom YAML config omits the playground.builder_hub_config section, all fields default to empty strings, silently registering a builder with empty ID/IP/network in builder-hub (components.go:1024-1025).

  2. Hardcoded 10.0.2.2 — The QEMU user-mode gateway IP is used in the default builderhub-config.yaml for relay URLs. Non-VM BuilderNet deployments that use the default config will get broken relay addresses (recipe_buildernet.go:17).

  3. Shell scriptssync.sh hardcodes the fryd/mkosi-playground branch; start.sh has ~20 lines of dead commented-out code that should be removed or tracked as an issue.

Recommendation

The main concern is that several changes affect all recipes (target-peers, healthmon disabling, Electra fork epoch, Fileserver) rather than being scoped to VM-enabled deployments. These should be behind a flag to avoid regressions for existing users.

Comment on lines +147 to +154
resp, err := http.Get(beaconApi)
if err != nil {
return fmt.Errorf("failed to get beacon node identity: %v", err)
}
var identityRespPayload identityResponse
if err := json.NewDecoder(resp.Body).Decode(&identityRespPayload); err != nil {
return fmt.Errorf("failed to decode identity resp payload: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: resp.Body is never closed. This leaks an HTTP connection. Add defer resp.Body.Close() after the nil-error check.

Suggested change
resp, err := http.Get(beaconApi)
if err != nil {
return fmt.Errorf("failed to get beacon node identity: %v", err)
}
var identityRespPayload identityResponse
if err := json.NewDecoder(resp.Body).Decode(&identityRespPayload); err != nil {
return fmt.Errorf("failed to decode identity resp payload: %v", err)
}
resp, err := http.Get(beaconApi)
if err != nil {
return fmt.Errorf("failed to get beacon node identity: %v", err)
}
defer resp.Body.Close()
var identityRespPayload identityResponse
if err := json.NewDecoder(resp.Body).Decode(&identityRespPayload); err != nil {
return fmt.Errorf("failed to decode identity resp payload: %v", err)
}

Comment on lines +159 to +167
respData, err := postRequest(rethApi, "/", map[string]interface{}{
"jsonrpc": "2.0",
"method": "admin_nodeInfo",
"id": 1,
})
var enodeRespPayload enodeResponse
if err := json.Unmarshal(respData, &enodeRespPayload); err != nil {
return fmt.Errorf("failed to decode enode resp payload: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The error from postRequest is silently ignored. If the admin_nodeInfo call fails, respData could be nil, causing json.Unmarshal to fail with a confusing error instead of the real HTTP error.

Suggested change
respData, err := postRequest(rethApi, "/", map[string]interface{}{
"jsonrpc": "2.0",
"method": "admin_nodeInfo",
"id": 1,
})
var enodeRespPayload enodeResponse
if err := json.Unmarshal(respData, &enodeRespPayload); err != nil {
return fmt.Errorf("failed to decode enode resp payload: %v", err)
}
respData, err := postRequest(rethApi, "/", map[string]interface{}{
"jsonrpc": "2.0",
"method": "admin_nodeInfo",
"id": 1,
})
if err != nil {
return fmt.Errorf("failed to get reth admin_nodeInfo: %v", err)
}
var enodeRespPayload enodeResponse
if err := json.Unmarshal(respData, &enodeRespPayload); err != nil {
return fmt.Errorf("failed to decode enode resp payload: %v", err)
}

Comment on lines 551 to 565
@@ -558,7 +561,8 @@ func (l *LighthouseBeaconNode) Apply(ctx *ExContext) *Component {
WithArtifact("/data/jwtsecret", "jwtsecret").
WithVolume("data", "/data_beacon")

UseHealthmon(component, svc, healthmonBeacon)
// TODO: Enable later - doesn't work with --target-peers=1 which is required for builder VM
//UseHealthmon(component, svc, healthmonBeacon)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing --target-peers from 0 to 1 and disabling the beacon healthmon (with the TODO comments at lines 564-565 and 642-643) is a significant regression for existing users who don't use the VM feature.

With --target-peers 1, lighthouse will wait for a peer before reporting healthy, which breaks single-node L1 and non-VM BuilderNet deployments. Disabling UseHealthmon for the beacon node means the DependsOnHealthy("beacon") calls (e.g., from contender) lose their safety guarantees.

This was also flagged in the existing review discussion. Consider making these changes conditional on a flag (e.g., --with-vm) rather than applying them unconditionally to all recipes.

flag "github.com/spf13/pflag"
)

const BuilderHostIPAddress = "10.0.2.2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IP 10.0.2.2 is the QEMU user-mode networking host gateway address. This only works with QEMU's user-mode NAT (-nic user). If this constant is used in the builder config that gets registered into builder-hub (e.g., builderhub-config.yaml line 44-50 use it for relay URLs), it means BuilderNet deployments without a VM will have broken relay URLs in the config. Consider making this configurable or only injecting it when VM mode is active.

Comment on lines +1024 to +1025
var config builderHubConfig
if err := yaml.Unmarshal([]byte(configYaml), &config); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaml.Unmarshal takes []byte, but defaultBuilderHubConfig is already []byte (from the //go:embed directive). The []byte(configYaml) cast is redundant but harmless. However, the real issue: if b.BuilderConfig is empty and the default config is used, the configYaml variable is []byte. But if a custom config is read via os.ReadFile, it's also []byte. This is fine—just noting the type is consistent.

More importantly: there is no validation that the parsed config.Playground.BuilderHubConfig fields are non-empty before using them. If someone provides a YAML file that doesn't include the playground.builder_hub_config section, all fields will silently be empty strings, which will create a builder with empty ID, empty IP, etc.

endpoint := fmt.Sprintf("http://localhost:%d", apiPort.HostPort)

err := registerBuilder(endpoint, &builderHubRegisterBuilderInput{
err := registerBuilder(endpoint, "", "", &builderHubRegisterBuilderInput{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing empty strings for beaconApi and rethApi will cause registerBuilder to make HTTP requests to empty URLs (/eth/v1/node/identity and /), which will fail. This test only works because the function hits the measurements endpoint first, and the previous test code didn't need beacon/reth APIs.

Now that registerBuilder fetches beacon identity and reth enode as its first operations, this test will fail at the http.Get(beaconApi) call. The test needs to either:

  1. Set up mock HTTP servers for those endpoints, or
  2. Skip the beacon/reth discovery when those URLs are empty


genesisTime := time.Now().Add(time.Duration(b.genesisDelay) * time.Second)
config := params.BeaconConfig()
config.ElectraForkEpoch = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line (config.ElectraForkEpoch = 0) forces the Electra fork to activate at epoch 0 unconditionally, regardless of the applyLatestL1Fork flag. This changes behavior for all recipes, not just BuilderNet. Was this intentional? It should likely be conditional on b.applyLatestL1Fork or removed if Electra is already the default.

return fmt.Errorf("failed to decode identity resp payload: %v", err)
}
peerID := identityRespPayload.Data.PeerID
libP2PAddr := fmt.Sprintf("/ip4/%s/tcp/9001/p2p/%s", BuilderHostIPAddress, peerID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CL P2P port 9001 is hardcoded here. The actual lighthouse beacon node uses {{Port "p2p" 9000}} (line 544 of components.go), which defaults to 9000, not 9001. This mismatch means the libp2p address registered in builder-hub will point to the wrong port.

If 9001 is the TCP port specifically for libp2p (distinct from the ENR port), this needs a comment explaining why it differs. Otherwise, this should match the p2p port from the beacon service.

auth_header = ""

disk_encryption:
key: 5d7052c0c3aff5834f45e3f33aca0a55ef9f43ca9cf6c5c8e8375ab82564ddb6 # playground value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is embedded via //go:embed into the binary and used as the default builder-hub configuration. It contains:

  • A BLS relay secret key (line 15)
  • A coinbase secret key (line 17)
  • A disk encryption key (line 54)

While these are clearly development/playground values, embedding secret keys in the binary (even for dev) means they'll persist in all builds. Consider adding a comment at the top of this file making it explicit that these are playground-only keys and must never be used in production.

Comment on lines +62 to +83

# TRIED TO DISABLE SERVICES - DID NOT WORK
# error:
# qemu-system-x86_64: -append only allowed with -kernel option

# PLAYGROUND_DISABLE_SERVICES=(
# reth-sync # Downloads Reth snapshot from S3 bucket
# acme-le # Issues Let's Encrypt TLS certificates
# acme-le-renewal # Renews Let's Encrypt certificates
# rbuilder-bidding-downloader # Downloads binary from private GitHub repo
# vector # Observability pipeline (logs/metrics)
# rbuilder-rebalancer # ETH balance rebalancing across wallets
# operator-api # Management API for node operators
# config-watchdog # Watches and reloads rbuilder config
# )

# mask_args() {
# [[ $# -gt 0 ]] && printf "systemd.mask=%s.service " "$@"
# }
# # # add argument to qemu-system-x86_64:
# # \
# # -append "console=ttyS0 $(mask_args "${PLAYGROUND_DISABLE_SERVICES[@]}")"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large block of dead code (commented-out service masking). This should either be implemented or removed. If it's a known limitation worth tracking, file an issue and reference it instead of leaving 20+ lines of commented-out code in a new file.


SSH_PORT=2222

ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -p ${SSH_PORT} bnet@localhost
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null suppresses all host key verification. This is acceptable for ephemeral dev VMs, but consider adding a comment explaining this is intentional for playground use only, so it doesn't get cargo-culted into production scripts.

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

FLASHBOTS_IMAGES_DIR="${SCRIPT_DIR}/.flashbots-images"
FLASHBOTS_IMAGES_BRANCH="fryd/mkosi-playground"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hardcodes the branch fryd/mkosi-playground. Once the flashbots-images work is merged to main, this will need updating. Consider making it a variable that defaults to main with an override, or at least add a TODO comment.

})
}

component.AddComponent(ctx, &Fileserver{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Fileserver is now unconditionally added to every L1 recipe deployment. This means even users who never use VMs will have a Caddy container running and serving genesis/testnet files on port 8100. This adds resource overhead and exposes configuration files on the network.

Consider making this conditional on a flag (e.g., --with-fileserver or --with-vm), or at minimum documenting that this is a new service users can expect.

@canercidam canercidam marked this pull request as ready for review February 11, 2026 22:44
@canercidam canercidam self-requested a review as a code owner February 11, 2026 22:44

svcManager.RunContenderIfEnabled()

svcManager.AddService(&Fileserver{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we move this to BuilderNetRecipe.Apply()? I guess we don't need it for the L1 recipe.

@canercidam canercidam merged commit 472899d into main Feb 11, 2026
56 of 59 checks passed
@canercidam canercidam deleted the fryd/mkosi-playground branch February 11, 2026 22:49
@canercidam canercidam restored the fryd/mkosi-playground branch February 11, 2026 23:20
canercidam added a commit that referenced this pull request Feb 11, 2026
canercidam added a commit that referenced this pull request Feb 11, 2026
canercidam added a commit that referenced this pull request Feb 11, 2026
This PR tries to unify the BuilderNet recipe by basing itself on Fryd's
PR (#302). It may contain changes from PRs awaiting to be merged and
will shrink in size as they are merged.

---------

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants