Skip to content

Migrate S3 service to RustFS supervised as a native binary#60

Merged
munezaclovis merged 21 commits intomainfrom
feat/rustfs-native-binary
Apr 14, 2026
Merged

Migrate S3 service to RustFS supervised as a native binary#60
munezaclovis merged 21 commits intomainfrom
feat/rustfs-native-binary

Conversation

@munezaclovis
Copy link
Copy Markdown
Contributor

Summary

Replaces the Docker-backed S3 service with RustFS supervised as a native binary by the pv daemon, and establishes the BinaryService + supervisor infrastructure that future services (mail, etc.) will reuse.

Implements the plan at docs/superpowers/plans/2026-04-14-rustfs-native-binary.md against the spec at docs/superpowers/specs/2026-04-14-rustfs-native-binary-design.md.

Architecture

  • New BinaryService interface (internal/services/binary.go) parallel to the existing Docker Service interface. RustFS registers itself as "s3" in binaryRegistry.
  • New supervisor package (internal/supervisor/) spawns/watches/restarts child processes. Crash-recovery budget: 5 restarts per 60s window. SIGTERM → SIGKILL on Stop.
  • ServerManager.Reconcile() gains a binary-service phase that diffs registry state against supervisor state. Service commands write to the registry and call server.SignalDaemon() — the daemon reconciles in place. No new IPC beyond the existing SIGHUP channel.
  • daemon-status.json is written by Reconcile and read by service:status / service:list so the CLI can report PID and running state without talking to the daemon directly.
  • Colima boot is no longer triggered when the registry contains only binary services.

Spec corrections verified during Task 1

Against rustfs 1.0.0-alpha.93:

  • Env vars are RUSTFS_ACCESS_KEY / RUSTFS_SECRET_KEY (the earlier RUSTFS_ROOT_USER / RUSTFS_ROOT_PASSWORD names are invalid).
  • --console-enable flag is required; without it port 9001 stays closed.
  • Linux assets carry a -gnu / -musl variant suffix (pv ships -gnu).
  • /releases/latest returns 404 because every release is marked prerelease; FetchLatestVersion for rustfs uses /releases?per_page=1 and picks [0].tag_name.

Spec amended in commit 38d35fe.

Key commits

  • 38d35fe Amend rustfs spec with verified CLI / asset facts
  • 1b2391e Add Rustfs binary descriptor
  • 839b01f Add ExtractZip + installRustfs
  • c614e9f Add Kind and Enabled fields to registry.ServiceInstance
  • 6632b4d Add BinaryService interface
  • dbd5efb Replace Docker S3 with RustFS BinaryService
  • e08cb6e Add supervisor package
  • 3c26550 Add supervisor-process builder + daemon-status.json
  • e8509f0 Extend ServerManager.Reconcile with binary-service phase
  • 60a041d Wire supervisor into daemon start + filter Colima boot
  • 72e40a8 Add service command kind dispatcher
  • 949f84c, 927893f, 86d226a, 44364db, f96bca9 Service CLI binary paths
  • 271cc29 Refresh binary-service binaries in pv update
  • 2ce7853 Add E2E phase for S3 binary service lifecycle
  • 3f5710b Fix supervisor Stop/watch race on cmd.Wait

Test plan

  • go test ./... -count=1 — full suite passes
  • go test ./internal/supervisor/ -race -count=1 — all supervisor tests clean under the race detector (fixed a cmd.Wait race in Stop + watch during final review)
  • go vet ./... clean
  • gofmt clean (the one outstanding file, internal/setup/shell.go, is pre-existing on main and not touched here)
  • E2E phase scripts/e2e/s3-binary.sh will run on CI (macos-26) — exercises service:addservice:stopservice:startservice:destroy against a real RustFS download.

Known follow-ups (not blockers)

  • cmd/update.go "updated binaries" hint names only binaryUpdated[0] as the stop/start example; if two binaries update it is misleading but harmless.
  • service:start / service:stop "act on all services" paths skip binary-kind entries correctly, but do not call SignalDaemon after the docker loop. In practice a binary-only reconcile can be forced by naming the service. Worth tightening when Mailpit lands.
  • TestDNSServer_Shutdown trips the race detector — pre-existing on main; out of scope here.

Verified 2026-04-14 against rustfs 1.0.0-alpha.93:
- Correct env vars are RUSTFS_ACCESS_KEY / RUSTFS_SECRET_KEY,
  not RUSTFS_ROOT_USER / RUSTFS_ROOT_PASSWORD.
- --console-enable is required to bind port 9001.
- Linux assets carry a -gnu / -musl variant suffix; pv uses -gnu.
- /releases/latest returns 404 because RustFS marks every release
  as a prerelease, so FetchLatestVersion for rustfs uses
  /releases?per_page=1 and parses [0].tag_name.
Mirror the Mago/Composer binary-descriptor pattern for RustFS.
Download URL + latest-version URL wired through manager.go.
Rustfs is not added to Tools() because it is a backing service,
not a user-facing CLI tool.

LatestVersionURL points at /releases?per_page=1 because RustFS
marks every release as a prerelease, making /releases/latest
return 404. FetchLatestVersion gains a rustfs branch that parses
the array response and picks [0].tag_name.

Linux assets carry a -gnu / -musl variant suffix; pv uses the
-gnu build for broadest compatibility.
ExtractZip mirrors ExtractTarGz for the zip format used by RustFS
releases. installRustfs follows the installMago pattern: download
the .zip, extract the binary, delete the archive, chmod 0755.
Wired into InstallBinary via a new rustfs case.
Kind is "docker" (default) or "binary". Enabled is only used for
binary services; nil means enabled=true so existing entries written
by older pv versions keep working unchanged. Image field gains
omitempty so binary entries do not emit an empty image string.
Parallel abstraction to the existing Docker Service interface.
Registry is populated by init() functions in per-service files;
for now it stays empty. LookupBinary / AllBinary match the
existing Service API surface.
RustFS registers itself as "s3" in the binary registry. The old
Docker-backed S3 implementation is removed — there is no longer a
docker path for s3. Available() now returns the union of Docker
and binary service names so CLI help messages stay correct.

Uses verified RUSTFS_ACCESS_KEY / RUSTFS_SECRET_KEY env vars and
--console-enable flag per rustfs 1.0.0-alpha.93 verification.
Supervisor spawns binaries, waits for ReadyCheck, restarts on crash
(with a 5-in-60s budget), and stops cleanly with SIGTERM->SIGKILL.
Tests exercise the happy path plus ready-timeout, StopAll, TCP
readiness, and log-file writes using /bin/sh as a stand-in binary.
buildSupervisorProcess translates a BinaryService into a
supervisor.Process by resolving paths via internal/config and
creating data + log directories. writeDaemonStatus captures the
supervisor snapshot for CLI readers; ReadDaemonStatus rejects
stale files when the recorded PID is dead.
ServerManager now owns a *supervisor.Supervisor alongside the
FrankenPHP instances. Reconcile runs the existing FrankenPHP phase
and then a new phase that diffs the binary registry against the
supervisor state, starting/stopping processes as needed.

NewServerManager signature changes to accept the supervisor.
Call site in process.go is updated with nil supervisor temporarily;
Task 9 wires in a real supervisor.
server.Start() creates a supervisor and hands it to
NewServerManager. Colima boot is now gated on the existence of
Docker-kind services so a registry with only binary services
does not trigger an unnecessary VM boot.

ServicesToRecover filters out Kind="binary" entries so the
colima-recovery code never tries to "dockerize" a binary
service.
resolveKind is the single place that decides whether a given service
name is docker- or binary-backed. It also rejects pre-existing
docker-shaped registry entries for names that are now binary services
so migrations do not silently overwrite state.
New addBinary function downloads the service's binary, registers
it with Kind=binary and Enabled=true, and signals the running
daemon to reconcile. When no daemon is running, the entry is
persisted and will be picked up by the next pv start.

Docker path extracted into addDocker so the RunE closure is now
a thin dispatcher keyed off resolveKind.
Set Enabled on the registry entry and SignalDaemon so the daemon's
reconcile loop spawns or stops the supervised process without
restarting FrankenPHP.

The 'all services' branch of each command now skips binary-kind
entries (the daemon owns their lifecycle via Reconcile).
remove unregisters and deletes the binary but keeps data. destroy
also removes config.ServiceDataDir. Both signal the daemon so the
supervised process is stopped.
service:status shows Kind/Registered/Enabled/Running/PID for
binary services. service:list merges docker and binary rows so
users see a unified view.
Binary services write stdout+stderr to ~/.pv/logs/<binary>.log via
the supervisor. service:logs dumps existing content and follows
appends, exiting on context cancellation.
After the tool loop, iterate registered binary services and
compare installed vs latest upstream version. Newer binaries are
downloaded; user is advised to cycle the service (or pv restart)
to load them since the running process keeps the old binary via
its open file descriptor.
Exercises service:add, service:stop, service:start, service:destroy
against a real RustFS download. Verifies the binary is written,
the daemon-status file advertises the supervised process, and
port 9000 is reachable / silent at the expected moments.
exec.Cmd.Wait is documented as safe to call only once; calling it from
both Stop and the watcher goroutine tripped the race detector on every
run. Each managed process now carries a done channel that the watcher
closes when it returns, and Stop awaits that channel instead of
spawning a second Wait goroutine.

Also absorbs a cosmetic gofmt realignment in internal/server/dns.go
picked up earlier in this branch (no functional change).
Critical:

- Supervisor race: Stop + watch both call cmd.Wait on the same exec.Cmd
  only one owner at a time, but concurrent Stop during respawn can orphan
  the newly-spawned child. Recheck m.stopped after spawn under the lock;
  SIGKILL newM and bail out if Stop was called while we were spawning.
  Added TestSupervisor_RestartsCrashedProcess and
  TestSupervisor_GivesUpAfterBudget, both pass under -race.

- cmd/update.go nil-panic: registry.Load and binaries.LoadVersions can
  return (nil, err); the old code discarded the error and would panic
  on reg.Services[name]. Handle errors explicitly and print a subtle
  warning instead of crashing.

- service:destroy swallowed "not registered": check reg.Services[name]
  up front and return a proper error instead of silently deleting the
  binary and data when the entry never existed.

- Reconcile Phase 2 error swallowed: binary-reconcile failures are now
  folded into the Reconcile return value alongside secondary-instance
  errors so service:add surfaces "RustFS failed to start" instead of
  reporting a false success.

- service:start/stop dishonest messaging: when SignalDaemon fails, the
  old code printed both "Could not signal daemon" and "daemon
  reconciled" on the same path. Now prints an accurate failure with a
  "run pv restart" hint.

- ReadyCheck invariant unenforced: buildReadyFunc silently treated a
  zero-value as "instantly ready". Now returns an error on zero-value
  and on both-set config; error propagates through buildSupervisorProcess.
  Added TestBuildReadyFunc_RejectsZeroValue / RejectsBothSet.

Important:

- Replaced Python3 dependency in manager_test.go with a compiled Go
  helper at internal/server/testdata/fakebinary/main.go. The helper is
  built once per test run and staged as the fake rustfs binary — no
  more hidden python3 requirement on CI runners or dev machines.

- addBinary now calls updateLinkedProjectsEnvBinary so adding s3 writes
  AWS_* keys into linked Laravel projects' .env files, matching what
  the docker path has always done. Added
  UpdateProjectEnvForBinaryService in internal/laravel/env.go and two
  tests.

Also strengthened TestBuildSupervisorProcess_RustFS to assert
--console-enable, --console-address :9001, and --address :9000 all end
up in the final supervisor.Process Args — catches regressions at the
wrapper layer even if the service's Args() stays correct.
@munezaclovis munezaclovis merged commit 41f81a6 into main Apr 14, 2026
1 check failed
@munezaclovis munezaclovis deleted the feat/rustfs-native-binary branch April 22, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant