Migrate S3 service to RustFS supervised as a native binary#60
Merged
munezaclovis merged 21 commits intomainfrom Apr 14, 2026
Merged
Migrate S3 service to RustFS supervised as a native binary#60munezaclovis merged 21 commits intomainfrom
munezaclovis merged 21 commits intomainfrom
Conversation
Verified 2026-04-14 against rustfs 1.0.0-alpha.93: - Correct env vars are RUSTFS_ACCESS_KEY / RUSTFS_SECRET_KEY, not RUSTFS_ROOT_USER / RUSTFS_ROOT_PASSWORD. - --console-enable is required to bind port 9001. - Linux assets carry a -gnu / -musl variant suffix; pv uses -gnu. - /releases/latest returns 404 because RustFS marks every release as a prerelease, so FetchLatestVersion for rustfs uses /releases?per_page=1 and parses [0].tag_name.
Mirror the Mago/Composer binary-descriptor pattern for RustFS. Download URL + latest-version URL wired through manager.go. Rustfs is not added to Tools() because it is a backing service, not a user-facing CLI tool. LatestVersionURL points at /releases?per_page=1 because RustFS marks every release as a prerelease, making /releases/latest return 404. FetchLatestVersion gains a rustfs branch that parses the array response and picks [0].tag_name. Linux assets carry a -gnu / -musl variant suffix; pv uses the -gnu build for broadest compatibility.
ExtractZip mirrors ExtractTarGz for the zip format used by RustFS releases. installRustfs follows the installMago pattern: download the .zip, extract the binary, delete the archive, chmod 0755. Wired into InstallBinary via a new rustfs case.
Kind is "docker" (default) or "binary". Enabled is only used for binary services; nil means enabled=true so existing entries written by older pv versions keep working unchanged. Image field gains omitempty so binary entries do not emit an empty image string.
Parallel abstraction to the existing Docker Service interface. Registry is populated by init() functions in per-service files; for now it stays empty. LookupBinary / AllBinary match the existing Service API surface.
RustFS registers itself as "s3" in the binary registry. The old Docker-backed S3 implementation is removed — there is no longer a docker path for s3. Available() now returns the union of Docker and binary service names so CLI help messages stay correct. Uses verified RUSTFS_ACCESS_KEY / RUSTFS_SECRET_KEY env vars and --console-enable flag per rustfs 1.0.0-alpha.93 verification.
Supervisor spawns binaries, waits for ReadyCheck, restarts on crash (with a 5-in-60s budget), and stops cleanly with SIGTERM->SIGKILL. Tests exercise the happy path plus ready-timeout, StopAll, TCP readiness, and log-file writes using /bin/sh as a stand-in binary.
buildSupervisorProcess translates a BinaryService into a supervisor.Process by resolving paths via internal/config and creating data + log directories. writeDaemonStatus captures the supervisor snapshot for CLI readers; ReadDaemonStatus rejects stale files when the recorded PID is dead.
ServerManager now owns a *supervisor.Supervisor alongside the FrankenPHP instances. Reconcile runs the existing FrankenPHP phase and then a new phase that diffs the binary registry against the supervisor state, starting/stopping processes as needed. NewServerManager signature changes to accept the supervisor. Call site in process.go is updated with nil supervisor temporarily; Task 9 wires in a real supervisor.
server.Start() creates a supervisor and hands it to NewServerManager. Colima boot is now gated on the existence of Docker-kind services so a registry with only binary services does not trigger an unnecessary VM boot. ServicesToRecover filters out Kind="binary" entries so the colima-recovery code never tries to "dockerize" a binary service.
resolveKind is the single place that decides whether a given service name is docker- or binary-backed. It also rejects pre-existing docker-shaped registry entries for names that are now binary services so migrations do not silently overwrite state.
New addBinary function downloads the service's binary, registers it with Kind=binary and Enabled=true, and signals the running daemon to reconcile. When no daemon is running, the entry is persisted and will be picked up by the next pv start. Docker path extracted into addDocker so the RunE closure is now a thin dispatcher keyed off resolveKind.
Set Enabled on the registry entry and SignalDaemon so the daemon's reconcile loop spawns or stops the supervised process without restarting FrankenPHP. The 'all services' branch of each command now skips binary-kind entries (the daemon owns their lifecycle via Reconcile).
remove unregisters and deletes the binary but keeps data. destroy also removes config.ServiceDataDir. Both signal the daemon so the supervised process is stopped.
service:status shows Kind/Registered/Enabled/Running/PID for binary services. service:list merges docker and binary rows so users see a unified view.
Binary services write stdout+stderr to ~/.pv/logs/<binary>.log via the supervisor. service:logs dumps existing content and follows appends, exiting on context cancellation.
After the tool loop, iterate registered binary services and compare installed vs latest upstream version. Newer binaries are downloaded; user is advised to cycle the service (or pv restart) to load them since the running process keeps the old binary via its open file descriptor.
Exercises service:add, service:stop, service:start, service:destroy against a real RustFS download. Verifies the binary is written, the daemon-status file advertises the supervised process, and port 9000 is reachable / silent at the expected moments.
exec.Cmd.Wait is documented as safe to call only once; calling it from both Stop and the watcher goroutine tripped the race detector on every run. Each managed process now carries a done channel that the watcher closes when it returns, and Stop awaits that channel instead of spawning a second Wait goroutine. Also absorbs a cosmetic gofmt realignment in internal/server/dns.go picked up earlier in this branch (no functional change).
Critical: - Supervisor race: Stop + watch both call cmd.Wait on the same exec.Cmd only one owner at a time, but concurrent Stop during respawn can orphan the newly-spawned child. Recheck m.stopped after spawn under the lock; SIGKILL newM and bail out if Stop was called while we were spawning. Added TestSupervisor_RestartsCrashedProcess and TestSupervisor_GivesUpAfterBudget, both pass under -race. - cmd/update.go nil-panic: registry.Load and binaries.LoadVersions can return (nil, err); the old code discarded the error and would panic on reg.Services[name]. Handle errors explicitly and print a subtle warning instead of crashing. - service:destroy swallowed "not registered": check reg.Services[name] up front and return a proper error instead of silently deleting the binary and data when the entry never existed. - Reconcile Phase 2 error swallowed: binary-reconcile failures are now folded into the Reconcile return value alongside secondary-instance errors so service:add surfaces "RustFS failed to start" instead of reporting a false success. - service:start/stop dishonest messaging: when SignalDaemon fails, the old code printed both "Could not signal daemon" and "daemon reconciled" on the same path. Now prints an accurate failure with a "run pv restart" hint. - ReadyCheck invariant unenforced: buildReadyFunc silently treated a zero-value as "instantly ready". Now returns an error on zero-value and on both-set config; error propagates through buildSupervisorProcess. Added TestBuildReadyFunc_RejectsZeroValue / RejectsBothSet. Important: - Replaced Python3 dependency in manager_test.go with a compiled Go helper at internal/server/testdata/fakebinary/main.go. The helper is built once per test run and staged as the fake rustfs binary — no more hidden python3 requirement on CI runners or dev machines. - addBinary now calls updateLinkedProjectsEnvBinary so adding s3 writes AWS_* keys into linked Laravel projects' .env files, matching what the docker path has always done. Added UpdateProjectEnvForBinaryService in internal/laravel/env.go and two tests. Also strengthened TestBuildSupervisorProcess_RustFS to assert --console-enable, --console-address :9001, and --address :9000 all end up in the final supervisor.Process Args — catches regressions at the wrapper layer even if the service's Args() stays correct.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the Docker-backed S3 service with RustFS supervised as a native binary by the pv daemon, and establishes the
BinaryService+supervisorinfrastructure that future services (mail, etc.) will reuse.Implements the plan at
docs/superpowers/plans/2026-04-14-rustfs-native-binary.mdagainst the spec atdocs/superpowers/specs/2026-04-14-rustfs-native-binary-design.md.Architecture
BinaryServiceinterface (internal/services/binary.go) parallel to the existing DockerServiceinterface.RustFSregisters itself as"s3"inbinaryRegistry.supervisorpackage (internal/supervisor/) spawns/watches/restarts child processes. Crash-recovery budget: 5 restarts per 60s window. SIGTERM → SIGKILL on Stop.ServerManager.Reconcile()gains a binary-service phase that diffs registry state against supervisor state. Service commands write to the registry and callserver.SignalDaemon()— the daemon reconciles in place. No new IPC beyond the existing SIGHUP channel.daemon-status.jsonis written byReconcileand read byservice:status/service:listso the CLI can report PID and running state without talking to the daemon directly.Spec corrections verified during Task 1
Against
rustfs 1.0.0-alpha.93:RUSTFS_ACCESS_KEY/RUSTFS_SECRET_KEY(the earlierRUSTFS_ROOT_USER/RUSTFS_ROOT_PASSWORDnames are invalid).--console-enableflag is required; without it port 9001 stays closed.-gnu/-muslvariant suffix (pv ships-gnu)./releases/latestreturns 404 because every release is marked prerelease;FetchLatestVersionfor rustfs uses/releases?per_page=1and picks[0].tag_name.Spec amended in commit
38d35fe.Key commits
38d35feAmend rustfs spec with verified CLI / asset facts1b2391eAdd Rustfs binary descriptor839b01fAdd ExtractZip + installRustfsc614e9fAdd Kind and Enabled fields to registry.ServiceInstance6632b4dAdd BinaryService interfacedbd5efbReplace Docker S3 with RustFS BinaryServicee08cb6eAdd supervisor package3c26550Add supervisor-process builder + daemon-status.jsone8509f0Extend ServerManager.Reconcile with binary-service phase60a041dWire supervisor into daemon start + filter Colima boot72e40a8Add service command kind dispatcher949f84c,927893f,86d226a,44364db,f96bca9Service CLI binary paths271cc29Refresh binary-service binaries in pv update2ce7853Add E2E phase for S3 binary service lifecycle3f5710bFix supervisor Stop/watch race on cmd.WaitTest plan
go test ./... -count=1— full suite passesgo test ./internal/supervisor/ -race -count=1— all supervisor tests clean under the race detector (fixed acmd.Waitrace inStop+watchduring final review)go vet ./...cleangofmtclean (the one outstanding file,internal/setup/shell.go, is pre-existing on main and not touched here)scripts/e2e/s3-binary.shwill run on CI (macos-26) — exercisesservice:add→service:stop→service:start→service:destroyagainst a real RustFS download.Known follow-ups (not blockers)
cmd/update.go"updated binaries" hint names onlybinaryUpdated[0]as the stop/start example; if two binaries update it is misleading but harmless.service:start/service:stop"act on all services" paths skip binary-kind entries correctly, but do not callSignalDaemonafter the docker loop. In practice a binary-only reconcile can be forced by naming the service. Worth tightening when Mailpit lands.TestDNSServer_Shutdowntrips the race detector — pre-existing on main; out of scope here.