Skip to content

feat(agent): Improve nginx vhost generation, CLI help, and service discovery#160

Merged
nfebe merged 8 commits into
mainfrom
feat/albacore-quick-wins
Jun 28, 2026
Merged

feat(agent): Improve nginx vhost generation, CLI help, and service discovery#160
nfebe merged 8 commits into
mainfrom
feat/albacore-quick-wins

Conversation

@nfebe

@nfebe nfebe commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

The simplest, self-contained items from the Albacore umbrella (#158), one commit each.

Fixes

CLI (#130)

cmd/agent now uses cobra. Top-level help lists every subcommand (serve, setup, update, version), which the hand-rolled flag usage never did. A bare flatrun-agent prints that help instead of trying to start the daemon and failing on a missing default config. Passing --config (or the new serve command) still starts the agent, so the systemd launch (flatrun-agent --config ...) is unchanged and the installer needs no update. cobra was already in the module graph, so this adds no new external dependency.

nginx vhost generation (#156)

  • WebSocket proxy read/send timeout is configurable per domain (proxy_timeout), defaulting to 60s, so long-lived sockets aren't closed after 60s idle.
  • Connection: upgrade is sent only when a client requests an upgrade, via a managed map snippet, instead of unconditionally on every request. The snippet is written into the conf.d include dir so existing installs pick it up without re-applying the base config.
  • ssl_stapling is emitted only when the certificate advertises an OCSP responder, removing the per-reload warning on certs without one. It stays enabled when the cert can't be read.
  • Proxy setup probes each target through the same path nginx resolves and surfaces a non-fatal warning when a service or port isn't reachable, instead of leaving a silently dead route. The probe stays silent when it genuinely can't check (no probe tool, container down).

Closes #110, #111, #112, #130, #156.

nfebe added 3 commits June 28, 2026 13:08
Compose updates synced service metadata to disk but discarded any error
from the save, so the compose file could update while service metadata
silently went stale, leaving the two inconsistent. The save error is now
surfaced and the update fails instead of diverging.
In a multi-service deployment the primary service was chosen by Go map
iteration order, which is random, so the selected service could vary
between runs and pick an unintended one. A service named app or web that
exposes ports is now preferred, falling back to the previous first-with-
ports behaviour, making the choice deterministic.
ACME challenge requests, especially 404s for challenges already cleaned
up, flooded the nginx access and error logs and buried real errors.
Logging is now turned off on the challenge locations so the noise no
longer obscures genuine problems.
@sourceant

sourceant Bot commented Jun 28, 2026

Copy link
Copy Markdown

Code Review Summary

This PR improves the agent's robustness and CLI usability. Key changes include a move to Cobra, deterministic service discovery, and much-improved Nginx vhost generation (WebSocket support, OCSP stapling logic, and service probing).

🚀 Key Improvements

  • Deterministic primary service detection in internal/docker/discovery.go removes randomness in multi-service deployments.
  • Conditional Connection: upgrade header via Nginx maps prevents breaking non-WebSocket traffic that might be sensitive to the Connection header.
  • OCSP stapling is now dynamically enabled only when supported by the cert, silencing unnecessary Nginx warnings.

💡 Minor Suggestions

  • Standardize directory permissions constants.
  • Improve error handling in the discovery loop to avoid total sync failure on single-service metadata issues.

@sourceant sourceant Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review complete. See the overview comment for a summary.

existing.Networking.Service = newMeta.Networking.Service
}
d.SaveMetadata(name, existing)
if err := d.SaveMetadata(name, existing); err != nil {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Propagating this error is a good safety improvement. However, since this typically runs inside a discovery loop, consider if a failure to save one service's metadata should crash the entire sync or just be logged as a warning to allow other services to be discovered.

Suggested change
if err := d.SaveMetadata(name, existing); err != nil {
if err := d.SaveMetadata(name, existing); err != nil {
log.Printf("error: failed to save metadata for %s: %v", name, err)
continue
}

Comment thread internal/nginx/manager.go
return false, false
}

probe := fmt.Sprintf("nc -z -w2 %s %d", service, port)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using nc -z is efficient for probing. However, if the nginx container is heavily hardened (e.g., using a distroless or minimal alpine image), nc might not be present. While interpretProbe handles the missing tool, you might consider using /dev/tcp if the shell is bash for broader compatibility in some environments.

Suggested change
probe := fmt.Sprintf("nc -z -w2 %s %d", service, port)
probe := fmt.Sprintf("timeout 2 sh -c 'cat < /dev/null > /dev/tcp/%s/%d'", service, port)

nfebe added 5 commits June 28, 2026 14:38
Top-level help only showed the root flags, so the update, setup, and
version subcommands were undiscoverable, and a bare invocation tried to
start the server and failed on a missing default config. The CLI now uses
cobra: help lists every subcommand, a bare invocation prints that help
instead of failing, and passing a config (or the serve command) still
starts the agent. The single-dash -config and -version forms keep working,
so existing launch commands are unaffected.
Generated vhosts pinned the proxy read and send timeout at 60s, which is
too short for long-lived WebSocket connections: an idle socket with no
frames for 60s was closed, causing reconnect loops on quiet connections.
A domain can now set its own timeout (defaulting to 60s), so WebSocket
routes can hold connections open.
Every generated location sent Connection: upgrade on every request, even
on vhosts proxying plain HTTP, which disabled upstream keep-alive and
sent a stray header. The header is now driven by a managed map so it is
only sent when a client actually requests a WebSocket upgrade; ordinary
HTTP requests keep their connection reusable.
ssl_stapling was emitted for every SSL vhost, so certificates without an
OCSP responder logged a warning on every config test and reload, burying
real errors in the output. Stapling is now enabled only when the domain's
certificate actually advertises a responder, and stays on when the cert
cannot be read so behaviour is unchanged on error.
A domain could be mapped to a service or port where nothing listens, and
the vhost was generated and reloaded with no signal, leaving a silently
dead route. Proxy setup now probes each target through the same path the
proxy resolves and surfaces a warning when it is unreachable, without
failing the setup or warning when the target simply can't be probed.
@nfebe nfebe force-pushed the feat/albacore-quick-wins branch from 9ba3ffa to b05710c Compare June 28, 2026 13:39

@sourceant sourceant Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review complete. No specific code suggestions were generated. See the overview comment for a summary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

enh: Propagate SaveMetadata errors during compose updates

1 participant