Skip to content

Implement RFD-72: Ephemeral deployments#745

Merged
evanphx merged 9 commits into
mainfrom
mir-992-implement-rfd-72
Apr 24, 2026
Merged

Implement RFD-72: Ephemeral deployments#745
evanphx merged 9 commits into
mainfrom
mir-992-implement-rfd-72

Conversation

@evanphx
Copy link
Copy Markdown
Contributor

@evanphx evanphx commented Apr 9, 2026

Summary

Implements RFD-72: Ephemeral Deployments — app versions tagged with a label and TTL that are accessed via subdomain routing without affecting the active version.

  • miren deploy --ephemeral <label> --ttl <duration> creates an ephemeral version accessible at <label>.app.example.com
  • miren app versions lists all versions with status (active/ephemeral/replaced), supports --ephemeral filter and --format json
  • Ephemeral subdomains work with normal routes — no wildcard route entity required (just wildcard DNS)
  • Deploying with an existing label replaces the old version; per-app limit of 10 with nearest-to-expiry eviction
  • GC controller automatically cleans up expired versions and their sandbox pools every 5 minutes
  • Activator needs zero changes — ingress resolves the ephemeral AppVersion and passes it directly to AcquireLease

Key design decisions

  • Ephemeral deploys skip deployment record creation and locking entirely — they're independent of the normal deployment lifecycle
  • Ingress detects ephemeral subdomains by stripping the first DNS label and matching the remainder against known routes (lookupEphemeralRoute)
  • Server-side label validation ensures only DNS-safe labels are stored, even from direct API callers
  • getAccessInfo resolves ephemeral hostnames server-side so the CLI displays the correct URL

Test plan

  • Unit tests: label normalization (10 cases), validation, manage functions (lookup, replace, enforce limit, delete expired), GC controller (5 tests), deployment server ephemeral path (3 tests), subdomain extraction (7 cases)
  • Blackbox test: full e2e — deploy app, deploy ephemeral, verify active version unchanged, verify app versions output, verify HTTP activation via subdomain, verify replace-on-same-label
  • Lint clean, all existing tests pass
  • CI

@evanphx evanphx requested a review from a team as a code owner April 9, 2026 20:23
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

RPCs for build and deploy now accept ephemeral_label and ephemeral_ttl. The AppVersion schema and model gained EphemeralLabel, EphemeralTtl, and EphemeralExpiresAt. A new pkg/ephemeral provides label normalization, lookup, replacement, limit enforcement, and deletion. A controllers/ephemeral GC removes expired ephemeral versions. Build and deploy servers record ephemeral metadata and skip activation for ephemeral versions. Ingress and access resolution support ephemeral subdomains. The CLI adds miren app versions and deploy flags --ephemeral/--ttl. Tests and documentation were added or updated.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/deployment/rpc.yml (1)

174-212: ⚠️ Potential issue | 🟡 Minor

Update the DeployVersion contract for ephemeral mode.

The method doc and deployment result doc still say this call creates a deployment record and activates the version. With ephemeral_label, that is no longer true, so the generated API contract is wrong for the new path.

Suggested doc fix
-        doc: Deploy an existing app version (used for rollback and redeploy). Server handles everything — creates deployment record, activates version, marks previous deployment.
+        doc: Deploy an existing app version (used for rollback and redeploy). Normal deploys create/update deployment records and activate the version; ephemeral deploys skip activation and deployment-record creation.

@@
-          - name: deployment
-            type: DeploymentInfo
-            doc: The newly created deployment record
+          - name: deployment
+            type: DeploymentInfo
+            doc: Deployment info for the requested version. For ephemeral deploys this may be non-persisted/synthetic because no deployment record is created.
As per coding guidelines, "RPC interfaces defined in schemas are generated into implementations using `pkg/rpc/cmd/rpcgen`."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/deployment/rpc.yml` around lines 174 - 212, The DeployVersion RPC
contract needs its docs updated to reflect ephemeral mode: change the method
description for DeployVersion to state that when ephemeral_label is set the
server creates a transient/ephemeral deployment record and does NOT activate the
version (and uses ephemeral_ttl default "24h" if not provided), and update the
deployment result doc to clarify that the returned deployment may be
ephemeral/non-activated when ephemeral_label is present; keep other parameter
docs (is_rollback, env_vars, ephemeral_ttl) but ensure ephemeral_label and
ephemeral_ttl behavior is explicitly described and that lock_info/access_info
remain accurate for both normal and ephemeral flows.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/commands/deploy.go`:
- Around line 200-212: The current ephemeral branch prefixes ephemeralLabel to
each value from result.AccessInfo().Hostnames(), producing duplicates; instead,
when result.AccessInfo().HasHostnames() use the hostnames exactly as returned
(print "https://<hostname>") and do not prefix ephemeralLabel, and if
Hostnames() is empty but AccessInfo().ClusterHostname() is present, fall back to
constructing the URL as "https://<ephemeralLabel>.<clusterHostname>". Update the
block around result.AccessInfo(), Hostnames(), and ClusterHostname() (where
ephemeralLabel, ephemeralTTL, versionDisplay and ctx.Printf are used) to
implement this behavior.

In `@pkg/ephemeral/manage.go`:
- Around line 130-155: The deleteEphemeralVersion function currently proceeds to
delete the AppVersion even if listing or deleting sandbox pools failed; change
it to abort and return an error when pool cleanup fails. Specifically, in
deleteEphemeralVersion, after calling eac.List(entity.Ref(...)) return a
descriptive error instead of only logging when List fails, and inside the loop
over poolResp.Values() return an error (propagating the underlying err) if any
eac.Delete(poolID) fails rather than continuing; keep the existing logs for
context and only call eac.Delete for the AppVersion and log.Info("Deleted
ephemeral version", ...) when all pool operations succeeded.

In `@servers/build/build.go`:
- Around line 1449-1459: After parsing ephemeral.ttl with time.ParseDuration in
the isEphemeral branch, validate that ttlDuration is strictly positive; if
ttlDuration <= 0 return an error (e.g. "invalid ephemeral TTL %q: must be > 0")
instead of proceeding. Keep the existing ephemeralx.ValidateLabel check and only
set mrv.EphemeralLabel, mrv.EphemeralTtl and mrv.EphemeralExpiresAt after the
TTL passes this positive-duration validation so expired/zero TTLs are rejected
server-side.
- Around line 1461-1469: The calls to ephemeralx.ReplaceExisting and
ephemeralx.EnforceLimit currently only log warnings and allow the deploy to
continue; instead, fail the deploy when either returns an error by
propagating/returning a wrapped error from the surrounding function (do not
swallow it). Locate the ReplaceExisting and EnforceLimit calls (they use
ephemeralx.ReplaceExisting(ctx, b.ec.EAC(), appRec.ID, ephemeral.label, b.Log)
and ephemeralx.EnforceLimit(ctx, b.ec.EAC(), appRec.ID,
ephemeralx.DefaultMaxEphemeral, b.Log)) and change the handling so that on
non-nil err you return fmt.Errorf (or errors.Wrap) with contextual text like
"replace existing ephemeral failed" or "enforce ephemeral limit failed"
including err, rather than calling b.Log.Warn and continuing. Ensure callers
handle the returned error path so the deploy aborts.

In `@servers/deployment/server.go`:
- Around line 821-828: The code currently ignores errors from d.EAC.Get and
proceeds as if the revision was set; modify the block handling the Get in the
deployment flow so that if getErr != nil you log the error (use d.Log.Error with
the error and context including appVersion.ID), call results.SetError with a
descriptive message (e.g., "failed to read current version for ephemeral
update") and return the error/early return instead of continuing; keep the
existing SetRevision/Put logic when Get succeeds (i.e., in the branch where
getErr == nil call updateEntity.SetRevision(current.Entity().Revision()) and
d.EAC.Put and handle putErr as already done).

---

Outside diff comments:
In `@api/deployment/rpc.yml`:
- Around line 174-212: The DeployVersion RPC contract needs its docs updated to
reflect ephemeral mode: change the method description for DeployVersion to state
that when ephemeral_label is set the server creates a transient/ephemeral
deployment record and does NOT activate the version (and uses ephemeral_ttl
default "24h" if not provided), and update the deployment result doc to clarify
that the returned deployment may be ephemeral/non-activated when ephemeral_label
is present; keep other parameter docs (is_rollback, env_vars, ephemeral_ttl) but
ensure ephemeral_label and ephemeral_ttl behavior is explicitly described and
that lock_info/access_info remain accurate for both normal and ephemeral flows.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5c0ac918-207d-42e9-bc9f-a0d509456bda

📥 Commits

Reviewing files that changed from the base of the PR and between f7a0ddf and c2b04f1.

📒 Files selected for processing (24)
  • api/build/build_v1alpha/rpc.gen.go
  • api/build/rpc.yml
  • api/core/core_v1alpha/schema.gen.go
  • api/core/schema.yml
  • api/deployment/deployment_v1alpha/rpc.gen.go
  • api/deployment/rpc.yml
  • api/ingress/client.go
  • api/ingress/client_test.go
  • blackbox/ephemeral_test.go
  • cli/commands/app_rollback.go
  • cli/commands/app_versions.go
  • cli/commands/commands.go
  • cli/commands/deploy.go
  • components/coordinate/coordinate.go
  • controllers/ephemeral/gc.go
  • controllers/ephemeral/gc_test.go
  • pkg/ephemeral/label.go
  • pkg/ephemeral/label_test.go
  • pkg/ephemeral/manage.go
  • pkg/ephemeral/manage_test.go
  • servers/build/build.go
  • servers/deployment/server.go
  • servers/deployment/server_test.go
  • servers/httpingress/httpingress.go

Comment thread cli/commands/deploy.go
Comment thread pkg/ephemeral/manage.go
Comment thread servers/build/build.go
Comment thread servers/build/build.go
Comment thread servers/deployment/server.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
cli/commands/deploy.go (2)

205-215: Possible duplicate URL display when both Hostnames and ClusterHostname are present.

If the server returns both resolved hostnames (already containing the ephemeral label) and a ClusterHostname, the CLI will print both:

  • The resolved hostnames from info.Hostnames() (correct)
  • A fallback URL constructed as ephemeralLabel.ClusterHostname (line 213)

When routes exist, the server-resolved hostnames are authoritative. The ClusterHostname fallback should only display when no hostnames are returned. Consider guarding the ClusterHostname display:

♻️ Proposed fix to avoid redundant URL display
 				if result.HasAccessInfo() && result.AccessInfo() != nil {
 					info := result.AccessInfo()
 					if info.HasHostnames() {
 						for _, h := range *info.Hostnames() {
 							ctx.Printf("  URL:   https://%s\n", h)
 						}
-					}
-					if info.HasClusterHostname() && info.ClusterHostname() != "" {
+					} else if info.HasClusterHostname() && info.ClusterHostname() != "" {
+						// Fallback: construct ephemeral URL from cluster hostname when no routes exist
 						ctx.Printf("  URL:   https://%s.%s\n", ephemeralLabel, info.ClusterHostname())
 					}
 				}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/commands/deploy.go` around lines 205 - 215, The CLI currently prints both
resolved hostnames and a constructed cluster fallback URL; update the logic in
the access-info block (use result.HasAccessInfo(), result.AccessInfo(),
info.HasHostnames(), info.Hostnames(), info.HasClusterHostname(),
info.ClusterHostname(), ephemeralLabel, and ctx.Printf) so that the
ClusterHostname fallback is only printed when there are no hostnames returned by
info.Hostnames() (i.e., if info.HasHostnames() is false or info.Hostnames() is
empty), otherwise skip the ClusterHostname print to avoid duplicate URLs. Ensure
you keep existing nil/Has* checks and only fall back to printing the
ephemeralLabel.ClusterHostname when hostnames are absent.

875-885: Same redundant URL display concern in build path.

This block has the same pattern as lines 205-215 where both resolved hostnames and ClusterHostname-based URLs may be displayed simultaneously. Consider applying the same fix to use ClusterHostname only as a fallback.

♻️ Proposed fix
 		if results.HasAccessInfo() && results.AccessInfo() != nil {
 			info := results.AccessInfo()
 			if info.HasHostnames() {
 				for _, h := range *info.Hostnames() {
 					ctx.Printf("  URL:   https://%s\n", h)
 				}
-			}
-			if info.HasClusterHostname() && info.ClusterHostname() != "" {
+			} else if info.HasClusterHostname() && info.ClusterHostname() != "" {
 				ctx.Printf("  URL:   https://%s.%s\n", ephemeralLabel, info.ClusterHostname())
 			}
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/commands/deploy.go` around lines 875 - 885, The URL printing currently
shows both resolved hostnames and a ClusterHostname-based URL which can
duplicate outputs; update the block that uses
results.HasAccessInfo()/results.AccessInfo() and
info.HasHostnames()/info.HasClusterHostname() so that you prefer printing
hostnames from info.Hostnames() when present, and only use ctx.Printf("  URL:  
https://%s.%s\n", ephemeralLabel, info.ClusterHostname()) as a fallback when
info.HasHostnames() is false or the hostnames slice is empty (also ensure
ClusterHostname is non-empty); keep the existing ctx.Printf usage and
ephemeralLabel reference but guard the ClusterHostname path so it does not print
alongside explicit hostnames.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cli/commands/deploy.go`:
- Around line 205-215: The CLI currently prints both resolved hostnames and a
constructed cluster fallback URL; update the logic in the access-info block (use
result.HasAccessInfo(), result.AccessInfo(), info.HasHostnames(),
info.Hostnames(), info.HasClusterHostname(), info.ClusterHostname(),
ephemeralLabel, and ctx.Printf) so that the ClusterHostname fallback is only
printed when there are no hostnames returned by info.Hostnames() (i.e., if
info.HasHostnames() is false or info.Hostnames() is empty), otherwise skip the
ClusterHostname print to avoid duplicate URLs. Ensure you keep existing nil/Has*
checks and only fall back to printing the ephemeralLabel.ClusterHostname when
hostnames are absent.
- Around line 875-885: The URL printing currently shows both resolved hostnames
and a ClusterHostname-based URL which can duplicate outputs; update the block
that uses results.HasAccessInfo()/results.AccessInfo() and
info.HasHostnames()/info.HasClusterHostname() so that you prefer printing
hostnames from info.Hostnames() when present, and only use ctx.Printf("  URL:  
https://%s.%s\n", ephemeralLabel, info.ClusterHostname()) as a fallback when
info.HasHostnames() is false or the hostnames slice is empty (also ensure
ClusterHostname is non-empty); keep the existing ctx.Printf usage and
ephemeralLabel reference but guard the ClusterHostname path so it does not print
alongside explicit hostnames.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 39bdc33a-f758-4ca5-b7dd-847cfc156e72

📥 Commits

Reviewing files that changed from the base of the PR and between c2b04f1 and 1f2815f.

📒 Files selected for processing (9)
  • cli/commands/deploy.go
  • docs/command-sidebar.json
  • docs/docs/command/app-versions.md
  • docs/docs/command/app.md
  • docs/docs/command/deploy.md
  • docs/docs/commands.md
  • pkg/ephemeral/manage.go
  • servers/build/build.go
  • servers/deployment/server.go
✅ Files skipped from review due to trivial changes (5)
  • docs/docs/command/app.md
  • docs/command-sidebar.json
  • docs/docs/commands.md
  • docs/docs/command/app-versions.md
  • docs/docs/command/deploy.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/ephemeral/manage.go

Copy link
Copy Markdown
Contributor

@phinze phinze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clegon and Phenkman here. We spent some quality time with these ephemeral apparitions, tracking their movements from manifestation (deploy) through haunting (ingress routing) to exorcism (GC). The spectral containment system is well-built. A few field notes from our investigation.

CodeRabbit's earlier sightings (pool cleanup ordering, TTL validation, error propagation) were all banished in follow-up commits. Our notes focus on phenomena the automated detectors missed.

👻 Who you gonna call? miren deploy --ephemeral

Comment thread controllers/ephemeral/gc.go
Comment thread servers/httpingress/httpingress.go
Comment thread servers/httpingress/httpingress.go
Comment thread pkg/ephemeral/manage.go
"miren.dev/runtime/pkg/entity"
)

const DefaultMaxEphemeral = 10
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package is part ghost taxonomy (label normalization, pure strings) and part ghost wrangling (entity CRUD with an injected client). Slightly split personality, but not worth restructuring now. MIR-441 will naturally clarify its identity by pulling the orchestration into saga actions.

@evanphx evanphx force-pushed the mir-992-implement-rfd-72 branch from b6115f6 to 73c3452 Compare April 15, 2026 18:32
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/ephemeral/manage.go`:
- Around line 145-152: The loop currently deletes every pool via eac.Delete(ctx,
poolID) which will remove pools still referenced by other versions; change it to
first inspect the pool's referenced list (e.g., p.ReferencedByVersions() or
similar) and only call eac.Delete when that list contains only this versionID;
if the pool is referenced by multiple versions, remove this versionID from the
pool's ReferencedByVersions and persist that change via the appropriate API
(e.g., an update/patch call on the pool) instead of deleting; keep using
pool.Id(), eac.Delete, and versionID as the identifying symbols when
implementing the conditional delete vs. reference-removal flow.
- Around line 186-191: The current loop in manage.go returns nil when it
encounters an expired ephemeral version (checking av.EphemeralExpiresAt against
now), which lets an expired row shadow a live one; change the logic inside the
loop that currently does "return nil, nil" for expired matches to instead skip
that record and continue scanning (e.g., use continue) so the loop can still
find and return a live &av later; ensure the function still returns nil only if
no valid non-expired match is found after the loop completes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b3e507c-dad3-45af-a6cf-dfc613c8ac67

📥 Commits

Reviewing files that changed from the base of the PR and between 1f2815f and b6115f6.

📒 Files selected for processing (2)
  • pkg/ephemeral/manage.go
  • pkg/ephemeral/manage_test.go

Comment thread pkg/ephemeral/manage.go
Comment thread pkg/ephemeral/manage.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
servers/httpingress/httpingress.go (1)

526-542: ⚠️ Potential issue | 🟠 Major

Surface lookupEphemeralRoute failures instead of falling through.

If lookupEphemeralRoute returns an error here, the else if condition drops it and continues into the default-route/404 branch. That can misroute an ephemeral hostname to the default app instead of returning a 500.

🩹 Suggested handling
-	} else if label, baseRoute, err := h.lookupEphemeralRoute(ctx, onlyHost); err == nil && baseRoute != nil {
-		// No exact or wildcard match, but stripping the first subdomain label
-		// matched an existing route — this is an ephemeral subdomain request.
-		route = baseRoute
-		targetAppId = baseRoute.App
-		routeType = "route"
-		ephemeralLabel = label
-		h.Log.Debug("detected ephemeral subdomain", "host", onlyHost, "label", label, "app", targetAppId)
+	} else {
+		label, baseRoute, err := h.lookupEphemeralRoute(ctx, onlyHost)
+		if err != nil {
+			h.Log.Error("error looking up ephemeral route", "error", err, "host", onlyHost)
+			http.Error(w, fmt.Sprintf("error looking up http route: %s", onlyHost), http.StatusInternalServerError)
+			return
+		}
+		if baseRoute != nil {
+			// No exact or wildcard match, but stripping the first subdomain label
+			// matched an existing route — this is an ephemeral subdomain request.
+			route = baseRoute
+			targetAppId = baseRoute.App
+			routeType = "route"
+			ephemeralLabel = label
+			h.Log.Debug("detected ephemeral subdomain", "host", onlyHost, "label", label, "app", targetAppId)
 
-		if !entity.Empty(baseRoute.OidcProvider) {
-			oidcWrapped := h.oidcMiddleware(route, func(w http.ResponseWriter, r *http.Request) {
-				h.serveAuthenticatedRequest(w, r, targetAppId, routeType, ephemeralLabel, appName)
-			})
-			oidcWrapped(w, req)
-			return
-		}
-	} else {
+			if !entity.Empty(baseRoute.OidcProvider) {
+				oidcWrapped := h.oidcMiddleware(route, func(w http.ResponseWriter, r *http.Request) {
+					h.serveAuthenticatedRequest(w, r, targetAppId, routeType, ephemeralLabel, appName)
+				})
+				oidcWrapped(w, req)
+				return
+			}
+		} else {
 		// No route found, try to find a default route
 		h.Log.Debug("no http route found, checking for default route", "host", onlyHost)
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@servers/httpingress/httpingress.go` around lines 526 - 542, Call
h.lookupEphemeralRoute(ctx, onlyHost) separately and if it returns a non-nil
error surface it (log and write HTTP 500 / return) instead of letting the
else-if drop the error; only proceed to set route, targetAppId, routeType,
ephemeralLabel and run oidcMiddleware/serveAuthenticatedRequest when err == nil
and baseRoute != nil. Ensure you reference the same variables used now
(lookupEphemeralRoute, baseRoute, route, targetAppId, oidcMiddleware,
serveAuthenticatedRequest) and return immediately after writing the 500 to avoid
falling through to the default-route/404 branch.
♻️ Duplicate comments (4)
cli/commands/deploy.go (1)

200-215: ⚠️ Potential issue | 🟠 Major

Use the server-resolved ephemeral hostnames as-is here.

Both branches already have authoritative URLs in AccessInfo.Hostnames(). Appending https://<label>.<clusterHostname> unconditionally can invent an invalid URL for apps that only have explicit host routes, and in the build path Line 925 then prints a second access block on top of this one. Please centralize this rendering and only fall back to <label>.<clusterHostname> when no hostnames were returned for a default-route case.

Also applies to: 867-885

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/commands/deploy.go` around lines 200 - 215, The code prints an extra
constructed URL using ephemeralLabel and AccessInfo.ClusterHostname() even when
the server already returned authoritative hostnames via
result.AccessInfo().Hostnames(), causing duplicate/invalid URLs; update the
rendering in the ephemeral branch (the block that checks ephemeralLabel,
ephemeralTTL and result.HasAccessInfo()) to: prefer and print each hostname from
result.AccessInfo().Hostnames() exactly as returned, and only if no hostnames
are present but AccessInfo().ClusterHostname() is non-empty, then print the
fallback https://<ephemeralLabel>.<clusterHostname> URL; centralize this logic
so both places (the ephemeral branch and the other access-printing block) use
the same rule and avoid printing duplicate access blocks.
servers/httpingress/httpingress.go (1)

503-514: ⚠️ Potential issue | 🟠 Major

Don't infer ephemeral labels from matched wildcard routes.

Line 509 makes every matched *.host route behave like an ephemeral namespace. If no ephemeral version exists for that first label, normal wildcard traffic now 404s instead of reaching the route's active version. The strip-first-label fallback below already implements the intended “normal route + wildcard DNS” behavior, so this extra extraction regresses existing wildcard routes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@servers/httpingress/httpingress.go` around lines 503 - 514, The code is
incorrectly treating any matched wildcard route as an ephemeral subdomain by
calling ingress.ExtractSubdomainLabel and setting ephemeralLabel (in the route
handling block for variable route and onlyHost); remove that behavior so matched
wildcard routes are treated as normal routes: delete or skip the ephemeralLabel
extraction and associated log branch inside the route != nil block (the lines
using ephemeralLabel, ingress.ExtractSubdomainLabel, and the Debug that declares
"detected ephemeral subdomain via wildcard route"), and rely on the existing
strip-first-label fallback logic to detect ephemeral namespaces instead.
pkg/ephemeral/manage.go (2)

186-189: ⚠️ Potential issue | 🟠 Major

Skip expired matches instead of returning immediately.

Ephemeral deploys bypass locking, so duplicate appID + label rows are possible. Returning nil on the first expired match lets an expired row shadow a live one later in the result set and turns a valid hostname into a false 404.

🩹 Suggested fix
 		// Reject expired versions immediately rather than waiting for GC
 		if !av.EphemeralExpiresAt.IsZero() && now.After(av.EphemeralExpiresAt) {
-			return nil, nil
+			continue
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/ephemeral/manage.go` around lines 186 - 189, The code currently returns
nil when encountering an expired ephemeral row (checking av.EphemeralExpiresAt
and now), which can prematurely hide later valid matches; instead, inside the
loop that iterates matches (the block referencing av and now), skip the expired
entry by replacing the early "return nil, nil" with "continue" so the loop keeps
searching for a non-expired row; ensure the function still returns the
appropriate result after the loop if no valid match is found.

145-152: ⚠️ Potential issue | 🟠 Major

Don't delete pools that are still referenced by other versions.

SandboxPoolReferencedByVersions is plural. This loop deletes every pool that mentions versionID, so a reused pool can be torn down while another version still references it. Only delete pools exclusively owned by this version; otherwise remove this version's reference and keep the pool. Based on learnings: in components/activator/activator.go, reused pools can temporarily remain referenced by multiple versions via ReferencedByVersions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/ephemeral/manage.go` around lines 145 - 152, The loop currently deletes
every pool that lists versionID; instead, check each pool's
SandboxPoolReferencedByVersions (or equivalent field) for multiple references:
for each p from poolResp.Values() get poolID and referenced :=
p.SandboxPoolReferencedByVersions() (or its accessor) and if the slice/set has
only this versionID call eac.Delete(ctx, poolID) and append errors to poolErrs
as before; if the pool is referenced by other versions, remove this versionID
from the pool's reference list and update the pool (do not call eac.Delete),
logging the reference-removal path and preserving pools shared by other
versions. Ensure you still log and append errors from either the Delete or the
update operation.
🧹 Nitpick comments (1)
servers/deployment/server_test.go (1)

1150-1285: Assert the contracts that distinguish ephemeral deploys from normal deploys.

This test covers TTL persistence, but it would still pass if the server started creating real deployment records again or stopped replacing same-label previews. Please add assertions for both behaviors: deployment count should stay unchanged across an ephemeral deploy, and a second deploy with the same label should leave exactly one matching ephemeral version (or otherwise prove the first one was replaced).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@servers/deployment/server_test.go` around lines 1150 - 1285, Update
TestDeployVersionEphemeral to assert that ephemeral deploys do not create real
Deployment records and that repeated ephemeral deploys with the same label
replace the previous preview: before the first client.DeployVersion call capture
the current deployment count via inmem.Client.List (or the in-memory store list
API) and assert the count is unchanged after the ephemeral deploy (use
server/client deployment types from deployment_v1alpha and the DeployVersion
result.Deployment when present); then perform a second ephemeral
client.DeployVersion with the same ephemeral label and TTL and assert there is
exactly one AppVersion with EphemeralLabel == "feat-preview" (or query versions
via inmem.Client.List/Get) to prove the prior preview was replaced rather than
duplicated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@servers/deployment/server.go`:
- Around line 779-839: The code currently mutates the existing AppVersion
(appVersion) in place to set EphemeralLabel/EphemeralTtl/EphemeralExpiresAt;
instead, clone the source AppVersion into a new AppVersion entity (e.g., copy
struct, clear ID/revision), set the ephemeral fields on the clone, persist the
clone via the entity API (create a new entity with SetAttrs from clone and call
d.EAC.Put/Create), and use that new version ID for ephemeral lifecycle calls
(ephemeralx.ReplaceExisting, ephemeralx.EnforceLimit) and results, leaving the
original appVersion unchanged; ensure you do not reuse
updateEntity.SetId(string(appVersion.ID)) or set the revision from current —
create a new entity id/revision for the cloned version so the active version is
not modified.

In `@servers/httpingress/httpingress.go`:
- Around line 679-719: The lease cache must be scoped to the resolved version
ID, not just the app ID: after resolving the AppVersion into av (via
ephemeralx.LookupByLabel or h.eac.Get and av.Decode), compute a lease key
derived from av.ID (e.g., targetAppId.String()+":"+string(av.ID) or similar) and
replace any earlier calls that used useLease(targetAppId.String()) /
retainLease(targetAppId.String()) with the new version-scoped key; ensure
release/invalidate paths also use the same version-scoped key so ephemeral and
active versions never share the same lease.

---

Outside diff comments:
In `@servers/httpingress/httpingress.go`:
- Around line 526-542: Call h.lookupEphemeralRoute(ctx, onlyHost) separately and
if it returns a non-nil error surface it (log and write HTTP 500 / return)
instead of letting the else-if drop the error; only proceed to set route,
targetAppId, routeType, ephemeralLabel and run
oidcMiddleware/serveAuthenticatedRequest when err == nil and baseRoute != nil.
Ensure you reference the same variables used now (lookupEphemeralRoute,
baseRoute, route, targetAppId, oidcMiddleware, serveAuthenticatedRequest) and
return immediately after writing the 500 to avoid falling through to the
default-route/404 branch.

---

Duplicate comments:
In `@cli/commands/deploy.go`:
- Around line 200-215: The code prints an extra constructed URL using
ephemeralLabel and AccessInfo.ClusterHostname() even when the server already
returned authoritative hostnames via result.AccessInfo().Hostnames(), causing
duplicate/invalid URLs; update the rendering in the ephemeral branch (the block
that checks ephemeralLabel, ephemeralTTL and result.HasAccessInfo()) to: prefer
and print each hostname from result.AccessInfo().Hostnames() exactly as
returned, and only if no hostnames are present but
AccessInfo().ClusterHostname() is non-empty, then print the fallback
https://<ephemeralLabel>.<clusterHostname> URL; centralize this logic so both
places (the ephemeral branch and the other access-printing block) use the same
rule and avoid printing duplicate access blocks.

In `@pkg/ephemeral/manage.go`:
- Around line 186-189: The code currently returns nil when encountering an
expired ephemeral row (checking av.EphemeralExpiresAt and now), which can
prematurely hide later valid matches; instead, inside the loop that iterates
matches (the block referencing av and now), skip the expired entry by replacing
the early "return nil, nil" with "continue" so the loop keeps searching for a
non-expired row; ensure the function still returns the appropriate result after
the loop if no valid match is found.
- Around line 145-152: The loop currently deletes every pool that lists
versionID; instead, check each pool's SandboxPoolReferencedByVersions (or
equivalent field) for multiple references: for each p from poolResp.Values() get
poolID and referenced := p.SandboxPoolReferencedByVersions() (or its accessor)
and if the slice/set has only this versionID call eac.Delete(ctx, poolID) and
append errors to poolErrs as before; if the pool is referenced by other
versions, remove this versionID from the pool's reference list and update the
pool (do not call eac.Delete), logging the reference-removal path and preserving
pools shared by other versions. Ensure you still log and append errors from
either the Delete or the update operation.

In `@servers/httpingress/httpingress.go`:
- Around line 503-514: The code is incorrectly treating any matched wildcard
route as an ephemeral subdomain by calling ingress.ExtractSubdomainLabel and
setting ephemeralLabel (in the route handling block for variable route and
onlyHost); remove that behavior so matched wildcard routes are treated as normal
routes: delete or skip the ephemeralLabel extraction and associated log branch
inside the route != nil block (the lines using ephemeralLabel,
ingress.ExtractSubdomainLabel, and the Debug that declares "detected ephemeral
subdomain via wildcard route"), and rely on the existing strip-first-label
fallback logic to detect ephemeral namespaces instead.

---

Nitpick comments:
In `@servers/deployment/server_test.go`:
- Around line 1150-1285: Update TestDeployVersionEphemeral to assert that
ephemeral deploys do not create real Deployment records and that repeated
ephemeral deploys with the same label replace the previous preview: before the
first client.DeployVersion call capture the current deployment count via
inmem.Client.List (or the in-memory store list API) and assert the count is
unchanged after the ephemeral deploy (use server/client deployment types from
deployment_v1alpha and the DeployVersion result.Deployment when present); then
perform a second ephemeral client.DeployVersion with the same ephemeral label
and TTL and assert there is exactly one AppVersion with EphemeralLabel ==
"feat-preview" (or query versions via inmem.Client.List/Get) to prove the prior
preview was replaced rather than duplicated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3a17c4b1-45b7-49b6-b107-48ef7a870817

📥 Commits

Reviewing files that changed from the base of the PR and between b6115f6 and 73c3452.

📒 Files selected for processing (29)
  • api/build/build_v1alpha/rpc.gen.go
  • api/build/rpc.yml
  • api/core/core_v1alpha/schema.gen.go
  • api/core/schema.yml
  • api/deployment/deployment_v1alpha/rpc.gen.go
  • api/deployment/rpc.yml
  • api/ingress/client.go
  • api/ingress/client_test.go
  • blackbox/ephemeral_test.go
  • cli/commands/app_rollback.go
  • cli/commands/app_versions.go
  • cli/commands/commands.go
  • cli/commands/deploy.go
  • components/coordinate/coordinate.go
  • controllers/ephemeral/gc.go
  • controllers/ephemeral/gc_test.go
  • docs/command-sidebar.json
  • docs/docs/command/app-versions.md
  • docs/docs/command/app.md
  • docs/docs/command/deploy.md
  • docs/docs/commands.md
  • pkg/ephemeral/label.go
  • pkg/ephemeral/label_test.go
  • pkg/ephemeral/manage.go
  • pkg/ephemeral/manage_test.go
  • servers/build/build.go
  • servers/deployment/server.go
  • servers/deployment/server_test.go
  • servers/httpingress/httpingress.go
✅ Files skipped from review due to trivial changes (9)
  • docs/docs/commands.md
  • docs/command-sidebar.json
  • docs/docs/command/app.md
  • api/ingress/client_test.go
  • docs/docs/command/app-versions.md
  • pkg/ephemeral/label.go
  • pkg/ephemeral/label_test.go
  • docs/docs/command/deploy.md
  • api/core/core_v1alpha/schema.gen.go
🚧 Files skipped from review as they are similar to previous changes (8)
  • api/deployment/rpc.yml
  • cli/commands/commands.go
  • api/core/schema.yml
  • api/ingress/client.go
  • controllers/ephemeral/gc_test.go
  • cli/commands/app_versions.go
  • api/deployment/deployment_v1alpha/rpc.gen.go
  • servers/build/build.go

Comment thread servers/deployment/server.go
Comment thread servers/httpingress/httpingress.go
evanphx added 7 commits April 17, 2026 15:29
Add ephemeral deployments — app versions tagged with a label and TTL
that are never set as the active version. Each ephemeral version is
accessed via a subdomain prefix on the app's existing route (e.g.,
feat-x.app.example.com). Pools spin up on demand and spin down when
idle. Versions are automatically deleted when their TTL expires.

Key changes:

Schema: Add ephemeral_label (indexed), ephemeral_ttl, and
ephemeral_expires_at (indexed) fields to the AppVersion entity.

CLI: Add --ephemeral and --ttl flags to `miren deploy`. Add new
`miren app versions` command with --ephemeral filter and JSON output.
Ephemeral deploys skip deployment record creation and locking.

Build & deployment servers: When ephemeral, set version fields, skip
SetActiveVersion, handle replace-on-same-label, enforce per-app limit
(max 10), and validate labels server-side.

HTTP ingress: New lookupEphemeralRoute strips the first subdomain
label and matches against existing routes — no wildcard route entity
required. Resolves ephemeral AppVersion by label and passes it to
AcquireLease (activator needs zero changes).

GC controller: Periodic cleanup (every 5m) of expired ephemeral
versions and their associated sandbox pools. Wired into coordinator.

Closes MIR-992
- Fix double-prefixing of ephemeral hostnames in --version deploy path:
  deployment server's getAccessInfo now resolves wildcard/normal routes
  into concrete ephemeral hostnames when given a label, so the CLI can
  print them as-is. Adds ClusterHostname fallback.

- Abort ephemeral version deletion when sandbox pool cleanup fails so
  pools aren't leaked. The version entity is retained for GC retry.

- Reject non-positive ephemeral TTLs (ttl <= 0) in both build and
  deployment server paths — prevents versions that expire immediately.

- Fail the deploy when ReplaceExisting or EnforceLimit fails rather
  than logging and continuing. Silent failures could violate label
  uniqueness or the per-app limit.

- Fail loudly when fetching the app version revision fails during
  ephemeral update. Previously the Put was silently skipped and the
  user saw success while the ephemeral fields were never persisted.
LookupByLabel now checks EphemeralExpiresAt and returns nil for
versions past their TTL. This stops routing traffic to expired
versions immediately rather than waiting up to 5 minutes for the
next GC sweep.
- LookupByLabel: change `return nil` to `continue` on expired match
  so later valid entries aren't shadowed by a stale one in the result set.

- deleteEphemeralVersion: before deleting a sandbox pool, check if it's
  shared (ReferencedByVersions has multiple entries). For shared pools,
  remove only the current version's reference instead of deleting the
  entire pool, preventing disruption to other versions sharing it.
- DeployVersion: clone the source AppVersion into a new entity instead
  of mutating it in place. Deploying an active version as ephemeral no
  longer corrupts the live version with ephemeral fields.

- HTTP ingress: scope the lease cache key by ephemeral label
  (appID:eph:label) so ephemeral and active versions never share
  cached leases. Previously all versions of an app shared one cache
  entry, which could proxy ephemeral traffic to active sandboxes or
  vice versa.
The DeploymentLauncher creates pools when App.ActiveVersion changes,
but ephemeral versions skip SetActiveVersion so no pool exists when
the first request arrives.

Add a PoolCreator interface to the activator. When requestPoolCapacity
can't find a pool after retries and a PoolCreator is registered, it
calls CreatePoolForVersion to create one on demand. The Launcher
implements this interface, reusing its existing ensurePoolForService
logic.

The coordinator wires the Launcher into the activator via
SetPoolCreator at startup.
@evanphx evanphx force-pushed the mir-992-implement-rfd-72 branch from d0dcc25 to 006f069 Compare April 17, 2026 22:29
Copy link
Copy Markdown
Contributor

@phinze phinze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clegon and Phenkman checking back in. We dug through the delta and ended up doing some git archaeology, because the on-demand pool creation turned out to have a richer backstory than the diff suggests.

Most of the delta is tight. The clone-instead-of-mutate fix (09b799bb) and the lease-cache scoping are genuinely important, both were silent correctness failures for --version <active> --ephemeral. The shared-pool cleanup (d8bcb91b) is the right call. The LookupByLabel expiry check fully addresses our earlier "apparition past its expiry" note.

The piece Evan flagged for re-review (pool creation on the activation path, 006f0690) is where we ended up spending most of our time.

The backstory: bf97ec7d (Oct 2025) introduced a sentinel pattern in poolState specifically to prevent duplicate pool creation under concurrent requests. Evan reviewed and refined it in b630ea32, the commit that added the concurrency doc block still sitting at the top of activator.go. 040329dd (RFD-39 Phase 4) then moved pool creation out to DeploymentLauncher; the sentinel fields stayed but stopped guarding anything. This PR's on-demand branch reintroduces pool creation in the activator without re-wiring the sentinel, so concurrent requests to a cold ephemeral URL can both fall through to CreatePoolForVersion. Since findMatchingPool is list-then-create (not atomic), both can succeed.

Suggested pragmatic fix: re-animate the sentinel for the on-demand branch. The fields are still there, just unused. Mark inProgress: true and open a done channel before releasing the lock, close it after creation finishes. One related cleanup either way: the top-of-file doc block at lines 21-25 still claims requestPoolCapacity uses the Sentinel Pattern, which hasn't been true since 040329dd. Restore the sentinel and the doc is correct again; skip the sentinel and the doc should lose that section.

The bigger frustration, which Paul has been sitting with for a while: the activator's coordination story keeps slowly eroding as features thread through it. MIR-596 tracks the direction already, and we added a comment there pointing at this PR as fresh evidence. That's the refactor pile, not a blocker.

Comment thread components/activator/activator.go
Comment thread components/activator/activator.go Outdated
Two concurrent requests to a cold ephemeral URL could both fall out of
the findPoolInStore retry loop and both call CreatePoolForVersion,
creating duplicate pools for the same (version, service) because
findMatchingPool is list-then-create (not atomic).

Reuse the existing inProgress/done/err sentinel infrastructure in
poolState: under the write lock, install the sentinel before releasing
to do the slow creation. Concurrent callers find the sentinel, wait on
done, then loop back to re-check state. The sentinel is replaced with
the real pool state on success, or deleted on failure.

Also update the top-of-file doc block to describe the sentinel pattern
accurately for the on-demand creation case, and clarify the error
message when findPoolInStore returns nil after ensurePoolForService
reused an existing pool (poolID empty).
@evanphx evanphx merged commit cd90d34 into main Apr 24, 2026
14 checks passed
@evanphx evanphx deleted the mir-992-implement-rfd-72 branch April 24, 2026 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants