Skip to content

Kubernetes Caddy Deployment Patching & GCP Redis Configuration#59

Merged
universe-ops merged 13 commits intomainfrom
staging
Dec 22, 2025
Merged

Kubernetes Caddy Deployment Patching & GCP Redis Configuration#59
universe-ops merged 13 commits intomainfrom
staging

Conversation

@universe-ops
Copy link
Copy Markdown
Contributor

Overview

Enhanced Kubernetes deployment patching for Caddy services and improved GCP Redis configuration handling. This PR introduces native Kubernetes client-based patching to bypass Pulumi's DeploymentPatch validation limitations, enabling reliable annotation updates for Caddy deployments.

Key Changes

1. Native Kubernetes Deployment Patching

  • New: PatchDeployment() function using native Kubernetes Go client instead of Pulumi resources
  • Benefit: Bypasses Pulumi's DeploymentPatch validation issues that prevented annotation updates
  • Implementation: Uses ApplyTWithContext to resolve Pulumi outputs before executing patch
  • Timeout Handling: 15-second timeout with graceful degradation for parent context cancellation
  • Non-critical: Patch failures don't block deployment (logged as warnings)

2. Caddy Deployment Naming Consistency

  • New: GenerateCaddyDeploymentName() exported function for consistent naming across packages
  • Pattern: Uses single-dash convention (caddy-staging not caddy--staging)
  • Support: Allows DeploymentName override in Caddy config for adoption scenarios
  • Files: kubernetes/naming.go, kubernetes/kube_run.go, gcp/gke_autopilot_stack.go

3. Improved Caddy Annotation Updates

  • Timestamp: Updated annotation now includes actual deployment time (RFC3339 format)
  • Hash: Caddyfile hash computed from actual entry content
  • Provider Casting: Proper Kubernetes provider casting for patch operations
  • Logging: Enhanced debug and success logging for patch operations

4. GCP Redis Configuration

  • Enhancement: Added Adopt and ClusterName fields to support adoption of existing Redis instances
  • Validation: Proper error handling for configuration conversion

5. GKE Autopilot Stack Updates

  • Caddy Naming: Uses new GenerateCaddyDeploymentName() for consistency
  • Deployment Override: Supports DeploymentName config for adopted Caddy deployments
  • Provider Configuration: Enabled EnableServerSideApply for Kubernetes provider (required for DeploymentPatch)

6. Dependencies

  • Updated: Go module dependencies with Kubernetes SDK v4 compatibility
  • Added: time import for RFC3339 timestamp formatting
  • Added: sdkK8s import for Kubernetes provider type casting

Technical Details

Deployment Patch Architecture:

// Native K8s client patch with Pulumi output resolution
result := sdk.All(kubeconfig, annotations).ApplyTWithContext(ctx.Context(), func(...) (string, error) {
    // Execute patch with native client
    patchDeploymentWithK8sClient(patchCtx, inputs)
})

Caddy Deployment Naming:

  • Consistent naming: caddy-{environment} across all cloud providers
  • Adoption support: Can override with existing deployment name
  • Backwards compatible: Maintains existing naming conventions

Testing

  • ✅ All builds pass: go build ./...
  • ✅ All linters clean: welder run fmt
  • ✅ Kubernetes provider properly configured for patch operations
  • ✅ Caddy annotation updates include timestamps and content hashes

Breaking Changes

None. All changes are backward compatible.

Related Issues

  • Fixes Caddy annotation patching for Kubernetes deployments
  • Enables adoption of existing GCP Redis instances
  • Improves GKE Autopilot Caddy deployment consistency

@universe-ops universe-ops requested a review from smecsia December 22, 2025 20:09
@universe-ops universe-ops merged commit 9804ca4 into main Dec 22, 2025
4 checks passed
@universe-ops universe-ops deleted the staging branch December 22, 2025 20:25
Cre-eD added a commit that referenced this pull request Apr 10, 2026
… restarts

time.Now() was used at pulumi eval time, so caddy-updated-at always changed
on every pulumi up even when the Caddyfile was identical. This dirtied the
pod template on every app deployment, causing a Caddy rolling restart each
time — which triggered Cloudflare 521 errors due to persistent connections
being dropped before Cloudflare rerouted them.

History: the original value was the static string "latest" (PR #59 changed
it to time.Now() as an "improvement"). The intent was informational — not a
rollout trigger.

Fix: derive caddy-updated-at from the Caddyfile content hash (same source as
caddy-update-hash). The annotation value is now stable across pulumi ups when
the Caddyfile hasn't changed, so K8s sees no pod template diff → no rollout.
Caddy still rolls when the Caddyfile actually changes (different hash).

Confirmed root cause via GCP Cloud Logging: all three Caddy patch events on
2026-04-10 had identical hash (03709a04d391d8ac) but different timestamps,
proving time.Now() was the sole cause of every rollout.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants