fix(k8s): correct manifests and documentation for kubernetes deployment#4347
fix(k8s): correct manifests and documentation for kubernetes deployment#4347NathanFlurry merged 2 commits intomainfrom
Conversation
|
🚅 Deployed to the rivet-pr-4347 environment in rivet-frontend
|
PR Review: fix(k8s): correct manifests and documentation for kubernetes deploymentThis is a solid fix for the broken Kubernetes deployment. The critical crash fix and domain reference cleanup are well-executed. Below are my findings organized by severity. Critical / Blockers1. Singleton deployment removed without confirmation this is safe The PR removes the separate
Now there is only one deployment running If the engine handles singleton coordination internally (e.g., via distributed leader election), this is fine, but it should be documented or confirmed in the PR. Medium Issues2. The deployment manifest at 3. 2-node NATS cluster with The NATS cluster is reduced from 3 to 2 replicas ( Options:
The README describes this as "2-node HA" which is slightly misleading. Minor Issues4. Ordering hazard in docs: The setup guide correctly tells users to create the namespace and secret (step 4) before running 5. NATS readiness probe missing
6. Namespace manifest stripped of all labels The new What Is Good
SummaryThe PR fixes a real deployment-blocking bug and is well-structured overall. The main open questions are: (1) whether removing the singleton deployment is architecturally safe, (2) whether 🤖 Generated with Claude Code |
- Fixed broken singleton deployment concept that caused engine crash (`--except-services singleton` is not a valid service name) - Removed unused NATS manifests (engine uses PostgreSQL NOTIFY for pub/sub) - Reduced resource requests to match production (2000m/2Gi request, 3000m/4Gi limit) - Added preStop lifecycle hook (15s graceful drain before SIGTERM) - Added PodDisruptionBudget to prevent all pods evicting at once - Cleaned up datacenter labels and renamed "local" to "default" - Fixed rivet.gg references to rivet.dev in docs and Dockerfile - Fixed health check endpoint (port 6421, not 6420) - Renumbered manifests to be sequential (removed gap at 06) - Updated all documentation to match working manifests - Added config update instructions to README and docs Tested deployment in k3d cluster - engine pods start and become ready.
b09501a to
8136944
Compare
…simplified docs Add NATS cluster (2-node HA) to k8s deployment, add admin token secret support, move manifests to self-host/k8s/, simplify kubernetes docs to use giget for downloading manifests, and update engine.sh for local dev.
Summary
Fixed critical issues with Kubernetes manifests and documentation that prevented deployment. The engine was crashing due to an invalid
--except-services singletonargument (singleton is not a real service name). Also removed unused NATS manifests, corrected resource sizing to match production values, and added production-grade reliability features.Changes
Manifests: Fixed broken singleton deployment, removed unused NATS, reduced resource requests to 2000m/2Gi (request) and 3000m/4Gi (limits), added preStop lifecycle hook for graceful shutdown, added PodDisruptionBudget, cleaned up datacenter labels, renamed "local" to "default", renumbered files sequentially.
Documentation: Updated all docs to match working manifests, fixed health check port from 6420 to 6421, fixed rivet.gg references to rivet.dev, added config update instructions.
Testing
Tested deployment in k3d cluster - engine pods start and become ready, health checks pass on port 6421.
🤖 Generated with Claude Code