wiki(automation-gcp-gke): expand PgBouncer budget + Pending PVC cleanup
Two operational sections fleshed out beyond the install-page brief:
PgBouncer connection budget (replaces the prior 2-paragraph stub):
- Layer diagram showing app sessions -> PgBouncer -> Cloud SQL.
- Knobs table listing pgbouncer_pool_size, pgbouncer_max_client_conn,
pgbouncer_replicas, pgbouncer_pool_mode, Cloud SQL max_connections,
and per-pod inflight, plus where each one lives.
- Cloud SQL tier defaults table (~50 conns on db-g1-small, etc).
- Math:
cloud_sql_max_connections >= pgbouncer_replicas * pool_size + reserved_admin
pgbouncer_replicas * pgbouncer_max_client_conn >= sum(app inflight)
- Sizing checklist: backends first, client capacity, bump PgBouncer
before Cloud SQL tier, max_connections is a flag not a tier limit.
- Diagnostics: pg_stat_activity from a server pod, pgbouncer SHOW POOLS
and SHOW STATS via the admin port; how to read cl_waiting / maxwait
to spot throttling.
Stale Pending PVCs cleanup (replaces the prior 2-command snippet):
- 5-step safe-cleanup recipe: inventory, confirm no pod / deploy / sts
/ job references the PVC, snapshot before delete, delete one at a
time without force-removing finalizers, verify.
- jq filters to find references in pod specs and workload templates.
- Explanation of why a non-empty VOLUME field means the PVC was bound
at some point and should not be deleted blindly.
- Root-cause callout: stale Pendings come from kubectl apply of kind-
profile manifests on GKE; the Helm chart's dynamic-provisioning PVCs
avoid the issue.
wiki(automation-gcp-gke): add GKE Helm install + clarify kind/GKE KEDA split
New page automation-gcp-gke.md documenting the GKE install path:
- Topology (Cloud SQL + PgBouncer + Helm NATS + chart-templated KEDA).
- Prerequisites: gcloud / kubectl / helm, GCP project setup, one-time
KEDA + cert-manager installs.
- Install via the noetl_gke_fresh_stack.yaml playbook with the
frequently-edited workload variables called out.
- Upgrade flow (helm upgrade --reuse-values), including the gotcha
where --reuse-values does not merge new chart defaults (PR #116
migration hit this).
- Verify checklist including KEDA scaledobject, NATS durable consumer,
Cloud SQL connectivity through PgBouncer, and a smoke run.
- Tuning section for KEDA, PgBouncer connection budget, Cloud SQL HA.
- Common pitfalls: two autoscalers fighting (HPA conflict that drove
ops #115), live-patching the autoscaler (the anti-pattern that
drove ops #116), worker durable consumer drift (noetl #600), and
stale Pending PVCs.
Update Home.md to add an Automation playbooks row for the new page.
Update _Sidebar.md with an Automation section.
Update manifests-keda.md to be honest about the kind/GKE split: the
existing page sample (account: NOETL, nats.nats.svc:8222) is the
kind-cluster artifact; the GKE artifact is chart-rendered with
account: $G and nats-headless. Added a profile-note callout near
the top and a cross-link in Related.
Cross-references:
- ai-meta decision doc 2026-05-24-gke-postgres-topology (Option A).
- ops PRs #115, #116 (HPA conflict + KEDA chart promotion).
- noetl PR #600 (worker consumer self-heal).