fix(k8s): Service for BoJ to ClusterIP (HCG tier-2 E1 prereq)#131
Merged
Conversation
Per ADR-0004 §1 (http-capability-gateway tier-2 placement) and the Phase E rollout-runbook §1.4 prereq #8, BoJ MUST NOT be externally addressable when fronted by HCG. Previously `k8s/service.yaml` declared `type: LoadBalancer`, exposing all four BoJ ports (REST 7700, gRPC 7701, GraphQL 7702, SSE 7703) externally — defeating the §3.4 decommission invariant before Phase E even starts. This is action item #8 from the Phase E consumer-side audit posted on hyperpolymath/standards#100 — companion to action #6 (boj-server#130 Cowboy bind tightening). The two layers together give defense in depth: the BoJ process itself binds loopback (#130) AND the k8s Service does not expose it externally (this PR). Changes: - `k8s/service.yaml` type: LoadBalancer -> ClusterIP. - Add `hyperpolymath.dev/exposure: "internal-only"` and `hyperpolymath.dev/external-via: "http-capability-gateway (tier-2)"` annotations so the posture is discoverable from `kubectl describe` without grepping for ADR references. - Add a header comment explaining the rationale, pointing to ADR-0004 and the rollout-runbook, and showing the kustomize/helm override recipe for legacy/standalone deployments that need BoJ exposed externally (no HCG in front). - Keep ports 7700-7703 declared forward-compatibly (current BoJ binds 7700 only; 7701/7702/7703 reserved for gRPC/GraphQL/SSE). DRAFT — breaking for anyone running this manifest as-is with a LoadBalancer in production. Owner gates merge on confirming no live LoadBalancer-fronted BoJ deployment exists that this change would break. If one exists, the migration path is HCG-in-front (per the rollout-runbook). Refs hyperpolymath/standards#100 Refs hyperpolymath/standards#91 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🔍 Hypatia Security ScanFindings: 30 issues detected
View findings[
{
"reason": "Stale AI session file -- delete",
"type": "stale",
"file": "GEMINI.md",
"action": "delete",
"rule_module": "root_hygiene",
"severity": "medium"
},
{
"reason": "Issue in quality.yml",
"type": "missing_workflow",
"file": "quality.yml",
"action": "create",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "Issue in security-policy.yml",
"type": "missing_workflow",
"file": "security-policy.yml",
"action": "create",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Action hyperpolymath/standards/.github/workflows/governance-reusable.yml@main needs attention",
"type": "unpinned_action",
"file": "governance.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/sanctify-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/academic-workflow-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/fireflag-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/ephapax-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/bofig-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/hesiod-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
}
]Powered by Hypatia Neurosymbolic CI/CD Intelligence |
This was referenced May 20, 2026
hyperpolymath
added a commit
that referenced
this pull request
May 20, 2026
…#132) ## Summary Tightens three sites that feed the Zig adapter binary's `--host` flag in production deployments, materialising the ADR-0004 §1 invariant that BoJ's back-side bind is not externally routable when fronted by `http-capability-gateway` (HCG tier-2). This is **action item #7** from the [Phase E consumer-side audit](hyperpolymath/standards#100 (comment)). The scope expanded during implementation from one site (the audit named `stapeln.toml`) to three sites — `entrypoint.sh` and `compose.prod.yaml` had the same `[::]` default that the audit missed. All three sites feed into the same Zig-adapter `--host` flag, so they need to flip together for the change to actually take effect at runtime. Companion PR to **#130** (Cowboy bind tightening in the Elixir path) and **#131** (k8s Service ClusterIP). Together the three give defence in depth: Elixir Cowboy binds loopback **AND** Zig adapter binds loopback **AND** k8s Service is internal-only. `Refs hyperpolymath/standards#100` (NOT Closes — joint-close is owner-only). `Refs hyperpolymath/standards#91`. ## What's in | File | Change | |---|---| | `stapeln.toml` `[targets.production]` | `APP_HOST = "[::]"` → `APP_HOST = "127.0.0.1"` + comment block. | | `container/entrypoint.sh` line 40 (log) + line 140 (`exec` invocation) | `${APP_HOST:-[::]}` → `${APP_HOST:-127.0.0.1}` + comment at exec line. | | `container/compose.prod.yaml` `services.boj-rest.environment` | `APP_HOST: "[::]"` → `APP_HOST: "127.0.0.1"` + comment block. | | `CHANGELOG.md` | New `### Changed` entry under `[Unreleased]`. | ## Override path for legacy/standalone use Deployments without HCG in front: set `APP_HOST=0.0.0.0` (IPv4 all-interfaces) or `APP_HOST=::` (IPv6 all-interfaces) in your deployment config. The in-repo defaults remain loopback. ## Audit-residue follow-ups deliberately NOT in this PR - `container/Containerfile` line 125: `ENV PHX_HOST=0.0.0.0` is **vestigial**. Nothing in the codebase reads `PHX_HOST` (verified by `grep -rn "PHX_HOST\|phx_host" --include="*.ex" ...` returning empty). Leftover from a former Phoenix incarnation. Safe to leave alone; can be removed in a hygiene PR if desired. - Unifying `APP_HOST` (Zig adapter) and `BOJ_BIND_IP` (Elixir Cowboy from #130) into one envelope is broader scope. The divergence exists because they feed different binaries built by different toolchains. If it proves annoying in operation, file a separate issue. ## Why DRAFT Same reason as #131 — this is a behaviour change for anyone running the stapeln-built production container or `compose.prod.yaml` as-is and relying on the default `[::]` for external access. Owner gates merge on confirming no such reliance, or on coordinating with anyone who needs the migration path (HCG-in-front, or explicit `APP_HOST=0.0.0.0` override). ## Test plan - [x] `stapeln.toml` parses as valid TOML (syntax preserved). - [x] `container/entrypoint.sh` runs through `sh -n` without syntax error (no syntax change, just literal substitution). - [x] `container/compose.prod.yaml` parses as valid YAML. - [ ] CI green — governance / hypatia / a2ml / k9 / dogfooding all pass. - [ ] Owner: confirm no live deployment depends on `[::]` default; flip from DRAFT to ready. - [ ] Post-merge manual: a stapeln-built production container, with no env override, refuses connections from non-loopback peers. ## Risk **Low for the codebase, medium for ops.** No Elixir / Zig / Idris2 / cartridge logic touched; CI should not show any regressions. Ops risk: anyone whose runbook assumes the container exposes BoJ on all interfaces by default will need to set `APP_HOST=0.0.0.0` explicitly. Reasonable default for the Phase E posture; documented override path. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyperpolymath
added a commit
that referenced
this pull request
May 20, 2026
…ds#100/#91) (#138) ## Summary Adds a second 2026-05-20 entry to `.machine_readable/6a2/STATE.a2ml` `[session-history]` documenting the afternoon HCG Phase E first-session output. The morning Tier C entry (already in main) stays in place; this new entry sits **above** it per the newest-first convention. `Refs hyperpolymath/standards#100` (Phase E), `Refs hyperpolymath/standards#91` (HCG tier-2 channel parent). **NOT Closes**. ## What's in A single new TOML entry in `[session-history] entries = [ ... ]` summarising the afternoon's deliverables: - PR `#128` (MERGED) — `docs/integration/hcg-tier2-rollout-runbook.md` (E5 rollout-and-rollback runbook, 308 lines, `!OWNER:` markers in §1.3 + §4) - PR `#130` (MERGED) — Cowboy bind `127.0.0.1` default + `BOJ_BIND_IP` env override (audit #6) - PR `#131` (MERGED) — k8s Service `LoadBalancer → ClusterIP` (audit #8) - PR `#132` (MERGED) — container `APP_HOST` defaults across `stapeln.toml` + `entrypoint.sh` + `compose.prod.yaml` (audit #7) - Issue `#135` (filed) — k8s NetworkPolicy follow-up (Low priority, Phase E acceptance non-critical) - Defence in depth: 3 independent loopback layers (Elixir Cowboy + Zig adapter + k8s Service) - Phase C §3 invariant 3 correction: confirmed via `git log` that the deny clause landed in `boj-server#106 (40e46f6)`; the channel-status comment claiming it was owner-gated was stale. The entry also records the **Phase E gating posture**: E1/E2/E3/E4 wiring + Trustfile `PENDING → DEPLOYED` flip are all explicitly gated on Phase D-3 (regression alert armed) + D-4 (real baseline numbers populated), per the runbook §1.1. The afternoon session shipped only the Phase-D-independent artefacts. ## Why a separate PR (not amended into another) All four code PRs (#128/#130/#131/#132) are already merged. The STATE.a2ml entry parallels the morning Tier C entry (already in main from the morning session), and the convention is per-session per-entry. Keeping this as its own doc PR is the cleanest record. ## Verification - TOML syntax: valid (single new `{ date = "...", description = "..." }` entry prepended). - Linting: `validate-a2ml` action will run on PR. ## Risk **Negligible.** Doc-only; no code or workflow changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Per ADR-0004 §1 (http-capability-gateway tier-2 placement) and the Phase E rollout-runbook §1.4 prereq #8, BoJ MUST NOT be externally addressable when fronted by HCG. Previously
k8s/service.yamldeclaredtype: LoadBalancer, exposing all four BoJ ports (REST 7700, gRPC 7701, GraphQL 7702, SSE 7703) externally — defeating the runbook §3.4 decommission invariant before Phase E even starts.This is action item #8 from the Phase E consumer-side audit — companion to action #6 (
boj-server#130Cowboy bind tightening). The two layers together give defence in depth: BoJ binds loopback (#130) AND the k8s Service does not expose it externally (this PR).Refs hyperpolymath/standards#100(NOT Closes — joint-close is owner-only).Refs hyperpolymath/standards#91.What's in
k8s/service.yamltype: LoadBalancer→type: ClusterIP. Header comment explaining the Phase E posture. New annotations:hyperpolymath.dev/exposure: "internal-only",hyperpolymath.dev/external-via: "http-capability-gateway (tier-2)". Ports 7700–7703 retained forward-compatibly.CHANGELOG.md### Changedentry under[Unreleased].Estate cross-check:
hypatia/*,rsr-certifier, andopsm-serviceall useClusterIPfor backend Services. The estate's gateway-fronted-backend pattern isService: ClusterIP. The onlyLoadBalancerin the estate's k8s manifests today (besides the BoJ Service this PR fixes) issvalinn-gateway— i.e. the gateway tier, not a backend. This PR brings BoJ's Service in line with the estate pattern.Why DRAFT
Breaking for anyone running this manifest as-is with a LoadBalancer in production. I don't know whether a live LoadBalancer-fronted BoJ deployment exists. Marking draft so the owner gates merge on confirming no such deployment will break. If one exists, the migration path is HCG-in-front (per the rollout-runbook §2).
For ad-hoc / dev / non-HCG-fronted deployments that need BoJ exposed externally, the header comment in
service.yamlshows the kustomize/helm overlay recipe rather than editing the canonical manifest:Test plan
kubectl apply -f k8s/service.yaml --dry-run=clientaccepts the spec;kubectl describe svc/boj-servershows the new annotations.Risk
Medium for the manifest, low for the codebase. The codebase is untouched — only k8s manifest. Breakage scope is limited to k8s operators relying on the LoadBalancer external IP for BoJ; they get a fixable Service-type override. No mix.exs / Elixir / Zig / Idris2 / cartridge changes; CI should not show any logic regressions.
Follow-ups not in this PR (audit residue)
stapeln.tomlAPP_HOST = "[::]"→ loopback (separate concern: packager). Can pick up next.NetworkPolicyrestricting BoJ pod ingress to only the HCG pod. Defence-in-depth beyond ClusterIP. Not strictly required for Phase E acceptance — Service-type is the controlling lever — but worth filing as a follow-up issue.🤖 Generated with Claude Code