OTel Phase 6 — Build-time manifest + egress wiring

> Part of the **Observability — OpenTelemetry Tracing v1** initiative (master tracking: #108). Effort: **S (2–3 engineer-days)**. Risk: **medium** (deployment-time semantics, two enforcement layers, dynamic egress). Depends on: **Phase 2** (#103).

## Goal

A built/deployed agent can reach its collector without baking an environment-specific URL into the image. The endpoint is supplied at **deploy time** via env; the allowlist resolves to whatever was injected, with zero drift.

## Files

| File | Change |
|---|---|
| `forge-cli/build` egress generation | When `tracing.enabled`, add a **dynamic** entry `$OTEL_EXPORTER_OTLP_ENDPOINT` to `egress_allowlist.json`, source `otel` — reusing the existing `$VAR` dynamic-egress mechanism (same as a skill's `$K8S_API_DOMAIN`) |
| `forge-core/security` egress resolver | When expanding the `otel` dynamic entry, **host-extract** from the value (the env holds a full URL, the matcher needs a host) |
| `forge-cli/build` manifest generation | When `tracing.enabled`, inject env *references* (ConfigMap/Secret, `optional: true`) into the Deployment; emit a ConfigMap stub |

## Two enforcement layers — keep them distinct

### 1. Forge in-process EgressEnforcer

The layer that would otherwise silently drop the exporter. Solved by the dynamic `$VAR` entry: the build emits a placeholder, the runtime resolver expands it from the same env var the exporter uses. **Destination and allowlist derive from one variable, so they cannot drift.**

### 2. K8s NetworkPolicy (static, network-level)

It cannot expand env vars at deploy. So for an **external** collector with a deploy-injected host, the NetworkPolicy egress rule is the **deployer's / Platform's responsibility** — the same actor that injects the env. This is the same limitation `$K8S_API_DOMAIN` already carries; **not a new gap**.

**Recommended sidestep**: run the collector as a **sidecar** (`http://localhost:4318` — no NetworkPolicy egress rule at all) or an **in-cluster service** (a same-cluster egress rule, not internet). Then the agent's external allowlist / NetworkPolicy is untouched and the collector owns forwarding to the real backend.

## Egress entry

```json
{ "domain": "$OTEL_EXPORTER_OTLP_ENDPOINT", "source": "otel" }
```

Resolver expands at startup → host-extract:

```
https://otel.initializ.ai:4318/v1/traces → otel.initializ.ai
```

Skip the entry when the expanded host is `localhost` (sidecar) or the var is empty. **Do not** introduce a second `_HOST` var — host-parse the one endpoint var.

## Deployment env injection

When `tracing.enabled`, emit into the Deployment container (references, not literals — so deploy-time override works and missing config degrades to no-op):

```yaml
env:
  - name: OTEL_EXPORTER_OTLP_ENDPOINT
    valueFrom:
      configMapKeyRef: { name: forge-otel, key: endpoint, optional: true }
  - name: OTEL_EXPORTER_OTLP_HEADERS
    valueFrom:
      secretKeyRef: { name: forge-otel-auth, key: headers, optional: true }
```

`optional: true` is **load-bearing**: absent ConfigMap → env unset → gate fails (per Phase 2) → no-op, pod still healthy. `OTEL_SERVICE_NAME` defaults to `agent_id`. Also emit `forge-otel.configmap.example.yaml` (a stub) so the operator sees exactly what to populate.

The Initializ Platform populates this ConfigMap automatically at deploy; self-managed operators fill it via kustomize / helm / GitOps.

## Build mode interactions

- **`--slim`**: skips manifests + allowlist entirely → no otel env, no otel egress entry. Ops wires everything.
- **`--prod`**: rejects the dev-open egress profile. Because the otel entry is dynamic, the in-process enforcer still resolves it correctly at runtime — but for an **external** collector the NetworkPolicy must permit it (deploy-owned) or the exporter is silently blocked in prod. State this loudly; recommend the in-cluster collector for prod.

## Verify

```bash
# Build an agent with tracing.enabled (no endpoint committed in forge.yaml):
forge build
jq '.[] | select(.source=="otel")' .forge-output/egress_allowlist.json
# dynamic $OTEL_EXPORTER_OTLP_ENDPOINT entry present

grep -n 'configMapKeyRef\|forge-otel' .forge-output/k8s/*deployment*.yaml
# env reference + optional:true present

ls .forge-output/k8s/ | grep -i 'forge-otel.*example'
# ConfigMap stub emitted

# Runtime host-extraction:
OTEL_EXPORTER_OTLP_ENDPOINT=https://otel.example.com:4318/v1/traces \
  forge run --tracing --port 8095 &
# confirm enforcer allowlists host "otel.example.com" (not the full URL)
# check egress_allowed audit/log line

# Build with tracing disabled: NO otel egress entry, NO otel env in Deployment.
```

## Anti-patterns to avoid

- Resolving the endpoint host at **build time** (it's deploy-time — emit the `$VAR` placeholder, don't bake a host).
- Adding a `--otel-endpoint` flag (redundant — the dynamic entry supersedes it).
- A second `_HOST` env var (host-parse the one endpoint var).
- Literal env `value:` instead of `valueFrom` reference (kills deploy-time override).
- Allowlisting `localhost`.
- Expecting the static NetworkPolicy to resolve `$VAR` for external collectors.


File	Change
`forge-cli/build` egress generation	When `tracing.enabled`, add a dynamic entry `$OTEL_EXPORTER_OTLP_ENDPOINT` to `egress_allowlist.json`, source `otel` — reusing the existing `$VAR` dynamic-egress mechanism (same as a skill's `$K8S_API_DOMAIN`)
`forge-core/security` egress resolver	When expanding the `otel` dynamic entry, host-extract from the value (the env holds a full URL, the matcher needs a host)
`forge-cli/build` manifest generation	When `tracing.enabled`, inject env references (ConfigMap/Secret, `optional: true`) into the Deployment; emit a ConfigMap stub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OTel Phase 6 — Build-time manifest + egress wiring #107

Goal

Files

Two enforcement layers — keep them distinct

1. Forge in-process EgressEnforcer

2. K8s NetworkPolicy (static, network-level)

Egress entry

Deployment env injection

Build mode interactions

Verify

Anti-patterns to avoid

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

OTel Phase 6 — Build-time manifest + egress wiring #107

Description

Goal

Files

Two enforcement layers — keep them distinct

1. Forge in-process EgressEnforcer

2. K8s NetworkPolicy (static, network-level)

Egress entry

Deployment env injection

Build mode interactions

Verify

Anti-patterns to avoid

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions