Skip to content

fix: Defer env expressions to runtime, add globalArguments validation#351

Merged
stack72 merged 1 commit into
mainfrom
fix/defer-env-expressions-to-runtime
Feb 14, 2026
Merged

fix: Defer env expressions to runtime, add globalArguments validation#351
stack72 merged 1 commit into
mainfrom
fix/defer-env-expressions-to-runtime

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented Feb 14, 2026

Summary

Closes #323 and #329.

Problem: Environment variable expressions (${{ env.* }}) were resolved at persist time, unlike vault expressions which are deferred to runtime. This caused two issues:

  1. Secret leakage (Environment variables exposed to CEL expressions via buildEnvContext() #323): Sensitive environment variables (e.g., AWS_SECRET_ACCESS_KEY, GITHUB_TOKEN) referenced via ${{ env.* }} were resolved during expression evaluation and written as plaintext YAML to .swamp/definitions-evaluated/ on disk.

  2. Schema validation bypass (Schema validation bypassed when any field contains a CEL expression #329): When ANY field in a definition contained a CEL expression, validation of ALL static fields was completely skipped. Because env expressions were resolved at persist time but the validation gap existed, there was no point where a fully-resolved definition could be schema-validated.

Fix: Defer env expressions to runtime (matching vault behavior), then add runtime globalArguments schema validation after all expressions are resolved. This fixes both issues by design — at runtime, the definition is fully resolved and can be completely validated.

What changes for users

  • Env expressions are no longer persisted as plaintext. Evaluated definition files (.swamp/definitions-evaluated/) will now contain the raw ${{ env.HOST }} template instead of the resolved value. This means sensitive env vars are never written to disk.
  • Runtime validation catches invalid globalArguments. If a model's globalArguments schema requires port to be a number but the resolved value is a string, execution will now fail with a clear validation error instead of silently proceeding.
  • No breaking changes to workflow behavior. Env expressions still resolve correctly — just at runtime instead of persist time. The resolved values are identical.

Changes

  • Expression detection: Added containsEnvExpression() and containsRuntimeExpression() as a unified check for vault OR env references
  • Persist-time skip: Replaced containsVaultExpression with containsRuntimeExpression at all 7 persist-time skip checks across expression_evaluation_service.ts, execution_service.ts, and workflow_evaluate.ts
  • Runtime resolution: Added resolveRuntimeExpressionsInDefinition() and resolveRuntimeExpressionsInData() that handle both vault AND env resolution in a single pass. Old resolveVaultExpressionsIn* methods kept as deprecated aliases for backward compatibility.
  • Schema validation: Added globalArguments schema validation in executeWorkflow() after definition upgrade and runtime expression resolution
  • Tests: Added unit tests for new detection functions and globalArguments validation. Updated 2 integration tests to verify env expressions are left raw at persist time and resolved at runtime.

Test plan

  • deno check — type checking passes
  • deno lint — no lint errors
  • deno fmt — formatting clean
  • deno run test — all 1697 tests pass (0 failures)
  • Verify env expressions in definitions are not persisted as plaintext
  • Verify models with invalid globalArguments after resolution are rejected

Co-authored-by: Magistr umag@users.noreply.github.com

🤖 Generated with Claude Code

Env expressions (${{ env.* }}) were previously resolved at persist time,
which caused two problems: sensitive environment variables (API keys,
tokens) were written as plaintext to evaluated definition files on disk,
and schema validation was completely bypassed when ANY field contained
a CEL expression.

This change defers env expressions to runtime (matching vault behavior)
and adds runtime globalArguments schema validation after all expressions
are resolved.

Changes:
- Add containsEnvExpression() and containsRuntimeExpression() detection
- Replace containsVaultExpression with containsRuntimeExpression at all
  persist-time skip checks (7 call sites across 3 files)
- Add resolveRuntimeExpressionsInDefinition/Data methods that handle
  both vault AND env resolution in a single pass
- Keep old resolveVaultExpressionsIn* methods as deprecated aliases
- Add globalArguments schema validation in executeWorkflow() after
  definition upgrade and runtime resolution
- Update integration tests to verify env expressions are left raw at
  persist time and resolved at runtime

Co-authored-by: Magistr <umag@users.noreply.github.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #351 - Defer env expressions to runtime, add globalArguments validation

Overall Assessment: APPROVE

This PR addresses two security and correctness issues (#323, #329) with a clean, well-designed solution. The changes follow domain-driven design principles and maintain good separation of concerns.


Blocking Issues

None identified.


Summary of Changes Reviewed

Security Fix (#323): Environment variable expressions (${{ env.* }}) are now deferred to runtime instead of being evaluated at persist time. This prevents sensitive environment variables from being written as plaintext to .swamp/definitions-evaluated/.

Correctness Fix (#329): Added globalArguments schema validation in executeWorkflow() after runtime expression resolution, ensuring type constraints are enforced on fully-resolved values.


Positive Observations

  1. Good abstraction: The containsRuntimeExpression() function cleanly unifies vault and env detection, replacing 7 separate checks with a single semantic concept.

  2. Backward compatibility: Deprecated aliases (resolveVaultExpressionsInDefinition, resolveVaultExpressionsInData) allow existing code to continue working while signaling the migration path.

  3. Comprehensive test coverage:

    • Unit tests for containsEnvExpression() and containsRuntimeExpression()
    • Unit tests for globalArguments validation (valid, invalid, and missing schema cases)
    • Integration tests updated to verify two-phase resolution (persist-time vs runtime)
  4. DDD compliance: Changes stay within the appropriate domain boundaries:

    • ExpressionEvaluationService handles expression detection and resolution (domain service)
    • DefaultMethodExecutionService handles execution-time validation (domain service)
    • No leaking of persistence concerns into domain logic
  5. Clear code comments: The PR updates comments consistently to reflect the new "runtime expressions" terminology.


Suggestions (Non-blocking)

  1. Consider a type alias for clarity (optional):

    type RuntimeExpressionChecker = (celExpression: string) => boolean;

    Could make the pattern more discoverable, but current implementation is clear enough.

  2. Edge case coverage (minor): The ENV_PATTERN uses word boundary (\benv\.) which correctly rejects myenv.FOO but would match _env.FOO since _ is a word character. This seems like the right behavior for variable names, just noting it.

  3. Error message enhancement (minor): The globalArguments validation error message could include which field(s) failed validation for easier debugging. Current implementation uses formatZodError which does provide path information, so this is already reasonably good.


Files Reviewed

  • src/domain/expressions/expression_evaluation_service.ts - Core changes
  • src/domain/expressions/expression_evaluation_service_test.ts - Unit tests
  • src/domain/models/method_execution_service.ts - Validation addition
  • src/domain/models/method_execution_service_test.ts - Validation tests
  • src/domain/workflows/execution_service.ts - Updated to use containsRuntimeExpression
  • src/cli/commands/model_method_run.ts - Updated method call
  • src/cli/commands/workflow_evaluate.ts - Updated to use containsRuntimeExpression
  • integration/cel_data_access_test.ts - Updated integration test
  • integration/definition_lifecycle_test.ts - Updated integration test

All changes follow the project's code style guidelines (TypeScript strict mode, named exports, no any types) and include the required AGPLv3 license headers.

@stack72 stack72 merged commit 74ca98d into main Feb 14, 2026
4 checks passed
@stack72 stack72 deleted the fix/defer-env-expressions-to-runtime branch February 14, 2026 22:32
adamstac pushed a commit to adamstac/swamp that referenced this pull request May 14, 2026
…42) (systeminit#1383)

## Summary

Removes the legacy `registries.<kind>.failures[]` dual failure surface
and makes `aggregateState.sourceDetails[]` the single authoritative
source of extension failure state in `swamp doctor extensions`. Closes
the W1-W6 rearchitecture story — W7 removes the last vestige of the old
failure-reporting model.

### What changed

- **Shared failure recorder**
(`src/domain/extensions/source_failure_recorder.ts`) — extracted from
reconcile's catch block. Both reconcile and buildIndex call the same
helper, eliminating the consistency gap where buildIndex bypassed the
aggregate.
- **buildIndex failure routing** — warm-start catch routes failures
through `recordSourceFailure` → `repository.saveAll()`. Cold-start
failures also routed, with `originalError` preserving `ValidationError`
type for correct state-tag discrimination.
- **`markCatalogValidationFailed` removed** — validation failures now
throw `ValidationError`, caught by the shared helper. No more direct
catalog upserts that bypass the aggregate.
- **`registries.failures[]` removed** — per-registry status derived from
`sourceDetails[]` failure-state tags (`ValidationFailed`,
`BundleBuildFailed`, `EntryPointUnreadable`).
- **`extension_load_warnings.ts` split** — kept
`emitTypeExtractionFailure` for stderr regex-mismatch warnings; removed
`recordLoadFailures`, `getExtensionLoadWarnings`, and the warnings array
accumulation.
- **`last_error` catalog column** — schema migration adds `last_error
TEXT` to `bundle_types`. Error messages persist across process
invocations. Clears on recovery to Indexed.
- **SQL state-preservation fix** — `upsert` ON CONFLICT UPDATE now
conditionally excludes `state` and `last_error` when callers don't
provide them, preventing `populateCatalogFromDir` from overwriting
reconcile's failure states with `Indexed`.

### JSON schema changes (doctor extensions --json)

1. **Removed**: `registries.<kind>.failures[]` — the field no longer
exists on `DoctorRegistryResult`
2. **Added**: `sourceDetails[].lastError?: string` — populated for
failure states (`ValidationFailed`, `BundleBuildFailed`,
`EntryPointUnreadable`), absent for healthy states

### Migration note

Pre-migration catalog rows in failure states get empty `lastError` until
the next reconcile pass. Self-heals on first run after upgrade — not a
bug, just a transient state.

### Known behavior changes (follow-ups filed)

- **Type-extraction warnings** (systeminit#351): models with non-literal `type`
fields (e.g. `const TYPE = "..."`) produce stderr warnings but no longer
flip doctor exit code. They don't register and don't appear in
`sourceDetails[]`. Previously surfaced via the removed
`registries.failures[]` pipeline.
- **Streaming protocol** (systeminit#352): `kind-completed` events carry
preliminary `status: "pass"` during the scan phase; final status is
computed from aggregate state in the `completed` event. Internal-only
(renderer is the sole consumer).

## Test plan

- [x] 5908 unit/integration tests pass (0 failures)
- [x] `deno check` / `deno lint` / `deno fmt` clean
- [x] 24-point binary verification matrix against compiled `./swamp`:
  - Healthy model → Indexed, exit 0
  - Schema-invalid model → failure state, exit 1, lastError populated
  - Missing version → ValidationFailed, exit 1
  - Recovery: broken → fixed → Indexed, lastError clears
  - Rebundle loop guard: stable broken source converges (3 runs)
  - Mixed healthy + broken: healthy registers, broken surfaces
  - JSON contract: no `failures` field, `lastError` present
  - Vault failure surfaces in sourceDetails with correct registry status
  - lastError persists across separate process invocations
  - lastError absent from sourceDetails after recovery to Indexed
- [x] SQL regression tests: upsert without state preserves both state
and last_error; upsert with explicit state overwrites both
- [x] Integration test: lastError lifecycle (populates on failure,
clears on recovery)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Environment variables exposed to CEL expressions via buildEnvContext()

1 participant