Skip to content

[Security] DurableState revision tampering via JSON-body manipulation #116

@pathosDev

Description

@pathosDev

Severity / Size

  • Severity: CRITICAL — bypasses optimistic-concurrency control on DurableState; opens a write/read corruption path for any deployment that doesn't enable client-side encryption.
  • Size: M (~1.5d).
  • Threat model: an attacker with write access to the object-storage backend (shared S3 bucket with overly permissive IAM, co-tenant on a filesystem-backed store, insider, compromised CI pipeline pushing to the bucket). TCP-level cluster access is not required.

Affected files

  • src/persistence/durable-state-stores/ObjectStorageDurableStateStore.ts:75-98load() parses { revision, state, timestamp } from the body and trusts the in-body revision for both the return value and the etagCache.
  • src/persistence/durable-state-stores/ObjectStorageDurableStateStore.ts:100-185upsert() uses the cached revision for the CAS check; tampered cache → bypass.
  • src/persistence/object-storage/BodyCodec.ts — the wire format. AES-GCM's auth-tag covers the ciphertext, but for unencrypted bodies there is no integrity field.

Background

The framework's ObjectStorage-backed DurableStateStore wraps any S3/MinIO/filesystem backend. Records are JSON-encoded into the wire format defined in BodyCodec.ts:

ATS1 | flags | [keyVersion] | [iv] | payload

The payload for DurableState is JSON.stringify({ revision: N, state: <user>, timestamp: T }). When encryption is enabled (mode: 'client-aes256-gcm'), AES-GCM's auth tag covers the whole ciphertext; flipping any byte in the body — including the revision digits — invalidates the tag and the decode throws.

For the mode: 'none' setup (the default), there is no integrity check. decodeBody() returns the bytes; load() JSON.parses and trusts the result. This is the gap the exploit walks through.

Exploit walkthrough

Setup: a DurableState-backed actor (BankAccount say) with encryption: 'none', stored on a shared S3 bucket where the attacker has PutObject permission.

Step 1 — legitimate write: the actor upserts state { balance: 100 } with expectedRevision=4. The store writes a body containing JSON { revision: 5, state: { balance: 100 }, timestamp: 1715... }. Backend returns etag "abc". The store caches { etag: "abc", revision: 5 }.

Step 2 — attacker tamper: attacker downloads the object, edits the JSON to { revision: 999, state: { balance: 100 }, timestamp: 1715... } and PUTs it back. Backend assigns a new etag "def" (we never see this).

Step 3 — actor cold-start (or any other process that calls load() without a stale cache): load() fetches, parses, and caches { etag: "def", revision: 999 }. Returns { revision: 999, state: { balance: 100 } } to the caller.

Step 4 — actor logic relies on the revision counter for ordering decisions (audit log, idempotency, billing). It now reasons about an artificially-inflated revision. Worst case (banking-style): attacker writes revision: 7 instead of 999. Next legitimate upsert(expectedRevision=5) compares cached.revision (7) ≠ expected (5) → throws CAS error. Operator sees a phantom "concurrent writer" and may re-issue with expectedRevision=7. That succeeds — and overwrites with whatever state the attacker chose (because the attacker's payload also kept the state field).

The attacker doesn't need to be in the cluster. They don't need encryption keys. They just need write access to the bucket.

How the 8 already-landed security fixes inform this

Two existing fixes are closely related and should set the pattern:

  • Snapshot seq corruption (99de741): in PersistentActor.recover() we now cross-check snapshot.sequenceNr against journal.highestSeq(pid). A snapshot from outer space (MAX_SAFE_INTEGER) is rejected because the journal can't corroborate it. → For DurableState there is no journal to cross-check against; revisions are self-contained. Need a different defense.
  • Memcached CRLF / Frame-DoS / Path-Traversal: each added a small input-validation helper at the boundary. → For revision-tampering the equivalent is an integrity-MAC at the codec boundary.

The eight existing fixes are all in the body of the commit history (d454079..4cac92a). Follow the same style: exploit-test (reproduces the issue), defense-test (verifies fix), regression-test (legitimate path unchanged).

Fix design

Two-track approach because the encrypted vs unencrypted paths have different integrity stories.

Track 1 — encrypted bodies: lift revision into the manifest header.

Move revision out of the JSON payload and into the ATS1 manifest, just like keyVersion. AES-GCM's additionalAuthenticatedData parameter binds the manifest to the ciphertext: tampering with revision in the header invalidates the auth tag.

ATS1 | flags | [keyVersion] | revision: u64-LE | [iv] | ciphertext-with-AAD

The aesGcmEncrypt(subkey, iv, plaintext, aad) call gets a new aad parameter; the manifest's revision-byte-range is the AAD. decode() parses revision from the manifest and passes the same byte-range as AAD.

This is the strongest defense — it costs ~8 bytes of header per record and adds one subtle.encrypt argument.

Track 2 — unencrypted bodies: opt-in HMAC-SHA256.

Add an IntegrityConfig:

export type IntegrityConfig =
  | { readonly mode: 'none' }
  | { readonly mode: 'hmac-sha256';
      readonly integrityKey: Uint8Array;  // 32 bytes
    };

When set, the codec computes HMAC over (revision-bytes || timestamp-bytes || state-bytes) with the integrity key and appends an 8-byte HMAC-tag (truncated SHA-256, since we just need collision-resistance not cryptographic-strength signature). A new flag bit (FLAG_INTEGRITY_HMAC) signals presence in the manifest.

The integrity key is separate from the encryption master key — it's only used for HMAC, and is allowed to live alongside the data (the threat is tampering, not confidentiality). Operators can rotate it with a similar sweep helper to the existing re-encryption sweep (reEncryptObjectStorage).

Default: integrity: { mode: 'none' } — back-compat. Bodies written before this fix have FLAG_INTEGRITY_HMAC unset and decode normally without the HMAC check (legacy-safe).

Track 3 — revise the etagCache.

Even with integrity in place, load()'s etagCache should not trust an unverified revision. Refactor so the cache is populated AFTER the integrity check passes.

API surface

// src/persistence/PersistenceOptions.ts
export interface DurableStateOptions {
  // ... existing
  readonly integrity?: IntegrityConfig;
}

// New helper for legitimate sweep + key rotation:
// src/persistence/object-storage/integrityKeyRotation.ts
export async function rotateIntegrityKey(
  backend: ObjectStorageBackend,
  opts: { keyPrefix: string; oldKey: Uint8Array; newKey: Uint8Array; onProgress?: ... }
): Promise<{ rewrote: number }>;

Backward compatibility

  • Legacy bodies (no FLAG_INTEGRITY_HMAC) decode unchanged — the integrity check is opt-in.
  • Operators enable integrity: { mode: 'hmac-sha256' } in their DurableStateStore config; existing records get re-validated on next write.
  • A migration helper (sweep) rewrites legacy records with the HMAC. Pattern matches the existing reEncryptObjectStorage.

Test plan

Three tests, mirroring the established Security-Fix style (exploit / defense / regression):

  1. Exploit test (tests/unit/persistence/durable-state-stores/integrity-tampering.test.ts): write a record with {revision: 5}, manually edit the on-disk JSON to {revision: 999}, load() returns the tampered value (proves the issue exists pre-fix). After the fix lands, this test asserts that load throws when integrity is configured.

  2. Defense test: same record, same tamper, but DurableStateStore({integrity: { mode: 'hmac-sha256', integrityKey }}); load() throws JournalError: integrity check failed. Subsequent legitimate writes still work.

  3. Regression test: legacy body (written without integrity, FLAG_INTEGRITY_HMAC unset) decodes cleanly under a config that has integrity enabled — back-compat preserved.

  4. Encrypted-AAD test: with encryption enabled, tamper with revision byte in the manifest → aesGcmDecrypt throws (AAD mismatch). No new code needed beyond the AAD wiring.

  5. Concurrency-test regression: existing ObjectStorageDurableStateStore concurrency tests still pass.

Acceptance criteria

  • IntegrityConfig exposed in PersistenceOptions.ts.
  • BodyCodec carries an optional integrity argument on encode + verifies on decode; new flag bit reserved.
  • aesGcmEncrypt / aesGcmDecrypt accept an aad parameter; manifest revision flows through as AAD when encryption is active.
  • ObjectStorageDurableStateStore.load() checks integrity before populating etagCache.
  • Exploit-test demonstrates the pre-fix vulnerability and the post-fix block (4 tests above).
  • Backward-compat: existing tests in tests/unit/persistence/durable-state-stores/ all green.
  • Plan-doc + README "Known security caveats" entry updated when this lands (move out of "remaining" into "fixed").

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority: highTop priority — high impact, plan nextsecuritySecurity-relevant — see severity label for impact tierseverity: criticalConfidentiality/integrity bypass with system-wide impact

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions