Skip to content

sec: harden kest-core v0.3.0 — cache isolation, baggage purity, JWT guard & architecture research#13

Merged
eterna2 merged 2 commits intomainfrom
feat/python-backend-and-security
Apr 11, 2026
Merged

sec: harden kest-core v0.3.0 — cache isolation, baggage purity, JWT guard & architecture research#13
eterna2 merged 2 commits intomainfrom
feat/python-backend-and-security

Conversation

@eterna2
Copy link
Copy Markdown
Owner

@eterna2 eterna2 commented Apr 11, 2026

Overview

This PR delivers two related bodies of work on top of the initial v0.3.0 release:

  1. Production security hardening — four targeted fixes for issues discovered during architecture review and live integration testing.
  2. Architecture deep-dive — researched improvement proposals for the three non-obvious design decisions documented in §5 of the learnings, each tracked as a follow-on GitHub issue.

🔐 Security Fixes

B-01 · Policy Decision Cache — cross-request identity collision

The _PolicyDecisionCache key was (entry_id, frozenset(policies)). A policy decision for one identity (user/agent/task) could be served to a different identity within the same TTL window.

  • Fix: Key expanded to (user, agent, task, frozenset(policies), frozenset(context))
  • New env var: KEST_POLICY_CACHE_TTL (float, default 5.0; set 0 to disable)
  • New public API: invalidate_policy_cache() for revocation use cases
  • Tests: policy_decision_cache_test.py — 5 tests

R-02 · Baggage purity — decouple _get_baggage() from global lab state

_get_baggage() had a secondary fallback to a module-level _LAB_BAGGAGE_STORE dict, creating a data race under async concurrency (no per-request isolation).

  • Fix: _get_baggage() reads exclusively from baggage.get_baggage() (OTel context)
  • Tests: decorators_baggage_test.py — 4 tests

R-03 · JWT verification guard in KestIdentityMiddleware

Without a jwks_uri, the middleware silently decoded JWTs with verify=False, making signature verification optional.

  • Fix: RuntimeError raised on first request unless KEST_INSECURE_NO_VERIFY=true is explicitly set
  • Tests: ext_test.py — 3 tests (raises without jwks, insecure flag accepted, spec key names)

D-01 · Baggage key naming aligned with SPEC-v0.3.0 §8.4

Old key New key
kest.principal_user kest.user
kest.workload_agent kest.agent
kest.scope (inconsistent) kest.task

⚠️ Rolling upgrade: Old containers emit old key names. Rebuild all services together.

D-02 · Three-tier baggage propagation formalised (kest.passport_z)

Compressed inline baggage was an undocumented implementation extension. Now normative in SPEC-v0.3.0 §8.6:

Tier Key Condition
1 — Inline kest.passport ≤ 4 KB uncompressed
2 — Compressed Inline kest.passport_z zlib-compressed ≤ 4 KB
3 — Claim Check kest.claim_check Exceeds both thresholds

A 10-hop chain (~5 KB raw) compresses to ~1.5 KB, pushing the claim-check threshold from hop ~8 to hop ~52.


🔬 Architecture Research (§5 LEARNINGS.md)

A-01 · asyncio.to_thread — unbounded pool risk

async_wrapper uses asyncio.to_thread which spawns threads without bound. Under spike load this can exhaust OS thread limits.

A-02 · Rust backend GIL cliff — root cause identified

The Rust sign_entry does py.allow_threads for canonicalization but immediately calls Python::with_gil for sign_payload (Python callback). Under 8-thread contention this causes ~94% throughput degradation.

A-03 · Passport cache — algorithmic inefficiencies

Three compounding issues: O(n) list-snapshot comparison on every property read, per-read taint accumulation, no __slots__.


📝 Documentation

File Change
README.md Added Production Notes section: backend selection, cache TTL, JWT guard, full baggage key table
CHANGELOG.md Added Security Hardening (2026-04-11) section + Known Limitations table
website/content/developer/04_middleware.md Three-tier baggage table, JWT guard note, updated Mermaid sequence diagram
website/content/developer/08_testing.md Test count 124→136, security hardening test table, T-09 pattern note
website/content/reference/02_models.md BaggageManager three-tier table, integer ORIGIN_TRUST_MAP, Passport property table
website/content/reference/04_context.md Full baggage key table with Set-by column, all three passport tiers, rename warning
spec/learnings/v0.3.0/LEARNINGS.md §5 expanded with improvement proposals + tracking issue links
spec/learnings/v0.3.0/ARCHIVES.md Created — houses resolved false alarms (D-03, T-07, L-04 original)

✅ Test Coverage

File Tests Covers
policy_decision_cache_test.py 5 B-01: cache key isolation by user/agent/task, TTL=0, invalidation
decorators_baggage_test.py 4 R-02: baggage reads from OTel context only
ext_test.py (additions) 3 D-01: spec key names; R-03: JWT verification guard

Total unit tests: 124 → 136 ✅ All passing.


🔗 Related Issues

eterna2 added 2 commits April 11, 2026 10:58
…uard, spec alignment

## Security Fixes

### B-01: Policy Decision Cache — cross-request identity collision
- Expand _PolicyDecisionCache key from (entry_id, policies) to
  (user, agent, task, policies, context_items) — decisions are now
  strictly isolated per identity tuple
- Add KEST_POLICY_CACHE_TTL env var (default 5.0s); set to 0 to disable
- Expose invalidate_policy_cache() publicly for revocation use cases
- Tests: policy_decision_cache_test.py (5 tests: key isolation, TTL=0,
  invalidation)

### R-02: Baggage purity — decouple _get_baggage() from global lab state
- _get_baggage() now reads exclusively from OTel baggage context
- _LAB_BAGGAGE_STORE used only by KestHttpxInterceptor for outbound injection
- Tests: decorators_baggage_test.py (4 tests)

### R-03: JWT verification guard in KestIdentityMiddleware
- Raise RuntimeError on first request if jwks_uri=None and
  KEST_INSECURE_NO_VERIFY is not set — eliminates silent unverified decode
- Tests: ext_test.py (3 tests: raises without jwks, accepts insecure flag,
  spec key names)

### D-01: Baggage keys aligned with SPEC-v0.3.0 §8.4
- Rename kest.principal_user → kest.user
- Rename kest.workload_agent → kest.agent
- kest.task replaces inconsistent kest.scope usage

### D-02: Three-tier baggage propagation formalised (kest.passport_z)
- Inline → Compressed Inline (kest.passport_z, zlib+base64url) →
  Claim Check; pushes claim-check threshold from hop ~8 to hop ~52

## Architecture Research (§5 LEARNINGS.md)

### A-01: asyncio.to_thread unbounded pool risk documented
- Improvement proposal: bounded ThreadPoolExecutor via run_in_executor
- Tracked: #10

### A-02: Rust GIL cliff — root cause precisely identified
- py.allow_threads releases GIL for canonicalization but
  Python::with_gil re-acquires for sign_payload callback → 94% cliff
- Fix path: port LocalEd25519Provider to ed25519-dalek in Rust (A-02-A)
- Tracked: #11

### A-03: Passport cache inefficiencies documented with fix options
- O(n) list-snapshot comparison → O(1) version counter (A-03-I)
- Per-read taint accumulation → incremental on add_signature (A-03-II)
- __slots__ = True for high-throughput services (A-03-III)
- Tracked: #12

## Spec Updates
- SPEC-v0.3.0: add F-CP-07/08 (kest.passport_z normative), update §8.6
  three-tier priority table
- spec/learnings/v0.3.0/LEARNINGS.md: full §5 expansion with improvement
  proposals, issue cross-references, ARCHIVES.md for resolved findings

## Documentation
- README.md: Production Notes section (KEST_BACKEND, cache TTL, JWT guard,
  baggage key reference table with rolling-upgrade warning)
- CHANGELOG.md: Security Hardening (2026-04-11) section under [0.3.0]
  with B-01/R-02/R-03/D-01/D-02 details; Known Limitations table → #10/11/12
- website/content/developer/04_middleware.md: three-tier baggage table,
  updated JWT guard note, updated sequence diagram
- website/content/developer/08_testing.md: test count 124→136, security
  hardening test table, T-09 testing pattern note
- website/content/reference/02_models.md: fix duplicate para, BaggageManager
  three-tier table, integer ORIGIN_TRUST_MAP, Passport property table
- website/content/reference/04_context.md: extended baggage key table with
  Set-by column, all three passport tiers, hardening rename warning

## Code Style
- ruff-format applied across decorators.py, _core_py.py, ext.py, models.py,
  identity_test.py, all bench scripts, and test files
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Site Preview Deployed

Deployment URL
This PR https://eterna2.github.io/kest/preview/feat-python-backend-and-security/
stable (current main) https://eterna2.github.io/kest/stable/
All versions https://eterna2.github.io/kest/
Branch feat/python-backend-and-security
Commit 07ae5123220d21f6a9b1b61f835f5891cab3d572

Preview updates automatically on every push to this branch. It will be removed when this PR is closed.

@eterna2 eterna2 merged commit 8d3ceee into main Apr 11, 2026
4 checks passed
@eterna2 eterna2 deleted the feat/python-backend-and-security branch April 11, 2026 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant