Threat-model hardening pass: envelope, meta HMAC, bridge, process, adapter by jgowdy-godaddy · Pull Request #68 · godaddy/libenclaveapp

jgowdy-godaddy · 2026-04-17T16:52:24Z

Summary

Large threat-model hardening pass spanning most of the crate tree. Each commit is scoped to a single mitigation area and is reviewable independently. The consumer projects (awsenc, sshenc, sso-jwt, npmenc) have companion PRs that depend on this PR landing first.

Security mitigations added

Cache envelope (APL1) — plaintext fed to EncryptionStorage::encrypt is wrapped in [4B "APL1"][32B SHA-256(header)][8B BE u64 counter][payload], binding the unencrypted cache header to the ciphertext and embedding a monotonic rollback counter guarded by a sibling .counter sidecar under an fs4 flock. Trait signature did not change — all backends get the protection uniformly.
Meta HMAC sidecar on the keyring/software backend. Per-app random HMAC key in the system keyring; load_meta_with_hmac rejects mismatched .meta with a hard meta_hmac_verify error. Hardware backends skip — they enforce policy on the chip regardless of .meta claims.
Bridge protocol cleanup — legacy biometric: bool wire field removed; access_policy is the only encoding. Locked in destroy↔delete alias with a compat-guarantee test.
Authenticode-presence check on the WSL bridge binary before spawn. Opt-out via ENCLAVEAPP_BRIDGE_ALLOW_UNSIGNED=1 for dev / CI.
Process-wide bridge-session lock — serializes concurrent client-side bridge calls so two threads don't race two Windows Hello prompts or contend TPM key slots.
Windows SetProcessMitigationPolicy safe subset in harden_process(): StrictHandleCheck, ExtensionPointDisable, ImageLoad (NoRemoteImages + NoLowMandatoryLabelImages).
Zeroize extension to keyring/software P-256 private-key bytes and ECIES derived AES keys.
Typed SecretRead return on adapter read path; "<redacted>" is no longer sentinel-vulnerable.
Opt-in LaunchRequest::with_env_scrub(patterns) + per-child setrlimit(RLIMIT_CORE, 0) pre-exec hook.
Apple keychain UX fix — SecKeychainOpen on the real user's login keychain (via getpwuid), bypassing CFPreferences / $HOME lookups that broke in isolated contexts (integration tests, launchd, cron).
Swift-bridge contract tightening on SE_ERR_BUFFER_TOO_SMALL; Rust-side retry caps at 4, refuses non-growing length reports, validates pub_key_len ≤ 65.
Absolute /usr/bin/xcrun + xcrun --find for Swift bridge build tooling — $PATH poisoning on the dev machine can no longer substitute a poisoned swiftc / ar.
const assert on NCRYPT_UI_POLICY struct size — future windows-rs layout drift fails the build instead of shipping a wrong-sized cbInput.
Rename Keychain service prefix com.enclaveapp.* → com.libenclaveapp.* to match the newly-registered libenclaveapp.com domain.

Docs

DESIGN.md, THREAT_MODEL.md, and fix-macos.md updated to reflect all of the above. The stale "AAD binding is deferred" section in DESIGN.md is gone; the new envelope is documented in both DESIGN.md and THREAT_MODEL.md.

Test plan

cargo check clean on macOS (default-host), Linux (cross), Windows (cross).
cargo fmt --check clean.
cargo clippy --workspace --all-targets --features <all> -- -D warnings clean.
cargo test --workspace --features <all> — 687 tests passing, 0 failing.
Specific new/changed tests:
- cache::envelope::* (wrap/unwrap roundtrip, header-tamper rejection, rollback rejection, counter sidecar)
- core::metadata::meta_hmac_* (HMAC roundtrip, tamper rejection, wrong-key rejection, legacy passthrough)
- bridge::client::concurrent_call_bridge_serializes_via_session_lock
- bridge::client::pe_has_authenticode_table_* + require_signed_*
- bridge::client::bridge_init_encodes_access_policy_only
- tpm-bridge::destroy_and_delete_are_aliases
- apple::keychain_wrap::* (29 Keychain roundtrip tests against the real login keychain)
- app-adapter::launcher::scrub_removes_inherited_env_from_child_and_own_process

Out of scope

Items that need code changes on the consumer side (awsenc, sshenc, sso-jwt, npmenc) to actually take effect are in those projects' PRs.

…e, cache Sweep of DESIGN.md against the code on main: - Workspace layout diagram was missing enclaveapp-app-adapter, enclaveapp-cache, enclaveapp-tpm-bridge, enclaveapp-build-support. Added. - 'Three integration types' while the following paragraphs define four. Fixed; Type 4 (CredentialSource) was already described. - Level 5 'Linux musl plaintext' shown as a production backend on two security-level tables. Actually the only plaintext backend is enclaveapp-test-software (explicitly marked 'NOT for production'), and CLAUDE.md states musl is not supported. Tables now show Level 4 as the terminal glibc-keyring row and reference the test-only crate as an unnumbered out-of-band entry. - 'macOS signed vs. unsigned' section described an auto-detecting two-path runtime that doesn't exist. There is one code path; SE is always in it; Path 1 (entitled) is deferred per fix-macos.md. Rewrote as 'macOS path in practice (signed and unsigned)' framed around Keychain prompt UX. New sections documenting features that were in code but not in DESIGN: - Process hardening (harden_process, PR_SET_DUMPABLE, PR_SET_NO_NEW_PRIVS, RLIMIT_CORE=0, mlock_buffer). - Shared infrastructure crates (app-adapter, cache, tpm-bridge, build-support). - WSL bridge discovery (fixed paths only; PATH fallback was removed; 64 KB cap; ENCLAVEAPP_BRIDGE_TIMEOUT_SECS; BridgeSession::Drop). - Credential cache file tamper (consumer-layer max(header, config) mitigation; AAD binding deferred). Consumer mapping table expanded with shipped binary names, including gitenc and npxenc which were previously invisible.

# Conflicts: # DESIGN.md

…unter Wraps plaintext fed to EncryptionStorage::encrypt in [4B "APL1"][32B SHA-256(header)][8B BE u64 counter][payload] so the unencrypted cache header is bound to the ciphertext and older-ciphertext replay is rejected. - envelope::wrap_plaintext / unwrap_plaintext do the framing; the header SHA covers whatever bytes the caller decides are authoritative. - counter_path / read_counter / write_counter manage a sibling <cache>.counter sidecar guarded by an fs4 exclusive flock. - next_counter(sidecar, prior_observed) takes the max so deleting the sidecar cannot rewind the sequence (prior_observed is re-seeded from the highest counter inside any successfully-decrypted ciphertext). - Legacy plaintext without the APL1 magic is accepted as Unwrapped::Legacy for migration. The first write after upgrade lands in the new format. - Trait signature of EnclaveEncryptor did NOT change; backends (SE, CNG, Linux TPM, keyring, WSL bridge) inherit the protection uniformly through the existing encrypt/decrypt path. Consumed by awsenc and sso-jwt (separate repos).

metadata::save_meta_with_hmac and load_meta_with_hmac write and verify a sidecar `<label>.meta.hmac` containing hex HMAC-SHA256 of the .meta JSON. The HMAC is computed inline (RFC 2104 over sha2) so enclaveapp-core picks up a single small sha2 dep and no new HMAC crate. enclaveapp-keyring::meta_hmac_key loads or generates a per-app random 32-byte key in the system keyring at account "__meta_hmac_key__". The key is wrapped in Zeroizing<Vec<u8>> and the local intermediate array is zeroized after copy. enclaveapp-app-storage::ensure_key calls load_meta_with_hmac on Linux when the keyring hands back a key; a meta_hmac_verify mismatch is a hard error and refuses key init. Non-HMAC load errors (missing file, deserialize) fall through to the legacy load_meta path for back-compat. Hardware backends (Apple SE, Windows CNG, Linux TPM) keep the plain save_meta path — .meta tamper on those backends is UI-deception only because the chip enforces the access policy regardless of what .meta claims. delete_key_files now also removes the .meta.hmac extension so key deletion is complete.

- load_private_key_bytes and decrypt_private_key return Zeroizing<Vec<u8>> so the caller-held copy is wiped on drop. - save_encrypted wraps the random KEK in Zeroizing after filling the intermediate [u8; KEK_SIZE] (which is also zeroized via the Zeroize trait on the local array). - generate_and_save holds secret_key.to_bytes() as Zeroizing<Vec<u8>>. - derive_key in both the keyring-backed and test-software ECIES paths now returns Zeroizing<[u8; 32]> so the AES-GCM symmetric key is wiped after each encrypt / decrypt op. This closes the previous gap where plaintext P-256 bytes loaded from the keyring or software store could linger on the Rust heap after the relevant SecretKey was dropped.

- BridgeParams no longer carries `biometric: bool` on the wire. access_policy is the only accepted encoding. Stray `biometric` keys in received payloads are ignored by the deserializer and cannot influence the effective policy. Closes the silent-downgrade path where a server that honored only `biometric` could serve a client's BiometricOnly request as None. - effective_access_policy() is kept as a method returning access_policy for source-compatibility with call sites that used to reconcile the two fields. - A new destroy_and_delete_are_aliases test in enclaveapp-tpm-bridge asserts that both wire names route to identical semantics (neither produces the unknown-method error) — the "mitigation: bridge servers should accept both" note in the threat model is upgraded to a compat guarantee.

Two related bridge-client hardening changes that both live in client.rs: 1. require_bridge_is_authenticode_signed is now called before every BridgeSession::spawn. It parses the PE header's IMAGE_DIRECTORY_ENTRY_SECURITY slot (via pe_has_authenticode_table) and refuses binaries that carry no signature block at all. Opt-out: ENCLAVEAPP_BRIDGE_ALLOW_UNSIGNED=1 for dev / CI. Full WinVerifyTrust chain verification is still out of scope from the WSL side — an admin-on-Windows attacker who plants a validly-signed-but-malicious binary is the acknowledged residual. Non-.exe paths (test shell scripts) bypass the check so the existing test harness keeps working. 2. A process-wide BRIDGE_SESSION_LOCK (Mutex<()>) is held across the full spawn → request → shutdown lifetime of every bridge call. Two threads in the same client process no longer race to spawn independent bridge children against the same TPM, which would otherwise fire Windows Hello twice back-to-back and contend for the server-side key slot. Mutex poisoning is recovered with into_inner() so one crashed session cannot wedge the client for the process lifetime. 3. The renamed bridge_init_encodes_access_policy_only test enforces that the biometric field never leaks onto the wire, aligning with the earlier protocol.rs commit that removed the legacy field. New tests: - pe_has_authenticode_table_detects_signed_pe32 / _pe32plus / _unsigned / _rejects_non_pe - require_signed_skips_non_exe_paths / _rejects_unsigned_exe / _honors_allow_unsigned_env - concurrent_call_bridge_serializes_via_session_lock

Security.framework's default-keychain lookup goes through CFPreferences, which is keyed off the process's $HOME. Callers that override $HOME (integration tests via assert_cmd, awsenc serve under a launchd sandbox, cron jobs) got errSecNoDefaultKeychain back, which surfaced as the system-modal "A keychain cannot be found to store 'cache-key'" alert — blocking tests and leaving users confused. bridge.swift now resolves the login keychain explicitly via getpwuid(getuid())->pw_dir + "/Library/Keychains/login.keychain-db" (falling back to the older .keychain extension for migrated installs) and passes the resulting SecKeychain handle via kSecUseKeychain / kSecMatchSearchList in every SecItem query. The lookup bypasses CFPreferences / $HOME entirely. Unsigned-build UX is preserved: the first-run "Always Allow" ACL prompt is a SecTrust decision driven by the SecItemAdd itself, not by default-keychain lookup, so it still fires normally. keychain_delete on the Rust side now treats SE_ERR_KEYCHAIN_NOT_FOUND (12) as idempotent success so uninstall / cleanup flows stay quiet when HOME is isolated. Rename service prefix com.enclaveapp.* → com.libenclaveapp.* to match the newly-registered libenclaveapp.com domain. Pre-release rename, no legacy-entry migration path.

generate_key_with_retry now caps retries at MAX_RESIZE_RETRIES = 4 and refuses to resize when the Swift-reported length does not grow past what we sent. If the FFI ever starts returning SE_ERR_BUFFER_TOO_SMALL for something other than a genuine buffer-sizing shortfall, the Rust side surfaces it as 'Swift bridge contract violation' instead of spinning in a retry loop or masking the real failure. Also validates post-call pub_key_len ≤ 65 (uncompressed P-256 SEC1 is fixed-size); an out-of-range report is a contract violation. Paired with a bridge.swift doc comment asserting that SE_ERR_BUFFER_TOO_SMALL is ONLY used for buffer-sizing failures.

build.rs invokes the system xcrun at its absolute path /usr/bin/xcrun (system-managed, not user-writable without sudo) and discovers swiftc and ar via `xcrun --find <tool>`. The resolved paths sit inside the active Xcode developer directory (xcode-select -p) rather than walking $PATH. A shadowed xcrun / swiftc / ar earlier on the developer's $PATH can no longer substitute a poisoned Swift object into the static bridge that ends up linked into the binary. Release-tooling PATH hygiene is no longer load-bearing for this crate.

Adds a module-level `const _: () = assert!(size_of::<NCRYPT_UI_POLICY>() == EXPECTED_NCRYPT_UI_POLICY_SIZE, ...)` so a future windows-rs release that silently changes the struct (e.g. reorders LPCWSTR fields or pads differently) fails the build rather than shipping a wrong-sized cbInput to NCryptSetProperty / NCryptGetProperty. Expected size is 32 bytes on x64 (4 + 4 + 3×8) and 20 bytes on x86 (4 + 4 + 3×4).

harden_process() on Windows now applies three low-risk mitigations at startup: - ProcessStrictHandleCheckPolicy with RaiseExceptionOnInvalidHandleReference + HandleExceptionsPermanentlyEnabled — turns latent handle-confusion bugs into STATUS_INVALID_HANDLE exceptions instead of silently operating on the wrong object. - ProcessExtensionPointDisablePolicy with DisableExtensionPoints — blocks AppInit_DLLs, AppCertDlls, shim engines, IMEs, and winevent hooks from loading into the process. - ProcessImageLoadPolicy with NoRemoteImages + NoLowMandatoryLabelImages — refuses DLL loads from UNC paths and from files at the low-mandatory integrity label. Deliberately not applied: BinarySignaturePolicy.MicrosoftSignedOnly (breaks unsigned cargo builds), DynamicCodePolicy / ACG (breaks some JIT / crypto providers), SystemCallDisablePolicy.DisallowWin32kSystemCalls (breaks any GUI-surface process). Each call is best-effort — failure on older Windows builds is traced via tracing::warn! and does not abort startup. Workspace Cargo.toml adds Win32_System_Threading + Win32_Security to the windows crate's feature list.

Three related adapter changes: 1. SecretStore::get_read returns a typed SecretRead { Present(String), Redacted, Absent } enum. The read-only inspection store surfaces Redacted directly — it no longer round-trips through the "<redacted>" string sentinel, so a stored secret whose bytes happen to equal "<redacted>" is returned as Present("<redacted>") and cannot be misclassified. Legacy SecretStore::get is retained for back-compat and still produces Some(REDACTED_PLACEHOLDER) from the read-only store. MemorySecretStore gains a test-only mark_redacted() helper so tests can inject Redacted without going through the sentinel string. 2. LaunchRequest::with_env_scrub(patterns) — opt-in list of exact variable names ("NPM_TOKEN") or *-suffixed prefix patterns ("NPM_TOKEN_*", "AWS_*"). Matching variables are removed from both the child's Command and our own std::env (so later subprocess spawns without env_clear don't re-inherit), and our owned String copies are zeroized before drop. Case-insensitive on Windows because Windows env names are case-insensitive. Opt-in — existing callers with env_scrub_patterns: Vec::new() behave identically. 3. disable_core_dumps_in_child installs a pre_exec hook on Unix that calls setrlimit(RLIMIT_CORE, 0) before execve. The spawned child (e.g. npm under npmenc) inherits a zero core limit regardless of system-level core_pattern, so a crash of the Type 2 target can no longer dump its interpolated NPM_TOKEN_* / AWS_* environment.

DESIGN.md: - Rewrite 'Credential cache file tamper' section around the new APL1 envelope (SHA-256(header) + rollback counter) — the old section claimed AAD binding was deferred. - New 'Metadata .meta tamper' section documenting the .meta.hmac sidecar on the keyring backend. - 'Process hardening' extended with the Windows mitigation subset. - 'app-adapter' line extended with SecretRead, with_env_scrub, and per-child RLIMIT_CORE. - Mention Authenticode-presence check and client-side session mutex in the bridge section. - Keychain wrap service name updated to com.libenclaveapp.<app>. THREAT_MODEL.md: - Bridge 'method-name confusion' downgraded from threat to compat guarantee (the destroy_and_delete_are_aliases test locks it in). - Bridge 'serialization' rewritten around the new process-wide BRIDGE_SESSION_LOCK. - App-adapter 'Launcher env inheritance' rewritten around the new opt-in with_env_scrub helper. - Keychain-wrap service name updated throughout. fix-macos.md: - Keychain-wrap service name updated throughout. Cargo.lock reflects the new sha2 dep on enclaveapp-core and the windows feature additions.

Resolves conflicts where origin/main's #66/#67 docs PRs touched the same DESIGN.md/THREAT_MODEL.md regions I rewrote for the hardening pass. Our side supersedes — the new sections already include all of main's refinements plus the envelope, HMAC sidecar, Windows mitigations, SecretRead, env-scrub, bridge mutex, and Authenticode notes. fix-macos.md: accept main's deletion (#67 folded the findings into THREAT_MODEL.md).

Clippy on Linux flagged `&store.path_for(&id)` as needless — path_for already returns an owned PathBuf. Fix applies to the test-only get_read_on_read_only_store_returns_redacted_for_existing_entry case.

windows-rs 0.58 splits the Windows process-mitigation API across two modules: the PROCESS_MITIGATION_*_POLICY structs live in Win32::System::SystemServices while the SetProcessMitigationPolicy function and the PROCESS_MITIGATION_POLICY enum discriminants (ProcessStrictHandleCheckPolicy, etc.) live in Win32::System::Threading. Also add the Win32_System_SystemServices feature to the workspace's windows-crate feature list so the structs are actually available.

windows-rs 0.58 exposes PROCESS_MITIGATION_* struct layouts in Win32::System::SystemServices but does not generate bitfield setter methods on the inner _0_0 anonymous struct. Instead of calling set_RaiseExceptionOnInvalidHandleReference() etc., write the union's Flags: u32 word directly. Bit positions match the Win32 headers. Also drops redundant std::mem:: qualification on size_of to satisfy -D unused-qualifications.

std::ptr::from_ref is only stable since Rust 1.76 and the workspace pins MSRV at 1.75. std::ptr::addr_of! has been stable since 1.51 and produces the same *const T result without the MSRV floor bump.

The helper is only consumed by #[cfg(unix)] PE-parsing tests, so on Windows rustc reports it as dead code and the -D warnings build fails. Mirror the gate on the helper.

After #68 the Swift bridge opens the login keychain by explicit absolute path and routes all `SecItem*` calls through it. On GitHub Actions `macos-latest` runners that keychain is locked between jobs (it was unlocked once with an empty password at image-provision time). With the explicit handle in hand the next `SecItemAdd` blocks indefinitely waiting for a GUI unlock prompt that never arrives, hanging the job. Fix: after `SecKeychainOpen`, attempt a silent unlock with an empty password. Interactive dev sessions already have the keychain unlocked so this is a no-op; CI runners get their keychain unlocked without any prompt. If the keychain has a real password the unlock silently fails and the subsequent operation surfaces the real error — we deliberately do not mask it with a fallback.

After #68 the Swift bridge opens the login keychain by explicit absolute path and routes all `SecItem*` calls through it. On GitHub Actions `macos-latest` runners that keychain is locked between jobs (it was unlocked once with an empty password at image-provision time). With the explicit handle in hand the next `SecItemAdd` blocks indefinitely waiting for a GUI unlock prompt that never arrives, hanging the job. Fix: after `SecKeychainOpen`, attempt a silent unlock with an empty password. Interactive dev sessions already have the keychain unlocked so this is a no-op; CI runners get their keychain unlocked without any prompt. If the keychain has a real password the unlock silently fails and the subsequent operation surfaces the real error — we deliberately do not mask it with a fallback. Co-authored-by: Jay Gowdy <jay@gowdy.me>

PR #68 made the Swift bridge unconditionally open the login keychain by absolute path and pin every SecItem* op to it via kSecUseKeychain / kSecMatchSearchList. That fixed the "Keychain Not Found" modal dialog on local dev runs that override $HOME in tests, but it made GitHub Actions `macos-latest` jobs hang: SecItemAdd against an explicitly-pinned legacy keychain blocks on a same-binary ACL confirmation prompt the headless runner cannot answer. Fix: check SecKeychainCopyDefault first. When a default keychain is reachable (normal interactive sessions and CI runners), fall through to Security.framework's implicit routing, which reaches the Data Protection keychain on unsigned builds and does not trigger the legacy ACL prompt — restoring the pre-#68 CI behaviour. Only when no default is reachable ($HOME-overridden tests, launchd sandboxes) do we open the login keychain by explicit path and constrain ops to it — preserving #68's dialog-avoidance fix for those contexts.

PR #68 made the Swift bridge unconditionally open the login keychain by absolute path and pin every SecItem* op to it via kSecUseKeychain / kSecMatchSearchList. That fixed the "Keychain Not Found" modal dialog on local dev runs that override $HOME in tests, but it made GitHub Actions `macos-latest` jobs hang: SecItemAdd against an explicitly-pinned legacy keychain blocks on a same-binary ACL confirmation prompt the headless runner cannot answer. Fix: check SecKeychainCopyDefault first. When a default keychain is reachable (normal interactive sessions and CI runners), fall through to Security.framework's implicit routing, which reaches the Data Protection keychain on unsigned builds and does not trigger the legacy ACL prompt — restoring the pre-#68 CI behaviour. Only when no default is reachable ($HOME-overridden tests, launchd sandboxes) do we open the login keychain by explicit path and constrain ops to it — preserving #68's dialog-avoidance fix for those contexts. Co-authored-by: Jay Gowdy <jay@gowdy.me>

jgowdy added 14 commits April 17, 2026 03:53

Merge remote-tracking branch 'origin/main' into docs/design-cleanup

5bfc57a

# Conflicts: # DESIGN.md

jgowdy added 6 commits April 17, 2026 10:02

fix(test): remove unnecessary borrow in get_read_on_read_only_store test

fe5a81e

Clippy on Linux flagged `&store.path_for(&id)` as needless — path_for already returns an owned PathBuf. Fix applies to the test-only get_read_on_read_only_store_returns_redacted_for_existing_entry case.

fix(windows): use addr_of! instead of ptr::from_ref for MSRV 1.75

faa07b9

std::ptr::from_ref is only stable since Rust 1.76 and the workspace pins MSRV at 1.75. std::ptr::addr_of! has been stable since 1.51 and produces the same *const T result without the MSRV floor bump.

fix(bridge): gate make_pe_bytes test helper to unix

6f8c7d8

The helper is only consumed by #[cfg(unix)] PE-parsing tests, so on Windows rustc reports it as dead code and the -D warnings build fails. Mirror the gate on the helper.

jgowdy-godaddy merged commit 9281e76 into main Apr 17, 2026
3 checks passed

jgowdy-godaddy mentioned this pull request Apr 17, 2026

apple: silent empty-password unlock of login keychain on open #69

Merged

4 tasks

jgowdy-godaddy mentioned this pull request Apr 17, 2026

apple: prefer implicit default keychain; explicit path as fallback #70

Merged

4 tasks

jgowdy-godaddy mentioned this pull request Apr 17, 2026

app-storage: ENCLAVEAPP_MOCK_STORAGE env var forces mock backend #71

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threat-model hardening pass: envelope, meta HMAC, bridge, process, adapter#68

Threat-model hardening pass: envelope, meta HMAC, bridge, process, adapter#68
jgowdy-godaddy merged 20 commits intomainfrom
docs/design-cleanup

jgowdy-godaddy commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jgowdy-godaddy commented Apr 17, 2026

Summary

Security mitigations added

Docs

Test plan

Out of scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants