Threat-model hardening pass: envelope, meta HMAC, bridge, process, adapter#68
Merged
jgowdy-godaddy merged 20 commits intomainfrom Apr 17, 2026
Merged
Threat-model hardening pass: envelope, meta HMAC, bridge, process, adapter#68jgowdy-godaddy merged 20 commits intomainfrom
jgowdy-godaddy merged 20 commits intomainfrom
Conversation
…e, cache Sweep of DESIGN.md against the code on main: - Workspace layout diagram was missing enclaveapp-app-adapter, enclaveapp-cache, enclaveapp-tpm-bridge, enclaveapp-build-support. Added. - 'Three integration types' while the following paragraphs define four. Fixed; Type 4 (CredentialSource) was already described. - Level 5 'Linux musl plaintext' shown as a production backend on two security-level tables. Actually the only plaintext backend is enclaveapp-test-software (explicitly marked 'NOT for production'), and CLAUDE.md states musl is not supported. Tables now show Level 4 as the terminal glibc-keyring row and reference the test-only crate as an unnumbered out-of-band entry. - 'macOS signed vs. unsigned' section described an auto-detecting two-path runtime that doesn't exist. There is one code path; SE is always in it; Path 1 (entitled) is deferred per fix-macos.md. Rewrote as 'macOS path in practice (signed and unsigned)' framed around Keychain prompt UX. New sections documenting features that were in code but not in DESIGN: - Process hardening (harden_process, PR_SET_DUMPABLE, PR_SET_NO_NEW_PRIVS, RLIMIT_CORE=0, mlock_buffer). - Shared infrastructure crates (app-adapter, cache, tpm-bridge, build-support). - WSL bridge discovery (fixed paths only; PATH fallback was removed; 64 KB cap; ENCLAVEAPP_BRIDGE_TIMEOUT_SECS; BridgeSession::Drop). - Credential cache file tamper (consumer-layer max(header, config) mitigation; AAD binding deferred). Consumer mapping table expanded with shipped binary names, including gitenc and npxenc which were previously invisible.
# Conflicts: # DESIGN.md
…unter Wraps plaintext fed to EncryptionStorage::encrypt in [4B "APL1"][32B SHA-256(header)][8B BE u64 counter][payload] so the unencrypted cache header is bound to the ciphertext and older-ciphertext replay is rejected. - envelope::wrap_plaintext / unwrap_plaintext do the framing; the header SHA covers whatever bytes the caller decides are authoritative. - counter_path / read_counter / write_counter manage a sibling <cache>.counter sidecar guarded by an fs4 exclusive flock. - next_counter(sidecar, prior_observed) takes the max so deleting the sidecar cannot rewind the sequence (prior_observed is re-seeded from the highest counter inside any successfully-decrypted ciphertext). - Legacy plaintext without the APL1 magic is accepted as Unwrapped::Legacy for migration. The first write after upgrade lands in the new format. - Trait signature of EnclaveEncryptor did NOT change; backends (SE, CNG, Linux TPM, keyring, WSL bridge) inherit the protection uniformly through the existing encrypt/decrypt path. Consumed by awsenc and sso-jwt (separate repos).
metadata::save_meta_with_hmac and load_meta_with_hmac write and verify a sidecar `<label>.meta.hmac` containing hex HMAC-SHA256 of the .meta JSON. The HMAC is computed inline (RFC 2104 over sha2) so enclaveapp-core picks up a single small sha2 dep and no new HMAC crate. enclaveapp-keyring::meta_hmac_key loads or generates a per-app random 32-byte key in the system keyring at account "__meta_hmac_key__". The key is wrapped in Zeroizing<Vec<u8>> and the local intermediate array is zeroized after copy. enclaveapp-app-storage::ensure_key calls load_meta_with_hmac on Linux when the keyring hands back a key; a meta_hmac_verify mismatch is a hard error and refuses key init. Non-HMAC load errors (missing file, deserialize) fall through to the legacy load_meta path for back-compat. Hardware backends (Apple SE, Windows CNG, Linux TPM) keep the plain save_meta path — .meta tamper on those backends is UI-deception only because the chip enforces the access policy regardless of what .meta claims. delete_key_files now also removes the .meta.hmac extension so key deletion is complete.
- load_private_key_bytes and decrypt_private_key return Zeroizing<Vec<u8>> so the caller-held copy is wiped on drop. - save_encrypted wraps the random KEK in Zeroizing after filling the intermediate [u8; KEK_SIZE] (which is also zeroized via the Zeroize trait on the local array). - generate_and_save holds secret_key.to_bytes() as Zeroizing<Vec<u8>>. - derive_key in both the keyring-backed and test-software ECIES paths now returns Zeroizing<[u8; 32]> so the AES-GCM symmetric key is wiped after each encrypt / decrypt op. This closes the previous gap where plaintext P-256 bytes loaded from the keyring or software store could linger on the Rust heap after the relevant SecretKey was dropped.
- BridgeParams no longer carries `biometric: bool` on the wire. access_policy is the only accepted encoding. Stray `biometric` keys in received payloads are ignored by the deserializer and cannot influence the effective policy. Closes the silent-downgrade path where a server that honored only `biometric` could serve a client's BiometricOnly request as None. - effective_access_policy() is kept as a method returning access_policy for source-compatibility with call sites that used to reconcile the two fields. - A new destroy_and_delete_are_aliases test in enclaveapp-tpm-bridge asserts that both wire names route to identical semantics (neither produces the unknown-method error) — the "mitigation: bridge servers should accept both" note in the threat model is upgraded to a compat guarantee.
Two related bridge-client hardening changes that both live in client.rs: 1. require_bridge_is_authenticode_signed is now called before every BridgeSession::spawn. It parses the PE header's IMAGE_DIRECTORY_ENTRY_SECURITY slot (via pe_has_authenticode_table) and refuses binaries that carry no signature block at all. Opt-out: ENCLAVEAPP_BRIDGE_ALLOW_UNSIGNED=1 for dev / CI. Full WinVerifyTrust chain verification is still out of scope from the WSL side — an admin-on-Windows attacker who plants a validly-signed-but-malicious binary is the acknowledged residual. Non-.exe paths (test shell scripts) bypass the check so the existing test harness keeps working. 2. A process-wide BRIDGE_SESSION_LOCK (Mutex<()>) is held across the full spawn → request → shutdown lifetime of every bridge call. Two threads in the same client process no longer race to spawn independent bridge children against the same TPM, which would otherwise fire Windows Hello twice back-to-back and contend for the server-side key slot. Mutex poisoning is recovered with into_inner() so one crashed session cannot wedge the client for the process lifetime. 3. The renamed bridge_init_encodes_access_policy_only test enforces that the biometric field never leaks onto the wire, aligning with the earlier protocol.rs commit that removed the legacy field. New tests: - pe_has_authenticode_table_detects_signed_pe32 / _pe32plus / _unsigned / _rejects_non_pe - require_signed_skips_non_exe_paths / _rejects_unsigned_exe / _honors_allow_unsigned_env - concurrent_call_bridge_serializes_via_session_lock
Security.framework's default-keychain lookup goes through CFPreferences, which is keyed off the process's $HOME. Callers that override $HOME (integration tests via assert_cmd, awsenc serve under a launchd sandbox, cron jobs) got errSecNoDefaultKeychain back, which surfaced as the system-modal "A keychain cannot be found to store 'cache-key'" alert — blocking tests and leaving users confused. bridge.swift now resolves the login keychain explicitly via getpwuid(getuid())->pw_dir + "/Library/Keychains/login.keychain-db" (falling back to the older .keychain extension for migrated installs) and passes the resulting SecKeychain handle via kSecUseKeychain / kSecMatchSearchList in every SecItem query. The lookup bypasses CFPreferences / $HOME entirely. Unsigned-build UX is preserved: the first-run "Always Allow" ACL prompt is a SecTrust decision driven by the SecItemAdd itself, not by default-keychain lookup, so it still fires normally. keychain_delete on the Rust side now treats SE_ERR_KEYCHAIN_NOT_FOUND (12) as idempotent success so uninstall / cleanup flows stay quiet when HOME is isolated. Rename service prefix com.enclaveapp.* → com.libenclaveapp.* to match the newly-registered libenclaveapp.com domain. Pre-release rename, no legacy-entry migration path.
generate_key_with_retry now caps retries at MAX_RESIZE_RETRIES = 4 and refuses to resize when the Swift-reported length does not grow past what we sent. If the FFI ever starts returning SE_ERR_BUFFER_TOO_SMALL for something other than a genuine buffer-sizing shortfall, the Rust side surfaces it as 'Swift bridge contract violation' instead of spinning in a retry loop or masking the real failure. Also validates post-call pub_key_len ≤ 65 (uncompressed P-256 SEC1 is fixed-size); an out-of-range report is a contract violation. Paired with a bridge.swift doc comment asserting that SE_ERR_BUFFER_TOO_SMALL is ONLY used for buffer-sizing failures.
build.rs invokes the system xcrun at its absolute path /usr/bin/xcrun (system-managed, not user-writable without sudo) and discovers swiftc and ar via `xcrun --find <tool>`. The resolved paths sit inside the active Xcode developer directory (xcode-select -p) rather than walking $PATH. A shadowed xcrun / swiftc / ar earlier on the developer's $PATH can no longer substitute a poisoned Swift object into the static bridge that ends up linked into the binary. Release-tooling PATH hygiene is no longer load-bearing for this crate.
Adds a module-level `const _: () = assert!(size_of::<NCRYPT_UI_POLICY>() == EXPECTED_NCRYPT_UI_POLICY_SIZE, ...)` so a future windows-rs release that silently changes the struct (e.g. reorders LPCWSTR fields or pads differently) fails the build rather than shipping a wrong-sized cbInput to NCryptSetProperty / NCryptGetProperty. Expected size is 32 bytes on x64 (4 + 4 + 3×8) and 20 bytes on x86 (4 + 4 + 3×4).
harden_process() on Windows now applies three low-risk mitigations at startup: - ProcessStrictHandleCheckPolicy with RaiseExceptionOnInvalidHandleReference + HandleExceptionsPermanentlyEnabled — turns latent handle-confusion bugs into STATUS_INVALID_HANDLE exceptions instead of silently operating on the wrong object. - ProcessExtensionPointDisablePolicy with DisableExtensionPoints — blocks AppInit_DLLs, AppCertDlls, shim engines, IMEs, and winevent hooks from loading into the process. - ProcessImageLoadPolicy with NoRemoteImages + NoLowMandatoryLabelImages — refuses DLL loads from UNC paths and from files at the low-mandatory integrity label. Deliberately not applied: BinarySignaturePolicy.MicrosoftSignedOnly (breaks unsigned cargo builds), DynamicCodePolicy / ACG (breaks some JIT / crypto providers), SystemCallDisablePolicy.DisallowWin32kSystemCalls (breaks any GUI-surface process). Each call is best-effort — failure on older Windows builds is traced via tracing::warn! and does not abort startup. Workspace Cargo.toml adds Win32_System_Threading + Win32_Security to the windows crate's feature list.
Three related adapter changes:
1. SecretStore::get_read returns a typed SecretRead { Present(String),
Redacted, Absent } enum. The read-only inspection store surfaces
Redacted directly — it no longer round-trips through the
"<redacted>" string sentinel, so a stored secret whose bytes
happen to equal "<redacted>" is returned as Present("<redacted>")
and cannot be misclassified. Legacy SecretStore::get is retained
for back-compat and still produces Some(REDACTED_PLACEHOLDER) from
the read-only store. MemorySecretStore gains a test-only
mark_redacted() helper so tests can inject Redacted without going
through the sentinel string.
2. LaunchRequest::with_env_scrub(patterns) — opt-in list of exact
variable names ("NPM_TOKEN") or *-suffixed prefix patterns
("NPM_TOKEN_*", "AWS_*"). Matching variables are removed from
both the child's Command and our own std::env (so later subprocess
spawns without env_clear don't re-inherit), and our owned String
copies are zeroized before drop. Case-insensitive on Windows
because Windows env names are case-insensitive. Opt-in — existing
callers with env_scrub_patterns: Vec::new() behave identically.
3. disable_core_dumps_in_child installs a pre_exec hook on Unix that
calls setrlimit(RLIMIT_CORE, 0) before execve. The spawned child
(e.g. npm under npmenc) inherits a zero core limit regardless of
system-level core_pattern, so a crash of the Type 2 target can no
longer dump its interpolated NPM_TOKEN_* / AWS_* environment.
DESIGN.md: - Rewrite 'Credential cache file tamper' section around the new APL1 envelope (SHA-256(header) + rollback counter) — the old section claimed AAD binding was deferred. - New 'Metadata .meta tamper' section documenting the .meta.hmac sidecar on the keyring backend. - 'Process hardening' extended with the Windows mitigation subset. - 'app-adapter' line extended with SecretRead, with_env_scrub, and per-child RLIMIT_CORE. - Mention Authenticode-presence check and client-side session mutex in the bridge section. - Keychain wrap service name updated to com.libenclaveapp.<app>. THREAT_MODEL.md: - Bridge 'method-name confusion' downgraded from threat to compat guarantee (the destroy_and_delete_are_aliases test locks it in). - Bridge 'serialization' rewritten around the new process-wide BRIDGE_SESSION_LOCK. - App-adapter 'Launcher env inheritance' rewritten around the new opt-in with_env_scrub helper. - Keychain-wrap service name updated throughout. fix-macos.md: - Keychain-wrap service name updated throughout. Cargo.lock reflects the new sha2 dep on enclaveapp-core and the windows feature additions.
This was referenced Apr 17, 2026
Merged
Resolves conflicts where origin/main's #66/#67 docs PRs touched the same DESIGN.md/THREAT_MODEL.md regions I rewrote for the hardening pass. Our side supersedes — the new sections already include all of main's refinements plus the envelope, HMAC sidecar, Windows mitigations, SecretRead, env-scrub, bridge mutex, and Authenticode notes. fix-macos.md: accept main's deletion (#67 folded the findings into THREAT_MODEL.md).
Clippy on Linux flagged `&store.path_for(&id)` as needless — path_for already returns an owned PathBuf. Fix applies to the test-only get_read_on_read_only_store_returns_redacted_for_existing_entry case.
windows-rs 0.58 splits the Windows process-mitigation API across two modules: the PROCESS_MITIGATION_*_POLICY structs live in Win32::System::SystemServices while the SetProcessMitigationPolicy function and the PROCESS_MITIGATION_POLICY enum discriminants (ProcessStrictHandleCheckPolicy, etc.) live in Win32::System::Threading. Also add the Win32_System_SystemServices feature to the workspace's windows-crate feature list so the structs are actually available.
windows-rs 0.58 exposes PROCESS_MITIGATION_* struct layouts in Win32::System::SystemServices but does not generate bitfield setter methods on the inner _0_0 anonymous struct. Instead of calling set_RaiseExceptionOnInvalidHandleReference() etc., write the union's Flags: u32 word directly. Bit positions match the Win32 headers. Also drops redundant std::mem:: qualification on size_of to satisfy -D unused-qualifications.
std::ptr::from_ref is only stable since Rust 1.76 and the workspace pins MSRV at 1.75. std::ptr::addr_of! has been stable since 1.51 and produces the same *const T result without the MSRV floor bump.
The helper is only consumed by #[cfg(unix)] PE-parsing tests, so on Windows rustc reports it as dead code and the -D warnings build fails. Mirror the gate on the helper.
jgowdy-godaddy
pushed a commit
that referenced
this pull request
Apr 17, 2026
After #68 the Swift bridge opens the login keychain by explicit absolute path and routes all `SecItem*` calls through it. On GitHub Actions `macos-latest` runners that keychain is locked between jobs (it was unlocked once with an empty password at image-provision time). With the explicit handle in hand the next `SecItemAdd` blocks indefinitely waiting for a GUI unlock prompt that never arrives, hanging the job. Fix: after `SecKeychainOpen`, attempt a silent unlock with an empty password. Interactive dev sessions already have the keychain unlocked so this is a no-op; CI runners get their keychain unlocked without any prompt. If the keychain has a real password the unlock silently fails and the subsequent operation surfaces the real error — we deliberately do not mask it with a fallback.
4 tasks
jgowdy-godaddy
added a commit
that referenced
this pull request
Apr 17, 2026
After #68 the Swift bridge opens the login keychain by explicit absolute path and routes all `SecItem*` calls through it. On GitHub Actions `macos-latest` runners that keychain is locked between jobs (it was unlocked once with an empty password at image-provision time). With the explicit handle in hand the next `SecItemAdd` blocks indefinitely waiting for a GUI unlock prompt that never arrives, hanging the job. Fix: after `SecKeychainOpen`, attempt a silent unlock with an empty password. Interactive dev sessions already have the keychain unlocked so this is a no-op; CI runners get their keychain unlocked without any prompt. If the keychain has a real password the unlock silently fails and the subsequent operation surfaces the real error — we deliberately do not mask it with a fallback. Co-authored-by: Jay Gowdy <jay@gowdy.me>
jgowdy-godaddy
pushed a commit
that referenced
this pull request
Apr 17, 2026
PR #68 made the Swift bridge unconditionally open the login keychain by absolute path and pin every SecItem* op to it via kSecUseKeychain / kSecMatchSearchList. That fixed the "Keychain Not Found" modal dialog on local dev runs that override $HOME in tests, but it made GitHub Actions `macos-latest` jobs hang: SecItemAdd against an explicitly-pinned legacy keychain blocks on a same-binary ACL confirmation prompt the headless runner cannot answer. Fix: check SecKeychainCopyDefault first. When a default keychain is reachable (normal interactive sessions and CI runners), fall through to Security.framework's implicit routing, which reaches the Data Protection keychain on unsigned builds and does not trigger the legacy ACL prompt — restoring the pre-#68 CI behaviour. Only when no default is reachable ($HOME-overridden tests, launchd sandboxes) do we open the login keychain by explicit path and constrain ops to it — preserving #68's dialog-avoidance fix for those contexts.
jgowdy-godaddy
pushed a commit
that referenced
this pull request
Apr 17, 2026
PR #68 made the Swift bridge unconditionally open the login keychain by absolute path and pin every SecItem* op to it via kSecUseKeychain / kSecMatchSearchList. That fixed the "Keychain Not Found" modal dialog on local dev runs that override $HOME in tests, but it made GitHub Actions `macos-latest` jobs hang: SecItemAdd against an explicitly-pinned legacy keychain blocks on a same-binary ACL confirmation prompt the headless runner cannot answer. Fix: check SecKeychainCopyDefault first. When a default keychain is reachable (normal interactive sessions and CI runners), fall through to Security.framework's implicit routing, which reaches the Data Protection keychain on unsigned builds and does not trigger the legacy ACL prompt — restoring the pre-#68 CI behaviour. Only when no default is reachable ($HOME-overridden tests, launchd sandboxes) do we open the login keychain by explicit path and constrain ops to it — preserving #68's dialog-avoidance fix for those contexts.
4 tasks
jgowdy-godaddy
added a commit
that referenced
this pull request
Apr 17, 2026
PR #68 made the Swift bridge unconditionally open the login keychain by absolute path and pin every SecItem* op to it via kSecUseKeychain / kSecMatchSearchList. That fixed the "Keychain Not Found" modal dialog on local dev runs that override $HOME in tests, but it made GitHub Actions `macos-latest` jobs hang: SecItemAdd against an explicitly-pinned legacy keychain blocks on a same-binary ACL confirmation prompt the headless runner cannot answer. Fix: check SecKeychainCopyDefault first. When a default keychain is reachable (normal interactive sessions and CI runners), fall through to Security.framework's implicit routing, which reaches the Data Protection keychain on unsigned builds and does not trigger the legacy ACL prompt — restoring the pre-#68 CI behaviour. Only when no default is reachable ($HOME-overridden tests, launchd sandboxes) do we open the login keychain by explicit path and constrain ops to it — preserving #68's dialog-avoidance fix for those contexts. Co-authored-by: Jay Gowdy <jay@gowdy.me>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Large threat-model hardening pass spanning most of the crate tree. Each commit is scoped to a single mitigation area and is reviewable independently. The consumer projects (awsenc, sshenc, sso-jwt, npmenc) have companion PRs that depend on this PR landing first.
Security mitigations added
EncryptionStorage::encryptis wrapped in[4B "APL1"][32B SHA-256(header)][8B BE u64 counter][payload], binding the unencrypted cache header to the ciphertext and embedding a monotonic rollback counter guarded by a sibling.countersidecar under anfs4flock. Trait signature did not change — all backends get the protection uniformly.load_meta_with_hmacrejects mismatched.metawith a hardmeta_hmac_verifyerror. Hardware backends skip — they enforce policy on the chip regardless of.metaclaims.biometric: boolwire field removed;access_policyis the only encoding. Locked indestroy↔deletealias with a compat-guarantee test.ENCLAVEAPP_BRIDGE_ALLOW_UNSIGNED=1for dev / CI.SetProcessMitigationPolicysafe subset inharden_process(): StrictHandleCheck, ExtensionPointDisable, ImageLoad (NoRemoteImages + NoLowMandatoryLabelImages).SecretReadreturn on adapter read path;"<redacted>"is no longer sentinel-vulnerable.LaunchRequest::with_env_scrub(patterns)+ per-childsetrlimit(RLIMIT_CORE, 0)pre-exec hook.SecKeychainOpenon the real user's login keychain (viagetpwuid), bypassingCFPreferences/$HOMElookups that broke in isolated contexts (integration tests, launchd, cron).SE_ERR_BUFFER_TOO_SMALL; Rust-side retry caps at 4, refuses non-growing length reports, validatespub_key_len ≤ 65./usr/bin/xcrun+xcrun --findfor Swift bridge build tooling —$PATHpoisoning on the dev machine can no longer substitute a poisonedswiftc/ar.constassert onNCRYPT_UI_POLICYstruct size — future windows-rs layout drift fails the build instead of shipping a wrong-sizedcbInput.com.enclaveapp.*→com.libenclaveapp.*to match the newly-registeredlibenclaveapp.comdomain.Docs
DESIGN.md, THREAT_MODEL.md, and fix-macos.md updated to reflect all of the above. The stale "AAD binding is deferred" section in DESIGN.md is gone; the new envelope is documented in both DESIGN.md and THREAT_MODEL.md.
Test plan
cargo checkclean on macOS (default-host), Linux (cross), Windows (cross).cargo fmt --checkclean.cargo clippy --workspace --all-targets --features <all> -- -D warningsclean.cargo test --workspace --features <all>— 687 tests passing, 0 failing.cache::envelope::*(wrap/unwrap roundtrip, header-tamper rejection, rollback rejection, counter sidecar)core::metadata::meta_hmac_*(HMAC roundtrip, tamper rejection, wrong-key rejection, legacy passthrough)bridge::client::concurrent_call_bridge_serializes_via_session_lockbridge::client::pe_has_authenticode_table_*+require_signed_*bridge::client::bridge_init_encodes_access_policy_onlytpm-bridge::destroy_and_delete_are_aliasesapple::keychain_wrap::*(29 Keychain roundtrip tests against the real login keychain)app-adapter::launcher::scrub_removes_inherited_env_from_child_and_own_processOut of scope
Items that need code changes on the consumer side (awsenc, sshenc, sso-jwt, npmenc) to actually take effect are in those projects' PRs.