ENG-3564: SecretProvider abstraction and AWS Secrets Manager provider#8051
ENG-3564: SecretProvider abstraction and AWS Secrets Manager provider#8051
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
Dependency Review✅ No vulnerabilities found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
…ider Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1b80f65 to
4e95bee
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8051 +/- ##
==========================================
+ Coverage 84.98% 85.04% +0.06%
==========================================
Files 633 639 +6
Lines 41737 41919 +182
Branches 4886 4898 +12
==========================================
+ Hits 35471 35652 +181
- Misses 5157 5158 +1
Partials 1109 1109 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…alue, widen Dict[str, Any] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/code-review |
There was a problem hiding this comment.
Code Review: SecretProvider Abstraction and AWS Secrets Manager Provider
The overall structure here is solid — the ABC + concrete providers + factory + settings approach is the right design, and there are several commendable security-conscious details: SecretValue.__repr__/__str__ redacting credentials, the from None suppression on JSONDecodeError to prevent .doc from leaking raw secrets into the exception chain, and the TTL + stale-while-revalidate + circuit breaker combination covering the most important operational failure modes. Test coverage is good.
Two issues should be resolved before this is production-safe:
Critical
1. Binary secrets raise KeyError with a misleading error message (see inline comment on _fetch)
response["SecretString"] raises KeyError for SecretBinary secrets, which surfaces to the caller as "no cached value available" with no indication of the root cause. An explicit check before the key access fixes this cleanly.
2. Lockless fast-path relies on CPython GIL semantics (see inline comment on lines 63–69)
The unsynchronized reads of entry.value and entry.fetched_at (two separate attributes, written in two separate statements under the lock) are safe today in CPython but constitute a data race under PEP 703 free-threaded mode. At minimum this should be documented with a comment so future maintainers don't enable --disable-gil without revisiting this. A cleaner fix is to unify the two fields into a single Optional[Tuple[SecretValue, float]] snapshot that can be atomically read as one reference.
Suggestions
observed_fetched_atthundering-herd check is redundant with the inner TTL re-check and has a subtle edge case wheninvalidate()has run (see inline comment).- Stale window provides no age protection post-
invalidate()— the comment says "let TTL sort it out" but TTL cannot do that when the timestamp was zeroed. The behavior may be intentional but should be documented (see inline comment). assert_neverinstead of unreachableraiseat the end ofcreate_secret_provider— theLiteraltype already guarantees exhaustiveness, so this should useassert_neverto make that explicit to the type checker.__hash__silently set toNoneby defining__eq__without__hash__onSecretValue— make the intent explicit.- No
regionformat validation or TTL ordering invariants inAWSSecretsManagerSettings— misconfiguration surfaces only at the first API call rather than at startup. VersionStage="AWSCURRENT"is the default and can be removed.
Nit
- The
times = iter([...])pattern intest_ttl_recheck_inside_lock_via_time_progressionhard-codes the exacttime.monotonic()call count and will fail silently on any refactor that changes the call order. Directentry.fetched_atmanipulation (as used elsewhere in the test file) would be more robust.
🔬 Codegraph: unavailable
💡 Write /code-review in a comment to re-run this review.
…alue, widen Dict[str, Any], coverage fixes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ent fixes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Nit: I suggest reverting this so git history doesn't show this changed for whitespace only
| self, secret_id: str, entry: _CacheEntry, exc: Exception | ||
| ) -> SecretValue: | ||
| """Serve stale value if within grace period, otherwise raise.""" | ||
| entry.last_failed_at = time.monotonic() |
There was a problem hiding this comment.
Nit: We're setting now - time.monotonic() down on L170
Do you think its better to set it here at the first usage, so it's consistent? Claude suggests its more of a code cleanliness than a functional concern with the likely nanoseconds-order difference. I'll leave it up to you
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…from warning logs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JadeCara
left a comment
There was a problem hiding this comment.
Looks really good!
Not blocking: The only potential test gap I see is the stale invalidated fetch failure (when fetched_at = 0) - might be worth adding a test for future regression proofing?
… fetch failure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ticket ENG-3564
Description Of Changes
Implements the secret provider layer from the design doc (PR #8016). This PR covers only the provider classes, config section, and tests — no DB engine wiring yet.
Code Changes
SecretValuewrapper with redactedstr()/repr()to prevent credential leakageSecretProviderABC withget_secret()andinvalidate()interfaceStaticSecretProviderfor existing static credential behavior (env vars / TOML)AWSSecretsManagerProviderwith TTL cache, stale-while-revalidate, thundering-herd protection, and circuit breakerSecretsSettingsconfig section wired intoFidesConfig(secrets.provider,secrets.aws_secrets_manager.*)create_secret_provider()factory functionSteps to Confirm
pytest tests/config/secrets/ -vvia nox (CI runs this through themisc-unittest group)--noconftestto avoid unrelated fixture conflictsPre-Merge Checklist
CHANGELOG.mdupdated