Skip to content

ENG-3564: SecretProvider abstraction and AWS Secrets Manager provider#8051

Open
erosselli wants to merge 15 commits intomainfrom
erosselli/ENG-3564-implementation
Open

ENG-3564: SecretProvider abstraction and AWS Secrets Manager provider#8051
erosselli wants to merge 15 commits intomainfrom
erosselli/ENG-3564-implementation

Conversation

@erosselli
Copy link
Copy Markdown
Contributor

@erosselli erosselli commented Apr 28, 2026

Ticket ENG-3564

Description Of Changes

Implements the secret provider layer from the design doc (PR #8016). This PR covers only the provider classes, config section, and tests — no DB engine wiring yet.

Code Changes

  • SecretValue wrapper with redacted str()/repr() to prevent credential leakage
  • SecretProvider ABC with get_secret() and invalidate() interface
  • StaticSecretProvider for existing static credential behavior (env vars / TOML)
  • AWSSecretsManagerProvider with TTL cache, stale-while-revalidate, thundering-herd protection, and circuit breaker
  • SecretsSettings config section wired into FidesConfig (secrets.provider, secrets.aws_secrets_manager.*)
  • create_secret_provider() factory function
  • 40 unit tests using moto and time mocking

Steps to Confirm

  1. Run pytest tests/config/secrets/ -v via nox (CI runs this through the misc-unit test group)
  2. For local/manual runs outside nox, use --noconftest to avoid unrelated fixture conflicts
  3. Review design doc in ENG-3564: Design doc for dynamic DB credentials via AWS Secrets Manager #8016 for context on the approach

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • No UX review needed
  • Followup issues:
    • Followup issues created
  • Database migrations:
    • No migrations
  • Documentation:
    • No documentation updates required

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview May 4, 2026 5:03pm
fides-privacy-center Ignored Ignored May 4, 2026 5:03pm

Request Review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 28, 2026

Dependency Review

✅ No vulnerabilities found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 7922e75.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

@erosselli erosselli changed the base branch from main to erosselli/ENG-3564-design-doc April 28, 2026 20:14
Comment thread tests/config/secrets/test_aws_secrets_manager_provider.py Fixed
Comment thread tests/config/secrets/test_aws_secrets_manager_provider.py Fixed
Comment thread tests/config/secrets/test_secret_value.py Fixed
Base automatically changed from erosselli/ENG-3564-design-doc to main April 29, 2026 15:05
erosselli and others added 2 commits April 29, 2026 13:41
…ider

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@erosselli erosselli force-pushed the erosselli/ENG-3564-implementation branch from 1b80f65 to 4e95bee Compare April 29, 2026 16:42
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.04%. Comparing base (5496449) to head (7922e75).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8051      +/-   ##
==========================================
+ Coverage   84.98%   85.04%   +0.06%     
==========================================
  Files         633      639       +6     
  Lines       41737    41919     +182     
  Branches     4886     4898      +12     
==========================================
+ Hits        35471    35652     +181     
- Misses       5157     5158       +1     
  Partials     1109     1109              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread tests/config/secrets/test_aws_secrets_manager_provider.py Fixed
Comment thread tests/config/secrets/test_aws_secrets_manager_provider.py Fixed
erosselli and others added 2 commits April 30, 2026 10:53
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread tests/config/secrets/test_aws_secrets_manager_provider.py Fixed
…alue, widen Dict[str, Any]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread tests/config/secrets/test_aws_secrets_manager_provider.py Fixed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@erosselli erosselli marked this pull request as ready for review April 30, 2026 15:12
@erosselli erosselli requested a review from a team as a code owner April 30, 2026 15:12
@erosselli erosselli requested review from dsill-ethyca and removed request for a team April 30, 2026 15:12
@erosselli
Copy link
Copy Markdown
Contributor Author

/code-review

@erosselli erosselli requested a review from a team April 30, 2026 15:13
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: SecretProvider Abstraction and AWS Secrets Manager Provider

The overall structure here is solid — the ABC + concrete providers + factory + settings approach is the right design, and there are several commendable security-conscious details: SecretValue.__repr__/__str__ redacting credentials, the from None suppression on JSONDecodeError to prevent .doc from leaking raw secrets into the exception chain, and the TTL + stale-while-revalidate + circuit breaker combination covering the most important operational failure modes. Test coverage is good.

Two issues should be resolved before this is production-safe:


Critical

1. Binary secrets raise KeyError with a misleading error message (see inline comment on _fetch)

response["SecretString"] raises KeyError for SecretBinary secrets, which surfaces to the caller as "no cached value available" with no indication of the root cause. An explicit check before the key access fixes this cleanly.

2. Lockless fast-path relies on CPython GIL semantics (see inline comment on lines 63–69)

The unsynchronized reads of entry.value and entry.fetched_at (two separate attributes, written in two separate statements under the lock) are safe today in CPython but constitute a data race under PEP 703 free-threaded mode. At minimum this should be documented with a comment so future maintainers don't enable --disable-gil without revisiting this. A cleaner fix is to unify the two fields into a single Optional[Tuple[SecretValue, float]] snapshot that can be atomically read as one reference.


Suggestions

  • observed_fetched_at thundering-herd check is redundant with the inner TTL re-check and has a subtle edge case when invalidate() has run (see inline comment).
  • Stale window provides no age protection post-invalidate() — the comment says "let TTL sort it out" but TTL cannot do that when the timestamp was zeroed. The behavior may be intentional but should be documented (see inline comment).
  • assert_never instead of unreachable raise at the end of create_secret_provider — the Literal type already guarantees exhaustiveness, so this should use assert_never to make that explicit to the type checker.
  • __hash__ silently set to None by defining __eq__ without __hash__ on SecretValue — make the intent explicit.
  • No region format validation or TTL ordering invariants in AWSSecretsManagerSettings — misconfiguration surfaces only at the first API call rather than at startup.
  • VersionStage="AWSCURRENT" is the default and can be removed.

Nit

  • The times = iter([...]) pattern in test_ttl_recheck_inside_lock_via_time_progression hard-codes the exact time.monotonic() call count and will fail silently on any refactor that changes the call order. Direct entry.fetched_at manipulation (as used elsewhere in the test file) would be more robust.

🔬 Codegraph: unavailable


💡 Write /code-review in a comment to re-run this review.

Comment thread src/fides/config/secrets/aws_secrets_manager_provider.py Outdated
Comment thread src/fides/config/secrets/aws_secrets_manager_provider.py
Comment thread src/fides/config/secrets/aws_secrets_manager_provider.py Outdated
Comment thread src/fides/config/secrets/aws_secrets_manager_provider.py Outdated
Comment thread src/fides/config/secrets/aws_secrets_manager_provider.py
Comment thread src/fides/config/secrets/base.py
Comment thread src/fides/config/secrets/factory.py
Comment thread src/fides/config/secrets_settings.py
Comment thread tests/config/secrets/test_aws_secrets_manager_provider.py Outdated
erosselli and others added 4 commits April 30, 2026 13:17
…alue, widen Dict[str, Any], coverage fixes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ent fixes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I suggest reverting this so git history doesn't show this changed for whitespace only

@erosselli erosselli requested a review from JadeCara May 4, 2026 14:18
self, secret_id: str, entry: _CacheEntry, exc: Exception
) -> SecretValue:
"""Serve stale value if within grace period, otherwise raise."""
entry.last_failed_at = time.monotonic()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We're setting now - time.monotonic() down on L170

Do you think its better to set it here at the first usage, so it's consistent? Claude suggests its more of a code cleanliness than a functional concern with the likely nanoseconds-order difference. I'll leave it up to you

erosselli and others added 2 commits May 4, 2026 11:48
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…from warning logs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@JadeCara JadeCara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good!
Not blocking: The only potential test gap I see is the stale invalidated fetch failure (when fetched_at = 0) - might be worth adding a test for future regression proofing?

… fetch failure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants