ENG-2852 Update encryption mechanism for Organization columns#7554
ENG-2852 Update encryption mechanism for Organization columns#7554
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
There was a problem hiding this comment.
I've checked this one and the implementation plan into version control since there's quite a number of tickets for the encryption epic
Greptile SummaryThis PR migrates the Key changes:
Confidence Score: 4/5
Important Files Changed
Last reviewed commit: 9ad728c |
...lembic/migrations/versions/xx_2026_03_03_1400_bf12f05ef8eb_migrate_organization_to_aesgcm.py
Outdated
Show resolved
Hide resolved
...lembic/migrations/versions/xx_2026_03_03_1400_bf12f05ef8eb_migrate_organization_to_aesgcm.py
Outdated
Show resolved
Hide resolved
|
@greptile re-review |
...lembic/migrations/versions/xx_2026_03_03_1400_bf12f05ef8eb_migrate_organization_to_aesgcm.py
Show resolved
Hide resolved
📝 WalkthroughWalkthroughThis PR migrates Organization's encrypted columns (controller, data_protection_officer, representative) from PostgreSQL pgcrypto to AES-GCM using StringEncryptedType. The change includes documentation, a database migration script, updated model definitions, and validation tests. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
src/fides/api/models/sql_models.py (1)
476-503: Reduce repeated encryption type declarations inOrganization.These three column definitions duplicate the same
StringEncryptedType(...)configuration and can drift. Extract a small helper that returns the type and reuse it.♻️ Proposed refactor
+def _organization_contact_encrypted_type() -> StringEncryptedType: + return StringEncryptedType( + JSONTypeOverride, + CONFIG.security.app_encryption_key, + AesGcmEngine, + "pkcs5", + ) + class Organization(Base, FidesBase): @@ - controller = Column( - StringEncryptedType( - JSONTypeOverride, - CONFIG.security.app_encryption_key, - AesGcmEngine, - "pkcs5", - ), - nullable=True, - ) + controller = Column(_organization_contact_encrypted_type(), nullable=True) @@ - data_protection_officer = Column( - StringEncryptedType( - JSONTypeOverride, - CONFIG.security.app_encryption_key, - AesGcmEngine, - "pkcs5", - ), - nullable=True, - ) + data_protection_officer = Column( + _organization_contact_encrypted_type(), nullable=True + ) @@ - representative = Column( - StringEncryptedType( - JSONTypeOverride, - CONFIG.security.app_encryption_key, - AesGcmEngine, - "pkcs5", - ), - nullable=True, - ) + representative = Column(_organization_contact_encrypted_type(), nullable=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fides/api/models/sql_models.py` around lines 476 - 503, The three Organization column definitions (controller, data_protection_officer, representative) duplicate the same StringEncryptedType(...) config; create a small helper function (e.g., _encrypted_json_type or get_encrypted_json_type) in the same module that returns StringEncryptedType(JSONTypeOverride, CONFIG.security.app_encryption_key, AesGcmEngine, "pkcs5") and then replace the repeated constructors with Column(_encrypted_json_type(), nullable=True) for each of those fields so the encryption config is centralized and reusable.src/fides/api/alembic/migrations/versions/xx_2026_03_03_1400_bf12f05ef8eb_migrate_organization_to_aesgcm.py (1)
73-77: Avoidfetchall()to reduce plaintext residency in memory.Iterate directly over the cursor result instead of materializing all rows at once.
♻️ Proposed change
- rows = bind.execute( + rows = bind.execute( text(f"SELECT fides_key, {null_coalesce} FROM ctl_organizations"), {"key": CONFIG.user.encryption_key}, - ).fetchall() + ) @@ - rows = bind.execute( + rows = bind.execute( text( "SELECT fides_key, controller, data_protection_officer, representative " "FROM ctl_organizations" ) - ).fetchall() + )Also applies to: 121-127
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fides/api/alembic/migrations/versions/xx_2026_03_03_1400_bf12f05ef8eb_migrate_organization_to_aesgcm.py` around lines 73 - 77, The code currently calls bind.execute(...).fetchall() and assigns to rows, which materializes all plaintext rows in memory; change to iterate the execute() result directly (e.g., for fides_key, col in bind.execute(text(...), {"key": CONFIG.user.encryption_key}): ...) instead of using .fetchall(), and update the analogous block referenced around lines 121-127 to also iterate the execute() cursor/result iterator rather than calling fetchall(), ensuring processing is done row-by-row and no large list of plaintext rows is retained in memory.tests/ctl/models/test_sql_models.py (1)
52-70: Strengthen this test to verify encryption at rest, not only round-trip behavior.The current assertions can still pass if values are stored plaintext and simply deserialized correctly. Add a raw SQL read of the stored columns and assert plaintext markers are not present.
🧪 Proposed test hardening
+import json import pytest +from sqlalchemy import text @@ def test_encrypted_fields_round_trip(self, db, contact_details): @@ reloaded = ( db.query(Organization).filter_by(fides_key="test_encryption_org").first() ) assert reloaded.controller == contact_details assert reloaded.data_protection_officer == contact_details assert reloaded.representative == contact_details + + raw = db.execute( + text( + "SELECT controller, data_protection_officer, representative " + "FROM ctl_organizations WHERE fides_key = :fides_key" + ), + {"fides_key": "test_encryption_org"}, + ).mappings().one() + + # quick at-rest sanity checks (raw ciphertext should not include plaintext JSON tokens) + assert '"name": "Jane Controller"' not in raw["controller"] + assert '"name": "Jane Controller"' not in raw["data_protection_officer"] + assert '"name": "Jane Controller"' not in raw["representative"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ctl/models/test_sql_models.py` around lines 52 - 70, The test_encrypted_fields_round_trip currently only verifies deserialization, so add a raw SQL read of the stored columns to confirm values are encrypted at rest: after creating Organization (Organization.create) and reloading, run a raw SELECT on the organizations table for the row with fides_key "test_encryption_org" to fetch the raw persisted values of controller, data_protection_officer, and representative (using the same db/session used in the test), then assert those raw stored values do not contain plaintext markers from contact_details (e.g., contact_details["email"] or contact_details["name"]) and are not equal to the JSON/text representation of contact_details; keep the existing round-trip assertions as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/fides/docs/encryption/implementation-plan.md`:
- Around line 25-46: The fenced diagram blocks in the markdown use plain
triple-backticks without a language tag (e.g., the KEK/DEK diagram and the other
fenced sections noted around lines 50-82 and 352-368); update each opening fence
from ``` to ```text so the blocks declare a language and satisfy MD040 (leave
the closing ``` unchanged); search for the triple-backtick fences in
docs/fides/docs/encryption/implementation-plan.md (the diagram blocks and the
other listed fenced areas) and add the language identifier "text" to each
opening fence.
In `@docs/fides/docs/encryption/overview.md`:
- Around line 16-17: Update the overview to mark PGEncryptedString as
legacy/migration-only and reflect that Organization columns now use
StringEncryptedType with AesGcmEngine; specifically, change the table entry
referencing PGEncryptedString and the rows mentioning Organization model fields
to indicate "Legacy / Migration-only" (or similar) and add a note that current
behavior uses StringEncryptedType(AesGcmEngine) for Organization columns; search
for occurrences of PGEncryptedString, AesEngine, AesGcmEngine, and
StringEncryptedType to update lines 16-17 and the other locations referenced
(around lines 94-100 and 146-147).
In
`@src/fides/api/alembic/migrations/versions/xx_2026_03_03_1400_bf12f05ef8eb_migrate_organization_to_aesgcm.py`:
- Around line 51-76: In upgrade(), add explicit pre-checks for the required
encryption keys (CONFIG.security.app_encryption_key and
CONFIG.user.encryption_key) before adding columns or reading rows: validate both
keys are present/non-empty and raise a clear exception (e.g., RuntimeError with
a descriptive message) to abort the migration if either is missing; do this
before creating encryptor, before executing bind.execute/select, and reference
the symbols upgrade(), CONFIG.security.app_encryption_key,
CONFIG.user.encryption_key, encryptor, and bind so the guard runs early and
prevents any data rewrite when keys are absent.
---
Nitpick comments:
In
`@src/fides/api/alembic/migrations/versions/xx_2026_03_03_1400_bf12f05ef8eb_migrate_organization_to_aesgcm.py`:
- Around line 73-77: The code currently calls bind.execute(...).fetchall() and
assigns to rows, which materializes all plaintext rows in memory; change to
iterate the execute() result directly (e.g., for fides_key, col in
bind.execute(text(...), {"key": CONFIG.user.encryption_key}): ...) instead of
using .fetchall(), and update the analogous block referenced around lines
121-127 to also iterate the execute() cursor/result iterator rather than calling
fetchall(), ensuring processing is done row-by-row and no large list of
plaintext rows is retained in memory.
In `@src/fides/api/models/sql_models.py`:
- Around line 476-503: The three Organization column definitions (controller,
data_protection_officer, representative) duplicate the same
StringEncryptedType(...) config; create a small helper function (e.g.,
_encrypted_json_type or get_encrypted_json_type) in the same module that returns
StringEncryptedType(JSONTypeOverride, CONFIG.security.app_encryption_key,
AesGcmEngine, "pkcs5") and then replace the repeated constructors with
Column(_encrypted_json_type(), nullable=True) for each of those fields so the
encryption config is centralized and reusable.
In `@tests/ctl/models/test_sql_models.py`:
- Around line 52-70: The test_encrypted_fields_round_trip currently only
verifies deserialization, so add a raw SQL read of the stored columns to confirm
values are encrypted at rest: after creating Organization (Organization.create)
and reloading, run a raw SELECT on the organizations table for the row with
fides_key "test_encryption_org" to fetch the raw persisted values of controller,
data_protection_officer, and representative (using the same db/session used in
the test), then assert those raw stored values do not contain plaintext markers
from contact_details (e.g., contact_details["email"] or contact_details["name"])
and are not equal to the JSON/text representation of contact_details; keep the
existing round-trip assertions as well.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d223b098-d550-4fbc-b77c-0ca7293889cf
📒 Files selected for processing (6)
changelog/7554-migrate-organization-encryption-to-aesgcm.yamldocs/fides/docs/encryption/implementation-plan.mddocs/fides/docs/encryption/overview.mdsrc/fides/api/alembic/migrations/versions/xx_2026_03_03_1400_bf12f05ef8eb_migrate_organization_to_aesgcm.pysrc/fides/api/models/sql_models.pytests/ctl/models/test_sql_models.py
| ``` | ||
| ┌──────────────────────────────────────────────────────────────┐ | ||
| │ KEK (Key Encryption Key) │ | ||
| │ ───────────────────────── │ | ||
| │ Owned by the operator. Lives in an env var (local provider) │ | ||
| │ or never leaves the KMS (AWS KMS provider). │ | ||
| │ │ | ||
| │ Wraps / unwraps the DEK. │ | ||
| └──────────────────────┬───────────────────────────────────────┘ | ||
| │ wraps | ||
| ▼ | ||
| ┌──────────────────────────────────────────────────────────────┐ | ||
| │ DEK (Data Encryption Key) │ | ||
| │ ───────────────────────── │ | ||
| │ = the current app_encryption_key value │ | ||
| │ Stored encrypted (wrapped) by the provider. │ | ||
| │ Unwrapped at runtime, cached in memory. │ | ||
| │ │ | ||
| │ Used by all StringEncryptedType columns, AES-GCM utils, │ | ||
| │ JWE tokens, etc. — the entire existing encryption surface. │ | ||
| └──────────────────────────────────────────────────────────────┘ | ||
| ``` |
There was a problem hiding this comment.
Add language identifiers to fenced blocks to satisfy markdownlint (MD040).
The diagram/code fences should declare a language (e.g., text) so lint checks pass consistently.
🧹 Proposed lint fix
-```
+```text
┌──────────────────────────────────────────────────────────────┐
...
-```
+```
-```
+```text
┌──────────────────┐
...
-```
+```
-```
+```text
PR 0a (migrate Organization) PR 0b (migrate UserRegistration)
...
-```
+```Also applies to: 50-82, 352-368
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 25-25: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/fides/docs/encryption/implementation-plan.md` around lines 25 - 46, The
fenced diagram blocks in the markdown use plain triple-backticks without a
language tag (e.g., the KEK/DEK diagram and the other fenced sections noted
around lines 50-82 and 352-368); update each opening fence from ``` to ```text
so the blocks declare a language and satisfy MD040 (leave the closing ```
unchanged); search for the triple-backtick fences in
docs/fides/docs/encryption/implementation-plan.md (the diagram blocks and the
other listed fenced areas) and add the language identifier "text" to each
opening fence.
| | **PGEncryptedString** | PGP symmetric | PostgreSQL `pgcrypto` | Organization model fields | | ||
| | **AesEngine** | AES-CBC | `sqlalchemy-utils` | Legacy (UserRegistration only) | |
There was a problem hiding this comment.
This overview still presents PGEncryptedString as active current-state behavior.
After this PR, Organization columns are migrated to StringEncryptedType(AesGcmEngine). Please mark PGEncryptedString as legacy/historical (or migration-only) to avoid stale operational guidance.
📝 Suggested doc adjustment
-| **PGEncryptedString** | PGP symmetric | PostgreSQL `pgcrypto` | Organization model fields |
+| **PGEncryptedString** | PGP symmetric | PostgreSQL `pgcrypto` | Legacy mechanism (historical / migration downgrade path) |
@@
-### PostgreSQL pgcrypto Encryption via `PGEncryptedString`
+### PostgreSQL pgcrypto Encryption via `PGEncryptedString` (Legacy)
@@
-| `Organization` | `controller`, `data_protection_officer`, `representative` |
+| _No active model columns_ | Legacy historical usage only |
@@
-Additionally, `CONFIG.user.encryption_key` (from `UserSettings`) is used exclusively for `PGEncryptedString`.
+Additionally, `CONFIG.user.encryption_key` (from `UserSettings`) remains relevant for legacy pgcrypto compatibility/migration paths.Also applies to: 94-100, 146-147
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/fides/docs/encryption/overview.md` around lines 16 - 17, Update the
overview to mark PGEncryptedString as legacy/migration-only and reflect that
Organization columns now use StringEncryptedType with AesGcmEngine;
specifically, change the table entry referencing PGEncryptedString and the rows
mentioning Organization model fields to indicate "Legacy / Migration-only" (or
similar) and add a note that current behavior uses
StringEncryptedType(AesGcmEngine) for Organization columns; search for
occurrences of PGEncryptedString, AesEngine, AesGcmEngine, and
StringEncryptedType to update lines 16-17 and the other locations referenced
(around lines 94-100 and 146-147).
| def upgrade() -> None: | ||
| for col in COLUMNS: | ||
| op.add_column( | ||
| "ctl_organizations", | ||
| sa.Column(f"{col}_new", sa.Text(), nullable=True), | ||
| ) | ||
|
|
||
| bind = op.get_bind() | ||
|
|
||
| encryptor = StringEncryptedType( | ||
| type_in=sa.Text(), | ||
| key=CONFIG.security.app_encryption_key, | ||
| engine=AesGcmEngine, | ||
| padding="pkcs5", | ||
| ) | ||
|
|
||
| null_coalesce = ", ".join( | ||
| f"CASE WHEN {c} IS NOT NULL " | ||
| f"THEN pgp_sym_decrypt({c}, :key) " | ||
| f"ELSE NULL END AS {c}" | ||
| for c in COLUMNS | ||
| ) | ||
| rows = bind.execute( | ||
| text(f"SELECT fides_key, {null_coalesce} FROM ctl_organizations"), | ||
| {"key": CONFIG.user.encryption_key}, | ||
| ).fetchall() |
There was a problem hiding this comment.
Fail fast on missing encryption keys before rewriting data.
This migration rewrites encrypted data but does not explicitly validate key presence first. Add explicit guards so the migration aborts early with a clear error before touching rows.
🛡️ Proposed safeguard
COLUMNS = ("controller", "data_protection_officer", "representative")
+def _validate_required_keys() -> None:
+ if not CONFIG.user.encryption_key:
+ raise RuntimeError(
+ "Missing CONFIG.user.encryption_key required for pgcrypto decrypt/encrypt."
+ )
+ if not CONFIG.security.app_encryption_key:
+ raise RuntimeError(
+ "Missing CONFIG.security.app_encryption_key required for AES-GCM encryption."
+ )
+
+
def _is_json_null(value: str) -> bool:
@@
def upgrade() -> None:
+ _validate_required_keys()
for col in COLUMNS:
@@
def downgrade() -> None:
+ _validate_required_keys()
for col in COLUMNS:Also applies to: 105-126
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@src/fides/api/alembic/migrations/versions/xx_2026_03_03_1400_bf12f05ef8eb_migrate_organization_to_aesgcm.py`
around lines 51 - 76, In upgrade(), add explicit pre-checks for the required
encryption keys (CONFIG.security.app_encryption_key and
CONFIG.user.encryption_key) before adding columns or reading rows: validate both
keys are present/non-empty and raise a clear exception (e.g., RuntimeError with
a descriptive message) to abort the migration if either is missing; do this
before creating encryptor, before executing bind.execute/select, and reference
the symbols upgrade(), CONFIG.security.app_encryption_key,
CONFIG.user.encryption_key, encryptor, and bind so the guard runs early and
prevents any data rewrite when keys are absent.
johnewart
left a comment
There was a problem hiding this comment.
Seems good - the question of other tables needing to be migrated like this was answered offline and is "no"
Ticket ENG-2852
Description Of Changes
This PR migrates the
Organizationmodel's 3 encrypted columns (controller,data_protection_officer, andrepresentative) from the legacyPGEncryptedString(PostgreSQLpgcrypto) toStringEncryptedType(AesGcmEngine), the encryption mechanism used by the rest of the app.It includes an Alembic migration that decrypts existing data with
pgp_sym_decryptand re-encrypts with AES-GCM, correctly converting pgcrypto-encrypted JSON "null" values to true SQL NULL. The PR also removes the now-unusedPGEncryptedStringclass.Steps to Confirm
OrganizationsAdmin UI viewSELECT pgp_sym_decrypt(controller, 'test_encryption_key') FROM ctl_organizations;to make sure the value was re-encrypted properlyPre-Merge Checklist
CHANGELOG.mdupdatedmaindowngrade()migration is correct and worksSummary by CodeRabbit
Documentation
Chores