Skip to content

ENG-564 (1/4): SaaS dataset backend validation and protected field restoration#7686

Merged
Linker44 merged 29 commits into
mainfrom
ENG-564-saas-dataset-backend-validation
Apr 23, 2026
Merged

ENG-564 (1/4): SaaS dataset backend validation and protected field restoration#7686
Linker44 merged 29 commits into
mainfrom
ENG-564-saas-dataset-backend-validation

Conversation

@Linker44
Copy link
Copy Markdown
Contributor

@Linker44 Linker44 commented Mar 18, 2026

Ticket ENG-564

Description Of Changes

Adds backend validation for SaaS dataset editing that restores protected fields instead of rejecting edits. This supports the node-based dataset editor in PR #7685.

Protection Summary

Subject Field protection Edit protection
Primary key fields Cant remove primary_key flag restored if removed; other metadata editable
Identity No Restored if changed/removed; other metadata editable
References No Restored if changed/removed; other metadata editable
SaaS config-referenced fields Cant remove N/A
Collections Can't add or remove N/A
Top-level metadata N/A Immutable (name, description, etc.)

Key changes

  • Immutable top-level fields (name, description, organization_fides_key, etc.) are silently restored with warnings
  • Collections cannot be added/removed — violations are auto-corrected and warned
  • Protected fields (referenced by SaaS config param values) are restored if deleted
  • Primary key fields cannot be deleted; the primary_key flag cannot be removed
  • Identity and references (set by SaaS config) cannot be changed or removed
  • New GET /connection/{key}/protected-fields endpoint returns immutable fields and SaaS config-referenced collection fields
  • BulkPutDataset response now includes warnings: DatasetFieldWarning[] with structured action field (restored, removed, failed)
  • DatasetValidator context carries warnings through validation steps
  • CtlDataset path raises ValidationError if SaaS validation issues exist (instead of silently dropping warnings)
  • Uses connection_config.get_saas_config() instead of manual unpacking
  • Moved find_field_by_name and resolve_field_path to fides.api.graph.utils
  • Replaced direct db.query(CtlDataset) with _get_ctl_dataset from dataset_service
  • Uses deepcopy when restoring fields to avoid shared-reference aliasing

Code Changes

  • dataset_config_endpoints.pyget_dataset_protected_fields is now connection-scoped (no dataset_key)
  • dataset_config_service.pyget_protected_fields() with SaaS/non-SaaS branching; raises ValidationError on CtlDataset path
  • validation_steps/saas.pySaaSValidationStep with restore_immutable_fields, restore_protected_structure, restore_primary_key_fields, restore_identity_and_references
  • graph/utils.py — New module with find_field_by_name and resolve_field_path helpers
  • dataset_validator.pyDatasetValidator with warning context passed through validation steps
  • merge_configs_util.pyget_saas_config_referenced_field_paths for dot-path field references from SaaS config
  • dataset.pyDatasetFieldWarning with action: Literal["restored", "removed", "failed"]
  • urn_registry.pyDATASET_PROTECTED_FIELDS now connection-scoped

Steps to Confirm

  1. Verify GET /connection/{key}/protected-fields returns correct immutable fields and SaaS config-referenced fields
  2. Verify PATCH dataset save restores protected fields and returns warnings in response
  3. Verify primary key fields cannot be deleted or have their primary_key flag removed
  4. Verify identity and references cannot be changed or removed
  5. Run unit tests: nox -s "pytest(ops-unit)" -- tests/ops/service/dataset/test_saas_validation_step.py

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • No UX review needed
  • Followup issues:
    • No followup issues
  • Database migrations:
    • No migrations
  • Documentation:
    • No documentation updates required

Add backend validation for SaaS dataset editing that restores protected
fields instead of rejecting edits. This allows the UI to freely edit
datasets while the backend ensures structural integrity.

Changes:
- SaaS validation step restores immutable top-level fields (fides_key,
  name, description, organization_fides_key) with warnings
- Collections cannot be added/removed; violations are auto-corrected
- Protected fields (referenced by SaaS config) are restored if deleted
- New GET endpoint for protected fields metadata
- DatasetFieldWarning schema and warnings[] on BulkPutDataset response
- get_saas_config_referenced_field_paths utility for full dot-path refs
- Comprehensive unit tests for restore/warning behavior
@Linker44 Linker44 requested a review from a team as a code owner March 18, 2026 14:35
@Linker44 Linker44 requested review from erosselli and removed request for a team March 18, 2026 14:36
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Apr 23, 2026 6:03pm
fides-privacy-center Ignored Ignored Apr 23, 2026 6:03pm

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 18, 2026

Greptile Summary

This PR adds backend validation for SaaS dataset editing that silently restores protected fields (immutable top-level metadata, SaaS-config-referenced fields, and collection structure) instead of rejecting edits outright, and exposes a new GET /connection/{key}/dataset/{dataset_key}/protected-fields endpoint. The approach is well-structured — warnings are threaded cleanly through the validator context into the BulkPutDataset response — but there are a few correctness and code quality issues worth addressing before merge.

  • dataset_key path parameter on the new endpoint is declared but never used, causing the endpoint to return data for non-existent datasets without a 404.
  • fides_key is listed in _IMMUTABLE_DATASET_FIELDS but can never trigger a restoration (the DB query finds records by fides_key, so the comparison is always equal; and _validate_saas_dataset already rejects mismatched keys before this code runs).
  • When a required collection is missing from both the incoming dataset and the existing DB record, the restoration is silently skipped with no warning emitted, which could result in an incomplete dataset being saved without any feedback.
  • Three imports are placed inside the get_dataset_protected_fields function body instead of at the module top level, violating the project's import placement conventions.
  • get_saas_config_referenced_field_paths in merge_configs_util.py is a near-verbatim copy of get_saas_config_referenced_fields; the two should be unified to avoid duplication and future drift.
  • This PR exceeds 500 lines of changes, which violates the team's PR size guideline.

Confidence Score: 2/5

  • Not safe to merge as-is — the unused dataset_key parameter and the silent collection-restoration failure introduce real correctness gaps in the new endpoint and validation logic.
  • Two P1 logic issues exist: the dataset_key path parameter is completely ignored (making the endpoint callable for any key, including non-existent datasets), and silent data loss is possible when a required collection cannot be restored. The dead fides_key entry in _IMMUTABLE_DATASET_FIELDS is also a logic error. These need to be resolved before the feature behaves as described in the PR summary.
  • src/fides/api/v1/endpoints/dataset_config_endpoints.py (unused path param + inline imports) and src/fides/service/dataset/validation_steps/saas.py (dead fides_key entry, silent restoration failure).

Important Files Changed

Filename Overview
src/fides/api/v1/endpoints/dataset_config_endpoints.py Adds GET /protected-fields endpoint; contains imports inside the function body (violating project conventions) and an unused dataset_key path parameter that allows the endpoint to return data for non-existent datasets.
src/fides/service/dataset/validation_steps/saas.py Core PR logic adding field/collection restoration; fides_key in _IMMUTABLE_DATASET_FIELDS is dead code, and silent data loss occurs when a required collection cannot be restored from the existing dataset.
src/fides/service/connection/merge_configs_util.py Adds get_saas_config_referenced_field_paths; the new function is nearly a verbatim copy of the existing get_saas_config_referenced_fields and should be refactored to eliminate duplication.
src/fides/service/dataset/dataset_config_service.py Correctly threads warnings from the validator through create_or_update_dataset_config and up into the BulkPutDataset response; the guard if warnings and not isinstance(dataset, DatasetConfigCtlDataset) is sound.
src/fides/api/schemas/dataset.py Adds DatasetFieldWarning, ProtectedCollectionField, and DatasetProtectedFields schemas, and extends BulkPutDataset and ValidateDatasetResponse with a warnings list; changes are clean and well-structured.
src/fides/service/dataset/dataset_validator.py Minimal change: adds warnings list to DatasetValidationContext and includes it in the ValidateDatasetResponse; straightforward and correct.
tests/ops/service/dataset/test_saas_validation_step.py New test file with good unit coverage for _restore_immutable_fields and _restore_protected_structure; uses mock DB sessions correctly. Missing test for the silent-skip path where a removed collection has no existing record to restore from.
src/fides/common/urn_registry.py Single-line addition registering the new DATASET_PROTECTED_FIELDS URL constant; correct and consistent with surrounding patterns.

Comments Outside Diff (2)

  1. src/fides/api/v1/endpoints/dataset_config_endpoints.py, line 105-111 (link)

    P2 Imports inside function body

    These three imports are placed inside the function body rather than at the top of the module. This violates the project's import placement conventions. There is no circular import issue that would justify keeping them here — all three modules (SaaSConfig, get_saas_config_referenced_field_paths, and _IMMUTABLE_DATASET_FIELDS) are already imported at the top of other files in the same package.

    Move them to the top-level import block alongside the existing imports:

    from fides.api.schemas.saas.saas_config import SaaSConfig
    from fides.service.connection.merge_configs_util import (
        get_saas_config_referenced_field_paths,
    )
    from fides.service.dataset.validation_steps.saas import (
        _IMMUTABLE_DATASET_FIELDS,
    )

    Rule Used: Move imports to the top of the module rather than ... (source)

    Learnt From
    ethyca/fidesplus#2441

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. src/fides/api/v1/endpoints/dataset_config_endpoints.py, line 95-131 (link)

    P1 Unused path parameter dataset_key

    The dataset_key path parameter is declared in the function signature but is never referenced in the function body. The function always returns protected fields derived solely from connection_config, regardless of which dataset is named in the URL.

    Two problems arise from this:

    1. The endpoint returns a 200 response even when the named dataset does not exist on this connection, making the parameter misleading.
    2. API callers may assume the response is scoped to the specified dataset, when it is actually connection-scoped.

    The function should at minimum look up the DatasetConfig by dataset_key and return a 404 if it does not exist for this connection, or remove the dataset_key segment from the URL entirely if dataset-level scoping is not required.

Last reviewed commit: "ENG-564: Add SaaS da..."

Comment thread src/fides/service/connection/merge_configs_util.py
Comment thread src/fides/service/dataset/validation_steps/saas.py Outdated
Comment thread src/fides/service/dataset/validation_steps/saas.py
@Linker44 Linker44 removed the request for review from erosselli March 18, 2026 17:45
@Linker44 Linker44 marked this pull request as draft March 18, 2026 17:45
…return type, simplify branching

- Move SaaSConfig, get_saas_config_referenced_field_paths, and
  _IMMUTABLE_DATASET_FIELDS imports to module level in endpoint file
- Return DatasetProtectedFields model instead of Dict[str, Any]
- Collapse redundant isinstance checks into if/elif
- Add defensive guard in _restore_protected_structure for None saas_config
@Linker44
Copy link
Copy Markdown
Contributor Author

Done — added dataset existence validation to GET /protected-fields using the same pattern as other endpoints in this file. The endpoint now returns 404 if no DatasetConfig exists for the given dataset_key on this connection.

Linker44 and others added 2 commits March 23, 2026 11:58
- Remove fides_key from _IMMUTABLE_DATASET_FIELDS (unreachable comparison)
- Add dataset existence check to GET /protected-fields endpoint (404 on missing)
Linker44 and others added 2 commits March 23, 2026 12:46
Replace the hardcoded _IMMUTABLE_DATASET_FIELDS tuple with a
_MUTABLE_DATASET_FIELDS set containing only "collections". All other
top-level Dataset fields (including fides_key, which was previously
missing) are now automatically treated as immutable for SaaS datasets.
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — ENG-564: SaaS Dataset Validation and Protected Field Restoration

Overall this is a well-structured feature. The approach of auto-restoring protected fields with warnings (rather than rejecting edits outright) is a good UX decision, and the recursive dot-path field resolution is cleanly implemented. The unit test coverage for the private helpers is thorough. A few issues worth addressing before merge:

Issues

1. Private symbol leaking across layers (dataset_config_endpoints.py:46)
_MUTABLE_DATASET_FIELDS is imported from the validation step module into the endpoint layer. This couples the API layer to an internal implementation detail of the validation layer. Should be exposed via a public function or moved to the schema layer.

2. ORM-to-Pydantic conversion may raise uncaught exception (saas.py:308)
FideslangDataset.model_validate(existing_record) on a SQLAlchemy object relies on from_attributes=True being set on the model. If the ORM object's attributes don't align perfectly with the Pydantic model's expectations, this raises a PydanticValidationError that isn't caught here, producing a 500. A try/except to fall back to existing_dataset = None would make this more resilient.

3. Misleading warning when a container field is wholly restored (saas.py:152)
When an intermediate container field (e.g., address) is removed entirely, _restore_nested_field re-adds the whole existing_container subtree — including all its child fields, not just the protected leaf. The warning emitted by the caller (e.g., "Restored field 'address.street'") doesn't reflect this; the user isn't told that address.city, address.zip, etc. were also silently restored.

4. Silent skip when a protected field's collection was removed (saas.py:260)
The continue when a collection is missing means a protected field inside a removed collection gets no dedicated warning. The collection-removal warning is generated separately, but users aren't explicitly told the collection contained a protected field that couldn't be individually recovered. Depends on product intent, but worth considering an additional warning.

5. fides_key restoration is unreachable (saas.py:67)
_validate_saas_dataset already raises ValidationError if fides_key mismatches, so by the time _restore_immutable_fields runs, fides_key is guaranteed to match. The field iterates over it unnecessarily. Either document this invariant in a comment or explicitly exclude fides_key from the loop.

6. Mutation side-effects run unnecessarily on the CtlDataset path (dataset_config_service.py:97)
SaaSValidationStep mutates dataset_to_validate in place even on the DatasetConfigCtlDataset path, and those mutations are then discarded. The workaround (clearing warnings) is documented, but the cleaner fix is to not run the restoration steps at all on this path (e.g., via an additional skip_steps entry).

Minor

Schema overlap (dataset.py:40)
ProtectedCollectionField and DatasetFieldWarning share similar shapes. Not necessarily a problem, but worth a quick design check to see if one should extend the other or if the duplication is intentional given they serve different API surfaces.

No integration tests
All new tests exercise private functions directly. An integration test hitting the new GET /protected-fields endpoint and the dataset PUT flow (verifying that warnings appear in the response and protected fields are actually restored in the DB) would give stronger confidence in the end-to-end behavior.

Comment thread src/fides/api/v1/endpoints/dataset_config_endpoints.py Outdated
Comment thread src/fides/service/dataset/validation_steps/saas.py Outdated
Comment thread src/fides/service/dataset/validation_steps/saas.py
Comment thread src/fides/service/dataset/validation_steps/saas.py
Comment thread src/fides/service/dataset/validation_steps/saas.py
Comment thread src/fides/service/dataset/dataset_config_service.py
Comment thread src/fides/api/schemas/dataset.py
@Linker44 Linker44 marked this pull request as draft April 1, 2026 16:02
- Rename _MUTABLE_DATASET_FIELDS to MUTABLE_DATASET_FIELDS (public API)
- Add comment clarifying SaaSValidationStep is no-op for non-SaaS connections
- Add warning when collection missing from existing dataset during restoration
- Wrap FideslangDataset.model_validate in try/except for graceful fallback
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 82.47863% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.94%. Comparing base (4d36209) to head (942b126).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/fides/service/dataset/validation_steps/saas.py 82.53% 16 Missing and 13 partials ⚠️
...rc/fides/service/dataset/dataset_config_service.py 52.17% 9 Missing and 2 partials ⚠️
...fides/api/v1/endpoints/dataset_config_endpoints.py 66.66% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (82.47%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (84.94%) is below the target coverage (85.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7686      +/-   ##
==========================================
- Coverage   84.97%   84.94%   -0.03%     
==========================================
  Files         631      632       +1     
  Lines       41239    41460     +221     
  Branches     4787     4834      +47     
==========================================
+ Hits        35041    35220     +179     
- Misses       5113     5139      +26     
- Partials     1085     1101      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread src/fides/service/dataset/validation_steps/saas.py
Comment thread src/fides/service/dataset/validation_steps/saas.py
Comment thread src/fides/service/dataset/validation_steps/saas.py Outdated
Comment thread src/fides/api/v1/endpoints/dataset_config_endpoints.py Outdated
Comment thread src/fides/service/dataset/validation_steps/saas.py
Comment thread src/fides/service/dataset/validation_steps/saas.py Outdated
Comment thread tests/ops/service/dataset/test_saas_validation_step.py
Comment thread src/fides/service/dataset/validation_steps/saas.py Outdated
Comment thread src/fides/service/dataset/validation_steps/saas.py Outdated
Comment thread changelog/7686-saas-dataset-validation-protected-fields.yaml
- Use deepcopy when restoring fields to avoid shared-reference aliasing
- Raise ValidationError on CtlDataset path when SaaS validation issues exist
- Use connection_config.get_saas_config() instead of manual unpacking
- Add structured action field to DatasetFieldWarning (restored/removed/failed)
- Consolidate duplicate warning branches into _emit_failed_collection_warning
- Add comment explaining MUTABLE_DATASET_FIELDS invariant
- Push 404 logic into service's get_protected_fields method
- Add warning when restore_nested_field fails silently
- Replace direct db.query with _get_ctl_dataset from dataset_service
- Move find_field_by_name and resolve_field_path to fides.api.graph.utils
- Parameterize immutable field tests
- Fix log level for removed collections (info -> warning)
- Surface behavior change in changelog description
The protected fields are a property of the connection's SaaS config,
not any individual dataset. Removed dataset_key from the URL path
(/connection/{key}/protected-fields instead of
/connection/{key}/dataset/{dataset_key}/protected-fields).
Primary key fields (fides_meta.primary_key == True) are now protected
on SaaS datasets:
- Deleted primary key fields are restored from the existing dataset
- Removing the primary_key flag restores the full fides_meta
- Protected-fields endpoint includes primary key fields alongside
  SaaS config-referenced fields
The protected-fields endpoint should only return SaaS config-referenced
fields. Primary key protection is enforced by the validation step only.
When a user edits metadata on a primary key field (e.g. data_type),
only the primary_key flag is restored — other metadata changes persist.
Identity and references are set by the SaaS config and cannot be
modified or removed by users. If changed, only those specific values
are restored — other metadata edits on the same field persist.
…ributes

Merged restore_primary_key_fields and restore_identity_and_references
into one function that walks the existing dataset's field tree once,
protecting primary_key, identity, and references in a single pass.
- Replace _collect_protected_meta_fields + _protect_primary_key with
  a single _restore_protected_meta function driven by PROTECTED_META_ATTRS
- Protect primary_key, identity, and references from changes in either
  direction (removal, modification, and addition)
- primary_key=True fields additionally cannot be deleted
- Adding a new protected attr is now a one-line config entry
- Consolidate and trim tests from 48 to 41 cases
@Linker44 Linker44 requested a review from Vagoasdf April 23, 2026 12:18
Resolve conflicts in dataset_config_service.py by keeping both:
- SaaS validation warnings and protected field restoration (PR branch)
- Dataset audit event tracking (main)
Comment thread src/fides/api/v1/endpoints/dataset_config_endpoints.py Fixed
- Add 'items' endpoint to minimal SaaS config so the SaaS validation
  step doesn't strip the 'items' collection from the test dataset
- Update 2-tuple unpacking to 3-tuple to match the new return signature
  of create_or_update_dataset_config (which now includes warnings)
SaaS validation treats 'name' as immutable (only 'collections' is mutable),
so the name change gets silently reverted. Update the test to only change
collections and assert on collection-level diff instead.
Copy link
Copy Markdown
Contributor

@Vagoasdf Vagoasdf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looking well. One small nitpick but nothing to worry about

Comment thread src/fides/service/dataset/validation_steps/saas.py Outdated
…d import

Use frozenset to prevent accidental mutation of the module-level constant.
Remove unused ConnectionType import from dataset_config_endpoints.
@Linker44 Linker44 added this pull request to the merge queue Apr 23, 2026
Merged via the queue into main with commit ad1b907 Apr 23, 2026
68 of 71 checks passed
@Linker44 Linker44 deleted the ENG-564-saas-dataset-backend-validation branch April 23, 2026 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants