ENG-564 (1/4): SaaS dataset backend validation and protected field restoration#7686
Conversation
Add backend validation for SaaS dataset editing that restores protected fields instead of rejecting edits. This allows the UI to freely edit datasets while the backend ensures structural integrity. Changes: - SaaS validation step restores immutable top-level fields (fides_key, name, description, organization_fides_key) with warnings - Collections cannot be added/removed; violations are auto-corrected - Protected fields (referenced by SaaS config) are restored if deleted - New GET endpoint for protected fields metadata - DatasetFieldWarning schema and warnings[] on BulkPutDataset response - get_saas_config_referenced_field_paths utility for full dot-path refs - Comprehensive unit tests for restore/warning behavior
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
Greptile SummaryThis PR adds backend validation for SaaS dataset editing that silently restores protected fields (immutable top-level metadata, SaaS-config-referenced fields, and collection structure) instead of rejecting edits outright, and exposes a new
Confidence Score: 2/5
Important Files Changed
|
…lution, skip warnings for CtlDataset path
…return type, simplify branching - Move SaaSConfig, get_saas_config_referenced_field_paths, and _IMMUTABLE_DATASET_FIELDS imports to module level in endpoint file - Return DatasetProtectedFields model instead of Dict[str, Any] - Collapse redundant isinstance checks into if/elif - Add defensive guard in _restore_protected_structure for None saas_config
|
Done — added dataset existence validation to |
- Remove fides_key from _IMMUTABLE_DATASET_FIELDS (unreachable comparison) - Add dataset existence check to GET /protected-fields endpoint (404 on missing)
Replace the hardcoded _IMMUTABLE_DATASET_FIELDS tuple with a _MUTABLE_DATASET_FIELDS set containing only "collections". All other top-level Dataset fields (including fides_key, which was previously missing) are now automatically treated as immutable for SaaS datasets.
There was a problem hiding this comment.
Code Review — ENG-564: SaaS Dataset Validation and Protected Field Restoration
Overall this is a well-structured feature. The approach of auto-restoring protected fields with warnings (rather than rejecting edits outright) is a good UX decision, and the recursive dot-path field resolution is cleanly implemented. The unit test coverage for the private helpers is thorough. A few issues worth addressing before merge:
Issues
1. Private symbol leaking across layers (dataset_config_endpoints.py:46)
_MUTABLE_DATASET_FIELDS is imported from the validation step module into the endpoint layer. This couples the API layer to an internal implementation detail of the validation layer. Should be exposed via a public function or moved to the schema layer.
2. ORM-to-Pydantic conversion may raise uncaught exception (saas.py:308)
FideslangDataset.model_validate(existing_record) on a SQLAlchemy object relies on from_attributes=True being set on the model. If the ORM object's attributes don't align perfectly with the Pydantic model's expectations, this raises a PydanticValidationError that isn't caught here, producing a 500. A try/except to fall back to existing_dataset = None would make this more resilient.
3. Misleading warning when a container field is wholly restored (saas.py:152)
When an intermediate container field (e.g., address) is removed entirely, _restore_nested_field re-adds the whole existing_container subtree — including all its child fields, not just the protected leaf. The warning emitted by the caller (e.g., "Restored field 'address.street'") doesn't reflect this; the user isn't told that address.city, address.zip, etc. were also silently restored.
4. Silent skip when a protected field's collection was removed (saas.py:260)
The continue when a collection is missing means a protected field inside a removed collection gets no dedicated warning. The collection-removal warning is generated separately, but users aren't explicitly told the collection contained a protected field that couldn't be individually recovered. Depends on product intent, but worth considering an additional warning.
5. fides_key restoration is unreachable (saas.py:67)
_validate_saas_dataset already raises ValidationError if fides_key mismatches, so by the time _restore_immutable_fields runs, fides_key is guaranteed to match. The field iterates over it unnecessarily. Either document this invariant in a comment or explicitly exclude fides_key from the loop.
6. Mutation side-effects run unnecessarily on the CtlDataset path (dataset_config_service.py:97)
SaaSValidationStep mutates dataset_to_validate in place even on the DatasetConfigCtlDataset path, and those mutations are then discarded. The workaround (clearing warnings) is documented, but the cleaner fix is to not run the restoration steps at all on this path (e.g., via an additional skip_steps entry).
Minor
Schema overlap (dataset.py:40)
ProtectedCollectionField and DatasetFieldWarning share similar shapes. Not necessarily a problem, but worth a quick design check to see if one should extend the other or if the duplication is intentional given they serve different API surfaces.
No integration tests
All new tests exercise private functions directly. An integration test hitting the new GET /protected-fields endpoint and the dataset PUT flow (verifying that warnings appear in the response and protected fields are actually restored in the DB) would give stronger confidence in the end-to-end behavior.
- Rename _MUTABLE_DATASET_FIELDS to MUTABLE_DATASET_FIELDS (public API) - Add comment clarifying SaaSValidationStep is no-op for non-SaaS connections - Add warning when collection missing from existing dataset during restoration - Wrap FideslangDataset.model_validate in try/except for graceful fallback
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (82.47%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #7686 +/- ##
==========================================
- Coverage 84.97% 84.94% -0.03%
==========================================
Files 631 632 +1
Lines 41239 41460 +221
Branches 4787 4834 +47
==========================================
+ Hits 35041 35220 +179
- Misses 5113 5139 +26
- Partials 1085 1101 +16 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Use deepcopy when restoring fields to avoid shared-reference aliasing - Raise ValidationError on CtlDataset path when SaaS validation issues exist - Use connection_config.get_saas_config() instead of manual unpacking - Add structured action field to DatasetFieldWarning (restored/removed/failed) - Consolidate duplicate warning branches into _emit_failed_collection_warning - Add comment explaining MUTABLE_DATASET_FIELDS invariant - Push 404 logic into service's get_protected_fields method - Add warning when restore_nested_field fails silently - Replace direct db.query with _get_ctl_dataset from dataset_service - Move find_field_by_name and resolve_field_path to fides.api.graph.utils - Parameterize immutable field tests - Fix log level for removed collections (info -> warning) - Surface behavior change in changelog description
The protected fields are a property of the connection's SaaS config,
not any individual dataset. Removed dataset_key from the URL path
(/connection/{key}/protected-fields instead of
/connection/{key}/dataset/{dataset_key}/protected-fields).
Primary key fields (fides_meta.primary_key == True) are now protected on SaaS datasets: - Deleted primary key fields are restored from the existing dataset - Removing the primary_key flag restores the full fides_meta - Protected-fields endpoint includes primary key fields alongside SaaS config-referenced fields
The protected-fields endpoint should only return SaaS config-referenced fields. Primary key protection is enforced by the validation step only.
When a user edits metadata on a primary key field (e.g. data_type), only the primary_key flag is restored — other metadata changes persist.
Identity and references are set by the SaaS config and cannot be modified or removed by users. If changed, only those specific values are restored — other metadata edits on the same field persist.
…ributes Merged restore_primary_key_fields and restore_identity_and_references into one function that walks the existing dataset's field tree once, protecting primary_key, identity, and references in a single pass.
- Replace _collect_protected_meta_fields + _protect_primary_key with a single _restore_protected_meta function driven by PROTECTED_META_ATTRS - Protect primary_key, identity, and references from changes in either direction (removal, modification, and addition) - primary_key=True fields additionally cannot be deleted - Adding a new protected attr is now a one-line config entry - Consolidate and trim tests from 48 to 41 cases
Resolve conflicts in dataset_config_service.py by keeping both: - SaaS validation warnings and protected field restoration (PR branch) - Dataset audit event tracking (main)
- Add 'items' endpoint to minimal SaaS config so the SaaS validation step doesn't strip the 'items' collection from the test dataset - Update 2-tuple unpacking to 3-tuple to match the new return signature of create_or_update_dataset_config (which now includes warnings)
SaaS validation treats 'name' as immutable (only 'collections' is mutable), so the name change gets silently reverted. Update the test to only change collections and assert on collection-level diff instead.
Vagoasdf
left a comment
There was a problem hiding this comment.
All looking well. One small nitpick but nothing to worry about
…d import Use frozenset to prevent accidental mutation of the module-level constant. Remove unused ConnectionType import from dataset_config_endpoints.
Ticket ENG-564
Description Of Changes
Adds backend validation for SaaS dataset editing that restores protected fields instead of rejecting edits. This supports the node-based dataset editor in PR #7685.
Protection Summary
primary_keyflag restored if removed; other metadata editableKey changes
name,description,organization_fides_key, etc.) are silently restored with warningsprimary_keyflag cannot be removedGET /connection/{key}/protected-fieldsendpoint returns immutable fields and SaaS config-referenced collection fieldsBulkPutDatasetresponse now includeswarnings: DatasetFieldWarning[]with structuredactionfield (restored,removed,failed)DatasetValidatorcontext carries warnings through validation stepsCtlDatasetpath raisesValidationErrorif SaaS validation issues exist (instead of silently dropping warnings)connection_config.get_saas_config()instead of manual unpackingfind_field_by_nameandresolve_field_pathtofides.api.graph.utilsdb.query(CtlDataset)with_get_ctl_datasetfrom dataset_servicedeepcopywhen restoring fields to avoid shared-reference aliasingCode Changes
dataset_config_endpoints.py—get_dataset_protected_fieldsis now connection-scoped (nodataset_key)dataset_config_service.py—get_protected_fields()with SaaS/non-SaaS branching; raisesValidationErroron CtlDataset pathvalidation_steps/saas.py—SaaSValidationStepwithrestore_immutable_fields,restore_protected_structure,restore_primary_key_fields,restore_identity_and_referencesgraph/utils.py— New module withfind_field_by_nameandresolve_field_pathhelpersdataset_validator.py—DatasetValidatorwith warning context passed through validation stepsmerge_configs_util.py—get_saas_config_referenced_field_pathsfor dot-path field references from SaaS configdataset.py—DatasetFieldWarningwithaction: Literal["restored", "removed", "failed"]urn_registry.py—DATASET_PROTECTED_FIELDSnow connection-scopedSteps to Confirm
GET /connection/{key}/protected-fieldsreturns correct immutable fields and SaaS config-referenced fieldsprimary_keyflag removednox -s "pytest(ops-unit)" -- tests/ops/service/dataset/test_saas_validation_step.pyPre-Merge Checklist
CHANGELOG.mdupdated