Conversation
* Restricted to Saas integrations
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (93.75%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #7964 +/- ##
==========================================
- Coverage 85.04% 84.95% -0.09%
==========================================
Files 631 630 -1
Lines 41217 41139 -78
Branches 4807 4775 -32
==========================================
- Hits 35053 34951 -102
- Misses 5070 5106 +36
+ Partials 1094 1082 -12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/code-review |
There was a problem hiding this comment.
Dataset Auditing for SaaS Integrations — Code Review
Overall this is a clean, well-structured addition. The pattern mirrors existing connection/taxonomy audit event work closely, test coverage is thorough, and the failure-isolation (audit errors never bubble up to callers) is handled correctly. A few things to address before merging:
Bug
- Wrong PR number in changelog (
changelog/7964-add-dataset-auditng.yaml:3):pr: 1964should bepr: 7964. The filename also has a typo —auditng→auditing(cosmetic, but worth fixing).
Minor Issues
-
Unused import (
tests/util/test_event_audit_util.py:4):FideslangDatasetis imported but never referenced in the new test class or anywhere else in the file. -
TOCTOU in
create_or_update_dataset_config(dataset_config_service.py:135-151): The "is this a create or update?" pre-check is a separate DB round-trip from the actualcreate_or_update, so a concurrent insert between the two could cause adataset_createdevent to be emitted for what is functionally an update. Low probability in practice, but worth at minimum a comment explaining the known race. A cleaner fix would be to infer create-vs-update from the result ofcreate_or_updaterather than a preflight query. -
Lost atomicity in
put_dataset_configs(dataset_config_endpoints.py:175-179): The original single-SQL batchDELETEwas atomic; the new per-key loop commits each deletion independently. An unexpected mid-loop failure (notDatasetNotFoundException) would leave partial state. TheDatasetNotFoundExceptionswallowing for concurrent deletes is correct, but it's worth noting the changed semantics in a comment.
What looks good
EventAuditTypevalues (dataset.created,dataset.updated,dataset.deleted) follow the existing naming convention._create_dataset_audit_eventcorrectly no-ops for non-SaaS connections and when no audit service is injected.delete_dataset_configemits the audit event after the delete succeeds, so no phantom events for failed deletes.get_dataset_config_servicedependency ordering indeps.pyis correct.- Test coverage is comprehensive: create, update, delete, non-SaaS no-op, broken audit service, missing audit service, concurrent delete, and non-existent key all have dedicated test cases.
🔬 Codegraph: connected (47107 nodes)
💡 Write /code-review in a comment to re-run this review.
| @@ -0,0 +1,4 @@ | |||
| type: Added | |||
| description: Adds Dataset Auditing to Saas integrations | |||
| pr: 1964 | |||
There was a problem hiding this comment.
changelog/7964-add-dataset-auditng.yaml:3
Wrong PR number: pr: 1964 should be pr: 7964.
Also a minor nit: the filename itself has a typo — auditng instead of auditing.
| """Tests for event_audit_util.py functions.""" | ||
|
|
||
| import pytest | ||
| from fideslang.models import Dataset as FideslangDataset |
There was a problem hiding this comment.
tests/util/test_event_audit_util.py:4
FideslangDataset is imported here but never used anywhere in the new TestGenerateDatasetAuditEventDetails tests (or elsewhere in the file). This import can be removed.
| # Determine create vs. update before persisting | ||
| fides_key = data_dict["fides_key"] | ||
| existing = DatasetConfig.filter( | ||
| db=self.db, | ||
| conditions=( | ||
| (DatasetConfig.connection_config_id == connection_config.id) | ||
| & (DatasetConfig.fides_key == fides_key) | ||
| ), | ||
| ).first() | ||
| event_type = ( | ||
| EventAuditType.dataset_updated | ||
| if existing | ||
| else EventAuditType.dataset_created | ||
| ) | ||
|
|
||
| # Create or update using unified method | ||
| dataset_config = DatasetConfig.create_or_update(self.db, data=data_dict) |
There was a problem hiding this comment.
src/fides/service/dataset/dataset_config_service.py:135-151
There's a TOCTOU (time-of-check / time-of-use) window here: the existing pre-check and the actual create_or_update are two separate DB operations, so a concurrent request that inserts the record in between would cause this request to emit dataset_created for what is effectively an update.
In practice this is a low-frequency edge case, but it's worth noting. A cleaner alternative would be to inspect the result of DatasetConfig.create_or_update (e.g. check if the returned object has a freshly-generated created_at equal to updated_at, or have the model method return a created/updated flag) instead of the pre-flight query. Even adding a comment explaining the known race would help set expectations for future readers.
There was a problem hiding this comment.
I can see how that can happen. Including it
| for key in config_keys_to_remove: | ||
| try: | ||
| dataset_config_service.delete_dataset_config(connection_config, key) | ||
| except DatasetNotFoundException: | ||
| pass # already deleted; desired state reached |
There was a problem hiding this comment.
src/fides/api/v1/endpoints/dataset_config_endpoints.py:175-179
The original code deleted all config_keys_to_remove in a single SQL DELETE + one db.commit(), which was atomic. The new loop calls delete_dataset_config once per key, each of which commits independently. If an unexpected exception (i.e. something other than DatasetNotFoundException) occurs partway through the loop, the earlier keys will already be deleted while later ones are not, leaving partial state.
This is unlikely to matter in normal operation (each delete is independent and the DatasetNotFoundException swallowing handles the concurrent-delete case correctly), but it's a subtle behavioral change worth documenting in a comment. If strict all-or-nothing semantics are required, consider wrapping the loop in a single transaction or reverting to a bulk delete when no audit events are needed.
There was a problem hiding this comment.
The plan was to separate them so we could properly log an audit for each delete, otherwise, these would be drowned on the noise and we could only log one dataset erasure event.
Im not sure if this is a valid comment, but its something to take in consideration, yes. Will consider an update if the reviewer agrees
|
Base implementation looks good but i worry about the usefulness of the logging. Users will be able to see that something changed but not exactly what changed, this differs from how we Eventaudit I incline on this needing a refactor to include that information. |
Linker44
left a comment
There was a problem hiding this comment.
lgtm. I would rather that on update we only log exactly what changed on the dataset instead of all of it, but for now this is good we can improve upon it later
Ticket 2503
Description Of Changes
Adds Audit Events to the edition of SaaS Datasets. Now whenever we edit, create or update a SaaS Dataset, we would register a new
Code Changes
Steps to Confirm
Pre-Merge Checklist
CHANGELOG.mdupdatedmaindowngrade()migration is correct and works