Skip to content

Eng 2503 add auditing dataseets#7964

Merged
Vagoasdf merged 30 commits intomainfrom
ENG-2503-add-auditing-dataseets
Apr 22, 2026
Merged

Eng 2503 add auditing dataseets#7964
Vagoasdf merged 30 commits intomainfrom
ENG-2503-add-auditing-dataseets

Conversation

@Vagoasdf
Copy link
Copy Markdown
Contributor

@Vagoasdf Vagoasdf commented Apr 20, 2026

Ticket 2503

Description Of Changes

Adds Audit Events to the edition of SaaS Datasets. Now whenever we edit, create or update a SaaS Dataset, we would register a new

Code Changes

  • New Event Audits objects for datasets
  • generate_dataset_audit_event_details() function added on events_audit utils
  • implementing _create_dataset_audit_event() function on Dataset Service

Steps to Confirm

  1. Create a SaaS dataset via PATCH /api/v1/connection/{saas_connection_key}/datasets → expect dataset.created row in event_audit
  2. Update it (re-send PATCH with a change) → expect dataset.updated
  3. Delete it via DELETE /api/v1/connection/{saas_connection_key}/dataset/{key} → expect dataset.deleted
  4. Implicit bulk delete via PUT /api/v1/connection/{saas_connection_key}/dataset-configs with an empty/shorter list → expect one dataset.deleted per removed config (this was the bug fixed in this branch)
  5. Non-SaaS connection — create/delete on a Postgres connection → expect no rows in event_audit (SaaS-only guard)

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Apr 21, 2026 9:20pm
fides-privacy-center Ignored Ignored Apr 21, 2026 9:20pm

Request Review

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 93.75000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.95%. Comparing base (a52d50a) to head (22d483a).
⚠️ Report is 18 commits behind head on main.

Files with missing lines Patch % Lines
...rc/fides/service/dataset/dataset_config_service.py 89.18% 4 Missing ⚠️

❌ Your patch status has failed because the patch coverage (93.75%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (84.95%) is below the target coverage (85.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7964      +/-   ##
==========================================
- Coverage   85.04%   84.95%   -0.09%     
==========================================
  Files         631      630       -1     
  Lines       41217    41139      -78     
  Branches     4807     4775      -32     
==========================================
- Hits        35053    34951     -102     
- Misses       5070     5106      +36     
+ Partials     1094     1082      -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Vagoasdf Vagoasdf marked this pull request as ready for review April 20, 2026 20:14
@Vagoasdf Vagoasdf requested a review from a team as a code owner April 20, 2026 20:14
@Vagoasdf Vagoasdf requested review from vcruces and removed request for a team April 20, 2026 20:14
@Vagoasdf
Copy link
Copy Markdown
Contributor Author

/code-review

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dataset Auditing for SaaS Integrations — Code Review

Overall this is a clean, well-structured addition. The pattern mirrors existing connection/taxonomy audit event work closely, test coverage is thorough, and the failure-isolation (audit errors never bubble up to callers) is handled correctly. A few things to address before merging:

Bug

  • Wrong PR number in changelog (changelog/7964-add-dataset-auditng.yaml:3): pr: 1964 should be pr: 7964. The filename also has a typo — auditngauditing (cosmetic, but worth fixing).

Minor Issues

  • Unused import (tests/util/test_event_audit_util.py:4): FideslangDataset is imported but never referenced in the new test class or anywhere else in the file.

  • TOCTOU in create_or_update_dataset_config (dataset_config_service.py:135-151): The "is this a create or update?" pre-check is a separate DB round-trip from the actual create_or_update, so a concurrent insert between the two could cause a dataset_created event to be emitted for what is functionally an update. Low probability in practice, but worth at minimum a comment explaining the known race. A cleaner fix would be to infer create-vs-update from the result of create_or_update rather than a preflight query.

  • Lost atomicity in put_dataset_configs (dataset_config_endpoints.py:175-179): The original single-SQL batch DELETE was atomic; the new per-key loop commits each deletion independently. An unexpected mid-loop failure (not DatasetNotFoundException) would leave partial state. The DatasetNotFoundException swallowing for concurrent deletes is correct, but it's worth noting the changed semantics in a comment.

What looks good

  • EventAuditType values (dataset.created, dataset.updated, dataset.deleted) follow the existing naming convention.
  • _create_dataset_audit_event correctly no-ops for non-SaaS connections and when no audit service is injected.
  • delete_dataset_config emits the audit event after the delete succeeds, so no phantom events for failed deletes.
  • get_dataset_config_service dependency ordering in deps.py is correct.
  • Test coverage is comprehensive: create, update, delete, non-SaaS no-op, broken audit service, missing audit service, concurrent delete, and non-existent key all have dedicated test cases.

🔬 Codegraph: connected (47107 nodes)


💡 Write /code-review in a comment to re-run this review.

Comment thread changelog/7964-add-dataset-auditng.yaml Outdated
@@ -0,0 +1,4 @@
type: Added
description: Adds Dataset Auditing to Saas integrations
pr: 1964
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changelog/7964-add-dataset-auditng.yaml:3

Wrong PR number: pr: 1964 should be pr: 7964.

Also a minor nit: the filename itself has a typo — auditng instead of auditing.

Comment thread tests/util/test_event_audit_util.py Outdated
"""Tests for event_audit_util.py functions."""

import pytest
from fideslang.models import Dataset as FideslangDataset
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/util/test_event_audit_util.py:4

FideslangDataset is imported here but never used anywhere in the new TestGenerateDatasetAuditEventDetails tests (or elsewhere in the file). This import can be removed.

Comment on lines 135 to 151
# Determine create vs. update before persisting
fides_key = data_dict["fides_key"]
existing = DatasetConfig.filter(
db=self.db,
conditions=(
(DatasetConfig.connection_config_id == connection_config.id)
& (DatasetConfig.fides_key == fides_key)
),
).first()
event_type = (
EventAuditType.dataset_updated
if existing
else EventAuditType.dataset_created
)

# Create or update using unified method
dataset_config = DatasetConfig.create_or_update(self.db, data=data_dict)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/fides/service/dataset/dataset_config_service.py:135-151

There's a TOCTOU (time-of-check / time-of-use) window here: the existing pre-check and the actual create_or_update are two separate DB operations, so a concurrent request that inserts the record in between would cause this request to emit dataset_created for what is effectively an update.

In practice this is a low-frequency edge case, but it's worth noting. A cleaner alternative would be to inspect the result of DatasetConfig.create_or_update (e.g. check if the returned object has a freshly-generated created_at equal to updated_at, or have the model method return a created/updated flag) instead of the pre-flight query. Even adding a comment explaining the known race would help set expectations for future readers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how that can happen. Including it

Comment on lines +175 to +179
for key in config_keys_to_remove:
try:
dataset_config_service.delete_dataset_config(connection_config, key)
except DatasetNotFoundException:
pass # already deleted; desired state reached
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/fides/api/v1/endpoints/dataset_config_endpoints.py:175-179

The original code deleted all config_keys_to_remove in a single SQL DELETE + one db.commit(), which was atomic. The new loop calls delete_dataset_config once per key, each of which commits independently. If an unexpected exception (i.e. something other than DatasetNotFoundException) occurs partway through the loop, the earlier keys will already be deleted while later ones are not, leaving partial state.

This is unlikely to matter in normal operation (each delete is independent and the DatasetNotFoundException swallowing handles the concurrent-delete case correctly), but it's a subtle behavioral change worth documenting in a comment. If strict all-or-nothing semantics are required, consider wrapping the loop in a single transaction or reverting to a bulk delete when no audit events are needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan was to separate them so we could properly log an audit for each delete, otherwise, these would be drowned on the noise and we could only log one dataset erasure event.
Im not sure if this is a valid comment, but its something to take in consideration, yes. Will consider an update if the reviewer agrees

@Linker44
Copy link
Copy Markdown
Contributor

Linker44 commented Apr 21, 2026

Base implementation looks good but i worry about the usefulness of the logging. Users will be able to see that something changed but not exactly what changed, this differs from how we Eventaudit connection secrets and saas configs were we log exactly what is it that changed and mask any sensitive values.

I incline on this needing a refactor to include that information.

Copy link
Copy Markdown
Contributor

@Linker44 Linker44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. I would rather that on update we only log exactly what changed on the dataset instead of all of it, but for now this is good we can improve upon it later

@Vagoasdf Vagoasdf added this pull request to the merge queue Apr 22, 2026
Merged via the queue into main with commit 2350cce Apr 22, 2026
66 of 69 checks passed
@Vagoasdf Vagoasdf deleted the ENG-2503-add-auditing-dataseets branch April 22, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants