ENG-3593: Eliminate redundant System fetches in /system/upsert by adamsachs · Pull Request #8080 · ethyca/fides

adamsachs · 2026-04-30T20:13:18Z

Description Of Changes

Reduces per-row get_resource calls in /system/upsert from 4 → 1 on the UPDATE path, and adds per-axis change-detection logging in _audit_system_changes so we can quantify the steady-state no-op rate before considering a "skip-unchanged" optimization.

Each updated system on the UPDATE path was previously issuing four Fetching resource round-trips:

upsert_system — existence check
update_system — re-fetch with relationships
update_resource — pre-UPDATE sanity (inside crud.update_resource)
update_resource — post-UPDATE return (inside crud.update_resource)

Three are pure dedup: none of them inspect the row for anything except "does it exist" or "what is its current state," both of which the caller already has by step (1).

This was identified as a contributor to the system memory baseline during the recent system/upsert perf investigation; reducing fetches lowers both wall time and peak memory on this endpoint.

Code Changes

upsert_system keeps the System returned by its existence check and threads it to update_system via a new optional existing_system= parameter. The direct PUT /system endpoint caller (no pre-loaded System) continues to work — the parameter defaults to None and falls back to the original fetch.
update_system replaces its call to crud.update_resource with an inline sql_update(System.__table__).where(...).values(resource_dict) plus db.refresh(existing_system). The refresh re-reads the row so DB-side coercions (timestamps, JSONB normalization, etc.) are still picked up; the _audit_system_changes comparison stays equivalent to the prior behavior.
_audit_system_changes captures each DeepDiff once into named variables (general_diff, privacy_diff, data_flow_diff) and emits a structured System change detection debug log per system with general_changed, privacy_declarations_changed, data_flow_changed, any_changed, plus affected-paths counts per axis. No behavior change — same if-gates, same SystemHistory writes; this is purely observability.
db.refresh() does not flow through crud.get_resource, so it does not emit a Fetching resource log line. The Fetching resource count on the same payload drops from ~4N to ~N (the remaining one is upsert_system's existence check, which a follow-up could batch).

Performance verification (local benchmark)

reproduce_upsert_perf.py

Methodology: build a payload of N Systems × M declarations, prime once (INSERT path), then upsert the same payload three times to exercise the UPDATE path on every row with zero data drift. Run on main and this branch in turn against the same local backend / DB. Backend at DEBUG log level so Fetching resource / Updating resource lines emit. Reproduction script will be attached as a comment on this PR.

Payload: 180 systems × 3 declarations each.

Metric	`main`	This PR	Δ
Wall time min	2.92s	1.51s	−48%
Wall time avg (n=3)	3.05s	1.65s	−46%
Wall time max	3.19s	1.83s	−43%
`Fetching resource` per call	720	180	−75% (4N → N, exactly as predicted)
`Updating resource` per call	180	180	unchanged ✓
`System change detection` per call	0	180	new observability ✓

The fetch reduction matches the call-graph trace exactly. The wall-time reduction (~46%) is less than the fetch-count reduction (~75%), meaning roughly half of the original UPDATE-path wall time is non-fetch work (DeepDiff, model_dump, deepcopy of existing snapshot) — which is exactly the headroom available for the higher-risk "skip-unchanged" follow-up.

Steps to Confirm

Run tests/ctl/core/test_system_history.py — covers update_system + _audit_system_changes directly. The critical case is test_no_changes (idempotent upsert ⇒ zero SystemHistory rows). All six cases should pass: test_system_information_changes, test_privacy_declaration_changes, test_ingress_egress_changes, test_multiple_changes, test_no_changes, test_automatic_system_update.
Run tests/ctl/core/test_system.py and tests/ops/api/v1/endpoints/test_system.py for broader coverage and to exercise the PUT /system path (which uses the existing_system=None fallback).
Reproduce the perf measurement above using the script attached as a PR comment. With 180 systems × 3 declarations × 3 measure runs, expect ~180 Fetching resource logs per measure call (down from ~720), wall time roughly halved on local Postgres, and one System change detection line per row with any_changed=False on idempotent payloads.

Pre-Merge Checklist

Followups

Replace upsert_system's per-row existence check with a single batched SELECT ... WHERE fides_key IN (...) plus selectinload(System.privacy_declarations). Collapses the remaining N fetches to ~1.
Skip the UPDATE + audit pipeline entirely for unchanged systems. Higher correctness risk (equivalence checks, updated_at semantics) — the System change detection logs added here will quantify the opportunity first. The benchmark above shows ~1.65s/call of non-fetch wall time still on the table.
Add automated regression tests: (1) functional coverage on the upsert/update endpoints (mostly already present in test_system_history.py), and (2) DB-call-count assertions to prevent silent regression of the fetch-count reduction.

Each upsert call previously issued four `Fetching resource` (`get_resource`) round-trips per row on the UPDATE path: 1. upsert_system: existence check 2. update_system: re-fetch with relationships 3. update_resource: pre-UPDATE sanity (inside crud.update_resource) 4. update_resource: post-UPDATE return (inside crud.update_resource) Three of those are removable without changing behavior: - Thread the System loaded by upsert_system's existence check through to update_system as an optional `existing_system` parameter (eliminates #2). The direct `PUT /system` endpoint caller continues to work; the new parameter is optional and falls back to the original fetch when absent. - Inline the UPDATE in update_system rather than calling crud.update_resource, using the ORM object we already hold and a single db.refresh() to pick up any DB-side coercions (eliminates #3 and #4). Result: 4 -> 1 fetch per updated row. db.refresh() does not flow through crud.get_resource, so it does not emit `Fetching resource` log lines. Also adds per-axis change-detection logging in _audit_system_changes (general / privacy_declarations / data_flow). DeepDiff calls are now captured once into named variables and reused for both the existing SystemHistory write gate and a new `System change detection` debug log line. No behavior change in the audit path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-04-30T20:13:22Z

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
fides-plus-nightly	Ignored	Preview	May 4, 2026 2:59pm
fides-privacy-center	Ignored		May 4, 2026 2:59pm

codecov · 2026-04-30T20:23:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.99%. Comparing base (5496449) to head (f447f65).
⚠️ Report is 6 commits behind head on main.

❌ Your project status has failed because the head coverage (84.99%) is below the target coverage (85.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #8080   +/-   ##
=======================================
  Coverage   84.98%   84.99%           
=======================================
  Files         633      633           
  Lines       41737    41753   +16     
  Branches     4886     4887    +1     
=======================================
+ Hits        35471    35487   +16     
- Misses       5157     5158    +1     
+ Partials     1109     1108    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Fix mypy narrowing of `existing_system: Optional[System]` in `update_system` by introducing a typed local `system: System` populated from either the passed-in argument or `get_resource`. Avoids `assert` in app code and keeps the function body unchanged in shape. - Rename changelog fragment to `8080-...yaml` (filename must start with PR number per `noxfiles/changelog_nox.py:121`) and add the required `pr: 8080` field. - Add three regression tests in `tests/ctl/core/test_system_history.py::TestUpsertSystemFetchOptimization`: - `test_upsert_passes_existing_system_to_update` — mock-based, verifies `upsert_system` threads the loaded System through. - `test_update_system_with_existing_system_persists_changes` — functional, verifies the new code path actually persists + audits. - `test_upsert_system_emits_one_fetch_per_updated_system` — uses `loguru_caplog` to assert exactly one `Fetching resource` log per system on the UPDATE path. Guards against accidental re-introduction of a redundant `get_resource` call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The test re-used the same SystemSchema instances across the prime and measure passes, but `update_system` mutates inputs by stripping `privacy_declarations` after upserting them. The second pass then hit `AttributeError: 'System' object has no attribute 'privacy_declarations'` inside `validate_privacy_declarations`. Build a fresh payload for each pass while keeping the fides_keys stable so both passes hit the same DB rows. Verified by running the test suite in the local fides container — all 9 tests in tests/ctl/core/test_system_history.py pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

adamsachs · 2026-05-02T22:19:15Z

/code-review

claude

Code Review: Eliminate redundant System fetches in `/system/upsert`

The optimization goal is clear and well-motivated. The existing_system pass-through in upsert_system is correct, and pre-computing the three DeepDiff results in _audit_system_changes (avoiding triple re-computation) is a nice cleanup. The test coverage is a good addition.

A few items to address before merge:

Bug — test fixture teardown is a no-op (both fixtures)

delete(System).where(...) in both system fixtures (line 37 pre-existing, line 164 new) constructs but never executes the delete statement — rows accumulate in the test DB across runs. See inline comment on line 164.

Atomicity gap between UPDATE and audit write

The system data UPDATE and the SystemHistory audit entries commit in separate transactions. A failure between them leaves the system updated with no history record. This was structurally the same before, but since the UPDATE is now inlined it would be straightforward to merge both operations into one async with db.begin() block. See inline comment on lines 239–251.

Missing `SQLAlchemyError` guard on inline UPDATE

The removed update_resource caught SQLAlchemyError and raised a domain-level QueryError. The inlined replacement lets raw SQLAlchemy errors propagate. See inline comment on lines 239–248.

Log-string coupling in regression test

test_upsert_system_emits_one_fetch_per_updated_system infers fetch counts by scanning for "Fetching resource" in logs. If the log message changes this gives a false pass. Patching get_resource and counting calls is a more reliable approach. See inline comment on line 247.

Minor: return type annotation improvement

update_system now correctly returns Tuple[System, bool] (was Tuple[Dict, bool]). The route handler at src/fides/api/v1/endpoints/system.py:277 still annotates -> Dict but returns the System ORM object — that inconsistency predates this PR, but it's worth a follow-up cleanup.

🔬 Codegraph: unavailable

💡 Write /code-review in a comment to re-run this review.

claude · 2026-05-02T22:24:12Z

+    async with db.begin():
+        log.debug(
+            "Updating resource",
+            sql_model="System",
+            fides_key=resource.fides_key,
+        )
+        await db.execute(
+            sql_update(System.__table__)
+            .where(System.fides_key == resource.fides_key)
+            .values(resource_dict)


src/fides/api/db/system.py:239-248

The old update_resource wrapped the execute() in a try/except SQLAlchemyError and re-raised a domain-level QueryError with a structured log message. The inlined version lets raw SQLAlchemy exceptions propagate to callers unchecked. Worth adding the same guard:

from sqlalchemy.exc import SQLAlchemyError from fides.api.util import errors try: await db.execute( sql_update(System.__table__) .where(System.fides_key == resource.fides_key) .values(resource_dict) ) except SQLAlchemyError as e: log.exception(f"Failed to update System with error: '{e}'") raise errors.QueryError()

updated in 618c8d7

claude · 2026-05-02T22:24:20Z

+                    fides_key=fk,
+                    organization_fides_key="default_organization",
+                    name=f"Fetch count test {i}",
+                    system_type="test",


tests/ctl/core/test_system_history.py:247

test_upsert_system_emits_one_fetch_per_updated_system relies on the literal string "Fetching resource" appearing in log output to count DB fetches. This is a fragile coupling to crud.get_resource's internal log message — if that string ever changes or the log level is raised, the assertion len(matching) == 1 will always pass even if the regression is re-introduced.

A more robust approach would be to patch get_resource itself and count invocations:

with patch("fides.api.db.system.get_resource", wraps=get_resource) as mock_get: await upsert_system(resources=build_payload(), db=async_session, ...) # Each system should only be fetched once (existence check in upsert_system). assert mock_get.call_count == len(fides_keys)

This directly measures what the optimization is meant to guarantee rather than inferring it through log messages.

adjusted to use the mocking in 618c8d7

- Add SQLAlchemyError -> errors.QueryError guard around the inline UPDATE in update_system, mirroring the guard the prior crud.update_resource call had. Without this the inlined UPDATE lets raw SQLAlchemy errors propagate, a small behavior regression. - Refactor test_upsert_system_emits_one_fetch_per_updated_system to patch crud.get_resource at the call site (with `wraps=` so behavior is unchanged) and assert mock.call_count, instead of scanning loguru output for the literal 'Fetching resource' string. Directly measures the metric the optimization is meant to guarantee and isn't coupled to log message wording. Skipped on purpose to keep scope tight: - Fixture teardown delete() that's never executed -- the same no-op pattern exists in the pre-existing fixture at line 37; out of scope. - Atomicity gap between the inline UPDATE and the SystemHistory write in separate `async with db.begin()` blocks -- structurally pre-existing (was the same shape via crud.update_resource); tracked for follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

erosselli

Approved with an ask

erosselli · 2026-05-04T14:08:54Z

+        except SQLAlchemyError as exc:
+            # Mirrors the guard the prior `crud.update_resource` call had.
+            log.exception(f"Failed to update System with error: '{exc}'")
+            raise errors.QueryError()


these lines are uncovered by tests, can we add a test for the exception case ?

[Claude]: added in f447f65941 — test_update_system_raises_query_error_on_sqlalchemy_error patches async_session.execute to raise SQLAlchemyError only on the System UPDATE and asserts errors.QueryError propagates.

Addresses erosselli's review on PR #8080: the inlined UPDATE's `except SQLAlchemyError` branch was uncovered. New test patches `async_session.execute` to raise on the System UPDATE only and asserts `errors.QueryError` propagates, matching the guard the prior `crud.update_resource` call provided. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

adamsachs requested a review from a team as a code owner April 30, 2026 20:13

adamsachs requested review from thabofletcher and removed request for a team April 30, 2026 20:13

adamsachs marked this pull request as draft April 30, 2026 20:14

adamsachs requested review from JadeCara and erosselli and removed request for thabofletcher April 30, 2026 20:14

adamsachs and others added 3 commits May 1, 2026 16:09

remove unnecessary comment

7f46213

claude Bot reviewed May 2, 2026

View reviewed changes

adamsachs marked this pull request as ready for review May 4, 2026 13:58

erosselli approved these changes May 4, 2026

View reviewed changes

adamsachs added this pull request to the merge queue May 4, 2026

Merged via the queue into main with commit d6e1f0b May 4, 2026
68 of 69 checks passed

adamsachs deleted the adam/eng-3593-eliminate-redundant-system-fetches branch May 4, 2026 16:07

adamsachs added a commit that referenced this pull request May 4, 2026

ENG-3593: Eliminate redundant System fetches in /system/upsert (#8080)

1093d36

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENG-3593: Eliminate redundant System fetches in /system/upsert#8080

ENG-3593: Eliminate redundant System fetches in /system/upsert#8080
adamsachs merged 6 commits intomainfrom
adam/eng-3593-eliminate-redundant-system-fetches

adamsachs commented Apr 30, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

adamsachs commented May 2, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

Uh oh!

claude Bot May 2, 2026

Uh oh!

adamsachs May 2, 2026

Uh oh!

claude Bot May 2, 2026

Uh oh!

adamsachs May 2, 2026

Uh oh!

erosselli left a comment

Uh oh!

erosselli May 4, 2026

Uh oh!

adamsachs May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adamsachs commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description Of Changes

Code Changes

Performance verification (local benchmark)

Steps to Confirm

Pre-Merge Checklist

Followups

Uh oh!

vercel Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

adamsachs commented May 2, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Code Review: Eliminate redundant System fetches in /system/upsert

Bug — test fixture teardown is a no-op (both fixtures)

Atomicity gap between UPDATE and audit write

Missing SQLAlchemyError guard on inline UPDATE

Log-string coupling in regression test

Minor: return type annotation improvement

Uh oh!

Uh oh!

Uh oh!

claude Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

adamsachs May 2, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

adamsachs May 2, 2026

Choose a reason for hiding this comment

Uh oh!

erosselli left a comment

Choose a reason for hiding this comment

Uh oh!

erosselli May 4, 2026

Choose a reason for hiding this comment

Uh oh!

adamsachs May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adamsachs commented Apr 30, 2026 •

edited

Loading

vercel Bot commented Apr 30, 2026 •

edited

Loading

codecov Bot commented Apr 30, 2026 •

edited

Loading

Code Review: Eliminate redundant System fetches in `/system/upsert`

Missing `SQLAlchemyError` guard on inline UPDATE