Skip to content

ENG-3593: Eliminate redundant System fetches in /system/upsert#8080

Merged
adamsachs merged 6 commits intomainfrom
adam/eng-3593-eliminate-redundant-system-fetches
May 4, 2026
Merged

ENG-3593: Eliminate redundant System fetches in /system/upsert#8080
adamsachs merged 6 commits intomainfrom
adam/eng-3593-eliminate-redundant-system-fetches

Conversation

@adamsachs
Copy link
Copy Markdown
Contributor

@adamsachs adamsachs commented Apr 30, 2026

Ticket ENG-3593

Description Of Changes

Reduces per-row get_resource calls in /system/upsert from 4 → 1 on the UPDATE path, and adds per-axis change-detection logging in _audit_system_changes so we can quantify the steady-state no-op rate before considering a "skip-unchanged" optimization.

Each updated system on the UPDATE path was previously issuing four Fetching resource round-trips:

  1. upsert_system — existence check
  2. update_system — re-fetch with relationships
  3. update_resource — pre-UPDATE sanity (inside crud.update_resource)
  4. update_resource — post-UPDATE return (inside crud.update_resource)

Three are pure dedup: none of them inspect the row for anything except "does it exist" or "what is its current state," both of which the caller already has by step (1).

This was identified as a contributor to the system memory baseline during the recent system/upsert perf investigation; reducing fetches lowers both wall time and peak memory on this endpoint.

Code Changes

  • upsert_system keeps the System returned by its existence check and threads it to update_system via a new optional existing_system= parameter. The direct PUT /system endpoint caller (no pre-loaded System) continues to work — the parameter defaults to None and falls back to the original fetch.
  • update_system replaces its call to crud.update_resource with an inline sql_update(System.__table__).where(...).values(resource_dict) plus db.refresh(existing_system). The refresh re-reads the row so DB-side coercions (timestamps, JSONB normalization, etc.) are still picked up; the _audit_system_changes comparison stays equivalent to the prior behavior.
  • _audit_system_changes captures each DeepDiff once into named variables (general_diff, privacy_diff, data_flow_diff) and emits a structured System change detection debug log per system with general_changed, privacy_declarations_changed, data_flow_changed, any_changed, plus affected-paths counts per axis. No behavior change — same if-gates, same SystemHistory writes; this is purely observability.
  • db.refresh() does not flow through crud.get_resource, so it does not emit a Fetching resource log line. The Fetching resource count on the same payload drops from ~4N to ~N (the remaining one is upsert_system's existence check, which a follow-up could batch).

Performance verification (local benchmark)

reproduce_upsert_perf.py

Methodology: build a payload of N Systems × M declarations, prime once (INSERT path), then upsert the same payload three times to exercise the UPDATE path on every row with zero data drift. Run on main and this branch in turn against the same local backend / DB. Backend at DEBUG log level so Fetching resource / Updating resource lines emit. Reproduction script will be attached as a comment on this PR.

Payload: 180 systems × 3 declarations each.

Metric main This PR Δ
Wall time min 2.92s 1.51s −48%
Wall time avg (n=3) 3.05s 1.65s −46%
Wall time max 3.19s 1.83s −43%
Fetching resource per call 720 180 −75% (4N → N, exactly as predicted)
Updating resource per call 180 180 unchanged ✓
System change detection per call 0 180 new observability ✓

The fetch reduction matches the call-graph trace exactly. The wall-time reduction (~46%) is less than the fetch-count reduction (~75%), meaning roughly half of the original UPDATE-path wall time is non-fetch work (DeepDiff, model_dump, deepcopy of existing snapshot) — which is exactly the headroom available for the higher-risk "skip-unchanged" follow-up.

Steps to Confirm

  1. Run tests/ctl/core/test_system_history.py — covers update_system + _audit_system_changes directly. The critical case is test_no_changes (idempotent upsert ⇒ zero SystemHistory rows). All six cases should pass: test_system_information_changes, test_privacy_declaration_changes, test_ingress_egress_changes, test_multiple_changes, test_no_changes, test_automatic_system_update.
  2. Run tests/ctl/core/test_system.py and tests/ops/api/v1/endpoints/test_system.py for broader coverage and to exercise the PUT /system path (which uses the existing_system=None fallback).
  3. Reproduce the perf measurement above using the script attached as a PR comment. With 180 systems × 3 declarations × 3 measure runs, expect ~180 Fetching resource logs per measure call (down from ~720), wall time roughly halved on local Postgres, and one System change detection line per row with any_changed=False on idempotent payloads.

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

Followups

  • Replace upsert_system's per-row existence check with a single batched SELECT ... WHERE fides_key IN (...) plus selectinload(System.privacy_declarations). Collapses the remaining N fetches to ~1.
  • Skip the UPDATE + audit pipeline entirely for unchanged systems. Higher correctness risk (equivalence checks, updated_at semantics) — the System change detection logs added here will quantify the opportunity first. The benchmark above shows ~1.65s/call of non-fetch wall time still on the table.
  • Add automated regression tests: (1) functional coverage on the upsert/update endpoints (mostly already present in test_system_history.py), and (2) DB-call-count assertions to prevent silent regression of the fetch-count reduction.

Each upsert call previously issued four `Fetching resource` (`get_resource`)
round-trips per row on the UPDATE path:

  1. upsert_system:   existence check
  2. update_system:   re-fetch with relationships
  3. update_resource: pre-UPDATE sanity (inside crud.update_resource)
  4. update_resource: post-UPDATE return (inside crud.update_resource)

Three of those are removable without changing behavior:

- Thread the System loaded by upsert_system's existence check through to
  update_system as an optional `existing_system` parameter (eliminates #2).
  The direct `PUT /system` endpoint caller continues to work; the new
  parameter is optional and falls back to the original fetch when absent.

- Inline the UPDATE in update_system rather than calling crud.update_resource,
  using the ORM object we already hold and a single db.refresh() to pick up
  any DB-side coercions (eliminates #3 and #4).

Result: 4 -> 1 fetch per updated row. db.refresh() does not flow through
crud.get_resource, so it does not emit `Fetching resource` log lines.

Also adds per-axis change-detection logging in _audit_system_changes
(general / privacy_declarations / data_flow). DeepDiff calls are now
captured once into named variables and reused for both the existing
SystemHistory write gate and a new `System change detection` debug log
line. No behavior change in the audit path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adamsachs adamsachs requested a review from a team as a code owner April 30, 2026 20:13
@adamsachs adamsachs requested review from thabofletcher and removed request for a team April 30, 2026 20:13
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview May 4, 2026 2:59pm
fides-privacy-center Ignored Ignored May 4, 2026 2:59pm

Request Review

@adamsachs adamsachs marked this pull request as draft April 30, 2026 20:14
@adamsachs adamsachs requested review from JadeCara and erosselli and removed request for thabofletcher April 30, 2026 20:14
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.99%. Comparing base (5496449) to head (f447f65).
⚠️ Report is 6 commits behind head on main.

❌ Your project status has failed because the head coverage (84.99%) is below the target coverage (85.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8080   +/-   ##
=======================================
  Coverage   84.98%   84.99%           
=======================================
  Files         633      633           
  Lines       41737    41753   +16     
  Branches     4886     4887    +1     
=======================================
+ Hits        35471    35487   +16     
- Misses       5157     5158    +1     
+ Partials     1109     1108    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

adamsachs and others added 3 commits May 1, 2026 16:09
- Fix mypy narrowing of `existing_system: Optional[System]` in
  `update_system` by introducing a typed local `system: System` populated
  from either the passed-in argument or `get_resource`. Avoids `assert`
  in app code and keeps the function body unchanged in shape.
- Rename changelog fragment to `8080-...yaml` (filename must start with
  PR number per `noxfiles/changelog_nox.py:121`) and add the required
  `pr: 8080` field.
- Add three regression tests in
  `tests/ctl/core/test_system_history.py::TestUpsertSystemFetchOptimization`:
  - `test_upsert_passes_existing_system_to_update` — mock-based,
    verifies `upsert_system` threads the loaded System through.
  - `test_update_system_with_existing_system_persists_changes` —
    functional, verifies the new code path actually persists + audits.
  - `test_upsert_system_emits_one_fetch_per_updated_system` — uses
    `loguru_caplog` to assert exactly one `Fetching resource` log per
    system on the UPDATE path. Guards against accidental re-introduction
    of a redundant `get_resource` call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The test re-used the same SystemSchema instances across the prime and
measure passes, but `update_system` mutates inputs by stripping
`privacy_declarations` after upserting them. The second pass then hit
`AttributeError: 'System' object has no attribute 'privacy_declarations'`
inside `validate_privacy_declarations`.

Build a fresh payload for each pass while keeping the fides_keys stable
so both passes hit the same DB rows. Verified by running the test suite
in the local fides container — all 9 tests in
tests/ctl/core/test_system_history.py pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adamsachs
Copy link
Copy Markdown
Contributor Author

/code-review

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Eliminate redundant System fetches in /system/upsert

The optimization goal is clear and well-motivated. The existing_system pass-through in upsert_system is correct, and pre-computing the three DeepDiff results in _audit_system_changes (avoiding triple re-computation) is a nice cleanup. The test coverage is a good addition.

A few items to address before merge:

Bug — test fixture teardown is a no-op (both fixtures)

delete(System).where(...) in both system fixtures (line 37 pre-existing, line 164 new) constructs but never executes the delete statement — rows accumulate in the test DB across runs. See inline comment on line 164.

Atomicity gap between UPDATE and audit write

The system data UPDATE and the SystemHistory audit entries commit in separate transactions. A failure between them leaves the system updated with no history record. This was structurally the same before, but since the UPDATE is now inlined it would be straightforward to merge both operations into one async with db.begin() block. See inline comment on lines 239–251.

Missing SQLAlchemyError guard on inline UPDATE

The removed update_resource caught SQLAlchemyError and raised a domain-level QueryError. The inlined replacement lets raw SQLAlchemy errors propagate. See inline comment on lines 239–248.

Log-string coupling in regression test

test_upsert_system_emits_one_fetch_per_updated_system infers fetch counts by scanning for "Fetching resource" in logs. If the log message changes this gives a false pass. Patching get_resource and counting calls is a more reliable approach. See inline comment on line 247.

Minor: return type annotation improvement

update_system now correctly returns Tuple[System, bool] (was Tuple[Dict, bool]). The route handler at src/fides/api/v1/endpoints/system.py:277 still annotates -> Dict but returns the System ORM object — that inconsistency predates this PR, but it's worth a follow-up cleanup.


🔬 Codegraph: unavailable


💡 Write /code-review in a comment to re-run this review.

Comment thread tests/ctl/core/test_system_history.py
Comment thread src/fides/api/db/system.py
Comment thread src/fides/api/db/system.py Outdated
Comment on lines +239 to +248
async with db.begin():
log.debug(
"Updating resource",
sql_model="System",
fides_key=resource.fides_key,
)
await db.execute(
sql_update(System.__table__)
.where(System.fides_key == resource.fides_key)
.values(resource_dict)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/fides/api/db/system.py:239-248

The old update_resource wrapped the execute() in a try/except SQLAlchemyError and re-raised a domain-level QueryError with a structured log message. The inlined version lets raw SQLAlchemy exceptions propagate to callers unchecked. Worth adding the same guard:

from sqlalchemy.exc import SQLAlchemyError
from fides.api.util import errors

try:
    await db.execute(
        sql_update(System.__table__)
        .where(System.fides_key == resource.fides_key)
        .values(resource_dict)
    )
except SQLAlchemyError as e:
    log.exception(f"Failed to update System with error: '{e}'")
    raise errors.QueryError()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in 618c8d7

fides_key=fk,
organization_fides_key="default_organization",
name=f"Fetch count test {i}",
system_type="test",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/ctl/core/test_system_history.py:247

test_upsert_system_emits_one_fetch_per_updated_system relies on the literal string "Fetching resource" appearing in log output to count DB fetches. This is a fragile coupling to crud.get_resource's internal log message — if that string ever changes or the log level is raised, the assertion len(matching) == 1 will always pass even if the regression is re-introduced.

A more robust approach would be to patch get_resource itself and count invocations:

with patch("fides.api.db.system.get_resource", wraps=get_resource) as mock_get:
    await upsert_system(resources=build_payload(), db=async_session, ...)

# Each system should only be fetched once (existence check in upsert_system).
assert mock_get.call_count == len(fides_keys)

This directly measures what the optimization is meant to guarantee rather than inferring it through log messages.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjusted to use the mocking in 618c8d7

- Add SQLAlchemyError -> errors.QueryError guard around the inline
  UPDATE in update_system, mirroring the guard the prior
  crud.update_resource call had. Without this the inlined UPDATE lets
  raw SQLAlchemy errors propagate, a small behavior regression.

- Refactor test_upsert_system_emits_one_fetch_per_updated_system to
  patch crud.get_resource at the call site (with `wraps=` so behavior
  is unchanged) and assert mock.call_count, instead of scanning loguru
  output for the literal 'Fetching resource' string. Directly measures
  the metric the optimization is meant to guarantee and isn't coupled
  to log message wording.

Skipped on purpose to keep scope tight:
- Fixture teardown delete() that's never executed -- the same no-op
  pattern exists in the pre-existing fixture at line 37; out of scope.
- Atomicity gap between the inline UPDATE and the SystemHistory write
  in separate `async with db.begin()` blocks -- structurally
  pre-existing (was the same shape via crud.update_resource); tracked
  for follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adamsachs adamsachs marked this pull request as ready for review May 4, 2026 13:58
Copy link
Copy Markdown
Contributor

@erosselli erosselli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with an ask

Comment on lines +253 to +256
except SQLAlchemyError as exc:
# Mirrors the guard the prior `crud.update_resource` call had.
log.exception(f"Failed to update System with error: '{exc}'")
raise errors.QueryError()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these lines are uncovered by tests, can we add a test for the exception case ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude]: added in f447f65941test_update_system_raises_query_error_on_sqlalchemy_error patches async_session.execute to raise SQLAlchemyError only on the System UPDATE and asserts errors.QueryError propagates.

Addresses erosselli's review on PR #8080: the inlined UPDATE's
`except SQLAlchemyError` branch was uncovered. New test patches
`async_session.execute` to raise on the System UPDATE only and
asserts `errors.QueryError` propagates, matching the guard the
prior `crud.update_resource` call provided.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adamsachs adamsachs added this pull request to the merge queue May 4, 2026
Merged via the queue into main with commit d6e1f0b May 4, 2026
68 of 69 checks passed
@adamsachs adamsachs deleted the adam/eng-3593-eliminate-redundant-system-fetches branch May 4, 2026 16:07
adamsachs added a commit that referenced this pull request May 4, 2026
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants