Skip to content

ENG-3517: Foundation - attachment_user_provided model + repository#8110

Open
mikeGarifullin wants to merge 1 commit intomainfrom
ENG-3517-2
Open

ENG-3517: Foundation - attachment_user_provided model + repository#8110
mikeGarifullin wants to merge 1 commit intomainfrom
ENG-3517-2

Conversation

@mikeGarifullin
Copy link
Copy Markdown
Contributor

@mikeGarifullin mikeGarifullin commented May 5, 2026

Ticket ENG-3517

Description Of Changes

Data layer for the data-subject file-upload feature: attachment_user_provided table + repository primitives. The upload service + endpoint live in fidesplus and consume this PR.

The table tracks an upload's lifecycle independently of attachment so a user can upload before a privacy request exists. Lifecycle: uploaded → promoted | deleted. deleted rows are kept for audit. The (field_name, property_id, policy_key) triple binds an upload to the privacy-center config it was issued under so re-validation at submission can refuse mismatches. storage_key is a plain string (no FK) so rows survive storage-config churn.

Code Changes

  • models/attachment.py: AttachmentUserProvidedStatus enum + AttachmentUserProvided model. (field_name, property_id, policy_key) are required at insert. Composite index (status, created_at) for the orphan-sweep predicate.
  • alembic/.../9b449105864d_add_attachment_user_provided.py: table + enum + index. down_revision = "3a91e5d4f7b2". Reversible.
  • service/privacy_request_attachments/privacy_request_attachments_repository.py:
    • create_uploaded — insert + flush + refresh, returns AttachmentUserProvidedRecord
    • lock_by_idsSELECT … FOR UPDATE ordered by id (deadlock-safe under overlapping batches), returns dict[str, row] so callers can detect missing ids
    • assert_all_uploaded — fail-fast precondition
    • mark_promoted — pure mutation, caller owns the session
    • mark_deleted — in-session mutation, no commit
    • list_uploaded_older_than — orphan-sweep, exclusive cutoff
  • service/privacy_request_attachments/privacy_request_attachments_entities.py: AttachmentUserProvidedRecord detach-safe dataclass.
  • service/privacy_request_attachments/privacy_request_attachments_exceptions.py: AttachmentsServiceError base + InvalidAttachmentStateError. Other domain errors will land with their consumers.
  • tests/ops/service/privacy_request_attachments/test_repository.py: lifecycle suite covering every public repo method.
  • .fides/db_dataset.yml: dataset entries for the new table.

Steps to Confirm

  1. alembic upgrade head — revision 9b449105864d applies.
  2. \d attachment_user_provided in psql — columns, attachmentuserprovidedstatus enum, ix_attachment_user_provided_status_created_at index all present.
  3. alembic downgrade -1 — table, enum, and index drop cleanly.
  4. alembic upgrade head again to land on the new revision.
  5. nox -s "pytest(ops-unit-non-api)" -- tests/ops/service/privacy_request_attachments/test_repository.py — all green.

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview May 7, 2026 2:52pm
fides-privacy-center Ignored Ignored May 7, 2026 2:52pm

Request Review

@mikeGarifullin mikeGarifullin added the db-migration This indicates that a change includes a database migration label May 5, 2026
@mikeGarifullin mikeGarifullin force-pushed the ENG-3517-2 branch 3 times, most recently from 2db5bf4 to bd91f16 Compare May 5, 2026 18:28
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.26%. Comparing base (292d8b2) to head (7655326).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8110      +/-   ##
==========================================
+ Coverage   85.23%   85.26%   +0.03%     
==========================================
  Files         638      641       +3     
  Lines       42011    42091      +80     
  Branches     4937     4941       +4     
==========================================
+ Hits        35807    35888      +81     
  Misses       5096     5096              
+ Partials     1108     1107       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mikeGarifullin mikeGarifullin marked this pull request as ready for review May 5, 2026 19:46
@mikeGarifullin mikeGarifullin requested a review from a team as a code owner May 5, 2026 19:46
@mikeGarifullin mikeGarifullin requested review from johnewart and removed request for a team May 5, 2026 19:46
mikeGarifullin added a commit that referenced this pull request May 6, 2026
Schema + storage + service-extension half of the upload foundation. Pairs
with the data-layer PR (#8110); the upload service and endpoint live in
fidesplus and consume both.

Adds the Privacy Center config variant for file fields, a magic-byte sniff
catalog so the upload route can verify a payload's claimed type at byte
zero (client-supplied Content-Type is not trusted), two config knobs for
the global upload ceilings, and two PrivacyRequestService hook points so
fidesplus can plug attachment-resolve and attachment-promote into the
existing submission flow without overriding create_privacy_request
wholesale.

* schemas/privacy_center_config.py: FileUploadCustomPrivacyRequestField
  discriminated-union variant (max_size_bytes > 0, allowed_file_types
  non-empty + must be a subset of AllowedFileType).
* schemas/attachment.py: PrivacyRequestAttachment response payload.
* service/storage/util.py: FilesMagicBytes (per-extension signatures +
  candidates / extensions_without_magic), AllowedFileType helpers,
  MIME_TO_EXTENSION, extension_for_mime, FileUploadConstraints
  (self-validated bag).
* service/privacy_request/privacy_request_service.py:
  - _resolve_attachment_state(req, action) and
    _promote_attachment_state(privacy_request, state) — overridable
    no-op hooks.
  - create_privacy_request resolves the action once, forwards to
    validators + the resolve hook, then promotes after request creation.
    Promotion failures delete the just-created row and rewrap with a
    generic message (sensitive detail goes to logs / __cause__, not the
    user-facing error).
* common/urn_registry.py: route constant for the future fidesplus upload
  endpoint.
* config/security_settings.py: request_attachment_max_bytes ceiling.
* config/execution_settings.py: attachment_orphan_ttl_seconds.
* Tests: schema variant, magic-byte catalog, FileUploadConstraints,
  attachment-state hook defaults, create_privacy_request orchestration
  (hook receives the same action sentinel), promotion-failure rollback
  (request deleted, leaky detail sanitized, delete-failure still
  surfaces promotion exception, hook-raised PrivacyRequestError isn't
  double-rewrapped).
mikeGarifullin added a commit that referenced this pull request May 6, 2026
Schema + storage + service-extension half of the upload foundation. Pairs
with the data-layer PR (#8110); the upload service and endpoint live in
fidesplus and consume both.

Adds the Privacy Center config variant for file fields, a magic-byte sniff
catalog so the upload route can verify a payload's claimed type at byte
zero (client-supplied Content-Type is not trusted), two config knobs for
the global upload ceilings, and two PrivacyRequestService hook points so
fidesplus can plug attachment-resolve and attachment-promote into the
existing submission flow without overriding create_privacy_request
wholesale.

* schemas/privacy_center_config.py: FileUploadCustomPrivacyRequestField
  discriminated-union variant (max_size_bytes > 0, allowed_file_types
  non-empty + must be a subset of AllowedFileType).
* schemas/attachment.py: PrivacyRequestAttachment response payload.
* service/storage/util.py: FilesMagicBytes (per-extension signatures +
  candidates / extensions_without_magic), AllowedFileType helpers,
  MIME_TO_EXTENSION, extension_for_mime, FileUploadConstraints
  (self-validated bag).
* service/privacy_request/privacy_request_service.py:
  - _resolve_attachment_state(req, action) and
    _promote_attachment_state(privacy_request, state) — overridable
    no-op hooks.
  - create_privacy_request resolves the action once, forwards to
    validators + the resolve hook, then promotes after request creation.
    Promotion failures delete the just-created row and rewrap with a
    generic message (sensitive detail goes to logs / __cause__, not the
    user-facing error).
* common/urn_registry.py: route constant for the future fidesplus upload
  endpoint.
* config/security_settings.py: request_attachment_max_bytes ceiling.
* config/execution_settings.py: attachment_orphan_ttl_seconds.
* Tests: schema variant, magic-byte catalog, FileUploadConstraints,
  attachment-state hook defaults, create_privacy_request orchestration
  (hook receives the same action sentinel), promotion-failure rollback
  (request deleted, leaky detail sanitized, delete-failure still
  surfaces promotion exception, hook-raised PrivacyRequestError isn't
  double-rewrapped).
mikeGarifullin added a commit that referenced this pull request May 6, 2026
Schema + storage + service-extension half of the upload foundation. Pairs
with the data-layer PR (#8110); the upload service and endpoint live in
fidesplus and consume both.

Adds the Privacy Center config variant for file fields, a magic-byte sniff
catalog so the upload route can verify a payload's claimed type at byte
zero (client-supplied Content-Type is not trusted), two config knobs for
the global upload ceilings, and two PrivacyRequestService hook points so
fidesplus can plug attachment-resolve and attachment-promote into the
existing submission flow without overriding create_privacy_request
wholesale.

* schemas/privacy_center_config.py: FileUploadCustomPrivacyRequestField
  discriminated-union variant (max_size_bytes > 0, allowed_file_types
  non-empty + must be a subset of AllowedFileType).
* schemas/attachment.py: PrivacyRequestAttachment response payload.
* service/storage/util.py: FilesMagicBytes (per-extension signatures +
  candidates / extensions_without_magic), AllowedFileType helpers,
  MIME_TO_EXTENSION, extension_for_mime, FileUploadConstraints
  (self-validated bag).
* service/privacy_request/privacy_request_service.py:
  - _resolve_attachment_state(req, action) and
    _promote_attachment_state(privacy_request, state) — overridable
    no-op hooks.
  - create_privacy_request resolves the action once, forwards to
    validators + the resolve hook, then promotes after request creation.
    Promotion failures delete the just-created row and rewrap with a
    generic message (sensitive detail goes to logs / __cause__, not the
    user-facing error).
* common/urn_registry.py: route constant for the future fidesplus upload
  endpoint.
* config/security_settings.py: request_attachment_max_bytes ceiling.
* config/execution_settings.py: attachment_orphan_ttl_seconds.
* Tests: schema variant, magic-byte catalog, FileUploadConstraints,
  attachment-state hook defaults, create_privacy_request orchestration
  (hook receives the same action sentinel), promotion-failure rollback
  (request deleted, leaky detail sanitized, delete-failure still
  surfaces promotion exception, hook-raised PrivacyRequestError isn't
  double-rewrapped).
@mikeGarifullin mikeGarifullin force-pushed the ENG-3517-2 branch 5 times, most recently from dc2b693 to 5959e6c Compare May 7, 2026 14:37
Data layer for the data-subject file-upload feature: `attachment_user_provided`
table + repository primitives. Upload service + endpoint live in fidesplus
and consume this PR.

Lifecycle: uploaded → promoted | deleted (deleted rows kept for audit).
The (field_name, property_id, policy_key) triple binds an upload to the
privacy-center config it was issued under so re-validation at submission
can refuse mismatches. storage_key is intentionally non-FK so audit rows
survive storage-config churn.

* Model: AttachmentUserProvidedStatus enum + AttachmentUserProvided model
  with composite (status, created_at) index for the orphan-sweep predicate.
  field_name/property_id/policy_key are intentionally non-FK so audit rows
  survive policy/property/field rename or delete between upload and
  submission.
* Migration 9b449105864d, reversible.
* Repository:
  - create_uploaded: insert + flush + refresh, returns AttachmentUserProvidedRecord
  - lock_by_ids: SELECT … FOR UPDATE ordered by id (deadlock-safe under
    overlapping batches), returns dict[str, row] keyed by id; missing ids
    absent. session is required (no @with_optional_sync_session) since the
    lock is only meaningful inside a caller-owned transaction.
  - assert_all_uploaded: lifecycle precondition check
  - mark_promoted: pure mutation, caller owns the session boundary
  - mark_deleted: in-session only, no commit
  - list_uploaded_older_than: orphan-sweep query, exclusive cutoff
* Domain entities (AttachmentUserProvidedRecord) + exceptions
  (AttachmentsServiceError + InvalidAttachmentStateError).
* Test suite covers every public method incl. lifecycle invariants,
  empty-list short-circuit, missing-id detection, cutoff exclusivity.
* .fides/db_dataset.yml: dataset entries for the new table.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

db-migration This indicates that a change includes a database migration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant