Skip to content

Attachment refactor#7286

Merged
galvana merged 30 commits intomainfrom
attachment-refactor
Feb 20, 2026
Merged

Attachment refactor#7286
galvana merged 30 commits intomainfrom
attachment-refactor

Conversation

@galvana
Copy link
Contributor

@galvana galvana commented Feb 2, 2026

Description Of Changes

Refactors the attachment storage system to use a proper service layer architecture with the Strategy and Factory design patterns. This improves code organization, testability, and maintainability by separating storage concerns from the data model.

Key architectural improvements:

  • Introduces a unified StorageProvider abstract base class that defines a consistent interface for all storage backends (S3, GCS, Local)
  • Implements the Factory pattern via StorageProviderFactory for clean provider instantiation based on configuration
  • Creates a new AttachmentService in the service layer to handle all storage operations, following the project's layered architecture (API routes → Services → Models)
  • Removes storage logic from the Attachment model, keeping it focused on data representation only

Code Changes

  • Added StorageProvider abstract base class (src/fides/api/service/storage/providers/base.py) defining unified interface for upload, download, delete, presigned URLs, and file operations
  • Added S3StorageProvider implementation (src/fides/api/service/storage/providers/s3_provider.py)
  • Added GCSStorageProvider implementation (src/fides/api/service/storage/providers/gcs_provider.py)
  • Added LocalStorageProvider implementation (src/fides/api/service/storage/providers/local_provider.py)
  • Added StorageProviderFactory (src/fides/api/service/storage/providers/factory.py) for creating providers from configuration
  • Added AttachmentService (src/fides/service/attachment_service.py) for attachment storage operations with methods for upload, retrieve, delete, and reference management
  • Refactored Attachment model to remove embedded storage logic
  • Updated external_data_storage.py to use new provider architecture
  • Updated polling_attachment_handler.py to use AttachmentService
  • Added comprehensive test suite for storage providers including contract tests and characterization tests

Steps to Confirm

  1. Verify attachment uploads work correctly for S3, GCS, and local storage configurations
  2. Verify attachment downloads return correct presigned URLs
  3. Verify attachment deletion removes files from storage and cleans up database records
  4. Verify privacy request attachments still work end-to-end
  5. Verify manual task attachments still work correctly
  6. Run the new storage provider tests: pytest tests/ops/service/storage/providers/
  7. Run existing attachment tests: pytest tests/ctl/models/test_attachment.py

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

@galvana galvana requested a review from JadeCara February 2, 2026 02:36
@vercel
Copy link
Contributor

vercel bot commented Feb 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Feb 20, 2026 2:42am
fides-privacy-center Ignored Ignored Feb 20, 2026 2:42am

Request Review

@galvana galvana marked this pull request as ready for review February 6, 2026 22:37
@galvana galvana requested a review from a team as a code owner February 6, 2026 22:37
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 6, 2026

Greptile Overview

Greptile Summary

This PR refactors the attachment storage system using the Strategy and Factory design patterns, creating a proper service layer architecture. The refactor successfully separates storage concerns from the data model and provides a unified interface for S3, GCS, and local storage backends.

Key Improvements:

  • Introduced StorageProvider abstract base class defining a consistent interface for all storage backends
  • Implemented StorageProviderFactory using the Factory pattern for clean provider instantiation
  • Created AttachmentService in the service layer to handle all storage operations
  • Removed storage logic from the Attachment model, keeping it focused on data representation
  • Updated ExternalDataStorageService and PollingAttachmentHandler to use the new architecture
  • Added comprehensive contract tests ensuring all providers satisfy the same behavioral interface

Architecture Quality:
The refactor demonstrates solid software engineering principles with proper separation of concerns, the Adapter pattern for backward compatibility, and thorough testing. The streaming imports in S3StorageProvider correctly use lazy loading inside methods (lines 298, 326) to avoid circular dependencies.

Process Note:
This PR exceeds the repository's size guidelines (38 files, 3327 insertions vs. recommended 15 files/500 lines per custom rule 32b2f192-9cdf-48f0-be68-5c0eaa04fc70). While the refactor is well-structured, consider splitting large architectural changes into smaller, more reviewable PRs in the future.

Confidence Score: 4/5

  • This PR is safe to merge with careful testing of all storage backends
  • Score of 4 reflects a well-architected refactor with comprehensive testing, but docked one point due to the large scope (38 files, 3327 insertions) which increases risk and makes thorough review challenging. The code quality is high with proper design patterns, backward compatibility via the Adapter pattern, and contract tests ensuring consistent behavior across providers. All storage operations have been properly abstracted and the service layer follows the project's architectural patterns.
  • No files require special attention - the refactor is consistently implemented across all components

Important Files Changed

Filename Overview
src/fides/api/service/storage/providers/base.py New abstract base class for storage providers with comprehensive interface definition
src/fides/api/service/storage/providers/factory.py Factory pattern implementation for creating storage provider instances
src/fides/service/attachment_service.py Service layer for attachment operations with proper separation of concerns
src/fides/api/service/storage/providers/s3_provider.py S3 storage provider implementation wrapping existing functions, includes streaming support with lazy imports
src/fides/api/service/storage/providers/gcs_provider.py GCS storage provider implementation with proper error handling
src/fides/api/service/storage/providers/local_provider.py Local filesystem storage provider for development and testing
src/fides/api/models/attachment.py Attachment model simplified with storage logic removed
src/fides/api/service/external_data_storage.py Updated to use new provider factory pattern for external data storage

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Comment on lines +153 to +154
"""Creates a new attachment record in the database."""
return super().create(db=db, data=data, check_name=check_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Comment on lines +38 to +41
# Mapping of storage types to their uploader functions
# These functions handle DSR-specific data transformation before upload
_DSR_UPLOADERS: Dict[str, Any] = {} # Populated lazily

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this do anything? I didn't see it referenced anywhere in the changes or code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it was unused. The actual routing happens inline in _get_uploader_from_config_type. Removed it.

db = self._require_db()
return Attachment.get_by_key_or_id(db, data={"id": attachment_id})

def _get_provider_and_bucket(self, attachment: Attachment) -> Tuple[Any, str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Any right or should it be StorageProvider?

Suggested change
def _get_provider_and_bucket(self, attachment: Attachment) -> Tuple[Any, str]:
def _get_provider_and_bucket(self, attachment: Attachment) -> Tuple[StorageProvider, str]:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to Tuple[StorageProvider, str] and added the import. Also changed the return type of retrieve_url from Tuple[int, AnyHttpUrlString] to Tuple[int, str] since generate_presigned_url returns str across all providers (the local provider returns a file path, not a valid HTTP URL).


try:
blob.delete()
except (AttributeError, RuntimeError) as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GCS sdk has some specific exceptions for not finding a file - i think its google.api.core.exceptions.NotFound or a broader google.api.core.exceptions.GoogleAPICallError. Could we perhaps use those here so we arent accidentally swallowing other unexpected errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Replaced (AttributeError, RuntimeError) with NotFound and GoogleAPICallError from google.api_core.exceptions in delete, delete_prefix, and generate_presigned_url. NotFound is logged at debug level since it's expected for cleanup operations.

Copy link
Contributor

@JadeCara JadeCara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested DSRs with attachments using S3 and worked as expected.

"""
S3 storage operations.

DEPRECATION NOTICE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Aren't you using this module in the storage s3 implementation? How can this be deprecated if you're using it in your new s3 implementation of your storage service abstraction? In any case, we could say we're making this private and other places should not use it directly but only through the storage service.
Maybe I'm being pedantic and "deprecated" can mean this too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — "deprecated" was misleading since S3StorageProvider still uses these internally. Updated the module docstring to say "INTERNAL MODULE" instead, clarifying these are implementation details that external callers shouldn't use directly. Kept the @deprecated decorators on individual functions since they're useful as migration guidance for anyone who encounters them.

except Exception as e:
logger.error("Failed to upload attachment: {}", e)
# Delete the DB record since upload failed
db.delete(attachment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only db.delete(attachment) ?
Any reason we don’t do full cleanup the same way you already do in AttachmentService.delete() (storage + refs + attachment row)? Or even use that method.
If upload succeeded but ref creation fails, we’ll orphan the file/object/blob/whatever (and possibly leave committed refs). Is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — if upload succeeded but ref creation failed, we'd orphan the file in storage. Updated create_and_upload to do full cleanup on failure: attempts to delete from storage (swallowing errors if the file wasn't uploaded yet), removes any refs that were created, then deletes the DB record. This mirrors the cleanup logic in delete().

logger.debug("LocalStorageProvider.delete_prefix: prefix={}", prefix)

# Construct the directory path
dir_path = os.path.join(self._base_path, prefix.rstrip("/"))
Copy link
Contributor

@nreyes-dev nreyes-dev Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delete method above uses _get_file_path() which calls get_local_filename() which does some checks, and I quote:

This extra security checks are to prevent directory traversal attacks and "complete business and technical destruction".

Thanks Claude

😂 (see docstring).

But here in delete_prefix we're using this path creation here that does no checks whatsoever if I'm not mistaken.
Maybe we should take the same precaution measures here as well to avoid deleting the wrong stuff? Not sure how important this is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Added a _get_validated_dir_path() helper that mirrors the security checks in get_local_filename() — normalizes the path and verifies the resolved absolute path stays within the base upload directory. Both delete_prefix and list_objects now use it instead of raw os.path.join.

- s3.py: Change module docstring from "DEPRECATION NOTICE" to "INTERNAL MODULE"
  to clarify these functions are internal details used by S3StorageProvider,
  not going away. Keep @deprecated decorators as migration guidance.
- attachment_service.py: Full cleanup on create_and_upload failure — delete
  from storage, remove refs, then delete DB record to prevent orphaned files
  when upload succeeds but ref creation fails.
- local_provider.py: Add path traversal protection to delete_prefix and
  list_objects via _get_validated_dir_path helper, matching the security
  checks already present in _get_file_path / get_local_filename.

Co-authored-by: Cursor <cursoragent@cursor.com>
@galvana galvana requested a review from nreyes-dev February 19, 2026 23:09
@galvana galvana enabled auto-merge February 20, 2026 02:42
@galvana galvana dismissed nreyes-dev’s stale review February 20, 2026 03:54

Addressed concerns

@galvana galvana added this pull request to the merge queue Feb 20, 2026
Merged via the queue into main with commit 48d3a94 Feb 20, 2026
53 of 54 checks passed
@galvana galvana deleted the attachment-refactor branch February 20, 2026 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants