Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
Greptile OverviewGreptile SummaryThis PR refactors the attachment storage system using the Strategy and Factory design patterns, creating a proper service layer architecture. The refactor successfully separates storage concerns from the data model and provides a unified interface for S3, GCS, and local storage backends. Key Improvements:
Architecture Quality: Process Note: Confidence Score: 4/5
Important Files Changed
|
src/fides/api/models/attachment.py
Outdated
| """Creates a new attachment record in the database.""" | ||
| return super().create(db=db, data=data, check_name=check_name) |
| # Mapping of storage types to their uploader functions | ||
| # These functions handle DSR-specific data transformation before upload | ||
| _DSR_UPLOADERS: Dict[str, Any] = {} # Populated lazily | ||
|
|
There was a problem hiding this comment.
Does this do anything? I didn't see it referenced anywhere in the changes or code.
There was a problem hiding this comment.
Good catch, it was unused. The actual routing happens inline in _get_uploader_from_config_type. Removed it.
| db = self._require_db() | ||
| return Attachment.get_by_key_or_id(db, data={"id": attachment_id}) | ||
|
|
||
| def _get_provider_and_bucket(self, attachment: Attachment) -> Tuple[Any, str]: |
There was a problem hiding this comment.
Is Any right or should it be StorageProvider?
| def _get_provider_and_bucket(self, attachment: Attachment) -> Tuple[Any, str]: | |
| def _get_provider_and_bucket(self, attachment: Attachment) -> Tuple[StorageProvider, str]: |
There was a problem hiding this comment.
Updated to Tuple[StorageProvider, str] and added the import. Also changed the return type of retrieve_url from Tuple[int, AnyHttpUrlString] to Tuple[int, str] since generate_presigned_url returns str across all providers (the local provider returns a file path, not a valid HTTP URL).
|
|
||
| try: | ||
| blob.delete() | ||
| except (AttributeError, RuntimeError) as e: |
There was a problem hiding this comment.
The GCS sdk has some specific exceptions for not finding a file - i think its google.api.core.exceptions.NotFound or a broader google.api.core.exceptions.GoogleAPICallError. Could we perhaps use those here so we arent accidentally swallowing other unexpected errors?
There was a problem hiding this comment.
Good call. Replaced (AttributeError, RuntimeError) with NotFound and GoogleAPICallError from google.api_core.exceptions in delete, delete_prefix, and generate_presigned_url. NotFound is logged at debug level since it's expected for cleanup operations.
JadeCara
left a comment
There was a problem hiding this comment.
Tested DSRs with attachments using S3 and worked as expected.
src/fides/api/service/storage/s3.py
Outdated
| """ | ||
| S3 storage operations. | ||
|
|
||
| DEPRECATION NOTICE: |
There was a problem hiding this comment.
I'm confused. Aren't you using this module in the storage s3 implementation? How can this be deprecated if you're using it in your new s3 implementation of your storage service abstraction? In any case, we could say we're making this private and other places should not use it directly but only through the storage service.
Maybe I'm being pedantic and "deprecated" can mean this too.
There was a problem hiding this comment.
Good call — "deprecated" was misleading since S3StorageProvider still uses these internally. Updated the module docstring to say "INTERNAL MODULE" instead, clarifying these are implementation details that external callers shouldn't use directly. Kept the @deprecated decorators on individual functions since they're useful as migration guidance for anyone who encounters them.
| except Exception as e: | ||
| logger.error("Failed to upload attachment: {}", e) | ||
| # Delete the DB record since upload failed | ||
| db.delete(attachment) |
There was a problem hiding this comment.
Why only db.delete(attachment) ?
Any reason we don’t do full cleanup the same way you already do in AttachmentService.delete() (storage + refs + attachment row)? Or even use that method.
If upload succeeded but ref creation fails, we’ll orphan the file/object/blob/whatever (and possibly leave committed refs). Is that right?
There was a problem hiding this comment.
You're right — if upload succeeded but ref creation failed, we'd orphan the file in storage. Updated create_and_upload to do full cleanup on failure: attempts to delete from storage (swallowing errors if the file wasn't uploaded yet), removes any refs that were created, then deletes the DB record. This mirrors the cleanup logic in delete().
| logger.debug("LocalStorageProvider.delete_prefix: prefix={}", prefix) | ||
|
|
||
| # Construct the directory path | ||
| dir_path = os.path.join(self._base_path, prefix.rstrip("/")) |
There was a problem hiding this comment.
The delete method above uses _get_file_path() which calls get_local_filename() which does some checks, and I quote:
This extra security checks are to prevent directory traversal attacks and "complete business and technical destruction".
Thanks Claude
😂 (see docstring).
But here in delete_prefix we're using this path creation here that does no checks whatsoever if I'm not mistaken.
Maybe we should take the same precaution measures here as well to avoid deleting the wrong stuff? Not sure how important this is.
There was a problem hiding this comment.
Good catch. Added a _get_validated_dir_path() helper that mirrors the security checks in get_local_filename() — normalizes the path and verifies the resolved absolute path stays within the base upload directory. Both delete_prefix and list_objects now use it instead of raw os.path.join.
- s3.py: Change module docstring from "DEPRECATION NOTICE" to "INTERNAL MODULE" to clarify these functions are internal details used by S3StorageProvider, not going away. Keep @deprecated decorators as migration guidance. - attachment_service.py: Full cleanup on create_and_upload failure — delete from storage, remove refs, then delete DB record to prevent orphaned files when upload succeeds but ref creation fails. - local_provider.py: Add path traversal protection to delete_prefix and list_objects via _get_validated_dir_path helper, matching the security checks already present in _get_file_path / get_local_filename. Co-authored-by: Cursor <cursoragent@cursor.com>
Description Of Changes
Refactors the attachment storage system to use a proper service layer architecture with the Strategy and Factory design patterns. This improves code organization, testability, and maintainability by separating storage concerns from the data model.
Key architectural improvements:
StorageProviderabstract base class that defines a consistent interface for all storage backends (S3, GCS, Local)StorageProviderFactoryfor clean provider instantiation based on configurationAttachmentServicein the service layer to handle all storage operations, following the project's layered architecture (API routes → Services → Models)Attachmentmodel, keeping it focused on data representation onlyCode Changes
StorageProviderabstract base class (src/fides/api/service/storage/providers/base.py) defining unified interface for upload, download, delete, presigned URLs, and file operationsS3StorageProviderimplementation (src/fides/api/service/storage/providers/s3_provider.py)GCSStorageProviderimplementation (src/fides/api/service/storage/providers/gcs_provider.py)LocalStorageProviderimplementation (src/fides/api/service/storage/providers/local_provider.py)StorageProviderFactory(src/fides/api/service/storage/providers/factory.py) for creating providers from configurationAttachmentService(src/fides/service/attachment_service.py) for attachment storage operations with methods for upload, retrieve, delete, and reference managementAttachmentmodel to remove embedded storage logicexternal_data_storage.pyto use new provider architecturepolling_attachment_handler.pyto useAttachmentServiceSteps to Confirm
pytest tests/ops/service/storage/providers/pytest tests/ctl/models/test_attachment.pyPre-Merge Checklist
CHANGELOG.mdupdatedmaindowngrade()migration is correct and works