LJ-530 Attachments have access to google cloud storage by JadeCara · Pull Request #6161 · ethyca/fides

JadeCara · 2025-05-21T18:26:00Z

Description Of Changes

Google Cloud storage was added as a storage option after attachments were introduced and because we are missing a single storage service at this time, this option needed to be added independently. The delete statements for all storage types were updated to be more thorough, but there should be no change to the functionality.

Code Changes

Updated attachments to work with google cloud storage.
Updated tests
(Unrelated clean-up) removed Error handler service because we are using the fastapi version and this is unused code 🧽

Steps to Confirm

All tests should pass
With fidesplus pointed at this branch. Set up default storage config pointed at GCS prj-sandbox-55855-test-bucket. Instructions to do that can be found here
Create a DSR
Add an attachment to the DSR

It should show up in the test bucket

Verify you can retrieve the attachment by id.
Delete the attachment works as expected - Before running delete command if you have gone into the created folder to check the attachment make sure you are no longer in it, because it will not auto delete a folder that is being accessed.
Verify the attachment is no longer in gcs.
Attachments should work the same way for manual webhook steps and there should be no change in functionality for s3 or local storage.

Pre-Merge Checklist

vercel · 2025-05-21T18:26:05Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

2 Skipped Deployments

Name	Status	Preview	Comments	Updated (UTC)
fides-plus-nightly	⬜️ Ignored (Inspect)	Visit Preview		May 22, 2025 11:14pm
fides-privacy-center	⬜️ Ignored (Inspect)	Visit Preview		May 22, 2025 11:14pm

codecov · 2025-05-21T18:50:00Z

Codecov Report

Attention: Patch coverage is 78.72340% with 10 lines in your changes missing coverage. Please review.

Project coverage is 87.01%. Comparing base (62ea246) to head (fa6b1cd).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/fides/api/service/storage/s3.py	0.00%	6 Missing and 1 partial ⚠️
src/fides/api/models/attachment.py	92.50%	2 Missing and 1 partial ⚠️

❌ Your patch status has failed because the patch coverage (78.72%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6161      +/-   ##
==========================================
+ Coverage   86.92%   87.01%   +0.09%     
==========================================
  Files         423      422       -1     
  Lines       26161    26170       +9     
  Branches     2842     2849       +7     
==========================================
+ Hits        22741    22773      +32     
+ Misses       2800     2775      -25     
- Partials      620      622       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…-storage-capabilities

galvana · 2025-05-22T19:16:41Z

+            if os.path.exists(folder_path):
+                import shutil
+
+                shutil.rmtree(folder_path)


What does the folder_path evaluate to? I want to make sure we don't delete the entire contents of fides_uploads

I just went and double checked because ... paranoia...

get_local_filename() is called with f"{self.id}/{self.file_name}".

In src/fides/api/service/storage/util.py - get_local_filename() adds LOCAL_FIDES_UPLOAD_DIRECTORY to the start of the path, and LOCAL_FIDES_UPLOAD_DIRECTORY == fides_uploads.

os.path.dirname() is called on this path, which returns the directory portion of the path.
So for an attachment with:
id = att_123
file_name = example.pdf
The folder_path would evaluate to: fides_uploads/att_123

So this should be safe. It only deletes the specific folder for this attachment (named by its ID)
It's contained within the fides_uploads directory
Each attachment gets its own subdirectory named by its ID - this follows the pattern I am using for gcs and s3

Perfect, thanks for checking!

galvana · 2025-05-22T19:20:59Z

+        # If the file_key ends with a '/', it's a folder prefix
+        if file_key.endswith("/"):
+            # List all objects with the prefix
+            objects_to_delete = s3_client.list_objects_v2(


I don't know how likely this would be, but if there are more than 1000 files, then we would need to paginate to get the next set of files https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/list_objects_v2.html#S3.Client.list_objects_v2. We probably don't need to do this, but just thought I'd call it out

I updated to use the paginate option just in case
If we want to optimize this further, we could batch the deletes using delete_objects with multiple keys per page, but that would make the code more complex.

The current implementation prioritizes reliability and simplicity over performance, which is probably the right tradeoff given that:
This is a deletion operation that doesn't need to be super fast
The number of files per attachment is likely to be small
The simpler code is easier to maintain and less prone to bugs

Co-authored-by: Adrian Galvan <adrian@ethyca.com>

…-storage-capabilities

galvana

Thanks for making the changes, looks good!

galvana · 2025-05-22T23:46:39Z

+            if os.path.exists(folder_path):
+                import shutil
+
+                shutil.rmtree(folder_path)


Perfect, thanks for checking!

galvana · 2025-05-22T23:47:17Z

+        # If the file_key ends with a '/', it's a folder prefix
+        if file_key.endswith("/"):
+            # List all objects with the prefix, handling pagination
+            paginator = s3_client.get_paginator("list_objects_v2")


Wow, so easy! 👍

cypress · 2025-05-23T15:07:19Z

fides Run #12942

Run Properties: Passed #12942 • c4663eb9d2: LJ-530 Attachments have access to google cloud storage (#6161)

Project	`fides`
Branch Review	`main`
Run status	`Passed #12942`
Run duration	`00m 52s`
Commit	`c4663eb9d2: LJ-530 Attachments have access to google cloud storage (#6161)`
Committer	`JadeWibbels`
View all properties for this run ↗︎

Test results
Failures	`0`
Flaky	`0`
Pending	`0`
Skipped	`0`
Passing	`5`
View all changes introduced in this branch ↗︎

Jade Wibbels added 2 commits May 21, 2025 10:00

google cloud storage

b3ad657

updated for better deletion

109e267

vercel Bot deployed to Preview – fides-privacy-center May 21, 2025 18:27 View deployment

JadeCara marked this pull request as ready for review May 21, 2025 18:29

Merge branch 'main' into ENG-530-update-attachments-with-google-cloud…

572baf3

…-storage-capabilities

JadeCara requested a review from galvana May 21, 2025 22:14

galvana reviewed May 22, 2025

View reviewed changes

JadeCara and others added 3 commits May 22, 2025 16:49

Update src/fides/api/models/attachment.py

bfea750

Co-authored-by: Adrian Galvan <adrian@ethyca.com>

updates from comments

16c098d

Merge branch 'main' into ENG-530-update-attachments-with-google-cloud…

fa6b1cd

…-storage-capabilities

galvana approved these changes May 22, 2025

View reviewed changes

JadeCara merged commit c4663eb into main May 23, 2025
39 of 40 checks passed

JadeCara deleted the ENG-530-update-attachments-with-google-cloud-storage-capabilities branch May 23, 2025 14:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LJ-530 Attachments have access to google cloud storage#6161

LJ-530 Attachments have access to google cloud storage#6161
JadeCara merged 6 commits intomainfrom
ENG-530-update-attachments-with-google-cloud-storage-capabilities

JadeCara commented May 21, 2025 •

edited by atlassian Bot

Loading

Uh oh!

vercel Bot commented May 21, 2025 •

edited

Loading

Uh oh!

codecov Bot commented May 21, 2025 •

edited

Loading

Uh oh!

galvana May 22, 2025

Uh oh!

JadeCara May 22, 2025

Uh oh!

galvana May 22, 2025

Uh oh!

galvana May 22, 2025

Uh oh!

JadeCara May 22, 2025

Uh oh!

Uh oh!

galvana left a comment

Uh oh!

galvana May 22, 2025

Uh oh!

galvana May 22, 2025

Uh oh!

Uh oh!

cypress Bot commented May 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JadeCara commented May 21, 2025 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description Of Changes

Code Changes

Steps to Confirm

Pre-Merge Checklist

Uh oh!

vercel Bot commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

galvana left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cypress Bot commented May 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JadeCara commented May 21, 2025 •

edited by atlassian Bot

Loading

vercel Bot commented May 21, 2025 •

edited

Loading

codecov Bot commented May 21, 2025 •

edited

Loading