Skip to content

LJ-530 Attachments have access to google cloud storage#6161

Merged
JadeCara merged 6 commits intomainfrom
ENG-530-update-attachments-with-google-cloud-storage-capabilities
May 23, 2025
Merged

LJ-530 Attachments have access to google cloud storage#6161
JadeCara merged 6 commits intomainfrom
ENG-530-update-attachments-with-google-cloud-storage-capabilities

Conversation

@JadeCara
Copy link
Copy Markdown
Contributor

@JadeCara JadeCara commented May 21, 2025

Closes LJ-530

Description Of Changes

Google Cloud storage was added as a storage option after attachments were introduced and because we are missing a single storage service at this time, this option needed to be added independently. The delete statements for all storage types were updated to be more thorough, but there should be no change to the functionality.

Code Changes

  • Updated attachments to work with google cloud storage.
  • Updated tests
  • (Unrelated clean-up) removed Error handler service because we are using the fastapi version and this is unused code 🧽

Steps to Confirm

  • All tests should pass
  • With fidesplus pointed at this branch. Set up default storage config pointed at GCS prj-sandbox-55855-test-bucket. Instructions to do that can be found here
  • Create a DSR
  • Add an attachment to the DSR
    Screenshot 2025-05-21 at 10 43 31 AM

It should show up in the test bucket
Screenshot 2025-05-21 at 10 44 46 AM

  • Verify you can retrieve the attachment by id.
    Screenshot 2025-05-21 at 10 50 36 AM

  • Delete the attachment works as expected - Before running delete command if you have gone into the created folder to check the attachment make sure you are no longer in it, because it will not auto delete a folder that is being accessed.

  • Verify the attachment is no longer in gcs.
    Screenshot 2025-05-21 at 10 54 39 AM

  • Attachments should work the same way for manual webhook steps and there should be no change in functionality for s3 or local storage.

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 21, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

2 Skipped Deployments
Name Status Preview Comments Updated (UTC)
fides-plus-nightly ⬜️ Ignored (Inspect) Visit Preview May 22, 2025 11:14pm
fides-privacy-center ⬜️ Ignored (Inspect) Visit Preview May 22, 2025 11:14pm

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2025

Codecov Report

Attention: Patch coverage is 78.72340% with 10 lines in your changes missing coverage. Please review.

Project coverage is 87.01%. Comparing base (62ea246) to head (fa6b1cd).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/fides/api/service/storage/s3.py 0.00% 6 Missing and 1 partial ⚠️
src/fides/api/models/attachment.py 92.50% 2 Missing and 1 partial ⚠️

❌ Your patch status has failed because the patch coverage (78.72%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6161      +/-   ##
==========================================
+ Coverage   86.92%   87.01%   +0.09%     
==========================================
  Files         423      422       -1     
  Lines       26161    26170       +9     
  Branches     2842     2849       +7     
==========================================
+ Hits        22741    22773      +32     
+ Misses       2800     2775      -25     
- Partials      620      622       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JadeCara JadeCara requested a review from galvana May 21, 2025 22:14
if os.path.exists(folder_path):
import shutil

shutil.rmtree(folder_path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the folder_path evaluate to? I want to make sure we don't delete the entire contents of fides_uploads

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just went and double checked because ... paranoia...

get_local_filename() is called with f"{self.id}/{self.file_name}".

In src/fides/api/service/storage/util.py - get_local_filename() adds LOCAL_FIDES_UPLOAD_DIRECTORY to the start of the path, and LOCAL_FIDES_UPLOAD_DIRECTORY == fides_uploads.

os.path.dirname() is called on this path, which returns the directory portion of the path.
So for an attachment with:
id = att_123
file_name = example.pdf
The folder_path would evaluate to: fides_uploads/att_123

So this should be safe. It only deletes the specific folder for this attachment (named by its ID)
It's contained within the fides_uploads directory
Each attachment gets its own subdirectory named by its ID - this follows the pattern I am using for gcs and s3

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks for checking!

Comment thread src/fides/api/service/storage/s3.py Outdated
# If the file_key ends with a '/', it's a folder prefix
if file_key.endswith("/"):
# List all objects with the prefix
objects_to_delete = s3_client.list_objects_v2(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how likely this would be, but if there are more than 1000 files, then we would need to paginate to get the next set of files https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/list_objects_v2.html#S3.Client.list_objects_v2. We probably don't need to do this, but just thought I'd call it out

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated to use the paginate option just in case
If we want to optimize this further, we could batch the deletes using delete_objects with multiple keys per page, but that would make the code more complex.

The current implementation prioritizes reliability and simplicity over performance, which is probably the right tradeoff given that:
This is a deletion operation that doesn't need to be super fast
The number of files per attachment is likely to be small
The simpler code is easier to maintain and less prone to bugs

Comment thread src/fides/api/models/attachment.py Outdated
Copy link
Copy Markdown
Contributor

@galvana galvana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes, looks good!

if os.path.exists(folder_path):
import shutil

shutil.rmtree(folder_path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks for checking!

# If the file_key ends with a '/', it's a folder prefix
if file_key.endswith("/"):
# List all objects with the prefix, handling pagination
paginator = s3_client.get_paginator("list_objects_v2")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, so easy! 👍

@JadeCara JadeCara merged commit c4663eb into main May 23, 2025
39 of 40 checks passed
@JadeCara JadeCara deleted the ENG-530-update-attachments-with-google-cloud-storage-capabilities branch May 23, 2025 14:50
@cypress
Copy link
Copy Markdown

cypress Bot commented May 23, 2025

fides    Run #12942

Run Properties:  status check passed Passed #12942  •  git commit c4663eb9d2: LJ-530 Attachments have access to google cloud storage (#6161)
Project fides
Branch Review main
Run status status check passed Passed #12942
Run duration 00m 52s
Commit git commit c4663eb9d2: LJ-530 Attachments have access to google cloud storage (#6161)
Committer JadeWibbels
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 5
View all changes introduced in this branch ↗︎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants