Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Skipped Deployment
|
Codecov ReportAttention: Patch coverage is
❌ Your patch status has failed because the patch coverage (37.73%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.
Additional details and impacted files@@ Coverage Diff @@
## main #6199 +/- ##
===========================================
- Coverage 87.14% 76.69% -10.45%
===========================================
Files 433 439 +6
Lines 26863 27278 +415
Branches 2935 2969 +34
===========================================
- Hits 23410 20922 -2488
- Misses 2820 5638 +2818
- Partials 633 718 +85 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
JadeCara
left a comment
There was a problem hiding this comment.
A few suggestions, no blockers though :) Nice work and so many tests!
|
|
||
| return f"{self.model_class}/{instance_id}/{self.field_name}/{timestamp}.txt" | ||
|
|
||
| def __get__(self, instance: Any, owner: Type) -> Any: |
There was a problem hiding this comment.
I don't see owner being used anywhere in this function - can probably be removed from the signature?
There was a problem hiding this comment.
This is one of the methods that are part of the descriptor protocol, we need to leave it even if we don't use it https://docs.python.org/3/reference/datamodel.html#object.__get__
There was a problem hiding this comment.
I am not sure if this belongs in models top level? It kind of blends in with the regular models.
There was a problem hiding this comment.
I moved it under /models/field_types
| ) | ||
|
|
||
| return is_large | ||
|
|
There was a problem hiding this comment.
These top two functions might be useful in other areas as well. I wonder if putting them into a data_size.py type util would make them easier to find/use?
| # Get storage config | ||
| if storage_key: | ||
| storage_config = ( | ||
| db.query(StorageConfig) | ||
| .filter(StorageConfig.key == storage_key) | ||
| .first() | ||
| ) | ||
| if not storage_config: | ||
| msg = f"Storage configuration with key '{storage_key}' not found" | ||
| logger.error(msg) | ||
| raise ExternalDataStorageError(msg) | ||
| else: | ||
| storage_config = get_active_default_storage_config(db) | ||
| if not storage_config: | ||
| msg = "No active default storage configuration available for large data" | ||
| logger.error(msg) | ||
| raise ExternalDataStorageError(msg) | ||
|
|
There was a problem hiding this comment.
This function is doing a number of things - this top part might be a good candidate to pull out into its own testable function.
fides
|
||||||||||||||||||||||||||||
| Project |
fides
|
| Branch Review |
main
|
| Run status |
|
| Run duration | 00m 53s |
| Commit |
|
| Committer | Adrian Galvan |
| View all properties for this run ↗︎ | |
| Test results | |
|---|---|
|
|
0
|
|
|
0
|
|
|
0
|
|
|
0
|
|
|
5
|
| View all changes introduced in this branch ↗︎ | |
Closes ENG-684
Description Of Changes
This PR introduces an automatic external storage fallback system to handle data that exceeds PostgreSQL's 1GB column limit. The solution centers around a new
EncryptedLargeDataDescriptorthat can be seamlessly applied to SQLAlchemy model columns. This descriptor maintains backward compatibility by storing small data directly in the database as before, while automatically detecting large datasets and transparently offloading them to encrypted external storage through configurable backends (local, S3, or GCS).EncryptedLargeDataDescriptor
RequestTask.access_results,RequestTask.data_for_erasures, andPrivacyRequest.filtered_final_uploadAES GCM Encryption Utilities
ExternalStorageService
Additional Code Changes
Steps to Confirm
.envFIDES__EXECUTION__USE_DSR_3_0=truehtmlLarge Data Test Connector. The connection test will fail, that's okjane@example.comfidesplus/fides_uploadsThis could take a while, it took ~13 minutes from beginning to end
Pre-Merge Checklist
CHANGELOG.mdupdatedmaindowngrade()migration is correct and works