Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move System.privacy_declarations embedded objects into their own DB table #3036

Closed
adamsachs opened this issue Apr 11, 2023 · 6 comments · Fixed by #3098
Closed

Move System.privacy_declarations embedded objects into their own DB table #3036

adamsachs opened this issue Apr 11, 2023 · 6 comments · Fixed by #3098
Assignees

Comments

@adamsachs
Copy link
Contributor

Is your feature request related to a specific problem?

Currently, System.privacy_declarations is a JSON column that contains a list of "privacy declarations" (otherwise known as "system data uses") associated with the given system. As these objects become more central to our application, having them stored as embedded JSON objects makes manipulating and referencing them significantly more cumbersome.

Specifically, we have upcoming work in fidesplus to expand the custom fields functionality to integrate with these objects. With the current data structures, that would take a lot of refactoring of the custom fields functionality to support referencing these embedded objects; splitting the embedded objects into their own table, however, would make this a trivial update from the "custom fields" side.

More generally, if we split these declarations out into their own table, we can index and query them based on their attributes, rather than having to always access them through individual systems, which can quickly become unsustainable if/when we have many systems and want to do any sort of aggregate reporting on the declarations.

Describe the solution you'd like

  • A new sqlalchemy class and corresponding DB table to store "privacy declarations" (need to decide on a name)
    • should have a FK link to the System that the given privacy declaration is associated with
    • TBD on whether there's a FK link to the DataUse and other taxonomy elements referenced by the declaration. Currently they're just string literals, is there a reason we can't have a "hard" link to the other elements?
  • Perform a data migration to create proper records for each embedded privacy declaration object that exists currently. We need to take care here: this will likely be a complex, and very important, migration.
  • Make any API updates needed to support necessary CRUD operations on privacy declarations for a given system
    • I don't think full backward compatibility is really feasible here - i.e. i think the CRUD operations that can currently be used to add declarations to systems are not going to continue to work. But i think we need to ensure that new APIs have functional parity with whatever we supported before for "privacy declarations", through operations on the System
    • Make corresponding UI updates to use new APIs for operations on privacy declarations
  • Make necessary adjustments to any pieces of the BE application code that reference privacy declarations
    • specifically the export code (used for datamap and CLI export)
    • anything else?

Describe alternatives you've considered, if any

Leave things as they are. We'd then need to make changes on the custom field side to support what's needed.

Additional context

Similar to #3034 in many ways - these came from the same discussion. I think the issue here is a bit more pressing, per the custom fields initiative

@adamsachs adamsachs self-assigned this Apr 11, 2023
@allisonking
Copy link
Contributor

Oh this is great! just leaving a note for the UI portion that we'll be able to fix this pesky thing once privacy declarations have their own ID

// This isn't a perfect key as privacy declarations don't have an enforced key right now.
// The closest is 'data_use' but that is only enforced on the frontend and can change
// This results in the "Saved" indicator not appearing if you change the 'data_use' in the form
// The fix would be to enforce a key, either on the backend, or through a significant workaround on the frontend
key={dec.id}

@adamsachs
Copy link
Contributor Author

hey @allisonking and @TheAndrewJackson, i'm looking for some perspective from the FE - as i rip this up a bit and put PrivacyDeclarations into their own table, i'm at a point where i could reasonably rework the API a bit for those privacy declaration records.

  • one goal is to keep the existing "System" API backward compatible with the old behavior, so that essentially allowed upserting the System record (PUT) with the full set of associated .privacy_declarations declared "inline", as a nested object. i think i should be able to maintain that behavior relatively easily - and from what i can tell, that's basically all that the FE is currently using to create/update/delete the "privacy declarations" associated with a system
  • but now that there will be a proper table (and primary keys, etc.) associated with these privacy declaration records, i'm wondering whether a CRUD API targeted specifically at the PrivacyDeclaration records/table would make things simpler on the FE
    • we don't necessarily need to build this in immediately, since it'd require refactoring on the FE and additional BE work, but we could at least chart it out as a future improvement. or, if it aligns with any existing work, then we could decide to just tackle it now and add this in.

thoughts? please let me know if i haven't explained myself clearly! and an extra cc to @ThomasLaPiana just to keep him in the loop on these changes to the System and PrivacyDeclaration constructs on the BE.

@allisonking
Copy link
Contributor

hiya @adamsachs , I believe you're right that privacy declarations will continue working just fine if they continue to be nested in the system. I think a CRUD for privacy declarations would make things cleaner in some ways, but agree that it's not urgent.

that said, our privacy declaration UI components are a little bit of a mess right now. once you have a working branch, I'd be happy to go in and test and make sure the privacy declarations are still behaving!

@TheAndrewJackson
Copy link
Contributor

I agree with @allisonking. CRUD endpoints would be nice but it's not urgent if the existing system API can be reused.

We can make a note of it and take a look again in the future. Possibly when the Privacy declaration components are merged into one.

@adamsachs
Copy link
Contributor Author

that sounds good - i'll focus on the backward compatibility for now, and we can ensure nothing breaks. and then we can create a follow up item for refactoring.

thanks both for the input!

@ThomasLaPiana
Copy link
Contributor

thank you for the CC @adamsachs ! This is a great improvement, and I have a feeling this will happen elsewhere (this isn't the only nested-type object that is stored in JSON)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants