Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Replace fidesctl package with fideslang package #454

Merged
merged 3 commits into from
Jun 7, 2022
Merged

Replace fidesctl package with fideslang package #454

merged 3 commits into from
Jun 7, 2022

Conversation

sanders41
Copy link
Contributor

Purpose

Replace the fidesctl package with fideslang

Changes

  • Remove fidesctl from requirements.txt
  • Add fideslang to requirements.txt

Checklist

  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram.
  • If docs updated (select one):
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
    • documentation issue created (tag docs-team to complete issue separately)
  • Good unit test/integration test coverage
  • This PR contains a DB migration. If checked, the reviewer should confirm with the author that the down_revision correctly references the previous migration before merging
  • The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Ticket

Fixes #426

@seanpreston seanpreston self-assigned this May 4, 2022
@sanders41
Copy link
Contributor Author

sanders41 commented May 4, 2022

When I run the tests locally they all pass. Any ideas why they would be failing here?

I got them to fail locally now. Looking into it.

@sanders41
Copy link
Contributor Author

I am looking into this and think I know what is happening. The version of fidesctl that is pinned is 0.9.8.4 which is an older version. There seems to have been updates made to fideslang since this release.

For example, when I look at this failure, the key is it checking is here-is-an-invalid-key with an expected message of FidesKey must only contain alphanumeric characters, '.' or '_'.. The validation is done in FidesOpsKey which inherits from FidesKey and this allows for - characters so the key isn't invalid. So I am thinking this was a change made since the 0.9.8.4?

With the example here I could change the test key to something like here*is*an*invalid*key, and update the FidesOpsKey message. Before I make these updates I wanted it get some thoughts here if this is appropriate way to go about it, and make sure there isn't something else I should be thinking about that I am missing.

@seanpreston
Copy link
Contributor

Great detective work @sanders41. When we integrated the shared FidesKey, we extended it to force keys into snake_case, this was part of this investigation. It stemmed from some util behaviour we added right at the start to slugify object names directly to keys so they could be referenced automatically in URLs.

With the example here I could change the test key to something like hereisaninvalidkey, and update the FidesOpsKey message. Before I make these updates I wanted it get some thoughts here if this is appropriate way to go about it, and make sure there isn't something else I should be thinking about that I am missing.

I'd be happy to go forward with this suggestion if all the tests continue to pass, I'll check your fork out locally and run the unsafe integration tests too.

@sanders41
Copy link
Contributor Author

sanders41 commented May 6, 2022

I have made some progress here, all tests but 1 are now passing. The test_nested_dataset_validation test is still failing. I have worked out where it is failing, but I'm still trying to figure out why.

@sanders41
Copy link
Contributor Author

I have found the reason test_nested_dataset_validation is failing, but I'm stuck in a bit of a loop trying to find a good solution.

The FidesopsDatasetField inherits from DatasetField. In fidesctl 0.9.8.4 this model looks like:

class DatasetField(BaseModel):
    """
    The DatasetField resource model.
    This resource is nested within a DatasetCollection.
    """

    name: str
    description: Optional[str]
    data_categories: Optional[List[FidesKey]]
    data_qualifier: FidesKey = Field(
        default="aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified",
    )

Then in the latest fideslang there are some new fields added:

class DatasetField(BaseModel):
    """
    The DatasetField resource model.
    This resource is nested within a DatasetCollection.
    """

    name: str = name_field
    description: Optional[str] = description_field
    data_categories: Optional[List[FidesKey]] = Field(
        description="Arrays of Data Categories, identified by `fides_key`, that applies to this field.",
    )
    data_qualifier: FidesKey = Field(
        default="aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified",
        description="A Data Qualifier that applies to this field. Note that this field holds a single value, therefore, the property name is singular.",
    )
    retention: Optional[str] = Field(
        description="An optional string to describe the retention policy for a dataset. This field can also be applied more granularly at either the Collection or field level of a Dataset.",
    )
    fields: Optional[List[DatasetField]] = Field(
        description="An optional array of objects that describe hierarchical/nested fields (typically found in NoSQL databases).",
    )

Specifically the issue is that FidesopsDatasetField contains fields, but the new DatasetField also contains fields. If I remove the fields from FidesopsDatasetField the test_nested_dataset_validation test starts passing, but others start failing.

@ThomasLaPiana
Copy link
Contributor

We updated DatasetField to support recursively nested fields (such as for a NoSQL datastore), it might be possible to update fidesops to use the fideslang model with minimal changes to the fidesops code?

@sanders41
Copy link
Contributor Author

sanders41 commented May 9, 2022

The fidesops FidesopsDatasetField model adds a fidesops_meta field, I was hoping to be able to keep that field and use the reset from the fideslang DatasetField, but when I tried this I started getting a lot of test failures.

@ThomasLaPiana
Copy link
Contributor

maybe there are validators on the fideslang dataset model that are conflicting with the new fields field on the fidesops dataset model? This is a pretty tricky pydantic issue, I'd have to spend some time seeing what fidesops is doing to be able to meaningfully debug it

@sanders41
Copy link
Contributor Author

There are validators, but removing them didn't fix the issue.

@sanders41
Copy link
Contributor Author

I still haven't found a solution, but I think I have made some progress on figuring out the issue. In Fideslang the model is:

class DatasetField(BaseModel):
    ...
    fields: Optional[List[DatasetField]] = Field(
        description="An optional array of objects that describe hierarchical/nested fields (typically found in NoSQL databases).",
    )

And in Fidesops it is:

class FidesopsDatasetField(DatasetField):
    ...
    fields: Optional[List["FidesopsDatasetField"]] = []

So I think the problem is that that both model's fields are trying to recursively reference back to themselves? So if you remove fields from FidesopsDatasetField then it is trying to recursively reference DatasetField from the nested fields which isn't what is needed. I'm not positive this is the issue yet, but it is currently my best guess.

@sanders41
Copy link
Contributor Author

So I think the problem is that that both model's fields are trying to recursively reference back to themselves? So if you remove fields from FidesopsDatasetField then it is trying to recursively reference DatasetField from the nested fields which isn't what is needed. I'm not positive this is the issue yet, but it is currently my best guess.

I have confirmed this is the issue. I made a FidesopsDatasetField BaseModel that contains all the fields from the fideslang but does not inherit from it. With this model all tests pass.

So now the question is what is the best way to go about this? We could go with the custom model that doesn't inherit from fideslang, but it seems like this runs the risk of getting us into a similar problem in the future where the models are out of sync. However, unfortunately this is the only solution I can think of.

@pattisdr
Copy link
Contributor

I agree, a custom model makes sense to me

@ThomasLaPiana
Copy link
Contributor

does fidesops use anything from fideslang besides the dataset?

@sanders41
Copy link
Contributor Author

does fidesops use anything from fideslang besides the dataset?

It also uses DEFAULT_TAXONOMY, FideslangDataCategory, and extends FidesKey

@NevilleS
Copy link
Contributor

🖐️ I did this (added the fidesops_meta field by extending the Dataset from fideslang)

I totally see the recursion issue here, which wasn't a problem at the time. I'd propose a cleaner solution- we could add fidesops_meta to the underlying fideslang model, since it's definitely an official field type. That way you should actually change the DatasetField type without trying to override it and get too messy.

@NevilleS
Copy link
Contributor

FWIW I think we could define fidesops_meta as just a simple object field in fideslang, and then supply all the other validations (of things like references, identity, primary_key, etc.) in the fidesops model validations, which seems like a good interim step. In theory it seems nice to have more of that core logic in the fideslang repo, but I'd consider that a nice-to-have requirement and likely a harder thing to forklift out

@ThomasLaPiana
Copy link
Contributor

FWIW I think we could define fidesops_meta as just a simple object field in fideslang, and then supply all the other validations (of things like references, identity, primary_key, etc.) in the fidesops model validations, which seems like a good interim step. In theory it seems nice to have more of that core logic in the fideslang repo, but I'd consider that a nice-to-have requirement and likely a harder thing to forklift out

On principle I also object to injecting fidesops-specific logic into fideslang but at some point the rubber does meet the road

@sanders41
Copy link
Contributor Author

@seanpreston the DatasetField swap with DatasetFieldBase is done so this is ready for another review now.

@ThomasLaPiana
Copy link
Contributor

🎉

Copy link
Contributor

@seanpreston seanpreston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this very valuable increment @sanders41

@seanpreston seanpreston merged commit c11ad1d into ethyca:main Jun 7, 2022
@sanders41 sanders41 deleted the fideslang branch June 7, 2022 14:45
sanders41 added a commit that referenced this pull request Sep 22, 2022
* Replace fidesctl package with fideslang package

* Update tests and error message in FidesKey to allow -

* Replace DatasetField with DatasetFieldBase
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Misc DevOps] Replace fidesctl import with fideslang pypi package
5 participants