Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] ValidationConfigStore #9523

Merged
merged 39 commits into from Feb 29, 2024
Merged

Conversation

cdkini
Copy link
Member

@cdkini cdkini commented Feb 23, 2024

  • Description of PR changes above includes a link to an existing GitHub issue
  • PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB]
  • Code is linted - run invoke lint (uses black + ruff)
  • Appropriate tests and docs have been updated

For more information about contributing, see Contribute.

After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!

Copy link

netlify bot commented Feb 23, 2024

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit a82f189
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/65e0c200ba66c7000858f348
😎 Deploy Preview https://deploy-preview-9523.docs.greatexpectations.io
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@@ -48,6 +48,7 @@ class SerializableDataContext(AbstractDataContext):
DataContextConfigDefaults.EXPECTATIONS_BASE_DIRECTORY.value,
DataContextConfigDefaults.PLUGINS_BASE_DIRECTORY.value,
DataContextConfigDefaults.PROFILERS_BASE_DIRECTORY.value,
DataContextConfigDefaults.VALIDATION_CONFIGS_BASE_DIRECTORY.value,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates to enable proper filesystem scaffolding

@@ -133,6 +133,7 @@ class GXCloudStoreBackend(StoreBackend, metaclass=ABCMeta):
GXCloudRESTResource.PROFILER: "profiler",
GXCloudRESTResource.RENDERED_DATA_DOC: "rendered_data_doc",
GXCloudRESTResource.VALIDATION_RESULT: "result",
GXCloudRESTResource.VALIDATION_CONFIG: "validation_config",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates to enable proper Cloud integration (purely mocking for now)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the enum can be modified/restructured if we have to list all these enums and define a new value for them. This makes it appear that our enum/abstraction is wrong.

runtime_environment=runtime_environment,
)

def get_key(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing the bare minimum required in the factory. Tests should reflect this as well

Comment on lines +24 to +27
arbitrary_types_allowed = True
json_encoders = {
ExpectationSuite: lambda v: expectationSuiteSchema.dump(v),
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrently using Marshmallow and Pydantic results in something like this. We need custom encoders/decoders but this should work as intended

Comment on lines 628 to 633
@property
def validation_config_store_name(self) -> str:
return DataContextConfigDefaults.DEFAULT_VALIDATION_CONFIG_STORE_NAME.value

def validaiton_config_store(self) -> ValidationConfigStore:
return self.stores[self.validation_config_store_name]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure about this but deciding to not make validation_config_store_name a context variable. I've wavered on it but I'm now confident that store naming just causes more harm than benefit.

An argument against this is that it is a deviation from the existing pattern and perhaps consistency is more important?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm usually not pro-deviation, but I think in this case it's probably reasonable. It'd require minimal changes if we changed our mind later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know this key will always be present?
What is the harm you see from making this a context variable? By context variable, do you mean a top level property?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By adding it as a context variable, its another field that a user can modify.

These top level store names are dangerous in that they can enable the following:

context.checkpoint_store_name = "my_nonexistent_store"
...

context.checkpoint_store # Raises an error
context.checkpoints.add(...) # Raises an error
...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We allow customization of the names but don't automatically update the stores dict, resulting in the possibility of orphaning a store by losing its name.

Seems relatively unlikely since we don't advertise this much in docs/public API.

We should always have a ValidationConfigStore but the name may vary.

def serialize(self, value):
if self.cloud_mode:
data = value.dict()
data["suite"] = data["suite"].to_json_dict()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another necessity due to Marshmallow

Copy link
Contributor

@billdirks billdirks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. I don't have the best context with respect to stores. I've asked some questions inline.

Comment on lines 628 to 633
@property
def validation_config_store_name(self) -> str:
return DataContextConfigDefaults.DEFAULT_VALIDATION_CONFIG_STORE_NAME.value

def validaiton_config_store(self) -> ValidationConfigStore:
return self.stores[self.validation_config_store_name]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know this key will always be present?
What is the harm you see from making this a context variable? By context variable, do you mean a top level property?

self, name: str, id: str | None = None
) -> GXCloudIdentifier | StringKey:
"""Given a name and optional ID, build the correct key for use in the ValidationConfigStore."""
if self.cloud_mode:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR but we need to get away from these cloud mode checks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It's poorly encapsulated but I couldn't think of a better way to get both file and cloud working

Copy link
Contributor

@billdirks billdirks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. I've asked 1 question and added a couple of nits that don't need to be solved now. There is also the outstanding property comment you've made.

@@ -1207,4 +1207,4 @@ def _convert_uuids_to_str(self, data, **kwargs):
return data


expectationSuiteSchema = ExpectationSuiteSchema()
expectationSuiteSchema: ExpectationSuiteSchema = ExpectationSuiteSchema()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This type can be inferred and is the same as the constructor so you don't need this annotation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weirdly enough I got a mypy error without this (when using it in the custom Pydantic encoder below)

def _validate_suite(cls, v):
if isinstance(v, dict):
return ExpectationSuite(**expectationSuiteSchema.load(v))
return v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check if v is an expectationSuiteSchema and raise if it's not in the fall through case? Or does pydantic do this for us since it won't be the right type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydantic validation will happen after this return - I've updated this to be a bit more explicit but it should work as-is

@@ -133,6 +133,7 @@ class GXCloudStoreBackend(StoreBackend, metaclass=ABCMeta):
GXCloudRESTResource.PROFILER: "profiler",
GXCloudRESTResource.RENDERED_DATA_DOC: "rendered_data_doc",
GXCloudRESTResource.VALIDATION_RESULT: "result",
GXCloudRESTResource.VALIDATION_CONFIG: "validation_config",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the enum can be modified/restructured if we have to list all these enums and define a new value for them. This makes it appear that our enum/abstraction is wrong.

@@ -157,6 +158,7 @@ class GXCloudStoreBackend(StoreBackend, metaclass=ABCMeta):
GXCloudRESTResource.PROFILER: "profilers",
GXCloudRESTResource.RENDERED_DATA_DOC: "rendered_data_docs",
GXCloudRESTResource.VALIDATION_RESULT: "validation_results",
GXCloudRESTResource.VALIDATION_CONFIG: "validation_configs",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here.

Copy link
Contributor

@tyler-hoffman tyler-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nits, but this LGTM

Comment on lines 628 to 633
@property
def validation_config_store_name(self) -> str:
return DataContextConfigDefaults.DEFAULT_VALIDATION_CONFIG_STORE_NAME.value

def validaiton_config_store(self) -> ValidationConfigStore:
return self.stores[self.validation_config_store_name]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm usually not pro-deviation, but I think in this case it's probably reasonable. It'd require minimal changes if we changed our mind later.

class Config:
arbitrary_types_allowed = True
json_encoders = {
ExpectationSuite: lambda v: expectationSuiteSchema.dump(v),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we should add a comment on the why here in the code.

id: Union[str, None] = None

@validator("suite", pre=True)
def _validate_suite(cls, v):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above - I think it's worth a docstring explaining this is just because the suite isn't pydantic.

)


@pytest.mark.parametrize("store_fixture", ["ephemeral_store", "file_backed_store"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use strings for the stores and then call getfixturevalue? Can't you just reference the fixture directly?

Copy link
Member

@Kilo59 Kilo59 Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the fixture won't be properly instantiated unless you do it this way.
Fixtures only get loaded by pytest if you include them in the test constructor.
Although this is a common complaint because it prevents you from using fixtures as part of parametrize.

There are packages out there like pytest-lazy-fixture that essentially do this work for you... but you still have to reference it by the string.
https://github.com/tvorog/pytest-lazy-fixture

Copy link
Contributor

@billdirks billdirks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

)
if config["class_name"] not in configured_stores:
# Create ephemeral store config
store_configs[name] = {"class_name": config["class_name"]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write a warning if we use the default store.

@cdkini cdkini added this pull request to the merge queue Feb 29, 2024
Merged via the queue into develop with commit 7b87f43 Feb 29, 2024
67 checks passed
@cdkini cdkini deleted the m/v1-178/validation_config_store branch February 29, 2024 20:48

@pytest.mark.parametrize("store_fixture", ["ephemeral_store", "file_backed_store"])
@pytest.mark.unit
def test_add(request, store_fixture: str, validation_config: ValidationConfig):
Copy link
Member

@Kilo59 Kilo59 Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_add(request, store_fixture: str, validation_config: ValidationConfig):
def test_add(request: pytest.FixtureRequest, store_fixture: str, validation_config: ValidationConfig):

Annoting request will developers understand what's going on.

https://docs.pytest.org/en/stable/reference/reference.html#request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants