Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] V1 Validation scaffolding #9508

Merged
merged 10 commits into from
Feb 23, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
10 changes: 6 additions & 4 deletions great_expectations/core/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,23 +18,25 @@
from .id_dict import IDDict
from .run_identifier import RunIdentifier, RunIdentifierSchema
from .urn import ge_urn
from .validation import Validation

__all__ = [
"Domain",
"ExpectationSuite",
"ExpectationSuiteSchema",
"expectationSuiteSchema",
"ExpectationSuiteValidationResult",
"ExpectationSuiteValidationResultSchema",
"ExpectationValidationResult",
"ExpectationValidationResultSchema",
"expectationSuiteValidationResultSchema",
"expectationValidationResultSchema",
"get_metric_kwargs_id",
"IDDict",
"RunIdentifier",
"RunIdentifierSchema",
"Validation",
"expectationSuiteSchema",
"expectationSuiteValidationResultSchema",
"expectationValidationResultSchema",
"ge_urn",
"get_metric_kwargs_id",
]

logger = logging.getLogger(__name__)
Expand Down
1 change: 1 addition & 0 deletions great_expectations/core/factory/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
from .checkpoint_factory import CheckpointFactory
from .suite_factory import SuiteFactory
from .validation_factory import ValidationFactory
52 changes: 52 additions & 0 deletions great_expectations/core/factory/validation_factory.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
from __future__ import annotations

from great_expectations._docs_decorators import public_api
from great_expectations.compatibility.typing_extensions import override
from great_expectations.core.factory.factory import Factory
from great_expectations.core.validation import Validation


# TODO: Add analytics as needed
class ValidationFactory(Factory[Validation]):
def __init__(self, store) -> None:
# TODO: Update type hints when new ValidationStore is implemented
self._store = store
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think I'm blocked on this. We have an existing ValidationsStore but those are for results?

I think we need to do the following:

  • Rename the old store to ValidationResultsStore
  • Create a new ValidationsStore (or ValidationStore our inconsistent plurality should be resolved)
  • Plug in the new one here and have CRUD start working

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on the store now: #9515


@public_api
@override
def add(self, validation: Validation) -> Validation:
"""Add a Validation to the collection.

Parameters:
validation: Validation to add

Raises:
DataContextError if Validation already exists
"""
raise NotImplementedError

@public_api
@override
def delete(self, validation: Validation) -> Validation:
"""Delete a Validation from the collection.

Parameters:
validation: Validation to delete

Raises:
DataContextError if Validation doesn't exist
"""
raise NotImplementedError

@public_api
@override
def get(self, name: str) -> Validation:
"""Get a Validation from the collection by name.

Parameters:
name: Name of Validation to get

Raises:
DataContextError when Validation is not found.
"""
raise NotImplementedError
31 changes: 31 additions & 0 deletions great_expectations/core/validation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from __future__ import annotations

from typing import TYPE_CHECKING

from great_expectations._docs_decorators import public_api
from great_expectations.compatibility.pydantic import BaseModel

if TYPE_CHECKING:
from great_expectations.core.batch_config import BatchConfig
from great_expectations.core.expectation_suite import ExpectationSuite
from great_expectations.datasource.fluent.interfaces import DataAsset


class Validation(BaseModel):
"""
Responsible for running a suite against data and returning a validation result.

Args:
name: The name of the validation.
data: An asset or batch config to validate.
suite: A grouping of expectations to validate against the data.

"""

name: str
data: DataAsset | BatchConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we've talked about this some, but did we land on this being a union with DataAsset? I know it can be done with discriminated unions, but do we currently have a good story around serialization there? I'm a bit concerned about the ergonomics of accessing validation.data and having to do isinstance checks. Did we rule out a BatchConfig that represents the whole DataAsset?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make this a BatchConfig and use pydantic coercion to allow a user to instantiate one with an asset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've decided to define a protocol and allow this to be a union.

suite: ExpectationSuite
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I make a change to a suite, how can I ensure that changes cascade through our persistence layer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a user has access to both a validation and a suite. They update the suite, how do we guarantee they see that in the validation?

Would be make use of a property and have it do an external call each time someone gets this property?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that a deletion or material change to a suite would result in downstream errors if the validations store didn't stay in sync. Would we need to check every time a suite or batch config was updated/deleted?


@public_api
def run(self):
raise NotImplementedError
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,11 @@
_RuntimeEnvironmentConfigurationProvider,
)
from great_expectations.core.expectation_validation_result import get_metric_kwargs_id
from great_expectations.core.factory import CheckpointFactory, SuiteFactory
from great_expectations.core.factory import (
CheckpointFactory,
SuiteFactory,
ValidationFactory,
)
from great_expectations.core.id_dict import BatchKwargs
from great_expectations.core.serializer import (
AbstractConfigSerializer,
Expand Down Expand Up @@ -329,6 +333,9 @@ def _init_factories(self) -> None:
context=self,
)

# TODO: Update to follow existing pattern once new ValidationStore is implemented
self._validations: ValidationFactory | None = None

def _init_analytics(self) -> None:
init_analytics(
data_context_id=uuid.UUID(self._data_context_id),
Expand Down Expand Up @@ -555,6 +562,14 @@ def checkpoints(self) -> CheckpointFactory:
)
return self._checkpoints

@property
def validations(self) -> ValidationFactory:
if not self._validations:
raise gx_exceptions.DataContextError(
"DataContext requires a configured ValidationStore to persist Validations."
)
return self._validations

@property
def expectations_store_name(self) -> Optional[str]:
return self.variables.expectations_store_name
Expand Down