Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 #4531

Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
aa42f9b
Implement ParameterBuilder evaluation and validation dependencies wit…
Mar 23, 2022
8840ded
Merge branch 'develop' into feature/GREAT-464/GREAT-688/GREAT-691/ale…
Mar 23, 2022
90c1629
type hints
Mar 23, 2022
b59582f
type hints
Mar 23, 2022
fd70e39
type hints
Mar 23, 2022
fb44b70
merge
Mar 23, 2022
6b55bc2
Merge branch 'develop' into feature/GREAT-464/GREAT-688/GREAT-691/ale…
alexsherstinsky Mar 23, 2022
c88451f
Merge remote-tracking branch 'upstream/develop' into feature/GREAT-46…
Mar 23, 2022
28dd2fc
Merge branch 'develop' into feature/GREAT-464/GREAT-688/GREAT-691/ale…
alexsherstinsky Mar 23, 2022
9029833
Implement ParameterBuilder evaluation and validation dependencies wit…
Mar 23, 2022
6335b6e
Merge branch 'develop' into feature/GREAT-464/GREAT-688/GREAT-691/ale…
Mar 23, 2022
619de93
Merge branch 'feature/GREAT-464/GREAT-688/GREAT-691/alexsherstinsky/r…
Mar 23, 2022
1594d97
Merge remote-tracking branch 'upstream/feature/GREAT-464/GREAT-688/GR…
Mar 23, 2022
8c8e632
merge
Mar 23, 2022
72628eb
typo
Mar 23, 2022
c83d51e
docstring
Mar 23, 2022
3f0a9e9
Merge branch 'develop' into feature/GREAT-464/GREAT-688/GREAT-691/ale…
Mar 23, 2022
b5a5460
clean up
Mar 23, 2022
8bd778d
Merge branch 'develop' into feature/GREAT-464/GREAT-688/GREAT-691/ale…
Mar 23, 2022
00e40d4
Empty test commit (make pull request).
Mar 24, 2022
66d1692
Merge remote-tracking branch 'upstream/develop' into feature/GREAT-46…
Mar 24, 2022
0f1a97c
Merge branch 'develop' into feature/GREAT-464/GREAT-688/GREAT-691/ale…
Mar 24, 2022
0a80670
fix tests
Mar 24, 2022
9982c44
Merge branch 'develop' into feature/GREAT-464/GREAT-688/GREAT-691/ale…
Mar 24, 2022
9d28e26
Merge branch 'develop' into feature/GREAT-464/GREAT-688/GREAT-691/ale…
Mar 24, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -245,7 +245,7 @@ def get_validation_dependencies(
metric_value_kwargs=None,
)
#
# NOTE 20201117 - JPC - Would prefer not to include partition_metric_configuraiton here,
# NOTE 20201117 - JPC - Would prefer not to include partition_metric_configuration here,
# since we have already evaluated it, and its result is in the kwargs for the histogram.
# However, currently the dependencies' configurations are not passed to the _validate method
#
Expand Down
36 changes: 32 additions & 4 deletions great_expectations/rule_based_profiler/config/base.py
Expand Up @@ -155,6 +155,7 @@ def __init__(
name: str,
class_name: str,
module_name: Optional[str] = None,
evaluation_parameter_builder_configs: Optional[list] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what this list contains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdkini Yes, it contains ParameterBuilder configurations. Since we did not expand this deeper in the signature of the constructor of the RuleBasedProfilerConfig, I did not do that here either. We can do it (in a separate tiny PR). Thanks.

json_serialize: bool = True,
batch_request: Optional[Union[dict, str]] = None,
**kwargs,
Expand All @@ -167,6 +168,11 @@ def __init__(
if class_name is not None:
self.class_name = class_name

if evaluation_parameter_builder_configs:
self.evaluation_parameter_builder_configs = (
evaluation_parameter_builder_configs
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really would prefer not to conditionally set attributes. ALL instances of a class should have the same shape, with individual instances having different values.

By conditionally setting this attr, we violate a core principle of class-based development.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with class_name and module_name above? I agree, and if done for serialization purposes we should use schema to suppress writing attributes with None values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdkini I think that this is totally fine here -- we do not want empty (None) elements having to watch for in all tests. We use this pattern in other places. If we did not, the very great_expectations.yml would be full of keys with no value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdkini @donaldheppner I am making an attempt to accomplish the same via Schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@donaldheppner @cdkini This approach is not yet possible, because we do not always use Schema.dump() for serializing (we sometimes use to_json_dict(), which can be custom). This is because we do not yet have a tight Marshmallow Schema for all configuration objects — for example, ExpectationConfigurationBuilder (and, correspondingly, ExpectationConfigurationBuilderConfig) can allow any kwargs. With Marshmallow Schema.dump() all of them would be lost. Once we implement a better Schema (with inheritance), this will be possible. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@donaldheppner @cdkini @anthonyburdi Seems like I was able to get it to work with having to fix only one test. Thanks! P.S.: We still have to address the broader deficiency in Marshmallow Schema usage, alluded to above.

self.json_serialize = json_serialize

if batch_request is not None:
Expand Down Expand Up @@ -201,6 +207,15 @@ class Meta:
required=True,
allow_none=False,
)
evaluation_parameter_builder_configs = fields.List(
cls_or_instance=fields.Nested(
lambda: ParameterBuilderConfigSchema(),
required=True,
allow_none=False,
),
required=False,
allow_none=True,
)
json_serialize = fields.Boolean(
required=False,
allow_none=True,
Expand All @@ -219,6 +234,7 @@ def __init__(
class_name: str,
module_name: Optional[str] = None,
meta: Optional[dict] = None,
validation_parameter_builder_configs: Optional[list] = None,
batch_request: Optional[Union[dict, str]] = None,
**kwargs,
):
Expand All @@ -233,6 +249,11 @@ def __init__(
if meta is not None:
self.meta = meta

if validation_parameter_builder_configs:
self.validation_parameter_builder_configs = (
validation_parameter_builder_configs
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same notes here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdkini I think that this is totally fine here -- we do not want empty (None) elements having to watch for in all tests. We use this pattern in other places. If we did not, the very great_expectations.yml would be full of keys with no value.

if batch_request is not None:
self.batch_request = batch_request

Expand Down Expand Up @@ -275,6 +296,15 @@ class Meta:
required=False,
allow_none=True,
)
validation_parameter_builder_configs = fields.List(
cls_or_instance=fields.Nested(
lambda: ParameterBuilderConfigSchema(),
required=True,
allow_none=False,
),
required=False,
allow_none=True,
)
batch_request = fields.Raw(
required=False,
allow_none=True,
Expand Down Expand Up @@ -381,12 +411,10 @@ def __init__(
commented_map: Optional[CommentedMap] = None,
):
self.name = name
self.module_name = module_name
self.class_name = class_name
self.config_version = config_version
self.rules = rules
if class_name is not None:
self.class_name = class_name
if module_name is not None:
self.module_name = module_name
self.variables = variables

super().__init__(commented_map=commented_map)
Expand Down
Expand Up @@ -67,6 +67,7 @@ def __init__(
expectation_type: str,
meta: Optional[Dict[str, Any]] = None,
condition: Optional[str] = None,
validation_parameter_builder_configs: Optional[List[dict]] = None,
batch_list: Optional[List[Batch]] = None,
batch_request: Optional[
Union[str, BatchRequest, RuntimeBatchRequest, dict]
Expand All @@ -80,6 +81,8 @@ def __init__(
meta: the "meta" argument of "ExpectationConfiguration" object to be emitted.
condition: Boolean statement (expressed as string and following specified grammar), which controls whether
or not underlying logic should be executed and thus resulting "ExpectationConfiguration" emitted.
validation_parameter_builder_configs: ParameterBuilder configurations, having whose outputs available (as
fully-qualified parameter names) is pre-requisite for present ExpectationConfigurationBuilder instance.
batch_list: explicitly passed Batch objects for parameter computation (take precedence over batch_request).
batch_request: specified in ParameterBuilder configuration to get Batch objects for parameter computation.
data_context: DataContext
Expand All @@ -88,6 +91,7 @@ def __init__(

super().__init__(
expectation_type=expectation_type,
validation_parameter_builder_configs=validation_parameter_builder_configs,
batch_list=batch_list,
batch_request=batch_request,
data_context=data_context,
Expand Down
Expand Up @@ -4,6 +4,12 @@

from great_expectations.core.batch import Batch, BatchRequest, RuntimeBatchRequest
from great_expectations.core.expectation_configuration import ExpectationConfiguration
from great_expectations.rule_based_profiler.config import ParameterBuilderConfig
from great_expectations.rule_based_profiler.helpers.util import (
init_rule_parameter_builders,
set_batch_list_or_batch_request_on_builder,
)
from great_expectations.rule_based_profiler.parameter_builder import ParameterBuilder
from great_expectations.rule_based_profiler.types import (
Builder,
Domain,
Expand All @@ -15,9 +21,16 @@


class ExpectationConfigurationBuilder(Builder, ABC):
exclude_field_names: Set[str] = Builder.exclude_field_names | {
"validation_parameter_builders",
}

def __init__(
self,
expectation_type: str,
validation_parameter_builder_configs: Optional[
List[ParameterBuilderConfig]
] = None,
batch_list: Optional[List[Batch]] = None,
batch_request: Optional[
Union[str, BatchRequest, RuntimeBatchRequest, dict]
Expand All @@ -30,6 +43,8 @@ def __init__(

Args:
expectation_type: the "expectation_type" argument of "ExpectationConfiguration" object to be emitted.
validation_parameter_builder_configs: ParameterBuilder configurations, having whose outputs available (as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this docstring as written - these configs are optional right? And optionally instead of specifying them here they can be specified in the parameter_builder section of the config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anthonyburdi Let's discuss -- I would like to offer a clarification, which I feel might be useful. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed clarifying this in a follow-on PR.

fully-qualified parameter names) is pre-requisite for present ExpectationConfigurationBuilder instance.
batch_list: explicitly passed Batch objects for parameter computation (take precedence over batch_request).
batch_request: specified in ParameterBuilder configuration to get Batch objects for parameter computation.
data_context: DataContext
Expand All @@ -44,6 +59,11 @@ def __init__(

self._expectation_type = expectation_type

self._validation_parameter_builders = init_rule_parameter_builders(
parameter_builder_configs=validation_parameter_builder_configs,
data_context=self._data_context,
)
Comment on lines +62 to +65
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could init_rule_parameter_builders live on a class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdkini Please see this comment/plan: #4524 (comment) -- thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for closing the loop on this @alexsherstinsky. I actually didn't know that you could directly link comments :)


"""
Since ExpectationConfigurationBuilderConfigSchema allows arbitrary fields (as ExpectationConfiguration kwargs)
to be provided, they must be all converted to public property accessors and/or public fields in order for all
Expand All @@ -65,10 +85,43 @@ def build_expectation_configuration(
variables: Optional[ParameterContainer] = None,
parameters: Optional[Dict[str, ParameterContainer]] = None,
) -> ExpectationConfiguration:
self._resolve_validation_dependencies(
parameter_container=parameter_container,
domain=domain,
variables=variables,
parameters=parameters,
)

return self._build_expectation_configuration(
domain=domain, variables=variables, parameters=parameters
)

def _resolve_validation_dependencies(
self,
parameter_container: ParameterContainer,
domain: Domain,
variables: Optional[ParameterContainer] = None,
parameters: Optional[Dict[str, ParameterContainer]] = None,
) -> None:
validation_parameter_builders: List[ParameterBuilder] = (
self.validation_parameter_builders or []
)

validation_parameter_builder: ParameterBuilder
for validation_parameter_builder in validation_parameter_builders:
set_batch_list_or_batch_request_on_builder(
builder=validation_parameter_builder,
batch_list=self.batch_list,
batch_request=self.batch_request,
force_batch_data=False,
)
validation_parameter_builder.build_parameters(
parameter_container=parameter_container,
domain=domain,
variables=variables,
parameters=parameters,
)

@abstractmethod
def _build_expectation_configuration(
self,
Expand All @@ -81,3 +134,7 @@ def _build_expectation_configuration(
@property
def expectation_type(self) -> str:
return self._expectation_type

@property
def validation_parameter_builders(self) -> Optional[List[ParameterBuilder]]:
return self._validation_parameter_builders
67 changes: 67 additions & 0 deletions great_expectations/rule_based_profiler/helpers/util.py
Expand Up @@ -19,10 +19,12 @@
from great_expectations.data_context.util import instantiate_class_from_config
from great_expectations.execution_engine.execution_engine import MetricDomainTypes
from great_expectations.rule_based_profiler.types import (
PARAMETER_KEY,
VARIABLES_PREFIX,
Builder,
Domain,
ParameterContainer,
get_fully_qualified_parameter_names,
get_parameter_value_by_fully_qualified_parameter_name,
is_fully_qualified_parameter_name_literal_string_format,
)
Expand Down Expand Up @@ -461,6 +463,71 @@ def init_expectation_configuration_builder(
return expectation_configuration_builder


def resolve_evaluation_dependencies(
parameter_builder: "ParameterBuilder", # noqa: F821
parameter_container: ParameterContainer,
domain: Domain,
variables: Optional[ParameterContainer] = None,
parameters: Optional[Dict[str, ParameterContainer]] = None,
) -> None:
Comment on lines +466 to +472
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a docstring here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdkini I will add it (in the next commit).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdkini I added a docstring. Thanks!

"""
This method computes ("resolves") pre-requisite ("evaluation") dependencies (i.e., results of executing other
"ParameterBuilder" objects), whose output(s) are needed by specified "ParameterBuilder" object to fulfill its goals.
"""

# Step 1: Check if any "evaluation_parameter_builders" are configured for specified "ParameterBuilder" object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loving the comments (super helpful in reading the code) but also wondering if this could be refactored into a few smaller methods. Similar thougts with get_metric() actually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shinnnyshinshin I would not want to disturb get_metric() now. In terms of this one, perhaps the comments make it parseable in small enough parts. It is pretty much a single thought in that method.

evaluation_parameter_builders: List[
"ParameterBuilder" # noqa: F821
] = parameter_builder.evaluation_parameter_builders
if not evaluation_parameter_builders:
return

# Step 2: Obtain all fully-qualified parameter names ("variables" and "parameter" keys) in namespace of "Domain"
# (fully-qualified parameter names are stored in "ParameterNode" objects of "ParameterContainer" of "Domain"
# whenever "ParameterBuilder.build_parameters()" is executed for "ParameterBuilder.fully_qualified_parameter_name").
fully_qualified_parameter_names: List[str] = get_fully_qualified_parameter_names(
domain=domain,
variables=variables,
parameters=parameters,
)

# Step 3: Check for presence of fully-qualified parameter names of "ParameterBuilder" objects, obtained by iterating
# over evaluation dependencies. "Execute ParameterBuilder.build_parameters()" if absent from "Domain" scoped list.
evaluation_parameter_builder: "ParameterBuilder" # noqa: F821
for evaluation_parameter_builder in evaluation_parameter_builders:
fully_qualified_evaluation_parameter_builder_name: str = (
f"{PARAMETER_KEY}{evaluation_parameter_builder.name}"
)

if (
fully_qualified_evaluation_parameter_builder_name
not in fully_qualified_parameter_names
):
set_batch_list_or_batch_request_on_builder(
builder=evaluation_parameter_builder,
batch_list=parameter_builder.batch_list,
batch_request=parameter_builder.batch_request,
force_batch_data=False,
)

evaluation_parameter_builder.build_parameters(
parameter_container=parameter_container,
domain=domain,
variables=variables,
parameters=parameters,
)

# Step 4: Any "ParameterBuilder" object, including members of "evaluation_parameter_builders" list may be
# configured with its own "evaluation_parameter_builders" list. Recursive call handles such situations.
resolve_evaluation_dependencies(
parameter_builder=evaluation_parameter_builder,
parameter_container=parameter_container,
domain=domain,
variables=variables,
parameters=parameters,
)


def set_batch_list_or_batch_request_on_builder(
builder: Builder,
batch_list: Optional[List[Batch]] = None,
Expand Down
Expand Up @@ -42,6 +42,7 @@ def __init__(
null_count_parameter_builder_name: Optional[str] = None,
metric_domain_kwargs: Optional[Union[str, dict]] = None,
metric_value_kwargs: Optional[Union[str, dict]] = None,
evaluation_parameter_builder_configs: Optional[List[dict]] = None,
json_serialize: Union[str, bool] = True,
batch_list: Optional[List[Batch]] = None,
batch_request: Optional[
Expand All @@ -60,6 +61,8 @@ def __init__(
null_count_parameter_builder_name: name of parameter that computes null_count (of domain values in Batch).
metric_domain_kwargs: used in MetricConfiguration
metric_value_kwargs: used in MetricConfiguration
evaluation_parameter_builder_configs: ParameterBuilder configurations, executing and making whose respective
ParameterBuilder objects' outputs available (as fully-qualified parameter names) is pre-requisite.
json_serialize: If True (default), convert computed value to JSON prior to saving results.
batch_list: explicitly passed Batch objects for parameter computation (take precedence over batch_request).
batch_request: specified in ParameterBuilder configuration to get Batch objects for parameter computation.
Expand All @@ -73,6 +76,7 @@ def __init__(
enforce_numeric_metric=True,
replace_nan_with_zero=True,
reduce_scalar_metric=True,
evaluation_parameter_builder_configs=evaluation_parameter_builder_configs,
json_serialize=json_serialize,
batch_list=batch_list,
batch_request=batch_request,
Expand Down
Expand Up @@ -33,6 +33,7 @@ def __init__(
enforce_numeric_metric: Union[str, bool] = False,
replace_nan_with_zero: Union[str, bool] = False,
reduce_scalar_metric: Union[str, bool] = True,
evaluation_parameter_builder_configs: Optional[List[dict]] = None,
json_serialize: Union[str, bool] = True,
batch_list: Optional[List[Batch]] = None,
batch_request: Optional[
Expand All @@ -52,13 +53,16 @@ def __init__(
replace_nan_with_zero: if False (default), then if the computed metric gives NaN, then exception is raised;
otherwise, if True, then if the computed metric gives NaN, then it is converted to the 0.0 (float) value.
reduce_scalar_metric: if True (default), then reduces computation of 1-dimensional metric to scalar value.
evaluation_parameter_builder_configs: ParameterBuilder configurations, executing and making whose respective
ParameterBuilder objects' outputs available (as fully-qualified parameter names) is pre-requisite.
json_serialize: If True (default), convert computed value to JSON prior to saving results.
batch_list: explicitly passed Batch objects for parameter computation (take precedence over batch_request).
batch_request: specified in ParameterBuilder configuration to get Batch objects for parameter computation.
data_context: DataContext
"""
super().__init__(
name=name,
evaluation_parameter_builder_configs=evaluation_parameter_builder_configs,
json_serialize=json_serialize,
batch_list=batch_list,
batch_request=batch_request,
Expand Down
Expand Up @@ -71,6 +71,7 @@ def __init__(
truncate_values: Optional[
Union[str, Dict[str, Union[Optional[int], Optional[float]]]]
] = None,
evaluation_parameter_builder_configs: Optional[List[dict]] = None,
json_serialize: Union[str, bool] = True,
batch_list: Optional[List[Batch]] = None,
batch_request: Optional[
Expand Down Expand Up @@ -100,6 +101,8 @@ def __init__(
output. If omitted, then no rounding is performed, unless the computed value is already an integer.
truncate_values: user-configured directive for whether or not to allow the computed parameter values
(i.e., lower_bound, upper_bound) to take on values outside the specified bounds when packaged on output.
evaluation_parameter_builder_configs: ParameterBuilder configurations, executing and making whose respective
ParameterBuilder objects' outputs available (as fully-qualified parameter names) is pre-requisite.
json_serialize: If True (default), convert computed value to JSON prior to saving results.
batch_list: explicitly passed Batch objects for parameter computation (take precedence over batch_request).
batch_request: specified in ParameterBuilder configuration to get Batch objects for parameter computation.
Expand All @@ -113,6 +116,7 @@ def __init__(
enforce_numeric_metric=enforce_numeric_metric,
replace_nan_with_zero=replace_nan_with_zero,
reduce_scalar_metric=reduce_scalar_metric,
evaluation_parameter_builder_configs=evaluation_parameter_builder_configs,
json_serialize=json_serialize,
batch_list=batch_list,
batch_request=batch_request,
Expand Down