# How to Build an RBP
* This Notebook will demonstrate the steps we need to take to generate a simple RBP by initializing the components in memory.

* We will start from a new Great Expectations context (ie `great_expectations` folder after running `great_expectations init`), and begin by adding the Datasource, and progressively adding more components


In [59]:
import great_expectations as ge
from great_expectations.data_context.util import file_relative_path
from ruamel import yaml
from great_expectations.core.batch import BatchRequest
from great_expectations.core import ExpectationSuite
from great_expectations.rule_based_profiler.domain_builder import (
    DomainBuilder,
    SimpleColumnSuffixDomainBuilder,
    SimpleSemanticTypeColumnDomainBuilder,
    ColumnDomainBuilder,
    TableDomainBuilder,
    MapMetricColumnDomainBuilder,
    CategoricalColumnDomainBuilder
)
from great_expectations.rule_based_profiler.parameter_builder import (
    MetricMultiBatchParameterBuilder,
    MeanUnexpectedMapMetricMultiBatchParameterBuilder,
)
from great_expectations.rule_based_profiler.expectation_configuration_builder import (
    DefaultExpectationConfigurationBuilder,
)
from great_expectations.rule_based_profiler.types import ParameterContainer

from great_expectations.rule_based_profiler.rule.rule import Rule
from great_expectations.rule_based_profiler.rule_based_profiler import RuleBasedProfiler


In [60]:
data_context: ge.DataContext = ge.get_context()

## Set-up: Adding taxi_data datasource
* Add taxi_data as a new datasource
* We are using an `InferredAssetFilesystemDataConnector` to connect to data in the `test_sets/taxi_yellow_tripdata_samples` folder and get one DataAsset (`yellow_tripdata_sample_2018`) that has 12 batches (1 batch/month).

In [61]:
data_path: str = "../../../../test_sets/taxi_yellow_tripdata_samples"

datasource_config = {
    "name": "taxi_multi_batch_datsource",
    "class_name": "Datasource",
    "module_name": "great_expectations.datasource",
    "execution_engine": {
        "module_name": "great_expectations.execution_engine",
        "class_name": "PandasExecutionEngine",
    },
    "data_connectors": {
        "default_inferred_data_connector_name": {
            "class_name": "InferredAssetFilesystemDataConnector",
            "base_directory": data_path,
            "default_regex": {
                "group_names": ["data_asset_name", "month"],
                "pattern": "(yellow_tripdata_sample_2018)-(\\d.*)\\.csv",
            },
        },
        "default_inferred_data_connector_name_all_years": {
            "class_name": "InferredAssetFilesystemDataConnector",
            "base_directory": data_path,
            "default_regex": {
                "group_names": ["data_asset_name", "year", "month"],
                "pattern": "(yellow_tripdata_sample)_(\\d.*)-(\\d.*)\\.csv",
            },
        },
    },
}

data_context.test_yaml_config(yaml.dump(datasource_config))
data_context.add_datasource(**datasource_config)

Attempting to instantiate class from config...
	Instantiating as a Datasource, since class_name is Datasource
	Successfully instantiated Datasource


ExecutionEngine class name: PandasExecutionEngine
Data Connectors:
	default_inferred_data_connector_name : InferredAssetFilesystemDataConnector

	Available data_asset_names (1 of 1):
		yellow_tripdata_sample_2018 (3 of 12): ['yellow_tripdata_sample_2018-01.csv', 'yellow_tripdata_sample_2018-02.csv', 'yellow_tripdata_sample_2018-03.csv']

	Unmatched data_references (3 of 29):['.DS_Store', 'first_3_files', 'random_subsamples']

	default_inferred_data_connector_name_all_years : InferredAssetFilesystemDataConnector

	Available data_asset_names (1 of 1):
		yellow_tripdata_sample (3 of 36): ['yellow_tripdata_sample_2018-01.csv', 'yellow_tripdata_sample_2018-02.csv', 'yellow_tripdata_sample_2018-03.csv']

	Unmatched data_references (3 of 5):['.DS_Store', 'first_3_files', 'random_subsamples']



<great_expectations.datasource.new_datasource.Datasource at 0x7fab7c137eb0>

# Example 1:  RBP with just a `DomainBuilder` and `ExpectationConfigurationBuilder`

## Build a `DomainBuilder`

* In the process of building a RuleBasedProfiler, one of the first components we want to build/test
is DomainBuilder, which returns the domains (in this case columns of our data) that the profiler
will be run on. Domain builders basically give the Expectation. So some of it is giong to be driven by the Exepctation you want the profiler to output, but there are a few `DomainBuilders` that allow you to do more sophisticaed filtering on your data, depending on the column name, cardinality or type. 

These include:

 * `SimpleColumnSuffixDomainBuilder` which allows you to choose columns based on their suffix. In our example, we iwll be using this DomainBuilder, to allow the Profiler to output the columns that have "`_amount`" in the suffix. 
 * `CategoricalColumnDomainBuilder` which allows you to choose columns based on their cardinality (number of unique values).
 * `SimpleSemanticTypeDomainBuilder` which allows you to choose columns based on their semantic types (such as numeric, or text).

These are `DomainBuilders` that do not do any additional filtering, but are required by the Expectations that are being built by the RBP. 
 * `ColumnDomainBuilder`: Outputs Column Domains, which are required by `ColumnExpectations` like (`expect_column_median_to_be_between`).
 * `TableDomainBuilder`:  Outputs Table Domain, which is required by Expectations that act on Tables, like (`expect_table_row_count_to_equal`, or `expect_table_columns_to_match_set`). 
 
 ** 
  * `MapMetricColumnDomainBuilder`: which allows you to choose columns based on MapMetrics like `column_values.nonnull`. 
 * How well does this Column handle across batches? 
     - if there is a 100 batches, then  it is unique. 
     - if it is only 50% then not good. 
     - then looking at all column. 
 *


#### `SimpleColumnSuffixDomainBuilder`

In [4]:
batch_request: BatchRequest = BatchRequest(
    datasource_name="taxi_multibatch_datasource_other_possibility",
    data_connector_name="default_inferred_data_connector_name",
    data_asset_name="yellow_tripdata_sample_2018",
    data_connector_query={"index": -1},
)
domain_builder: DomainBuilder = SimpleColumnSuffixDomainBuilder(
    data_context=data_context,
    batch_request=batch_request,
    column_name_suffixes=["_amount"],
)
domains: list = domain_builder.get_domains()


Calculating Metrics:   0%|          | 0/2 [00:00<?, ?it/s]

In [5]:
# assert that the domains we get are the ones we expect
assert len(domains) == 4
assert domains == [
    {"domain_type": "column", "domain_kwargs": {"column": "fare_amount"}},
    {"domain_type": "column", "domain_kwargs": {"column": "tip_amount"}},
    {"domain_type": "column", "domain_kwargs": {"column": "tolls_amount"}},
    {"domain_type": "column", "domain_kwargs": {"column": "total_amount"}},
]

To continue our example, we will continue building a `RuleBasedProfiler` using our `SimpleColumnSuffixDomainBuilder`

# Build Rule
* The first `Rule` that we build will output `expect_column_values_to_not_be_null` because it does not take in  additional information other than Domain. We will add `ParameterBuilders` in a subsequent example.

In [6]:

default_expectation_configuration_builder = DefaultExpectationConfigurationBuilder(
    expectation_type="expect_column_values_to_not_be_null",
    column="$domain.domain_kwargs.column", # domain kwargs that are retrieved from DomainBuilder
)

In [7]:
simple_rule: Rule = Rule(
    name="rule_with_no_variables_no_parameters",
    domain_builder=domain_builder,
    expectation_configuration_builders=[default_expectation_configuration_builder],
)

# Create RuleBasedProfiler and add Rule
* We create a simple RuleBasedProfiler and add the `Rule` that we added in the previous step is added to the Profiler. When we run the Profiler, the output is an `ExpectationSuite` with 4 `Expectations`, which we expect.

In [8]:
my_rbp: RuleBasedProfiler = RuleBasedProfiler(
    name="my_simple_rbp", data_context=data_context, config_version=1.0
)

In [9]:
my_rbp.add_rule(rule=simple_rule)


In [10]:
res: ExpectationSuite = my_rbp.run()

Could not convert existing variables to dict: 'NoneType' object is not subscriptable


Calculating Metrics:   0%|          | 0/2 [00:00<?, ?it/s]

In [11]:
assert len(res.expectations) == 4

In [12]:
res.expectations

[{"expectation_type": "expect_column_values_to_not_be_null", "kwargs": {"column": "fare_amount"}, "meta": {}},
 {"expectation_type": "expect_column_values_to_not_be_null", "kwargs": {"column": "tip_amount"}, "meta": {}},
 {"expectation_type": "expect_column_values_to_not_be_null", "kwargs": {"column": "tolls_amount"}, "meta": {}},
 {"expectation_type": "expect_column_values_to_not_be_null", "kwargs": {"column": "total_amount"}, "meta": {}}]

# Example 2: RBP with `DomainBuilder`, `ParameterBuilder` `ExpectationConfigurationBuilder`

## Build a DomainBuilder
* Using same `SimpleColumnSuffixDomainBuilder` from previous example.

In [13]:
batch_request: BatchRequest = BatchRequest(
    datasource_name="taxi_multibatch_datasource_other_possibility",
    data_connector_name="default_inferred_data_connector_name",
    data_asset_name="yellow_tripdata_sample_2018",
    data_connector_query={"index": -1},
)
domain_builder: DomainBuilder = SimpleColumnSuffixDomainBuilder(
    data_context=data_context,
    batch_request=batch_request,
    column_name_suffixes=["_amount"],
)
domains: list = domain_builder.get_domains()


Calculating Metrics:   0%|          | 0/2 [00:00<?, ?it/s]

## Build a ParameterBuilder

`ParameterBuilders` help calcluate "reasonable" parameters for Expectations based on data that is specified by a `BatchRequest`.

The largest categories include: 
- `metric_multi_batch_parameter_builder`: Which is able to calculate a numeric metric (like `column.min`) across multiple batches (or just one batch).
- `value_set_multi_batch_parameter_builder`: Which is able to build a value set across multiple batches (or just one batch). 

In special cases, there is a better way to build a value set using regex or dates. 
- `regex_pattern_string_parameter_builder`: Which contains a set of default regex patterns and builds a value set of the best-matching patterns. Users are also able to pass in new patterns as a parameter. 
- `simple_date_format_string_parameter_builder`: Which contains a st of default date_time_format patterns and builds a value set of the best-matching patterns. Users are also able to pass in new patterns as a parameter. 

Across multiple-batches, we can build more-sophisticated parameters by using sampling methods. 
- `numeric_range_multi_batch_parameter_builder`: Which is able to provide range estimations across batches using sampling methods. For instance, if we expect a table's row_count to change between batches, we could calculate the min / max values of row_count by using the `NumericMetricRangeMultiBatchParameterBuilder`. These parameters could then be used by `ExpectTableRowCountToBeBetween`


In [14]:
numeric_range_parameter_builder: MetricMultiBatchParameterBuilder = (
    MetricMultiBatchParameterBuilder(
        data_context=data_context,
        batch_request=batch_request,
        metric_name="column.min",
        metric_domain_kwargs="$domain.domain_kwargs",  # domain kwarg values are accessible using fully qualified parameters
        name="my_column_min",
    )
)

## Build a ExpectationConfigurationBuilder

`ExpectationConfigurationBuilder` is being built for `expect_column_values_to_be_greater_than` which will use the `column.min` values that are calculated using the `ParameterBuilder`. These are now accessibel using the fully qualified parameter. `$parameter.my_column_min.value[-1]`. For more details on why this is the case, please have a look at the Appendix

In [15]:
config_builder: DefaultExpectationConfigurationBuilder = (
    DefaultExpectationConfigurationBuilder(
        expectation_type="expect_column_values_to_be_greater_than",
        value="$parameter.my_column_min.value[-1]", # the parameter is accessible using a fully qualified parameter
        column="$domain.domain_kwargs.column", # domain kwarg values are accessible using fully qualified parameters
        name="my_column_min",
    )
)

## Build a `Rule`, `RuleBasedProfiler`, and run 

`ExpectationConfigurationBuilder` is being built for `expect_column_values_to_be_greater_than` which will use the `column.min` values that are calculated using the `ParameterBuilder`. These are now accessibel using the fully qualified parameter. `$parameter.my_column_min.value[-1]`. For more details on why this is the case, please have a look at the Appendix

In [16]:
simple_rule: Rule = Rule(
    name="rule_with_variables_and_parameters",
    domain_builder=domain_builder,
    parameter_builders=[numeric_range_parameter_builder],
    expectation_configuration_builders=[config_builder],
)

In [17]:
my_rbp = RuleBasedProfiler(name="my_rbp", data_context=data_context
                           , config_version=1.0)


In [18]:
my_rbp.add_rule(rule=simple_rule)

In [19]:
res: ExpectationSuite = my_rbp.run()

Could not convert existing variables to dict: 'NoneType' object is not subscriptable


Calculating Metrics:   0%|          | 0/2 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/4 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/4 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/4 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/4 [00:00<?, ?it/s]

In [20]:
assert len(res.expectations) == 4

In [21]:
res.expectations

[{"expectation_type": "expect_column_values_to_be_greater_than", "kwargs": {"name": "my_column_min", "value": -80.0, "column": "fare_amount"}, "meta": {}},
 {"expectation_type": "expect_column_values_to_be_greater_than", "kwargs": {"name": "my_column_min", "value": 0.0, "column": "tip_amount"}, "meta": {}},
 {"expectation_type": "expect_column_values_to_be_greater_than", "kwargs": {"name": "my_column_min", "value": 0.0, "column": "tolls_amount"}, "meta": {}},
 {"expectation_type": "expect_column_values_to_be_greater_than", "kwargs": {"name": "my_column_min", "value": -80.3, "column": "total_amount"}, "meta": {}}]

The resulting `ExpectationSuite` now contain values (`-80.0`, `0.0` etc) that were calculated from the `Batch` of data defined by the `BatchRequest`.


# Appendix
* Additional examples that were not included because of flow.

## `DomainBuilders`

#### `ColumnDomainBuilder`

In [22]:
batch_request: BatchRequest = BatchRequest(
    datasource_name="taxi_multibatch_datasource_other_possibility",
    data_connector_name="default_inferred_data_connector_name",
    data_asset_name="yellow_tripdata_sample_2018",
)

In [23]:
domain_builder: DomainBuilder = ColumnDomainBuilder(
    data_context=data_context,
    batch_request=batch_request,
)
domains: list = domain_builder.get_domains()
assert len(domains) == 18

Calculating Metrics:   0%|          | 0/2 [00:00<?, ?it/s]

#### `TableDomainBuilder`

In [24]:
domain_builder: DomainBuilder = TableDomainBuilder(
    data_context=data_context,
    batch_request=batch_request,
)
domains: list = domain_builder.get_domains()
domains

[{
   "domain_type": "table"
 }]

#### `MapMetricColumnDomainBuilder`

In [25]:
domain_builder: DomainBuilder = MapMetricColumnDomainBuilder(
    data_context=data_context,
    batch_request=batch_request,
    map_metric_name="column_values.nonnull"
)
domains: list = domain_builder.get_domains()
len(domains) == 17 # filtered 1 column that was all Null

Calculating Metrics:   0%|          | 0/2 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/12 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/468 [00:00<?, ?it/s]

True

#### `CategoricalColumnDomainBuilder`

batch_request: BatchRequest = BatchRequest(
    datasource_name="taxi_multibatch_datasource_other_possibility",
    data_connector_name="default_inferred_data_connector_name",
    data_asset_name="yellow_tripdata_sample_2018",
)

In [26]:
domain_builder: DomainBuilder = CategoricalColumnDomainBuilder(
    batch_request=batch_request,
    data_context=data_context,
    limit_mode="very_few", # VERY_FEW = 10 or less
)

* The `CategoricalColumnDomainBuilder` will take in various `limit_modes` for cardinality. For a list of valid modes, along with the associated values, please refer to:

https://github.com/great-expectations/great_expectations/blob/dd5aee4a1c47c4d2ab2352e330b31fa686fdaf10/great_expectations/rule_based_profiler/helpers/cardinality_checker.py#L38

In [27]:
domains: list = domain_builder.get_domains()
assert len(domains) == 9

Calculating Metrics:   0%|          | 0/2 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/252 [00:00<?, ?it/s]

#### `SimpleSemanticTypeColumnDomainBuilder`


In [28]:
domain_builder: DomainBuilder = SimpleSemanticTypeColumnDomainBuilder(
    batch_request=batch_request,
    data_context=data_context,
    semantic_types=['numeric']
)

In [29]:
domains: list = domain_builder.get_domains()
assert len(domains) == 15

Calculating Metrics:   0%|          | 0/2 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

## `ParameterBuilders`

`ParameterBuilders` work under the hood by populating a `ParameterContainer`, which can also be shared by multiple `ParameterBuilders`. It requires a domain, and metric_name, with domain_kwargs accessible from the DomainBuilder using the fully qualified parameter `$domain.domain_kwargs`.


** Adding Buildding ParameterCondainter : 150

#### `MetricMultiBatchParameterBuilder`

A Single/Multi-Batch implementation for obtaining a resolved (evaluated) metric, using domain_kwargs, value_kwargs, and metric_name as arguments.

In [30]:
single_batch_batch_request: BatchRequest = BatchRequest(
    datasource_name="taxi_multibatch_datasource_other_possibility",
    data_connector_name="default_inferred_data_connector_name",
    data_asset_name="yellow_tripdata_sample_2018",
    data_connector_query={"index": -1},
)
domain_builder: DomainBuilder = SimpleColumnSuffixDomainBuilder(
    data_context=data_context,
    batch_request=single_batch_batch_request,
    column_name_suffixes=["_amount"],
)

In [31]:
domains = domain_builder.get_domains()

Calculating Metrics:   0%|          | 0/2 [00:00<?, ?it/s]

In [32]:
domain_to_build_param = domains[0]

In [33]:
numeric_range_parameter_builder: MetricMultiBatchParameterBuilder = (
    MetricMultiBatchParameterBuilder(
        data_context=data_context,
        batch_request=single_batch_batch_request,
        metric_name="column.min",
        metric_domain_kwargs="$domain.domain_kwargs",
        name="my_column_min",
    )
)

In [34]:
parameter_container: ParameterContainer = ParameterContainer(parameter_nodes=None)

In [35]:
# Here we are just passing in domain
numeric_range_parameter_builder.build_parameters(domain=domain_to_build_param, parameter_container=parameter_container)
# we check the parameter container
print(parameter_container.parameter_nodes)


Calculating Metrics:   0%|          | 0/4 [00:00<?, ?it/s]

{'parameter': {'parameter': {'my_column_min': {'value': [-80.0], 'details': {'metric_configuration': {'metric_name': 'column.min', 'domain_kwargs': {'column': 'fare_amount'}, 'metric_value_kwargs': None, 'metric_dependencies': None}, 'num_batches': 1}}}}}


This means that the `ExpectationConfigurationBuilder` can access the parameter through the following fully-qualified parameter: `$parameter.my_column_min.value[-1]`

#### `ValueSetMultiBatchParameterBuilder`

In [102]:
from great_expectations.rule_based_profiler.types.domain import Domain
from great_expectations.execution_engine.execution_engine import MetricDomainTypes
from great_expectations.rule_based_profiler.parameter_builder.value_set_multi_batch_parameter_builder import (
    ValueSetMultiBatchParameterBuilder,
    _get_unique_values_from_nested_collection_of_sets,
)
from great_expectations.rule_based_profiler.parameter_builder.regex_pattern_string_parameter_builder import RegexPatternStringParameterBuilder
from great_expectations.rule_based_profiler.parameter_builder.simple_date_format_string_parameter_builder import SimpleDateFormatStringParameterBuilder
from great_expectations.rule_based_profiler.parameter_builder.numeric_metric_range_multi_batch_parameter_builder import NumericMetricRangeMultiBatchParameterBuilder

  and should_run_async(code)


In [68]:
single_batch_batch_request: BatchRequest = BatchRequest(
    datasource_name="taxi_multibatch_datasource_other_possibility",
    data_connector_name="default_inferred_data_connector_name",
    data_asset_name="yellow_tripdata_sample_2018",
    data_connector_query={"index": -1},
)
domain: Domain = Domain(domain_type=MetricDomainTypes.COLUMN, domain_kwargs = {'column': 'vendor_id'})

In [69]:
parameter_container: ParameterContainer = ParameterContainer(parameter_nodes=None)

In [70]:
single_batch_request_dict: dict = {
    "datasource_name":"taxi_multibatch_datasource_other_possibility",
    "data_connector_name": "default_inferred_data_connector_name",
    "data_asset_name":"yellow_tripdata_sample_2018",
}
    

In [71]:
value_set_parameter_builder: ValueSetMultiBatchParameterBuilder = (
    ValueSetMultiBatchParameterBuilder(
        data_context=data_context,
        batch_request=single_batch_batch_request,
        metric_domain_kwargs=domain.domain_kwargs,
        name="my_value_set",
    )
)

In [72]:
value_set_parameter_builder.build_parameters(
    parameter_container=parameter_container,
    domain=domain,
    parameters = {domain.id: parameter_container}
)

Calculating Metrics:   0%|          | 0/4 [00:00<?, ?it/s]

In [73]:
print(parameter_container.parameter_nodes)


{'parameter': {'parameter': {'my_value_set': {'value': [1, 2, 4], 'details': {'metric_configuration': {'metric_name': 'column.distinct_values', 'domain_kwargs': {'column': 'vendor_id'}, 'metric_value_kwargs': None, 'metric_dependencies': None}, 'num_batches': 1}}}}}


#### `RegexPatternStringParameterBuilder`

In [74]:
single_batch_batch_request: BatchRequest = BatchRequest(
    datasource_name="taxi_multibatch_datasource_other_possibility",
    data_connector_name="default_inferred_data_connector_name",
    data_asset_name="yellow_tripdata_sample_2018",
    data_connector_query={"index": -1},
)
domain: Domain = Domain(domain_type=MetricDomainTypes.COLUMN, domain_kwargs = {'column': 'vendor_id'})

In [75]:
parameter_container: ParameterContainer = ParameterContainer(parameter_nodes=None)

In [76]:
regex_parameter_builder: RegexPatternStringParameterBuilder = (
    RegexPatternStringParameterBuilder(
        data_context=data_context,
        batch_request=single_batch_batch_request,
        metric_domain_kwargs=domain.domain_kwargs,
        name="my_value_set",
    )
)

In [80]:
regex_parameter_builder.build_parameters(
    parameter_container=parameter_container,
    domain=domain,
    parameters = {domain.id: parameter_container}
)

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/25 [00:00<?, ?it/s]

  return func(self, *args, **kwargs)


In [81]:
print(parameter_container.parameter_nodes)


{'parameter': {'parameter': {'my_value_set': {'value': [], 'details': {'evaluated_regexes': {'^\\s+/': 0.0, '\\s+/$': 0.0, '\\b[0-9a-fA-F]{8}\\b-[0-9a-fA-F]{4}-[0-5][0-9a-fA-F]{3}-[089ab][0-9a-fA-F]{3}-\\b[0-9a-fA-F]{12}\\b ': 0.0, '/https?:\\/\\/(www\\.)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%_\\+.~#()?&//=]*)/': 0.0, '/\\d+/': 0.0, '/[A-Za-z0-9\\.,;:!?()\\"\'%\\-]+/': 0.0, '/<\\/?(?:p|a|b|img)(?: \\/)?>/': 0.0, '/-?\\d+/': 0.0, '/-?\\d+(\\.\\d*)?/': 0.0, '/(?:[A-Fa-f0-9]){0,4}(?: ?:? ?(?:[A-Fa-f0-9]){0,4}){0,7}/': 0.0, '/(?:25[0-5]|2[0-4]\\d|[01]\\d{2}|\\d{1,2})(?:.(?:25[0-5]|2[0-4]\\d|[01]\\d{2}|\\d{1,2})){3}/': 0.0}, 'threshold': 1.0}}}}}


#### `SimpleDateFormatStringParameterBuilder`

In [91]:
single_batch_batch_request: BatchRequest = BatchRequest(
    datasource_name="taxi_multibatch_datasource_other_possibility",
    data_connector_name="default_inferred_data_connector_name",
    data_asset_name="yellow_tripdata_sample_2018",
    data_connector_query={"index": -1},
)
domain: Domain = Domain(domain_type=MetricDomainTypes.COLUMN, domain_kwargs = {'column': 'pickup_datetime'})

In [92]:
parameter_container: ParameterContainer = ParameterContainer(parameter_nodes=None)

In [93]:
simple_date_format_string_parameter_builder: SimpleDateFormatStringParameterBuilder = (
    SimpleDateFormatStringParameterBuilder(
        data_context=data_context,
        batch_request=single_batch_batch_request,
        metric_domain_kwargs=domain.domain_kwargs,
        name="my_value_set",
    )
)

In [95]:
simple_date_format_string_parameter_builder.build_parameters(
    parameter_container=parameter_container,
    domain=domain,
    parameters = {domain.id: parameter_container}
)

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/121 [00:00<?, ?it/s]

In [96]:
print(parameter_container.parameter_nodes)

{'parameter': {'parameter': {'my_value_set': {'value': '%Y-%m-%d %H:%M:%S', 'details': {'success_ratio': 1.0, 'candidate_strings': ['%H:%M:%S', '%H:%M:%S,%f', '%H:%M:%S.%f', '%Y %b %d %H:%M:%S.%f', '%Y %b %d %H:%M:%S.%f %Z', '%Y %b %d %H:%M:%S.%f*%Z', '%Y%m%d %H:%M:%S.%f', '%Y-%m-%d', '%Y-%m-%d %H:%M:%S', '%Y-%m-%d %H:%M:%S %z', '%Y-%m-%d %H:%M:%S%z', '%Y-%m-%d %H:%M:%S,%f', '%Y-%m-%d %H:%M:%S,%f%z', '%Y-%m-%d %H:%M:%S.%f', '%Y-%m-%d %H:%M:%S.%f%z', "%Y-%m-%d'T'%H:%M:%S", "%Y-%m-%d'T'%H:%M:%S%z", "%Y-%m-%d'T'%H:%M:%S'%z'", "%Y-%m-%d'T'%H:%M:%S.%f", "%Y-%m-%d'T'%H:%M:%S.%f'%z'", '%Y-%m-%d*%H:%M:%S', '%Y-%m-%d*%H:%M:%S:%f', '%Y-%m-%dT%z', '%Y/%m/%d', '%Y/%m/%d*%H:%M:%S', '%b %d %H:%M:%S', '%b %d %H:%M:%S %Y', '%b %d %H:%M:%S %z', '%b %d %H:%M:%S %z %Y', '%b %d %Y %H:%M:%S', '%b %d, %Y %H:%M:%S %p', '%d %b %Y %H:%M:%S', '%d %b %Y %H:%M:%S*%f', '%d-%b-%Y %H:%M:%S', '%d-%b-%Y %H:%M:%S.%f', '%d-%m-%Y', '%d/%b %H:%M:%S,%f', '%d/%b/%Y %H:%M:%S', '%d/%b/%Y:%H:%M:%S', '%d/%b/%Y:%H:%M:%S %z', '%d

#### `NumericMetricRangeMultiBatchParameterBuilder`

* A Multi-Batch implementation for obtaining the range estimation bounds for a resolved (evaluated) numeric metric, using domain_kwargs, value_kwargs, metric_name, and false_positive_rate (tolerance) as arguments.  This Multi-Batch ParameterBuilder is general in the sense that any metric that computes numbers can be accommodated.
On the other hand, it is specific in the sense that the parameter names will always have the semantics of numeric ranges, which will incorporate the requirements, imposed by the configured false_positive_rate tolerances.

The implementation supports two methods of estimating parameter values from data:
* bootstrapped (default) -- a statistical technique (see "https://en.wikipedia.org/wiki/Bootstrapping_(statistics)")
* one-shot -- assumes that metric values, computed on batch data, are normally distributed and computes the mean
  and the standard error using the queried batches as the single sample of the distribution (fast, but inaccurate).

In [99]:
single_batch_batch_request: BatchRequest = BatchRequest(
    datasource_name="taxi_multibatch_datasource_other_possibility",
    data_connector_name="default_inferred_data_connector_name",
    data_asset_name="yellow_tripdata_sample_2018",
)
domain: Domain = Domain(domain_type=MetricDomainTypes.COLUMN, domain_kwargs = {'column': 'total_amount'})

In [103]:
numeric_metric_range_parameter_builder: NumericMetricRangeMultiBatchParameterBuilder = NumericMetricRangeMultiBatchParameterBuilder(
    name="row_count_range",
    metric_name="column.mean",
    sampling_method="bootstrap",
    metric_domain_kwargs=domain.domain_kwargs,
    false_positive_rate=1.0e-2,
    round_decimals=0,
    data_context=data_context,
    batch_request=batch_request,
)

In [104]:
simple_date_format_string_parameter_builder.build_parameters(
    parameter_container=parameter_container,
    domain=domain,
    parameters = {domain.id: parameter_container}
)

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/121 [00:00<?, ?it/s]

In [105]:
print(parameter_container.parameter_nodes)

{'parameter': {'parameter': {'my_value_set': {'value': '%Y-%m-%d %H:%M:%S', 'details': {'success_ratio': 1.0, 'candidate_strings': ['%H:%M:%S', '%H:%M:%S,%f', '%H:%M:%S.%f', '%Y %b %d %H:%M:%S.%f', '%Y %b %d %H:%M:%S.%f %Z', '%Y %b %d %H:%M:%S.%f*%Z', '%Y%m%d %H:%M:%S.%f', '%Y-%m-%d', '%Y-%m-%d %H:%M:%S', '%Y-%m-%d %H:%M:%S %z', '%Y-%m-%d %H:%M:%S%z', '%Y-%m-%d %H:%M:%S,%f', '%Y-%m-%d %H:%M:%S,%f%z', '%Y-%m-%d %H:%M:%S.%f', '%Y-%m-%d %H:%M:%S.%f%z', "%Y-%m-%d'T'%H:%M:%S", "%Y-%m-%d'T'%H:%M:%S%z", "%Y-%m-%d'T'%H:%M:%S'%z'", "%Y-%m-%d'T'%H:%M:%S.%f", "%Y-%m-%d'T'%H:%M:%S.%f'%z'", '%Y-%m-%d*%H:%M:%S', '%Y-%m-%d*%H:%M:%S:%f', '%Y-%m-%dT%z', '%Y/%m/%d', '%Y/%m/%d*%H:%M:%S', '%b %d %H:%M:%S', '%b %d %H:%M:%S %Y', '%b %d %H:%M:%S %z', '%b %d %H:%M:%S %z %Y', '%b %d %Y %H:%M:%S', '%b %d, %Y %H:%M:%S %p', '%d %b %Y %H:%M:%S', '%d %b %Y %H:%M:%S*%f', '%d-%b-%Y %H:%M:%S', '%d-%b-%Y %H:%M:%S.%f', '%d-%m-%Y', '%d/%b %H:%M:%S,%f', '%d/%b/%Y %H:%M:%S', '%d/%b/%Y:%H:%M:%S', '%d/%b/%Y:%H:%M:%S %z', '%d