-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MAINTENANCE] Update to MultiBatch Notebook to include Inferred - Sql #5958
[MAINTENANCE] Update to MultiBatch Notebook to include Inferred - Sql #5958
Conversation
✅ Deploy Preview for niobium-lead-7998 ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
👇 Click on the image for a new way to code review
Legend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Thank you for clearly laying this out. I have some questions and suggested changes. Happy to discuss if helpful.
...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb
Outdated
Show resolved
Hide resolved
...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb
Outdated
Show resolved
Hide resolved
...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb
Outdated
Show resolved
Hide resolved
...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb
Outdated
Show resolved
Hide resolved
{ | ||
"data": { | ||
"text/plain": [ | ||
"[<great_expectations.core.batch.Batch at 0x7f7d61cd5f40>,\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this showing 12 batches if it is called a single_batch_batch_request?
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"validator.save_expectation_suite()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should look at the expectation stored in the suite here (and see the params stored, discuss that they are from the min/max median value of all batches)?
"id": "477381f5", | ||
"metadata": {}, | ||
"source": [ | ||
"Now the ExpectationSuite can be used to validate single batches using a Checkpoint. In our example, let's validate a different table, `yellow_tripdata_sample_2020_02`, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020_01`." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure where these tables are coming from - I thought we only had one table yellow_tripdata_sample_2020
and we are pulling different sets of batches from that?
"id": "477381f5", | ||
"metadata": {}, | ||
"source": [ | ||
"Now the ExpectationSuite can be used to validate single batches using a Checkpoint. In our example, let's validate a different table, `yellow_tripdata_sample_2020_02`, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020_01`." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Now the ExpectationSuite can be used to validate single batches using a Checkpoint. In our example, let's validate a different table, `yellow_tripdata_sample_2020_02`, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020_01`." | |
"Now the ExpectationSuite we built using all batches can be used to validate single batches using a Checkpoint. For example, we can run this checkpoint on new data when it comes in next month. In our example, let's validate a different table, `yellow_tripdata_sample_2020_02`, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020_01`." |
{ | ||
"data": { | ||
"text/plain": [ | ||
"False" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be True if we created the expectation based on all of the batches?
"The signature of the `InferredAssetSqlDataConnector` also contains the following required parameters:\n", | ||
"- `name`: string describing the name of this DataConnector.\n", | ||
"- `datasource_name`: the name of the Datasource that contains it.\n", | ||
"- `execution_engine`: an ExecutionEngine.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"- `execution_engine`: an ExecutionEngine.\n", | |
"- `execution_engine`: The type of ExecutionEngine to use.\n", |
…i-batch-notebook * develop: [Maintenance] Randomize the non-comprehensive tests (#5968) [RELEASE] 0.15.22 (#5973) [BUGFIX] Spark column.distinct_values no longer returns entire table distinct values (#5969) [MAINTENANCE] Use DataContext to ignore progress bars (#5959) [MAINTENANCE] Bump `Marshmallow` upper bound to work with Airflow operator (#5952) [MAINTENANCE] Add x-fails to flaky Cloud tests for purposes of 0.15.22 (#5964) [BUGFIX] Prevent "division by zero" errors in Rule-Based Profiler calculations when Batch has zero rows (#5960) [FEATURE] Improve slack error condition (#5818)
…ok' of https://github.com/great-expectations/great_expectations into m/GREAT-1226/GREAT-1228/inferred-sql-multi-batch-notebook * 'm/GREAT-1226/GREAT-1228/inferred-sql-multi-batch-notebook' of https://github.com/great-expectations/great_expectations: [MAINTENANCE] Expectation suite new unit tests for add_citation (#5966) [MAINTENANCE] Expectation suite init unit tests + types (#5957) [MAINTENANCE] DatasourceStore refactoring (#5941)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just some tiny nits
"metadata": {}, | ||
"source": [ | ||
"### Typical Workflow\n", | ||
"A `batch_list` becomes really useful when you are calculating parameters for auto-initializing Expectations, as they us a `RuleBasedProfiler` under-the-hood to calculate parameters." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"A `batch_list` becomes really useful when you are calculating parameters for auto-initializing Expectations, as they us a `RuleBasedProfiler` under-the-hood to calculate parameters." | |
"A `batch_list` becomes really useful when you are calculating parameters for auto-initializing Expectations, as they use a `RuleBasedProfiler` under-the-hood to calculate parameters." |
"id": "f18d096f", | ||
"metadata": {}, | ||
"source": [ | ||
"The observed value for our `yellow_tripdata_sample_2020` table where `trip_distance` is going to be `1.75`, which means the Expectation fails. We guessed wrong - but we can do better!\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The observed value for our `yellow_tripdata_sample_2020` table where `trip_distance` is going to be `1.75`, which means the Expectation fails. We guessed wrong - but we can do better!\"" | |
"The observed value for our `yellow_tripdata_sample_2020` table where `trip_distance` is going to be `1.75`, which means the Expectation fails. We guessed wrong - but we can do better!" |
"id": "477381f5", | ||
"metadata": {}, | ||
"source": [ | ||
"Now the ExpectationSuite we built using all batches can be used to validate single batches using a Checkpoint. For example, we can run this checkpoint on new data when it comes in next month. In our example, let's validate a different table, `yellow_tripdata_sample_2020_02`, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020`." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Now the ExpectationSuite we built using all batches can be used to validate single batches using a Checkpoint. For example, we can run this checkpoint on new data when it comes in next month. In our example, let's validate a different table, `yellow_tripdata_sample_2020_02`, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020`." | |
"Now the ExpectationSuite we built using all batches can be used to validate single batches using a Checkpoint. For example, we can run this checkpoint on new data when it comes in next month. In our example, let's validate a different batch from February 2020, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020`." |
"The signature of the `InferredAssetSqlDataConnector` also contains the following required parameters:\n", | ||
"- `name`: string describing the name of this DataConnector.\n", | ||
"- `datasource_name`: the name of the Datasource that contains it.\n", | ||
"- `execution_engine`: the type of ExecutionEngine to use\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"- `execution_engine`: the type of ExecutionEngine to use\n", | |
"- `execution_engine`: the type of ExecutionEngine to use.\n", |
"- `excluded_tables`: A list of tables to ignore when inferring data asset_names\n", | ||
"- `included_tables`: If not `None`, only include tables in this list when inferring data asset_names\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"- `excluded_tables`: A list of tables to ignore when inferring data asset_names\n", | |
"- `included_tables`: If not `None`, only include tables in this list when inferring data asset_names\n", | |
"- `excluded_tables`: A list of tables to ignore when inferring data asset_names.\n", | |
"- `included_tables`: If not `None`, only include tables in this list when inferring data asset_names.\n", |
"- `included_tables`: If not `None`, only include tables in this list when inferring data asset_names\n", | ||
"- `skip_inapplicable_tables`: If `True`, tables that can't be successfully queried using sampling and splitter methods are excluded from inferred data_asset_names. If `False`, the class will throw an error during initialization if any such tables are encountered.\n", | ||
"- `batch_spec_passthrough`: dictionary with keys that will be added directly to batch_spec.\n", | ||
"- `introspection_directives`: Arguments passed to the introspection method to guide introspection\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"- `introspection_directives`: Arguments passed to the introspection method to guide introspection\n", | |
"- `introspection_directives`: Arguments passed to the introspection method to guide introspection.\n", |
…i-batch-notebook * develop: [MAINTENANCE] Enhance unit tests for ExpectationSuite.isEquivalentTo (#5979)
* develop: [BUGFIX] Fix failing `run_profiler_notebook` test (#5983) [FEATURE] Refactor PartitionParameterBuilder into dedicated ValueCountsParameterBuilder and HistogramParameterBuilder (#5975) [MAINTENANCE] Add reverse assertion for isEquivalentTo tests (#5982) [MAINTENANCE] Update to MultiBatch Notebook to include Inferred - Sql (#5958) [MAINTENANCE] Update to MultiBatch Notebook to include Configured - Sql (#5945) [MAINTENANCE] Remove unused fixtures from test suite (#5965) [MAINTENANCE] Enhance unit tests for ExpectationSuite.isEquivalentTo (#5979) [MAINTENANCE] Unit tests for `CheckpointStore` (#5967) [FEATURE] do not require expectation_suite_name in DataAssistantResult.show_expectations_by...() methods (#5976) [MAINTENANCE] Updated release schedule (#5977) [BUGFIX] Addresses issue with ExpectCompoundColumnsToBeUnique renderer (#5970) [MAINTENANCE] Expectation suite new unit tests for add_citation (#5966) [MAINTENANCE] Expectation suite init unit tests + types (#5957) [MAINTENANCE] DatasourceStore refactoring (#5941)
Changes proposed in this pull request:
InferredAssetSqlDataConnector
.postgres
). Notebooks takes tables corresponding to 2020 Taxi data and do the following:DataConnector
with 2assets
that facilitate loading the data as a singleBatch
or split into multipleBatches
.BatchRequests
that can load data as single or multiple batchesbatch_list
can be used withself-initializing
Expectations to estimate parameters.ExpectationSuite
can be used to run aSimpleCheckpoint
.Note
Definition of Done