[MAINTENANCE] Update to MultiBatch Notebook to include Inferred - Sql #5958

Shinnnyshinshin · 2022-09-08T01:51:16Z

Changes proposed in this pull request:

Follow-up to PR [MAINTENANCE] Certify InferredAssetSqlDataConnector and ConfiguredAssetSqlDataConnector #5847 which certified InferredAssetSqlDataConnector.
Multi-Batch Example Notebook for SqlDataConnector (postgres). Notebooks takes tables corresponding to 2020 Taxi data and do the following:
- Adds Datasource that contains 1 DataConnector with 2 assets that facilitate loading the data as a single Batch or split into multiple Batches.
- Provides examples of BatchRequests that can load data as single or multiple batches
- Shows how the resulting batch_list can be used with self-initializing Expectations to estimate parameters.
- Shows how the resulting ExpectationSuite can be used to run a SimpleCheckpoint.

Note

Added Appendix to the end of notebook which describes parameters and their function.

Definition of Done

My code follows the Great Expectations style guide
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added unit tests where applicable and made sure that new and existing tests are passing.
I have run any local integration tests and made sure that nothing is broken.

netlify · 2022-09-08T01:51:21Z

✅ Deploy Preview for niobium-lead-7998 ready!

Name	Link
🔨 Latest commit	`86311a1`
🔍 Latest deploy log	https://app.netlify.com/sites/niobium-lead-7998/deploys/631b739d9db7970009cc30d8
😎 Deploy Preview	https://deploy-preview-5958--niobium-lead-7998.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

ghost · 2022-09-08T01:52:26Z

👇 Click on the image for a new way to code review

Make big changes easier — review code in small groups of related files
Know where to start — see the whole change at a glance
Take a code tour — explore the change with an interactive tour
Make comments and review — all fully sync’ed with github

Try it now!

Legend

anthonyburdi

This is great! Thank you for clearly laying this out. I have some questions and suggested changes. Happy to discuss if helpful.

...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb

anthonyburdi · 2022-09-08T20:49:39Z

...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb

+        {
+          "data": {
+            "text/plain": [
+              "[<great_expectations.core.batch.Batch at 0x7f7d61cd5f40>,\n",


Why is this showing 12 batches if it is called a single_batch_batch_request?

anthonyburdi · 2022-09-08T21:04:25Z

...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb

+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "validator.save_expectation_suite()"


Maybe we should look at the expectation stored in the suite here (and see the params stored, discuss that they are from the min/max median value of all batches)?

anthonyburdi · 2022-09-08T21:05:57Z

...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb

+      "id": "477381f5",
+      "metadata": {},
+      "source": [
+        "Now the ExpectationSuite can be used to validate single batches using a Checkpoint. In our example, let's validate a different table, `yellow_tripdata_sample_2020_02`, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020_01`."


Not sure where these tables are coming from - I thought we only had one table yellow_tripdata_sample_2020 and we are pulling different sets of batches from that?

anthonyburdi · 2022-09-08T21:08:33Z

...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb

+      "id": "477381f5",
+      "metadata": {},
+      "source": [
+        "Now the ExpectationSuite can be used to validate single batches using a Checkpoint. In our example, let's validate a different table, `yellow_tripdata_sample_2020_02`, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020_01`."


Suggested change

"Now the ExpectationSuite can be used to validate single batches using a Checkpoint. In our example, let's validate a different table, `yellow_tripdata_sample_2020_02`, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020_01`."

"Now the ExpectationSuite we built using all batches can be used to validate single batches using a Checkpoint. For example, we can run this checkpoint on new data when it comes in next month. In our example, let's validate a different table, `yellow_tripdata_sample_2020_02`, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020_01`."

anthonyburdi · 2022-09-08T21:18:05Z

...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb

+        {
+          "data": {
+            "text/plain": [
+              "False"


Shouldn't this be True if we created the expectation based on all of the batches?

anthonyburdi · 2022-09-08T21:18:29Z

...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb

+        "The signature of the `InferredAssetSqlDataConnector` also contains the following required parameters:\n",
+        "- `name`: string describing the name of this DataConnector.\n",
+        "- `datasource_name`: the name of the Datasource that contains it.\n",
+        "- `execution_engine`: an ExecutionEngine.\n",


Suggested change

"- `execution_engine`: an ExecutionEngine.\n",

"- `execution_engine`: The type of ExecutionEngine to use.\n",

…i-batch-notebook * develop: [Maintenance] Randomize the non-comprehensive tests (#5968) [RELEASE] 0.15.22 (#5973) [BUGFIX] Spark column.distinct_values no longer returns entire table distinct values (#5969) [MAINTENANCE] Use DataContext to ignore progress bars (#5959) [MAINTENANCE] Bump `Marshmallow` upper bound to work with Airflow operator (#5952) [MAINTENANCE] Add x-fails to flaky Cloud tests for purposes of 0.15.22 (#5964) [BUGFIX] Prevent "division by zero" errors in Rule-Based Profiler calculations when Batch has zero rows (#5960) [FEATURE] Improve slack error condition (#5818)

…i-batch-notebook

…ok' of https://github.com/great-expectations/great_expectations into m/GREAT-1226/GREAT-1228/inferred-sql-multi-batch-notebook * 'm/GREAT-1226/GREAT-1228/inferred-sql-multi-batch-notebook' of https://github.com/great-expectations/great_expectations: [MAINTENANCE] Expectation suite new unit tests for add_citation (#5966) [MAINTENANCE] Expectation suite init unit tests + types (#5957) [MAINTENANCE] DatasourceStore refactoring (#5941)

…i-batch-notebook

anthonyburdi

LGTM! Just some tiny nits

anthonyburdi · 2022-09-09T16:11:51Z

...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb

+   "metadata": {},
+   "source": [
+    "### Typical Workflow\n",
+    "A `batch_list` becomes really useful when you are calculating parameters for auto-initializing Expectations, as they us a `RuleBasedProfiler` under-the-hood to calculate parameters."


Suggested change

"A `batch_list` becomes really useful when you are calculating parameters for auto-initializing Expectations, as they us a `RuleBasedProfiler` under-the-hood to calculate parameters."

"A `batch_list` becomes really useful when you are calculating parameters for auto-initializing Expectations, as they use a `RuleBasedProfiler` under-the-hood to calculate parameters."

anthonyburdi · 2022-09-09T16:14:42Z

...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb

+   "id": "f18d096f",
+   "metadata": {},
+   "source": [
+    "The observed value for our `yellow_tripdata_sample_2020` table where `trip_distance` is going to be `1.75`, which means the Expectation fails. We guessed wrong - but we can do better!\""


Suggested change

"The observed value for our `yellow_tripdata_sample_2020` table where `trip_distance` is going to be `1.75`, which means the Expectation fails. We guessed wrong - but we can do better!\""

"The observed value for our `yellow_tripdata_sample_2020` table where `trip_distance` is going to be `1.75`, which means the Expectation fails. We guessed wrong - but we can do better!"

anthonyburdi · 2022-09-09T16:17:04Z

...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb

+   "id": "477381f5",
+   "metadata": {},
+   "source": [
+    "Now the ExpectationSuite we built using all batches can be used to validate single batches using a Checkpoint. For example, we can run this checkpoint on new data when it comes in next month. In our example, let's validate a different table, `yellow_tripdata_sample_2020_02`, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020`."


Suggested change

"Now the ExpectationSuite we built using all batches can be used to validate single batches using a Checkpoint. For example, we can run this checkpoint on new data when it comes in next month. In our example, let's validate a different table, `yellow_tripdata_sample_2020_02`, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020`."

"Now the ExpectationSuite we built using all batches can be used to validate single batches using a Checkpoint. For example, we can run this checkpoint on new data when it comes in next month. In our example, let's validate a different batch from February 2020, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020`."

anthonyburdi · 2022-09-09T16:18:01Z

...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb

+    "The signature of the `InferredAssetSqlDataConnector` also contains the following required parameters:\n",
+    "- `name`: string describing the name of this DataConnector.\n",
+    "- `datasource_name`: the name of the Datasource that contains it.\n",
+    "- `execution_engine`: the type of ExecutionEngine to use\n",


Suggested change

"- `execution_engine`: the type of ExecutionEngine to use\n",

"- `execution_engine`: the type of ExecutionEngine to use.\n",

anthonyburdi · 2022-09-09T16:18:37Z

...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb

+    "- `excluded_tables`: A list of tables to ignore when inferring data asset_names\n",
+    "- `included_tables`: If not `None`, only include tables in this list when inferring data asset_names\n",


Suggested change

"- `excluded_tables`: A list of tables to ignore when inferring data asset_names\n",

"- `included_tables`: If not `None`, only include tables in this list when inferring data asset_names\n",

"- `excluded_tables`: A list of tables to ignore when inferring data asset_names.\n",

"- `included_tables`: If not `None`, only include tables in this list when inferring data asset_names.\n",

anthonyburdi · 2022-09-09T16:18:48Z

...xtures/rule_based_profiler/example_notebooks/MultiBatchExample_InferredAssetSQLExample.ipynb

+    "- `included_tables`: If not `None`, only include tables in this list when inferring data asset_names\n",
+    "- `skip_inapplicable_tables`:  If `True`, tables that can't be successfully queried using sampling and splitter methods are excluded from inferred data_asset_names. If `False`, the class will throw an error during initialization if any such tables are encountered.\n",
+    "- `batch_spec_passthrough`: dictionary with keys that will be added directly to batch_spec.\n",
+    "- `introspection_directives`: Arguments passed to the introspection method to guide introspection\n",


Suggested change

"- `introspection_directives`: Arguments passed to the introspection method to guide introspection\n",

"- `introspection_directives`: Arguments passed to the introspection method to guide introspection.\n",

…i-batch-notebook * develop: [MAINTENANCE] Enhance unit tests for ExpectationSuite.isEquivalentTo (#5979)

…i-batch-notebook

* develop: [BUGFIX] Fix failing `run_profiler_notebook` test (#5983) [FEATURE] Refactor PartitionParameterBuilder into dedicated ValueCountsParameterBuilder and HistogramParameterBuilder (#5975) [MAINTENANCE] Add reverse assertion for isEquivalentTo tests (#5982) [MAINTENANCE] Update to MultiBatch Notebook to include Inferred - Sql (#5958) [MAINTENANCE] Update to MultiBatch Notebook to include Configured - Sql (#5945) [MAINTENANCE] Remove unused fixtures from test suite (#5965) [MAINTENANCE] Enhance unit tests for ExpectationSuite.isEquivalentTo (#5979) [MAINTENANCE] Unit tests for `CheckpointStore` (#5967) [FEATURE] do not require expectation_suite_name in DataAssistantResult.show_expectations_by...() methods (#5976) [MAINTENANCE] Updated release schedule (#5977) [BUGFIX] Addresses issue with ExpectCompoundColumnsToBeUnique renderer (#5970) [MAINTENANCE] Expectation suite new unit tests for add_citation (#5966) [MAINTENANCE] Expectation suite init unit tests + types (#5957) [MAINTENANCE] DatasourceStore refactoring (#5941)

adding inferredassetsql example notebook

d47fa4f

github-actions bot added the core-team label Sep 8, 2022

Shinnnyshinshin self-assigned this Sep 8, 2022

Shinnnyshinshin requested a review from a team September 8, 2022 01:52

anthonyburdi requested changes Sep 8, 2022

View reviewed changes

anthonyburdi mentioned this pull request Sep 8, 2022

[MAINTENANCE] Update to MultiBatch Notebook to include Configured - Sql #5945

Merged

6 tasks

Will Shin added 6 commits September 8, 2022 17:13

adding after review

715152b

Merge branch 'develop' into m/GREAT-1226/GREAT-1228/inferred-sql-mult…

4639fed

…i-batch-notebook

updated last bit

856a94f

Merge branch 'develop' into m/GREAT-1226/GREAT-1228/inferred-sql-mult…

d3e1cbe

…i-batch-notebook

anthonyburdi approved these changes Sep 9, 2022

View reviewed changes

Will Shin added 3 commits September 9, 2022 09:45

Merge branch 'develop' into m/GREAT-1226/GREAT-1228/inferred-sql-mult…

895e3b1

…i-batch-notebook * develop: [MAINTENANCE] Enhance unit tests for ExpectationSuite.isEquivalentTo (#5979)

update after review

2ab00b3

Merge branch 'develop' into m/GREAT-1226/GREAT-1228/inferred-sql-mult…

86311a1

…i-batch-notebook

Shinnnyshinshin enabled auto-merge (squash) September 9, 2022 17:10

Shinnnyshinshin merged commit 8f0eb08 into develop Sep 9, 2022

Shinnnyshinshin deleted the m/GREAT-1226/GREAT-1228/inferred-sql-multi-batch-notebook branch September 9, 2022 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAINTENANCE] Update to MultiBatch Notebook to include Inferred - Sql #5958

[MAINTENANCE] Update to MultiBatch Notebook to include Inferred - Sql #5958

Shinnnyshinshin commented Sep 8, 2022 •

edited

Loading

netlify bot commented Sep 8, 2022 •

edited

Loading

ghost commented Sep 8, 2022 •

edited by ghost

Loading

anthonyburdi left a comment

anthonyburdi Sep 8, 2022

anthonyburdi Sep 8, 2022

anthonyburdi Sep 8, 2022

anthonyburdi Sep 8, 2022

anthonyburdi Sep 8, 2022

anthonyburdi Sep 8, 2022

anthonyburdi left a comment

anthonyburdi Sep 9, 2022

anthonyburdi Sep 9, 2022

anthonyburdi Sep 9, 2022

anthonyburdi Sep 9, 2022

anthonyburdi Sep 9, 2022

anthonyburdi Sep 9, 2022

	"Now the ExpectationSuite can be used to validate single batches using a Checkpoint. In our example, let's validate a different table, `yellow_tripdata_sample_2020_02`, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020_01`."
	"Now the ExpectationSuite we built using all batches can be used to validate single batches using a Checkpoint. For example, we can run this checkpoint on new data when it comes in next month. In our example, let's validate a different table, `yellow_tripdata_sample_2020_02`, using the `ExpectationSuite` we built from `yellow_tripdata_sample_2020_01`."

	"- `execution_engine`: an ExecutionEngine.\n",
	"- `execution_engine`: The type of ExecutionEngine to use.\n",

	"A `batch_list` becomes really useful when you are calculating parameters for auto-initializing Expectations, as they us a `RuleBasedProfiler` under-the-hood to calculate parameters."
	"A `batch_list` becomes really useful when you are calculating parameters for auto-initializing Expectations, as they use a `RuleBasedProfiler` under-the-hood to calculate parameters."

	"The observed value for our `yellow_tripdata_sample_2020` table where `trip_distance` is going to be `1.75`, which means the Expectation fails. We guessed wrong - but we can do better!\""
	"The observed value for our `yellow_tripdata_sample_2020` table where `trip_distance` is going to be `1.75`, which means the Expectation fails. We guessed wrong - but we can do better!"

		"- `excluded_tables`: A list of tables to ignore when inferring data asset_names\n",
		"- `included_tables`: If not `None`, only include tables in this list when inferring data asset_names\n",

	"- `introspection_directives`: Arguments passed to the introspection method to guide introspection\n",
	"- `introspection_directives`: Arguments passed to the introspection method to guide introspection.\n",

[MAINTENANCE] Update to MultiBatch Notebook to include Inferred - Sql #5958

[MAINTENANCE] Update to MultiBatch Notebook to include Inferred - Sql #5958

Conversation

Shinnnyshinshin commented Sep 8, 2022 • edited Loading

Changes proposed in this pull request:

Note

Definition of Done

netlify bot commented Sep 8, 2022 • edited Loading

✅ Deploy Preview for niobium-lead-7998 ready!

ghost commented Sep 8, 2022 • edited by ghost Loading

Legend

anthonyburdi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anthonyburdi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shinnnyshinshin commented Sep 8, 2022 •

edited

Loading

netlify bot commented Sep 8, 2022 •

edited

Loading

ghost commented Sep 8, 2022 •

edited by ghost

Loading