Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Implementing Fluent Datasources Support for Checkpoint #7697

Conversation

alexsherstinsky
Copy link
Contributor

@alexsherstinsky alexsherstinsky commented Apr 22, 2023

Scope

  • Regular Checkpoint now supports Fluent Datasources (including Usage Statistics)
  • Cleanup of typographical errors in Regular Checkpoint test module (for Block/V3 Datasources)

Next TODO

  • Verify SimpleCheckpoint supports for Fluent Datasources
  • Add tests for Checkpoint and SimpleCheckpoint utilizing Spark and SQL style Fluent Datasources.

Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], or [CONTRIB]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.

Changes proposed in this pull request:

  • JIRA: DX-469/DX-470

After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in GitHub issues or Slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123).

Previous Design Review notes:

Definition of Done

Please delete options that are not relevant.

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

Thank you for submitting!

Alex Sherstinsky and others added 20 commits April 20, 2023 09:18
…/link/assess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_19-4
…ess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_19-4
…heckpoint_operation_with_fluent_datasource_batch_request-2023_04_19-4' into maintenance/DX-469/DX-470/alexsherstinsky/link/assess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_20-5
…sherstinsky/link/assess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_19-4' into maintenance/DX-469/DX-470/alexsherstinsky/link/assess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_20-5
…/link/assess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_20-6
…/link/assess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_20-6
…heckpoint_operation_with_fluent_datasource_batch_request-2023_04_20-6' into maintenance/DX-469/DX-470/alexsherstinsky/link/assess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_20-7
@netlify
Copy link

netlify bot commented Apr 22, 2023

Deploy Preview for niobium-lead-7998 canceled.

Name Link
🔨 Latest commit 22c4eee
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/64443f9e59280f000880362d

@@ -349,7 +349,7 @@ def _convert_fluent_datasources_loaded_from_yaml_to_internal_object_representati
config: Dict[str, Any], _allow_empty: bool = False
) -> Dict[str, Any]:
if _FLUENT_DATASOURCES_KEY in config:
fluent_datasources: dict = config[_FLUENT_DATASOURCES_KEY]
fluent_datasources: dict = config[_FLUENT_DATASOURCES_KEY] or {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kilo59 This is needed, because great_expectations.yml may come with an empty fluent_datasources section.

@ghost
Copy link

ghost commented Apr 22, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@@ -15,6 +15,7 @@
BATCH_REQUEST_OPTIONAL_TOP_LEVEL_KEYS: Set[str] = {
"data_connector_query",
"runtime_parameters",
"options",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows Usage Statistics to work with Checkpoints using Fluent Datasources out of the box.

@@ -59,6 +59,7 @@ class _FilePathDataAsset(DataAsset):
"batching_regex", # file_path argument
"kwargs", # kwargs need to be unpacked and passed separately
"batch_metadata",
"connect_options",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kilo59 This is needed -- otherwise, instantiating an ExecutionEngine says that "options is not recognized by init().

@@ -25,6 +25,36 @@ def titanic_pandas_data_context_stats_enabled_and_expectation_suite_with_one_exp
return context


@pytest.fixture
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might embellish this fixture later (e.g., if additional Fluent Datasources and/or DataAssets are needed).

@@ -127,7 +127,7 @@ def test_basic_checkpoint_config_validation(
caplog,
capsys,
):
context: FileDataContext = empty_data_context_stats_enabled
data_context: FileDataContext = empty_data_context_stats_enabled
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are only beautification focused and correcting typographical errors. No logic has been changed.

@@ -0,0 +1,1164 @@
import logging
Copy link
Contributor Author

@alexsherstinsky alexsherstinsky Apr 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add tests for Checkpoint and SimpleCheckpoint utilizing Spark and SQL style Fluent Datasources.

@alexsherstinsky alexsherstinsky marked this pull request as ready for review April 22, 2023 01:36
@alexsherstinsky alexsherstinsky enabled auto-merge (squash) April 22, 2023 01:41
@alexsherstinsky alexsherstinsky requested review from a team April 22, 2023 01:42
github-actions bot and others added 3 commits April 22, 2023 14:42
…ess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_20-7
…ess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_20-7
donaldheppner and others added 8 commits April 22, 2023 13:27
…/link/assess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_20-7
…sherstinsky/link/assess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_20-7' into maintenance/DX-469/DX-470/alexsherstinsky/link/assess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_20-7
…sherstinsky/link/assess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_20-7' into maintenance/DX-469/DX-470/alexsherstinsky/link/assess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_20-7
…/link/assess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_20-7
@alexsherstinsky alexsherstinsky enabled auto-merge (squash) April 22, 2023 19:44
Alex Sherstinsky added 2 commits April 22, 2023 12:50
@alexsherstinsky alexsherstinsky force-pushed the maintenance/DX-469/DX-470/alexsherstinsky/link/assess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_20-7 branch from 4ce73ee to 22c4eee Compare April 22, 2023 20:12
@alexsherstinsky alexsherstinsky merged commit 3fe2ec4 into develop Apr 22, 2023
@alexsherstinsky alexsherstinsky deleted the maintenance/DX-469/DX-470/alexsherstinsky/link/assess_checkpoint_operation_with_fluent_datasource_batch_request-2023_04_20-7 branch April 22, 2023 21:09
Comment on lines -957 to -964
# Next two fields are for LegacyCheckpoint configuration
"validation_operator_name": validation_operator_name,
"batches": batches,
# the following four keys are used by SimpleCheckpoint
"site_names": site_names,
"slack_webhook": slack_webhook,
"notify_on": notify_on,
"notify_with": notify_with,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've subclassed a SimpleCheckpoint and this change has broken inheritance

@alexsherstinsky
Copy link
Contributor Author

@lukedyer-peak Thank you very much for reporting this. Is there any chance for you to update your class without too much difficulty? If so, that would be of great help; otherwise, please do let me know as soon as possible, and I will try to figure out a solution on our side. Again, apologies that this change broke your workflow, and please let me know. Thanks again.

@lukedyer-peak
Copy link
Contributor

Note: conversation moved to #7790

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants