Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Replace UserConfigurableProfiler with OnboardingDataAssistant in "CLI suite new --profile" Jupyter Notebooks #5236

Conversation

alexsherstinsky
Copy link
Contributor

Scope

  • Replace UserConfigurableProfiler with OnboardingDataAssistant run() invocation in Jupyter notebooks associated with executing "CLI suite new --profile" on the command line.
  • Update unit tests.
  • Clean up BaseNotebookRenderer interfaces and extend these corrections to the subclasses thereof.
  • Fix the bug in processing the column-type directives, allowed as part of DataAssistant run() invocations.
  • Fix type hints usage in CLI-Suite implementation.
  • Replace UserConfigurableProfiler with OnboardingDataAssistant run() invocation in DocuSaurus "Getting Started" Jupyter notebook.
  • Minor clean up (typos, etc.).

Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], or [MAINTENANCE]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.

Changes proposed in this pull request:

  • JIRA: GREAT-761/GREAT-932

After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in github issues or slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123).

Previous Design Review notes:

Definition of Done

Please delete options that are not relevant.

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

Thank you for submitting!

Alex Sherstinsky added 3 commits June 2, 2022 16:37
…ky/rule_based_profiler/replace_user_configurable_profiler_with_onboarding_data_assistant_in_cli_suite_new-2022_06_02-158
@netlify
Copy link

netlify bot commented Jun 3, 2022

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit 3616e8f
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/6299af584a0dd10008b2ade1
😎 Deploy Preview https://deploy-preview-5236--niobium-lead-7998.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@ghost
Copy link

ghost commented Jun 3, 2022

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

Alex Sherstinsky added 2 commits June 2, 2022 17:35
@alexsherstinsky alexsherstinsky enabled auto-merge (squash) June 3, 2022 00:44
Copy link
Contributor

@Shinnnyshinshin Shinnnyshinshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thank you sir! Appreciate the synchronous review.

@@ -3,8 +3,8 @@
import great_expectations as ge
from great_expectations.checkpoint import SimpleCheckpoint
from great_expectations.core.batch import BatchRequest
from great_expectations.profile.user_configurable_profiler import (
UserConfigurableProfiler,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

batch_request: Optional[
Union[str, Dict[str, Union[str, int, Dict[str, Any]]]]
] = None,
batch_request: Optional[Union[str, Dict[str, Any]]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This brings me joy

Union[str, Dict[str, Union[str, int, Dict[str, Any]]]]
] = None,
) -> nbformat.NotebookNode:
def render(self, **kwargs: dict) -> nbformat.NotebookNode:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are kwargs because subclasses require different parameters

meta={
"profiler_details": f"{column_values_attribute_mean_unexpected_value_multi_batch_parameter_builder_for_validations.fully_qualified_parameter_name}.{FULLY_QUALIFIED_PARAMETER_NAME_METADATA_KEY}",
},
)

# Step-5: Instantiate and return "Rule" object, comprised of "variables", "domain_builder", "parameter_builders", and "expectation_configuration_builders" components.

variables: dict = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the variable that has been introduced :) Thank you for this

@@ -116,7 +116,7 @@ def _build_parameters(

one_batch_table_columns_names_value: MetricValue
multi_batch_table_columns_names_sets_as_list: List[Set[str]] = [
set(one_batch_table_columns_names_value.tolist())
set(one_batch_table_columns_names_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already a list :) bugfixed

semantic_types_dict=None,
table_expectations_only=False,
value_set_threshold="MANY",
data_assistant_result: DataAssistantResult = context.assistants.onboarding.run(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now includes all settings, including settings that are commented out. To be used as a starting point for documentation.

value_set_threshold="MANY",
)
suite = profiler.build_suite()
data_assistant_result: DataAssistantResult = context.assistants.onboarding.run(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar story here. Including all settings even if they are not used.. to help documentation.

@alexsherstinsky alexsherstinsky enabled auto-merge (squash) June 3, 2022 01:10
Alex Sherstinsky added 2 commits June 2, 2022 22:29
@alexsherstinsky alexsherstinsky enabled auto-merge (squash) June 3, 2022 05:33
@alexsherstinsky alexsherstinsky enabled auto-merge (squash) June 3, 2022 06:51
@alexsherstinsky alexsherstinsky merged commit 141dc2f into develop Jun 3, 2022
@alexsherstinsky alexsherstinsky deleted the feature/GREAT-761/GREAT-932/alexsherstinsky/rule_based_profiler/replace_user_configurable_profiler_with_onboarding_data_assistant_in_cli_suite_new-2022_06_02-158 branch June 3, 2022 08:05
Shinnnyshinshin pushed a commit that referenced this pull request Jun 7, 2022
…https://github.com/great-expectations/great_expectations into feature/GREAT-953/migration-work-to-abc-data-context

* 'feature/GREAT-953/migration-work-to-abc-data-context' of https://github.com/great-expectations/great_expectations:
  undo commit 2b0ca85 (#5253)
  Don't use jupyter-client 7.3.2 (#5252)
  Unpin cryptography upper bound (#5249)
  [DOCS] In the reference section of the ToC remove duplicates and update category pages  (#5248)
  Fix Links (#5246)
  - updates formatting of class docs created by the auto API docs script to use bullet points for listing links to Public Methods. (#5247)
  [FEATURE] add new expectation on validating hexadecimals (#5188)
  [DOCS] Auto API documentation script (#4964)
  [BUGFIX] Hive temporary tables creation fix (#4956)
  [FEATURE] `DatasourceStore` (#5206)
  [FEATURE] Replace UserConfigurableProfiler with OnboardingDataAssistant in "CLI suite new --profile" Jupyter Notebooks (#5236)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants