Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] Make partitioning directives of PartitionParameterBuilder configurable #5810

Conversation

alexsherstinsky
Copy link
Contributor

Scope

  • We expose partitioning directives (i.e. bins, n_bins, and allow_relative_error) -- with same defaults as used inside the underlying "column.partition" metric -- at the PartitionParameterBuilder constructor level. This achieves two benefits:
    A more general applicability of PartitionParameterBuilder (no arguments are hard-coded); and
    The given defaults insure fast operation and thus improve the performance of OnboardingDataAssistant.
  • Test richer configuration of PartitionParameterBuilder to demonstrate the more customizable behavior.
  • Verify speedup of OnboardingDataAssistant in a Jupyter notebook.

Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], or [MAINTENANCE]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.

Changes proposed in this pull request:

  • JIRA: GREAT-761/GREAT-1199

After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in github issues or slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123).

Previous Design Review notes:

Definition of Done

Please delete options that are not relevant.

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

Thank you for submitting!

Alex Sherstinsky added 30 commits August 15, 2022 18:08
…sky/core/util/enable_iddict_to_support_comparison_operations-2022_08_10-213
…stinsky/core/json_serialize_row_condition_and_enable_iddict_to_support_comparison_operations-2022_08_10-213' into develop
…rule_based_profiler/performance_improvements-2022_08_22-218' into feature/GREAT-761/GREAT-1196/alexsherstinsky/rule_based_profiler/parameter_builder/support_metric_multi_and_single_batch_parameter_builder-2022_08_22-218
…exsherstinsky/rule_based_profiler/performance_improvements-2022_08_22-218
…rule_based_profiler/performance_improvements-2022_08_22-218' into feature/GREAT-761/GREAT-1196/alexsherstinsky/rule_based_profiler/parameter_builder/support_metric_multi_and_single_batch_parameter_builder-2022_08_22-218
…_profiler/parameter_builder/support_metric_multi_and_single_batch_parameter_builder-2022_08_22-218' into pre_pr-prototype/maintenance/GREAT-761/alexsherstinsky/rule_based_profiler/performance_improvements-2022_08_22-218
…1/GREAT-1196/alexsherstinsky/rule_based_profiler/parameter_builder/support_metric_multi_and_single_batch_parameter_builder-2022_08_22-218
Alex Sherstinsky and others added 16 commits August 22, 2022 14:32
…sky/rule_based_profiler/parameter_builder/support_metric_multi_and_single_batch_parameter_builder-2022_08_22-218
…_profiler/parameter_builder/support_metric_multi_and_single_batch_parameter_builder-2022_08_22-218' into pre_pr-prototype/maintenance/GREAT-761/alexsherstinsky/rule_based_profiler/performance_improvements-2022_08_22-218
…emoving calls to copy.deepcopy() in ValidationGraph
…ore/metrics/remove_str_calls_from_iddict_remove_deepcopy_calls_from_validation_graph-2022_08_22-219
…stinsky/core/metrics/remove_str_calls_from_iddict_remove_deepcopy_calls_from_validation_graph-2022_08_22-219
…exsherstinsky/rule_based_profiler/performance_improvements-2022_08_22-218
…etrics/remove_str_calls_from_iddict_remove_deepcopy_calls_from_validation_graph-2022_08_22-219' into pre_pr-prototype/maintenance/GREAT-761/alexsherstinsky/rule_based_profiler/performance_improvements-2022_08_22-218
…98/alexsherstinsky/core/metrics/remove_str_calls_from_iddict_remove_deepcopy_calls_from_validation_graph-2022_08_22-219' into pre_pr-prototype/maintenance/GREAT-761/alexsherstinsky/rule_based_profiler/performance_improvements-2022_08_22-218
… are involved. With the change to PartitionParameterBuilder, IDDict can remain the same.
…stinsky/core/metrics/remove_str_calls_from_iddict_remove_deepcopy_calls_from_validation_graph-2022_08_22-219
…exsherstinsky/rule_based_profiler/performance_improvements-2022_08_22-218
…etrics/remove_str_calls_from_iddict_remove_deepcopy_calls_from_validation_graph-2022_08_22-219' into pre_pr-prototype/maintenance/GREAT-761/alexsherstinsky/rule_based_profiler/performance_improvements-2022_08_22-218
@netlify
Copy link

netlify bot commented Aug 23, 2022

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit a7fed09
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/6304e11f9f6c000009f9df88
😎 Deploy Preview https://deploy-preview-5810--niobium-lead-7998.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@alexsherstinsky alexsherstinsky requested a review from a team August 23, 2022 02:38
@alexsherstinsky alexsherstinsky enabled auto-merge (squash) August 23, 2022 02:39
@ghost
Copy link

ghost commented Aug 23, 2022

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

…ule_based_profiler/parameter_builder/make_partition_parameter_builder_configurable_for_column_partition_options_bins_n_bins_allow_relative_error-2022_08_22-220
…ule_based_profiler/parameter_builder/make_partition_parameter_builder_configurable_for_column_partition_options_bins_n_bins_allow_relative_error-2022_08_22-220
Copy link
Contributor

@NathanFarmer NathanFarmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexsherstinsky alexsherstinsky merged commit 19a3ee8 into develop Aug 23, 2022
@alexsherstinsky alexsherstinsky deleted the maintenance/GREAT-761/GREAT-1199/alexsherstinsky/rule_based_profiler/parameter_builder/make_partition_parameter_builder_configurable_for_column_partition_options_bins_n_bins_allow_relative_error-2022_08_22-220 branch August 23, 2022 14:44
Shinnnyshinshin pushed a commit that referenced this pull request Aug 23, 2022
…k-spark

* develop: (21 commits)
  [MAINTENANCE] Write E2E Cloud test for RuleBasedProfiler creation and retrieval
  [MAINTENANCE] Make partitioning directives of PartitionParameterBuilder configurable (#5810)
  Add vectorized is_between for common numpy dtypes (#5711)
  [MAINTENANCE] Remove "copy.deepcopy()" calls from ValidationGraph (#5809)
  [FEATURE] Inline `ExpectationSuite` Rendering (#5726)
  [FEATURE] Support single-batch mode in MetricMultiBatchParameterBuilder (#5808)
  Fix how to create custom table expectation (#5807)
  [MAINTENANCE] Remove `ge_cloud_id` from `DataContext.add_profiler()` signature (#5804)
  Refactor convert_dictionary_to_parameter_node (#5805)
  [MAINTENANCE] Clean up `ge_cloud_id` reference from `DataContext` `ExpectationSuite` CRUD (#5791)
  Add v2_api flag for v2_api related tests (#5803)
  [MAINTENANCE] Refactor `save_profiler` to remove explicit `name` and `ge_cloud_id` args (#5792)
  [FEATURE] Enhance execution time measurement utility, and save `DomainBuilder` execution time per Rule of Rule-Based Profiler (#5796)
  [FEATURE] `query.pair_column` Metric (#5743)
  [MAINTENANCE] build-gallery enhancements (#5616)
  Remove xfail markers on cloud tests (#5793)
  Add square brackets to make the SA select work (#5780)
  [RELEASE] 0.15.19 (#5785)
  [MAINTENANCE] Temporarily xfail E2E Cloud tests due to Azure env var issues
  [BUGFIX] Enable "DataAssistantResult.__repr__()" to work properly in its subclasses. (#5786)
  ...
Shinnnyshinshin pushed a commit that referenced this pull request Aug 23, 2022
…k-pandas

* develop: (21 commits)
  [MAINTENANCE] Write E2E Cloud test for RuleBasedProfiler creation and retrieval
  [MAINTENANCE] Make partitioning directives of PartitionParameterBuilder configurable (#5810)
  Add vectorized is_between for common numpy dtypes (#5711)
  [MAINTENANCE] Remove "copy.deepcopy()" calls from ValidationGraph (#5809)
  [FEATURE] Inline `ExpectationSuite` Rendering (#5726)
  [FEATURE] Support single-batch mode in MetricMultiBatchParameterBuilder (#5808)
  Fix how to create custom table expectation (#5807)
  [MAINTENANCE] Remove `ge_cloud_id` from `DataContext.add_profiler()` signature (#5804)
  Refactor convert_dictionary_to_parameter_node (#5805)
  [MAINTENANCE] Clean up `ge_cloud_id` reference from `DataContext` `ExpectationSuite` CRUD (#5791)
  Add v2_api flag for v2_api related tests (#5803)
  [MAINTENANCE] Refactor `save_profiler` to remove explicit `name` and `ge_cloud_id` args (#5792)
  [FEATURE] Enhance execution time measurement utility, and save `DomainBuilder` execution time per Rule of Rule-Based Profiler (#5796)
  [FEATURE] `query.pair_column` Metric (#5743)
  [MAINTENANCE] build-gallery enhancements (#5616)
  Remove xfail markers on cloud tests (#5793)
  Add square brackets to make the SA select work (#5780)
  [RELEASE] 0.15.19 (#5785)
  [MAINTENANCE] Temporarily xfail E2E Cloud tests due to Azure env var issues
  [BUGFIX] Enable "DataAssistantResult.__repr__()" to work properly in its subclasses. (#5786)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants