Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] Defining Common Test Fixtures for DataAssistant Testing #4959

Conversation

alexsherstinsky
Copy link
Contributor

@alexsherstinsky alexsherstinsky commented Apr 26, 2022

Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], or [MAINTENANCE]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.

Changes proposed in this pull request:

  • JIRA: GREAT-735/GREAT-736 (prerequisite)

After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in github issues or slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123).

Previous Design Review notes:

Definition of Done

Please delete options that are not relevant.

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

Thank you for submitting!

Alex Sherstinsky added 2 commits April 26, 2022 07:59
…5/GREAT-736/alexsherstinsky/rule_based_profiler/data_assistant_to_be_registered_and_exercised_from_data_context-2022_04_25-107
@netlify
Copy link

netlify bot commented Apr 26, 2022

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit 2862a22
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/62680ef392d7e20009d123fb
😎 Deploy Preview https://deploy-preview-4959--niobium-lead-7998.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

Copy link
Member

@cdkini cdkini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@NathanFarmer NathanFarmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

…5/GREAT-736/alexsherstinsky/rule_based_profiler/data_assistant_to_be_registered_and_exercised_from_data_context-2022_04_25-107
@alexsherstinsky alexsherstinsky enabled auto-merge (squash) April 26, 2022 15:49
@alexsherstinsky alexsherstinsky merged commit b907c7d into develop Apr 26, 2022
@alexsherstinsky alexsherstinsky deleted the feature/GREAT-735/GREAT-736/alexsherstinsky/rule_based_profiler/data_assistant_to_be_registered_and_exercised_from_data_context-2022_04_25-107 branch April 26, 2022 16:13
cdkini added a commit that referenced this pull request Apr 28, 2022
* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>
cdkini added a commit that referenced this pull request May 5, 2022
…f `develop` and `main` (#5042)

* [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531)

* [BUGFIX] Moves testing dependencies out of core reqs (#4522)

* removing upper bound on mistune

* remove deprecated depedencies

* adds untracked dependency

* adds untracked dependency

* adds untracked dependency

* moving dependencies

* removes dependencies added to lite from core | adds missing dependencies

Co-authored-by: Chetan Kini <chetan@superconductive.com>

* [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547)

* [MAINTENANCE] Don't return from validate configuration methods (#4545)

* Add validate_configuration to 2 core Expectations that are passing all their tests

* Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests

* Update all validate_configuration methods to have type hints and return None

* Update all doc snippet references that were effected

* [DOCS] technical term tags connect to data cloud docs (#4414)

* - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.)
- Some additional editing was done to bring documents in line with the documentation and how-to guide standards.

* - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop.

* - Update to include technical term tags. (#4462)

- Minor updates to correct formatting and spelling issues.

* - Moved docs related to contributing integrations under contributing in the ToC (#4551)

- Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term).

* - Adds new image files for the intro page (#4540)

- Updates the image file link for the overview image on the intro page

* [DOCS] clarifications on execution engines and scalability (#4539)

* - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines.

* - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas.

* - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines.

* [DOCS] technical terms for validate data advanced (#4535)

* - add support for technical term tags.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* [DOCS] technical terms for validate data actions docs (#4518)

* - Edits to bring docs up to documentation and how-to guide standards.

* - add technical term tags to documents.
- minor formatting edits (technical terms missing capitalization, etc).

* [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553)

* [MAINTENANCE] Refactor global `conftest` (#4534)

* chore: use black directives to temporarily disable linting

* chore: more black directives to temporarily disable linting

* chore: finish remaining

* refactor: start cleaning up conftest

* refactor: more refactoring of conftest

* refactor: even more refactoring of conftest

* [FEATURE] Improve diagnostic checklist details (#4548)

* Update library_metadata check to provide details when it doesn't pass

* In linting check, if snake_case doesn't match filename, show computed snake_case

* Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr

* Update convert_to_json_serializable to handle bytes

* Update build_gallery.py script to convert diagnostics to JSON in separate try/except

* Update build_gallery.py script to write expectation_library_v2.json file with indenting

* Update _check_input_validation to tell if custom assert statements are used in validate_configuration

* clean up (#4554)

* minor touch up (#4558)

* [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485)

* feat: init commit

* refactor: shift all logic over to base class

* feat: start impl of anonymize on Anonymizer

* feat: get ProfilerRunAnonymizer working

* refactor: remove constructor from ProfilerRunAnonymizer

* refactor: start on CheckpointRunAnonymizer

* fix: clean up broken checkpoint tests

* fix: ensure *args and **kwargs are propogated through

* refactor: start work on datasource anonymizers

* refactor: remove all anonymizers except Anonymizer from usage stats attrs

* fix: update isinstance checks

* refactor: move helper into checkpoint_run_anonymizer

* refactor: move helper into datasource_anonymizer

* refactor: make anonymize string private and place in strategy

* refactor: make anonymize batch info private and place in strategy

* refactor: move build_init_payload to Anonymizer

* refactor: make remainder of anonymize methods private

* refactor: add store info to strategy

* refactor: add dataconnector info to strategy

* refactor: consolidate profiler info and profiler run anonymization

* refactor: remove *args from signatures

* refactor: updates around checkpoint anonymization

* chore: misc cleanup of Anonymizer

* feat: final touch up before review

* chore: remove 'else' statements

* fix: ensure appropriate checkpoint method gets called

* chore: misc updates from review

* refactor: move init_payload back to usage stats

* chore: misc type hinting

* refactor: start using individual classes again

* chore: continue updating individual anonymizer classes

* feat: further updates to child classes

* feat: update anonymize_init_payload

* fix: get checkpoint payloads working

* refactor: ensure all methods have obj

* fix: misc fixes

* fix: make misc updates to conditional checks for obj

* refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer

* refactor: rename Checkpoint and Profiler anonymizers

* feat: leverage aggregate anonymizer downstream

* feature: conditionally create aggregate_anonymizer in constructor

* feat: add cache retrieve or instantiate util

* chore: add batch_request can_handle

* feat: ensure that salt has a default value in anonymizers

* refactor: require aggregate anonymizer in constructor

* refactor: instantiate all strategies in aggregate

* fix: fix broken tests

* refactor: rename internal getter

Co-authored-by: Don Heppner <donald.heppner@gmail.com>

* [MAINTENANCE] Remove duplicate mistune dependency

* [MAINTENANCE] Run PEP-273 checks on a schedule or release cut

* [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573)

* -Corrected the line references and added <snippet> tags to source code for Spark version of guide.

* -Corrected the line references and added <snippet> tags to source code for Pandas version of guide.

* -lint reformat w/black

* -correcting line numbers after lint formatting.

* [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546)

Usage stats instrumentation of package dependencies

* [MAINTENANCE] Add DevRel team to GitHub auto-label action

* [MAINTENANCE] Add GitHub action to conditionally auto-update PR's  (#4574)

* feat: add new action

* chore: add conditions

* [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577)

* chore: bump version

* chore: test change

* chore: update all instances of black

* chore: new test changes

* chore: revert test changes

* Update overview.md (#4556)

* Add missing links.
* Fix some typos
* Simplify flow and grammar in a few places

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* - corrected broken link in admonition box. (#4585)

- updated links in admonition box to point to current technical documentation rather than old core concepts documents.

* [MAINTENANCE] Minor clean-up (#4571)

Little bit of cleanup in our execution engine and validator

* [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590)

* fix : misconfigured ExpectationConfigurationBuilder

* pushing fix

* clean up before submitting for review

* bugfix : remove sorting

* remove extra line

* [MAINTENANCE] Instrument package dependencies (#4583)

* Add dependencies to data_context.__init__ event

* [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599)

* release candidate for 0.14.13

* revert to 0.14.12 state

* [RELEASE] 0.15.3 (#4981)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>

* chore: revert azure pipeline

* chore: revert more files

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
cdkini added a commit that referenced this pull request May 5, 2022
* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep (#4980)

* [FEATURE] Splitting data assets into batches using timestamp columns in spark (#4973)

* [BUGFIX] Use `monkeypatch` to ensure consistent bootstrap seed for additional probabilistic test (#4983)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* [FEATURE] Splitting data assets into batches using datetime columns in pandas (#4982)

* [BUGFIX] Patch the remainder of probabilistic `RuleBasedProfiler` tests with consistent bootstrap seed (#4989)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* fix: patch remaining tests

* chore: add docstring

* [FEATURE] Provide Semantic Type Domain Interpretation Utility For Use Within ParameterBuilder Classes (#4993)

* [MAINTENANCE] Splitter cleanup and enhancements (#4984)

* Update action.md (#4967)

Update action.md (#4967)

* [FEATURE] Add support for Interpolation Method for Quantile Statistic Used by Estimators in NumericMetricRangeMultiBatchParameterBuilder (#4997)

* [FEATURE] Enable self-initializing `ExpectColumnMeanToBeBetween` (#4986)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* [FEATURE] Enable self-initializing `ExpectColumnMedianToBeBetween` (#4987)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* chore: update GH action (#5001)

* [FEATURE] Enable self-initializing `ExpectColumnSumToBeBetween` (#4988)

* feat: init commit

* test: write integration test

* feat: add interpolation field

* [MAINTENANCE] Move `DataAssistant` registry capabilities into `DataAssistantRegistry` to enable user aliasing (#4991)

* refactor: move registry dict to dispatcher

* chore: misc cleanup

* chore: misc updates after review

* chore: misc cleanup

* chore: update error message

* Fix continuous partition example (#4939)

When calling json.dumps() method, the weights change.

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] RBP Profiling Dataset ProgressBar Fix #4999

* [MAINTENANCE] Preliminary refactors for data samplers. (#4996)

* MetricSingleBatchParameterBuilder with unit and integration tests. (#5003)

* [DOCS] Update slack notification guide to not use validation operators. (#4978)

* - Removed references to validation_operator
- Edited config yaml examples
- Grouped config options into tab groups for webhook vs app.

* - Added technical term tag to Checkpoint reference.
- Minor edit to document format.

* corrects number in step header

* [MAINTENANCE] Clean up unused imports and enforce through `flake8` in CI/CD (#5005)

* maintenance: clean up codebase

* chore: add to pipelines

* fix: ensure that flake8 is installed first

* chore: rename CI/CD stage

* chore: update type hint threshold

* parameter builder tests should utilize polymorphism (#5007)

* [MAINTENANCE] Clean up type hints in CLI (#5006)

* chore: first pass

* chore: more updates

* chore: more annotations

* chore: more annotations

* [FEATURE] Enable Pandas DataFrame and Series as MetricValues Output of Metric ParameterBuilder Classes (#5008)

* logging and exception handling (#5009)

* [FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

*[FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

* [FEATURE] Histogram/Partition Single-Batch ParameterBuilder (#5011)

* [FEATURE] Update `DataAssistantResult.plot()` return value to emit `PlotResult` wrapper dataclass (#4962)

* feat: init commit

* test: write all tests (except 1)

* test: write last test

* test: remove theme test

* refactor: add custom Chart dataclass

* chore: make Chart immutable

* chore: add docstring

* feat: misc updates per team convo

* chore: delete comments

* refactor: delete vconcat methods and consolidate helpers

* refactor: further consolidation through helpers

* chore: update subtitle styling

* chore: add padding

* [ENHANCEMENT] Condense column-level `vconcat` plots into one interactive plot (#5002)

* Clone vertically concatenated chart for interactive starting point

* Working interactive chart

* Update ColorPalette names

* Update ordinal palette

* Tooltip now working

* Change legend color

* Add y-axis titles

* Align y-axis vertically for both charts

* Add highlight line

* Change batch_id to Batch ID

* Improve legend title and tooltip titles

* New layer for starting point showing one line

* Detail title updating appropriately

* Column name shows up with empty selecdtion

* Use variable for alt.value(light_gray)

* Allow selection by mouseover on lines

* Anomaly encoded lines

* Move column seledtor to top left

* Format input_dropdown name

* Working with expectation kwargs

* Add predicate logic for strict_min and strict_max

* Add subtitle to prescriptive return charts

* WIP

* Overcame merge conflicts in descriptive mode

* Overcame merge conflicts in prescriptive mode

* Column charts are in their own list index

* [MAINTENANCE] Update version of black in pre-commit config

* [MAINTENANCE] Improve tooltips and formatting for distinct column values chart in VolumeDataAssistantResult (#5017)

* Correct type hints

* Improve tooltips

* Improve docstrings

* Fix return object indexing

* Return list length 1 instead of chart

* [FEATURE] Limit samplers work with supported sqlalchemy backends (#5014)

* [BUGFIX] Fix DataAssistantResult serialization issue (#5020)

* [MAINTENANCE] Enhance configuring serialization for DotDict type classes (#5023)

* [FEATURE] trino support (#5021)

* clean up SQL statements for handling subqueries properly

* use formal sqlalchemy for reflection

* TRINO WIP POC-COMPLETED

* Add trino package as a dependency and update imports

* Add docker-compose.yml for starburst database in assets/docker/

* Update _create_trino_engine to accept a hostname and schema_name

* Update get_dataset to have a block for trino

* Add ability to use data_alt in test_definitions JSON files (for Trino quirks)

* Minor update to get_test_validator_with_data to make debugging easier

* Add trino to sqla_keys dict in setup.py

* Update 3 test_definition json files with trino things

* Add table_selectable workaround for trino

* Add requirements-dev-trino to test_packaging.py

* Add trino to various azure-pipelines yml files

* Skip test_expectation__get_renderers

* Skip test__get_test_results

* Skip test__generate_expectation_tests__with_no_test_backends

Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>

* Pyarrow upper bound (#5028)

* release prep v0.15.4 (#5029)

* [MAINTENANCE] Use temporary branch to attemp to align git histories of `develop` and `main` (#5042)

* [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531)

* [BUGFIX] Moves testing dependencies out of core reqs (#4522)

* removing upper bound on mistune

* remove deprecated depedencies

* adds untracked dependency

* adds untracked dependency

* adds untracked dependency

* moving dependencies

* removes dependencies added to lite from core | adds missing dependencies

Co-authored-by: Chetan Kini <chetan@superconductive.com>

* [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547)

* [MAINTENANCE] Don't return from validate configuration methods (#4545)

* Add validate_configuration to 2 core Expectations that are passing all their tests

* Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests

* Update all validate_configuration methods to have type hints and return None

* Update all doc snippet references that were effected

* [DOCS] technical term tags connect to data cloud docs (#4414)

* - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.)
- Some additional editing was done to bring documents in line with the documentation and how-to guide standards.

* - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop.

* - Update to include technical term tags. (#4462)

- Minor updates to correct formatting and spelling issues.

* - Moved docs related to contributing integrations under contributing in the ToC (#4551)

- Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term).

* - Adds new image files for the intro page (#4540)

- Updates the image file link for the overview image on the intro page

* [DOCS] clarifications on execution engines and scalability (#4539)

* - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines.

* - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas.

* - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines.

* [DOCS] technical terms for validate data advanced (#4535)

* - add support for technical term tags.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* [DOCS] technical terms for validate data actions docs (#4518)

* - Edits to bring docs up to documentation and how-to guide standards.

* - add technical term tags to documents.
- minor formatting edits (technical terms missing capitalization, etc).

* [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553)

* [MAINTENANCE] Refactor global `conftest` (#4534)

* chore: use black directives to temporarily disable linting

* chore: more black directives to temporarily disable linting

* chore: finish remaining

* refactor: start cleaning up conftest

* refactor: more refactoring of conftest

* refactor: even more refactoring of conftest

* [FEATURE] Improve diagnostic checklist details (#4548)

* Update library_metadata check to provide details when it doesn't pass

* In linting check, if snake_case doesn't match filename, show computed snake_case

* Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr

* Update convert_to_json_serializable to handle bytes

* Update build_gallery.py script to convert diagnostics to JSON in separate try/except

* Update build_gallery.py script to write expectation_library_v2.json file with indenting

* Update _check_input_validation to tell if custom assert statements are used in validate_configuration

* clean up (#4554)

* minor touch up (#4558)

* [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485)

* feat: init commit

* refactor: shift all logic over to base class

* feat: start impl of anonymize on Anonymizer

* feat: get ProfilerRunAnonymizer working

* refactor: remove constructor from ProfilerRunAnonymizer

* refactor: start on CheckpointRunAnonymizer

* fix: clean up broken checkpoint tests

* fix: ensure *args and **kwargs are propogated through

* refactor: start work on datasource anonymizers

* refactor: remove all anonymizers except Anonymizer from usage stats attrs

* fix: update isinstance checks

* refactor: move helper into checkpoint_run_anonymizer

* refactor: move helper into datasource_anonymizer

* refactor: make anonymize string private and place in strategy

* refactor: make anonymize batch info private and place in strategy

* refactor: move build_init_payload to Anonymizer

* refactor: make remainder of anonymize methods private

* refactor: add store info to strategy

* refactor: add dataconnector info to strategy

* refactor: consolidate profiler info and profiler run anonymization

* refactor: remove *args from signatures

* refactor: updates around checkpoint anonymization

* chore: misc cleanup of Anonymizer

* feat: final touch up before review

* chore: remove 'else' statements

* fix: ensure appropriate checkpoint method gets called

* chore: misc updates from review

* refactor: move init_payload back to usage stats

* chore: misc type hinting

* refactor: start using individual classes again

* chore: continue updating individual anonymizer classes

* feat: further updates to child classes

* feat: update anonymize_init_payload

* fix: get checkpoint payloads working

* refactor: ensure all methods have obj

* fix: misc fixes

* fix: make misc updates to conditional checks for obj

* refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer

* refactor: rename Checkpoint and Profiler anonymizers

* feat: leverage aggregate anonymizer downstream

* feature: conditionally create aggregate_anonymizer in constructor

* feat: add cache retrieve or instantiate util

* chore: add batch_request can_handle

* feat: ensure that salt has a default value in anonymizers

* refactor: require aggregate anonymizer in constructor

* refactor: instantiate all strategies in aggregate

* fix: fix broken tests

* refactor: rename internal getter

Co-authored-by: Don Heppner <donald.heppner@gmail.com>

* [MAINTENANCE] Remove duplicate mistune dependency

* [MAINTENANCE] Run PEP-273 checks on a schedule or release cut

* [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573)

* -Corrected the line references and added <snippet> tags to source code for Spark version of guide.

* -Corrected the line references and added <snippet> tags to source code for Pandas version of guide.

* -lint reformat w/black

* -correcting line numbers after lint formatting.

* [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546)

Usage stats instrumentation of package dependencies

* [MAINTENANCE] Add DevRel team to GitHub auto-label action

* [MAINTENANCE] Add GitHub action to conditionally auto-update PR's  (#4574)

* feat: add new action

* chore: add conditions

* [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577)

* chore: bump version

* chore: test change

* chore: update all instances of black

* chore: new test changes

* chore: revert test changes

* Update overview.md (#4556)

* Add missing links.
* Fix some typos
* Simplify flow and grammar in a few places

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* - corrected broken link in admonition box. (#4585)

- updated links in admonition box to point to current technical documentation rather than old core concepts documents.

* [MAINTENANCE] Minor clean-up (#4571)

Little bit of cleanup in our execution engine and validator

* [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590)

* fix : misconfigured ExpectationConfigurationBuilder

* pushing fix

* clean up before submitting for review

* bugfix : remove sorting

* remove extra line

* [MAINTENANCE] Instrument package dependencies (#4583)

* Add dependencies to data_context.__init__ event

* [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599)

* release candidate for 0.14.13

* revert to 0.14.12 state

* [RELEASE] 0.15.3 (#4981)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>

* chore: revert azure pipeline

* chore: revert more files

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>

* [BUGFIX] Patch broken usage stats test around dependency tracking

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: andyjessen <62343929+andyjessen@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>
cdkini added a commit that referenced this pull request May 6, 2022
* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep (#4980)

* [FEATURE] Splitting data assets into batches using timestamp columns in spark (#4973)

* [BUGFIX] Use `monkeypatch` to ensure consistent bootstrap seed for additional probabilistic test (#4983)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* [FEATURE] Splitting data assets into batches using datetime columns in pandas (#4982)

* [BUGFIX] Patch the remainder of probabilistic `RuleBasedProfiler` tests with consistent bootstrap seed (#4989)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* fix: patch remaining tests

* chore: add docstring

* [FEATURE] Provide Semantic Type Domain Interpretation Utility For Use Within ParameterBuilder Classes (#4993)

* [MAINTENANCE] Splitter cleanup and enhancements (#4984)

* Update action.md (#4967)

Update action.md (#4967)

* [FEATURE] Add support for Interpolation Method for Quantile Statistic Used by Estimators in NumericMetricRangeMultiBatchParameterBuilder (#4997)

* [FEATURE] Enable self-initializing `ExpectColumnMeanToBeBetween` (#4986)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* [FEATURE] Enable self-initializing `ExpectColumnMedianToBeBetween` (#4987)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* chore: update GH action (#5001)

* [FEATURE] Enable self-initializing `ExpectColumnSumToBeBetween` (#4988)

* feat: init commit

* test: write integration test

* feat: add interpolation field

* [MAINTENANCE] Move `DataAssistant` registry capabilities into `DataAssistantRegistry` to enable user aliasing (#4991)

* refactor: move registry dict to dispatcher

* chore: misc cleanup

* chore: misc updates after review

* chore: misc cleanup

* chore: update error message

* Fix continuous partition example (#4939)

When calling json.dumps() method, the weights change.

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] RBP Profiling Dataset ProgressBar Fix #4999

* [MAINTENANCE] Preliminary refactors for data samplers. (#4996)

* MetricSingleBatchParameterBuilder with unit and integration tests. (#5003)

* [DOCS] Update slack notification guide to not use validation operators. (#4978)

* - Removed references to validation_operator
- Edited config yaml examples
- Grouped config options into tab groups for webhook vs app.

* - Added technical term tag to Checkpoint reference.
- Minor edit to document format.

* corrects number in step header

* [MAINTENANCE] Clean up unused imports and enforce through `flake8` in CI/CD (#5005)

* maintenance: clean up codebase

* chore: add to pipelines

* fix: ensure that flake8 is installed first

* chore: rename CI/CD stage

* chore: update type hint threshold

* parameter builder tests should utilize polymorphism (#5007)

* [MAINTENANCE] Clean up type hints in CLI (#5006)

* chore: first pass

* chore: more updates

* chore: more annotations

* chore: more annotations

* [FEATURE] Enable Pandas DataFrame and Series as MetricValues Output of Metric ParameterBuilder Classes (#5008)

* logging and exception handling (#5009)

* [FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

*[FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

* [FEATURE] Histogram/Partition Single-Batch ParameterBuilder (#5011)

* [FEATURE] Update `DataAssistantResult.plot()` return value to emit `PlotResult` wrapper dataclass (#4962)

* feat: init commit

* test: write all tests (except 1)

* test: write last test

* test: remove theme test

* refactor: add custom Chart dataclass

* chore: make Chart immutable

* chore: add docstring

* feat: misc updates per team convo

* chore: delete comments

* refactor: delete vconcat methods and consolidate helpers

* refactor: further consolidation through helpers

* chore: update subtitle styling

* chore: add padding

* [ENHANCEMENT] Condense column-level `vconcat` plots into one interactive plot (#5002)

* Clone vertically concatenated chart for interactive starting point

* Working interactive chart

* Update ColorPalette names

* Update ordinal palette

* Tooltip now working

* Change legend color

* Add y-axis titles

* Align y-axis vertically for both charts

* Add highlight line

* Change batch_id to Batch ID

* Improve legend title and tooltip titles

* New layer for starting point showing one line

* Detail title updating appropriately

* Column name shows up with empty selecdtion

* Use variable for alt.value(light_gray)

* Allow selection by mouseover on lines

* Anomaly encoded lines

* Move column seledtor to top left

* Format input_dropdown name

* Working with expectation kwargs

* Add predicate logic for strict_min and strict_max

* Add subtitle to prescriptive return charts

* WIP

* Overcame merge conflicts in descriptive mode

* Overcame merge conflicts in prescriptive mode

* Column charts are in their own list index

* [MAINTENANCE] Update version of black in pre-commit config

* [MAINTENANCE] Improve tooltips and formatting for distinct column values chart in VolumeDataAssistantResult (#5017)

* Correct type hints

* Improve tooltips

* Improve docstrings

* Fix return object indexing

* Return list length 1 instead of chart

* [FEATURE] Limit samplers work with supported sqlalchemy backends (#5014)

* [BUGFIX] Fix DataAssistantResult serialization issue (#5020)

* [MAINTENANCE] Enhance configuring serialization for DotDict type classes (#5023)

* [FEATURE] trino support (#5021)

* clean up SQL statements for handling subqueries properly

* use formal sqlalchemy for reflection

* TRINO WIP POC-COMPLETED

* Add trino package as a dependency and update imports

* Add docker-compose.yml for starburst database in assets/docker/

* Update _create_trino_engine to accept a hostname and schema_name

* Update get_dataset to have a block for trino

* Add ability to use data_alt in test_definitions JSON files (for Trino quirks)

* Minor update to get_test_validator_with_data to make debugging easier

* Add trino to sqla_keys dict in setup.py

* Update 3 test_definition json files with trino things

* Add table_selectable workaround for trino

* Add requirements-dev-trino to test_packaging.py

* Add trino to various azure-pipelines yml files

* Skip test_expectation__get_renderers

* Skip test__get_test_results

* Skip test__generate_expectation_tests__with_no_test_backends

Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>

* Pyarrow upper bound (#5028)

* release prep v0.15.4 (#5029)

* [MAINTENANCE] Use temporary branch to attemp to align git histories of `develop` and `main` (#5042)

* [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531)

* [BUGFIX] Moves testing dependencies out of core reqs (#4522)

* removing upper bound on mistune

* remove deprecated depedencies

* adds untracked dependency

* adds untracked dependency

* adds untracked dependency

* moving dependencies

* removes dependencies added to lite from core | adds missing dependencies

Co-authored-by: Chetan Kini <chetan@superconductive.com>

* [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547)

* [MAINTENANCE] Don't return from validate configuration methods (#4545)

* Add validate_configuration to 2 core Expectations that are passing all their tests

* Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests

* Update all validate_configuration methods to have type hints and return None

* Update all doc snippet references that were effected

* [DOCS] technical term tags connect to data cloud docs (#4414)

* - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.)
- Some additional editing was done to bring documents in line with the documentation and how-to guide standards.

* - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop.

* - Update to include technical term tags. (#4462)

- Minor updates to correct formatting and spelling issues.

* - Moved docs related to contributing integrations under contributing in the ToC (#4551)

- Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term).

* - Adds new image files for the intro page (#4540)

- Updates the image file link for the overview image on the intro page

* [DOCS] clarifications on execution engines and scalability (#4539)

* - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines.

* - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas.

* - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines.

* [DOCS] technical terms for validate data advanced (#4535)

* - add support for technical term tags.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* [DOCS] technical terms for validate data actions docs (#4518)

* - Edits to bring docs up to documentation and how-to guide standards.

* - add technical term tags to documents.
- minor formatting edits (technical terms missing capitalization, etc).

* [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553)

* [MAINTENANCE] Refactor global `conftest` (#4534)

* chore: use black directives to temporarily disable linting

* chore: more black directives to temporarily disable linting

* chore: finish remaining

* refactor: start cleaning up conftest

* refactor: more refactoring of conftest

* refactor: even more refactoring of conftest

* [FEATURE] Improve diagnostic checklist details (#4548)

* Update library_metadata check to provide details when it doesn't pass

* In linting check, if snake_case doesn't match filename, show computed snake_case

* Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr

* Update convert_to_json_serializable to handle bytes

* Update build_gallery.py script to convert diagnostics to JSON in separate try/except

* Update build_gallery.py script to write expectation_library_v2.json file with indenting

* Update _check_input_validation to tell if custom assert statements are used in validate_configuration

* clean up (#4554)

* minor touch up (#4558)

* [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485)

* feat: init commit

* refactor: shift all logic over to base class

* feat: start impl of anonymize on Anonymizer

* feat: get ProfilerRunAnonymizer working

* refactor: remove constructor from ProfilerRunAnonymizer

* refactor: start on CheckpointRunAnonymizer

* fix: clean up broken checkpoint tests

* fix: ensure *args and **kwargs are propogated through

* refactor: start work on datasource anonymizers

* refactor: remove all anonymizers except Anonymizer from usage stats attrs

* fix: update isinstance checks

* refactor: move helper into checkpoint_run_anonymizer

* refactor: move helper into datasource_anonymizer

* refactor: make anonymize string private and place in strategy

* refactor: make anonymize batch info private and place in strategy

* refactor: move build_init_payload to Anonymizer

* refactor: make remainder of anonymize methods private

* refactor: add store info to strategy

* refactor: add dataconnector info to strategy

* refactor: consolidate profiler info and profiler run anonymization

* refactor: remove *args from signatures

* refactor: updates around checkpoint anonymization

* chore: misc cleanup of Anonymizer

* feat: final touch up before review

* chore: remove 'else' statements

* fix: ensure appropriate checkpoint method gets called

* chore: misc updates from review

* refactor: move init_payload back to usage stats

* chore: misc type hinting

* refactor: start using individual classes again

* chore: continue updating individual anonymizer classes

* feat: further updates to child classes

* feat: update anonymize_init_payload

* fix: get checkpoint payloads working

* refactor: ensure all methods have obj

* fix: misc fixes

* fix: make misc updates to conditional checks for obj

* refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer

* refactor: rename Checkpoint and Profiler anonymizers

* feat: leverage aggregate anonymizer downstream

* feature: conditionally create aggregate_anonymizer in constructor

* feat: add cache retrieve or instantiate util

* chore: add batch_request can_handle

* feat: ensure that salt has a default value in anonymizers

* refactor: require aggregate anonymizer in constructor

* refactor: instantiate all strategies in aggregate

* fix: fix broken tests

* refactor: rename internal getter

Co-authored-by: Don Heppner <donald.heppner@gmail.com>

* [MAINTENANCE] Remove duplicate mistune dependency

* [MAINTENANCE] Run PEP-273 checks on a schedule or release cut

* [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573)

* -Corrected the line references and added <snippet> tags to source code for Spark version of guide.

* -Corrected the line references and added <snippet> tags to source code for Pandas version of guide.

* -lint reformat w/black

* -correcting line numbers after lint formatting.

* [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546)

Usage stats instrumentation of package dependencies

* [MAINTENANCE] Add DevRel team to GitHub auto-label action

* [MAINTENANCE] Add GitHub action to conditionally auto-update PR's  (#4574)

* feat: add new action

* chore: add conditions

* [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577)

* chore: bump version

* chore: test change

* chore: update all instances of black

* chore: new test changes

* chore: revert test changes

* Update overview.md (#4556)

* Add missing links.
* Fix some typos
* Simplify flow and grammar in a few places

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* - corrected broken link in admonition box. (#4585)

- updated links in admonition box to point to current technical documentation rather than old core concepts documents.

* [MAINTENANCE] Minor clean-up (#4571)

Little bit of cleanup in our execution engine and validator

* [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590)

* fix : misconfigured ExpectationConfigurationBuilder

* pushing fix

* clean up before submitting for review

* bugfix : remove sorting

* remove extra line

* [MAINTENANCE] Instrument package dependencies (#4583)

* Add dependencies to data_context.__init__ event

* [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599)

* release candidate for 0.14.13

* revert to 0.14.12 state

* [RELEASE] 0.15.3 (#4981)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>

* chore: revert azure pipeline

* chore: revert more files

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>

* [BUGFIX] Patch broken usage stats test around dependency tracking

* [FEATURE] Add subset operation to Domain class (#5049)

* [FEATURE] In DataAssistant: Use Domain instead of domain_type as key for Metrics Parameter Builders (#5057)

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: andyjessen <62343929+andyjessen@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>
cdkini added a commit that referenced this pull request May 6, 2022
* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep (#4980)

* [FEATURE] Splitting data assets into batches using timestamp columns in spark (#4973)

* [BUGFIX] Use `monkeypatch` to ensure consistent bootstrap seed for additional probabilistic test (#4983)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* [FEATURE] Splitting data assets into batches using datetime columns in pandas (#4982)

* [BUGFIX] Patch the remainder of probabilistic `RuleBasedProfiler` tests with consistent bootstrap seed (#4989)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* fix: patch remaining tests

* chore: add docstring

* [FEATURE] Provide Semantic Type Domain Interpretation Utility For Use Within ParameterBuilder Classes (#4993)

* [MAINTENANCE] Splitter cleanup and enhancements (#4984)

* Update action.md (#4967)

Update action.md (#4967)

* [FEATURE] Add support for Interpolation Method for Quantile Statistic Used by Estimators in NumericMetricRangeMultiBatchParameterBuilder (#4997)

* [FEATURE] Enable self-initializing `ExpectColumnMeanToBeBetween` (#4986)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* [FEATURE] Enable self-initializing `ExpectColumnMedianToBeBetween` (#4987)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* chore: update GH action (#5001)

* [FEATURE] Enable self-initializing `ExpectColumnSumToBeBetween` (#4988)

* feat: init commit

* test: write integration test

* feat: add interpolation field

* [MAINTENANCE] Move `DataAssistant` registry capabilities into `DataAssistantRegistry` to enable user aliasing (#4991)

* refactor: move registry dict to dispatcher

* chore: misc cleanup

* chore: misc updates after review

* chore: misc cleanup

* chore: update error message

* Fix continuous partition example (#4939)

When calling json.dumps() method, the weights change.

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] RBP Profiling Dataset ProgressBar Fix #4999

* [MAINTENANCE] Preliminary refactors for data samplers. (#4996)

* MetricSingleBatchParameterBuilder with unit and integration tests. (#5003)

* [DOCS] Update slack notification guide to not use validation operators. (#4978)

* - Removed references to validation_operator
- Edited config yaml examples
- Grouped config options into tab groups for webhook vs app.

* - Added technical term tag to Checkpoint reference.
- Minor edit to document format.

* corrects number in step header

* [MAINTENANCE] Clean up unused imports and enforce through `flake8` in CI/CD (#5005)

* maintenance: clean up codebase

* chore: add to pipelines

* fix: ensure that flake8 is installed first

* chore: rename CI/CD stage

* chore: update type hint threshold

* parameter builder tests should utilize polymorphism (#5007)

* [MAINTENANCE] Clean up type hints in CLI (#5006)

* chore: first pass

* chore: more updates

* chore: more annotations

* chore: more annotations

* [FEATURE] Enable Pandas DataFrame and Series as MetricValues Output of Metric ParameterBuilder Classes (#5008)

* logging and exception handling (#5009)

* [FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

*[FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

* [FEATURE] Histogram/Partition Single-Batch ParameterBuilder (#5011)

* [FEATURE] Update `DataAssistantResult.plot()` return value to emit `PlotResult` wrapper dataclass (#4962)

* feat: init commit

* test: write all tests (except 1)

* test: write last test

* test: remove theme test

* refactor: add custom Chart dataclass

* chore: make Chart immutable

* chore: add docstring

* feat: misc updates per team convo

* chore: delete comments

* refactor: delete vconcat methods and consolidate helpers

* refactor: further consolidation through helpers

* chore: update subtitle styling

* chore: add padding

* [ENHANCEMENT] Condense column-level `vconcat` plots into one interactive plot (#5002)

* Clone vertically concatenated chart for interactive starting point

* Working interactive chart

* Update ColorPalette names

* Update ordinal palette

* Tooltip now working

* Change legend color

* Add y-axis titles

* Align y-axis vertically for both charts

* Add highlight line

* Change batch_id to Batch ID

* Improve legend title and tooltip titles

* New layer for starting point showing one line

* Detail title updating appropriately

* Column name shows up with empty selecdtion

* Use variable for alt.value(light_gray)

* Allow selection by mouseover on lines

* Anomaly encoded lines

* Move column seledtor to top left

* Format input_dropdown name

* Working with expectation kwargs

* Add predicate logic for strict_min and strict_max

* Add subtitle to prescriptive return charts

* WIP

* Overcame merge conflicts in descriptive mode

* Overcame merge conflicts in prescriptive mode

* Column charts are in their own list index

* [MAINTENANCE] Update version of black in pre-commit config

* [MAINTENANCE] Improve tooltips and formatting for distinct column values chart in VolumeDataAssistantResult (#5017)

* Correct type hints

* Improve tooltips

* Improve docstrings

* Fix return object indexing

* Return list length 1 instead of chart

* [FEATURE] Limit samplers work with supported sqlalchemy backends (#5014)

* [BUGFIX] Fix DataAssistantResult serialization issue (#5020)

* [MAINTENANCE] Enhance configuring serialization for DotDict type classes (#5023)

* [FEATURE] trino support (#5021)

* clean up SQL statements for handling subqueries properly

* use formal sqlalchemy for reflection

* TRINO WIP POC-COMPLETED

* Add trino package as a dependency and update imports

* Add docker-compose.yml for starburst database in assets/docker/

* Update _create_trino_engine to accept a hostname and schema_name

* Update get_dataset to have a block for trino

* Add ability to use data_alt in test_definitions JSON files (for Trino quirks)

* Minor update to get_test_validator_with_data to make debugging easier

* Add trino to sqla_keys dict in setup.py

* Update 3 test_definition json files with trino things

* Add table_selectable workaround for trino

* Add requirements-dev-trino to test_packaging.py

* Add trino to various azure-pipelines yml files

* Skip test_expectation__get_renderers

* Skip test__get_test_results

* Skip test__generate_expectation_tests__with_no_test_backends

Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>

* Pyarrow upper bound (#5028)

* release prep v0.15.4 (#5029)

* [MAINTENANCE] Use temporary branch to attemp to align git histories of `develop` and `main` (#5042)

* [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531)

* [BUGFIX] Moves testing dependencies out of core reqs (#4522)

* removing upper bound on mistune

* remove deprecated depedencies

* adds untracked dependency

* adds untracked dependency

* adds untracked dependency

* moving dependencies

* removes dependencies added to lite from core | adds missing dependencies

Co-authored-by: Chetan Kini <chetan@superconductive.com>

* [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547)

* [MAINTENANCE] Don't return from validate configuration methods (#4545)

* Add validate_configuration to 2 core Expectations that are passing all their tests

* Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests

* Update all validate_configuration methods to have type hints and return None

* Update all doc snippet references that were effected

* [DOCS] technical term tags connect to data cloud docs (#4414)

* - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.)
- Some additional editing was done to bring documents in line with the documentation and how-to guide standards.

* - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop.

* - Update to include technical term tags. (#4462)

- Minor updates to correct formatting and spelling issues.

* - Moved docs related to contributing integrations under contributing in the ToC (#4551)

- Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term).

* - Adds new image files for the intro page (#4540)

- Updates the image file link for the overview image on the intro page

* [DOCS] clarifications on execution engines and scalability (#4539)

* - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines.

* - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas.

* - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines.

* [DOCS] technical terms for validate data advanced (#4535)

* - add support for technical term tags.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* [DOCS] technical terms for validate data actions docs (#4518)

* - Edits to bring docs up to documentation and how-to guide standards.

* - add technical term tags to documents.
- minor formatting edits (technical terms missing capitalization, etc).

* [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553)

* [MAINTENANCE] Refactor global `conftest` (#4534)

* chore: use black directives to temporarily disable linting

* chore: more black directives to temporarily disable linting

* chore: finish remaining

* refactor: start cleaning up conftest

* refactor: more refactoring of conftest

* refactor: even more refactoring of conftest

* [FEATURE] Improve diagnostic checklist details (#4548)

* Update library_metadata check to provide details when it doesn't pass

* In linting check, if snake_case doesn't match filename, show computed snake_case

* Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr

* Update convert_to_json_serializable to handle bytes

* Update build_gallery.py script to convert diagnostics to JSON in separate try/except

* Update build_gallery.py script to write expectation_library_v2.json file with indenting

* Update _check_input_validation to tell if custom assert statements are used in validate_configuration

* clean up (#4554)

* minor touch up (#4558)

* [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485)

* feat: init commit

* refactor: shift all logic over to base class

* feat: start impl of anonymize on Anonymizer

* feat: get ProfilerRunAnonymizer working

* refactor: remove constructor from ProfilerRunAnonymizer

* refactor: start on CheckpointRunAnonymizer

* fix: clean up broken checkpoint tests

* fix: ensure *args and **kwargs are propogated through

* refactor: start work on datasource anonymizers

* refactor: remove all anonymizers except Anonymizer from usage stats attrs

* fix: update isinstance checks

* refactor: move helper into checkpoint_run_anonymizer

* refactor: move helper into datasource_anonymizer

* refactor: make anonymize string private and place in strategy

* refactor: make anonymize batch info private and place in strategy

* refactor: move build_init_payload to Anonymizer

* refactor: make remainder of anonymize methods private

* refactor: add store info to strategy

* refactor: add dataconnector info to strategy

* refactor: consolidate profiler info and profiler run anonymization

* refactor: remove *args from signatures

* refactor: updates around checkpoint anonymization

* chore: misc cleanup of Anonymizer

* feat: final touch up before review

* chore: remove 'else' statements

* fix: ensure appropriate checkpoint method gets called

* chore: misc updates from review

* refactor: move init_payload back to usage stats

* chore: misc type hinting

* refactor: start using individual classes again

* chore: continue updating individual anonymizer classes

* feat: further updates to child classes

* feat: update anonymize_init_payload

* fix: get checkpoint payloads working

* refactor: ensure all methods have obj

* fix: misc fixes

* fix: make misc updates to conditional checks for obj

* refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer

* refactor: rename Checkpoint and Profiler anonymizers

* feat: leverage aggregate anonymizer downstream

* feature: conditionally create aggregate_anonymizer in constructor

* feat: add cache retrieve or instantiate util

* chore: add batch_request can_handle

* feat: ensure that salt has a default value in anonymizers

* refactor: require aggregate anonymizer in constructor

* refactor: instantiate all strategies in aggregate

* fix: fix broken tests

* refactor: rename internal getter

Co-authored-by: Don Heppner <donald.heppner@gmail.com>

* [MAINTENANCE] Remove duplicate mistune dependency

* [MAINTENANCE] Run PEP-273 checks on a schedule or release cut

* [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573)

* -Corrected the line references and added <snippet> tags to source code for Spark version of guide.

* -Corrected the line references and added <snippet> tags to source code for Pandas version of guide.

* -lint reformat w/black

* -correcting line numbers after lint formatting.

* [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546)

Usage stats instrumentation of package dependencies

* [MAINTENANCE] Add DevRel team to GitHub auto-label action

* [MAINTENANCE] Add GitHub action to conditionally auto-update PR's  (#4574)

* feat: add new action

* chore: add conditions

* [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577)

* chore: bump version

* chore: test change

* chore: update all instances of black

* chore: new test changes

* chore: revert test changes

* Update overview.md (#4556)

* Add missing links.
* Fix some typos
* Simplify flow and grammar in a few places

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* - corrected broken link in admonition box. (#4585)

- updated links in admonition box to point to current technical documentation rather than old core concepts documents.

* [MAINTENANCE] Minor clean-up (#4571)

Little bit of cleanup in our execution engine and validator

* [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590)

* fix : misconfigured ExpectationConfigurationBuilder

* pushing fix

* clean up before submitting for review

* bugfix : remove sorting

* remove extra line

* [MAINTENANCE] Instrument package dependencies (#4583)

* Add dependencies to data_context.__init__ event

* [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599)

* release candidate for 0.14.13

* revert to 0.14.12 state

* [RELEASE] 0.15.3 (#4981)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>

* chore: revert azure pipeline

* chore: revert more files

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>

* [BUGFIX] Patch broken usage stats test around dependency tracking

* [FEATURE] Add subset operation to Domain class (#5049)

* [FEATURE] In DataAssistant: Use Domain instead of domain_type as key for Metrics Parameter Builders (#5057)

* [MAINTENANCE] Save output of usage stats schema script in repo (#5053)

* chore: add output

* chore: update

* chore: docstring

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: andyjessen <62343929+andyjessen@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>
kenwade4 added a commit that referenced this pull request May 11, 2022
* [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531)

* [BUGFIX] Moves testing dependencies out of core reqs (#4522)

* removing upper bound on mistune

* remove deprecated depedencies

* adds untracked dependency

* adds untracked dependency

* adds untracked dependency

* moving dependencies

* removes dependencies added to lite from core | adds missing dependencies

Co-authored-by: Chetan Kini <chetan@superconductive.com>

* [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547)

* [MAINTENANCE] Don't return from validate configuration methods (#4545)

* Add validate_configuration to 2 core Expectations that are passing all their tests

* Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests

* Update all validate_configuration methods to have type hints and return None

* Update all doc snippet references that were effected

* [DOCS] technical term tags connect to data cloud docs (#4414)

* - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.)
- Some additional editing was done to bring documents in line with the documentation and how-to guide standards.

* - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop.

* - Update to include technical term tags. (#4462)

- Minor updates to correct formatting and spelling issues.

* - Moved docs related to contributing integrations under contributing in the ToC (#4551)

- Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term).

* - Adds new image files for the intro page (#4540)

- Updates the image file link for the overview image on the intro page

* [DOCS] clarifications on execution engines and scalability (#4539)

* - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines.

* - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas.

* - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines.

* [DOCS] technical terms for validate data advanced (#4535)

* - add support for technical term tags.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* [DOCS] technical terms for validate data actions docs (#4518)

* - Edits to bring docs up to documentation and how-to guide standards.

* - add technical term tags to documents.
- minor formatting edits (technical terms missing capitalization, etc).

* [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553)

* [MAINTENANCE] Refactor global `conftest` (#4534)

* chore: use black directives to temporarily disable linting

* chore: more black directives to temporarily disable linting

* chore: finish remaining

* refactor: start cleaning up conftest

* refactor: more refactoring of conftest

* refactor: even more refactoring of conftest

* [FEATURE] Improve diagnostic checklist details (#4548)

* Update library_metadata check to provide details when it doesn't pass

* In linting check, if snake_case doesn't match filename, show computed snake_case

* Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr

* Update convert_to_json_serializable to handle bytes

* Update build_gallery.py script to convert diagnostics to JSON in separate try/except

* Update build_gallery.py script to write expectation_library_v2.json file with indenting

* Update _check_input_validation to tell if custom assert statements are used in validate_configuration

* clean up (#4554)

* minor touch up (#4558)

* [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485)

* feat: init commit

* refactor: shift all logic over to base class

* feat: start impl of anonymize on Anonymizer

* feat: get ProfilerRunAnonymizer working

* refactor: remove constructor from ProfilerRunAnonymizer

* refactor: start on CheckpointRunAnonymizer

* fix: clean up broken checkpoint tests

* fix: ensure *args and **kwargs are propogated through

* refactor: start work on datasource anonymizers

* refactor: remove all anonymizers except Anonymizer from usage stats attrs

* fix: update isinstance checks

* refactor: move helper into checkpoint_run_anonymizer

* refactor: move helper into datasource_anonymizer

* refactor: make anonymize string private and place in strategy

* refactor: make anonymize batch info private and place in strategy

* refactor: move build_init_payload to Anonymizer

* refactor: make remainder of anonymize methods private

* refactor: add store info to strategy

* refactor: add dataconnector info to strategy

* refactor: consolidate profiler info and profiler run anonymization

* refactor: remove *args from signatures

* refactor: updates around checkpoint anonymization

* chore: misc cleanup of Anonymizer

* feat: final touch up before review

* chore: remove 'else' statements

* fix: ensure appropriate checkpoint method gets called

* chore: misc updates from review

* refactor: move init_payload back to usage stats

* chore: misc type hinting

* refactor: start using individual classes again

* chore: continue updating individual anonymizer classes

* feat: further updates to child classes

* feat: update anonymize_init_payload

* fix: get checkpoint payloads working

* refactor: ensure all methods have obj

* fix: misc fixes

* fix: make misc updates to conditional checks for obj

* refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer

* refactor: rename Checkpoint and Profiler anonymizers

* feat: leverage aggregate anonymizer downstream

* feature: conditionally create aggregate_anonymizer in constructor

* feat: add cache retrieve or instantiate util

* chore: add batch_request can_handle

* feat: ensure that salt has a default value in anonymizers

* refactor: require aggregate anonymizer in constructor

* refactor: instantiate all strategies in aggregate

* fix: fix broken tests

* refactor: rename internal getter

Co-authored-by: Don Heppner <donald.heppner@gmail.com>

* [MAINTENANCE] Remove duplicate mistune dependency

* [MAINTENANCE] Run PEP-273 checks on a schedule or release cut

* [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573)

* -Corrected the line references and added <snippet> tags to source code for Spark version of guide.

* -Corrected the line references and added <snippet> tags to source code for Pandas version of guide.

* -lint reformat w/black

* -correcting line numbers after lint formatting.

* [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546)

Usage stats instrumentation of package dependencies

* [MAINTENANCE] Add DevRel team to GitHub auto-label action

* [MAINTENANCE] Add GitHub action to conditionally auto-update PR's  (#4574)

* feat: add new action

* chore: add conditions

* [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577)

* chore: bump version

* chore: test change

* chore: update all instances of black

* chore: new test changes

* chore: revert test changes

* Update overview.md (#4556)

* Add missing links.
* Fix some typos
* Simplify flow and grammar in a few places

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* - corrected broken link in admonition box. (#4585)

- updated links in admonition box to point to current technical documentation rather than old core concepts documents.

* [MAINTENANCE] Minor clean-up (#4571)

Little bit of cleanup in our execution engine and validator

* [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590)

* fix : misconfigured ExpectationConfigurationBuilder

* pushing fix

* clean up before submitting for review

* bugfix : remove sorting

* remove extra line

* [MAINTENANCE] Instrument package dependencies (#4583)

* Add dependencies to data_context.__init__ event

* [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599)

* release candidate for 0.14.13

* revert to 0.14.12 state

* [RELEASE] 0.15.3 (#4981)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>

* [RELEASE] 0.15.4 (#5051)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep (#4980)

* [FEATURE] Splitting data assets into batches using timestamp columns in spark (#4973)

* [BUGFIX] Use `monkeypatch` to ensure consistent bootstrap seed for additional probabilistic test (#4983)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* [FEATURE] Splitting data assets into batches using datetime columns in pandas (#4982)

* [BUGFIX] Patch the remainder of probabilistic `RuleBasedProfiler` tests with consistent bootstrap seed (#4989)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* fix: patch remaining tests

* chore: add docstring

* [FEATURE] Provide Semantic Type Domain Interpretation Utility For Use Within ParameterBuilder Classes (#4993)

* [MAINTENANCE] Splitter cleanup and enhancements (#4984)

* Update action.md (#4967)

Update action.md (#4967)

* [FEATURE] Add support for Interpolation Method for Quantile Statistic Used by Estimators in NumericMetricRangeMultiBatchParameterBuilder (#4997)

* [FEATURE] Enable self-initializing `ExpectColumnMeanToBeBetween` (#4986)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* [FEATURE] Enable self-initializing `ExpectColumnMedianToBeBetween` (#4987)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* chore: update GH action (#5001)

* [FEATURE] Enable self-initializing `ExpectColumnSumToBeBetween` (#4988)

* feat: init commit

* test: write integration test

* feat: add interpolation field

* [MAINTENANCE] Move `DataAssistant` registry capabilities into `DataAssistantRegistry` to enable user aliasing (#4991)

* refactor: move registry dict to dispatcher

* chore: misc cleanup

* chore: misc updates after review

* chore: misc cleanup

* chore: update error message

* Fix continuous partition example (#4939)

When calling json.dumps() method, the weights change.

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] RBP Profiling Dataset ProgressBar Fix #4999

* [MAINTENANCE] Preliminary refactors for data samplers. (#4996)

* MetricSingleBatchParameterBuilder with unit and integration tests. (#5003)

* [DOCS] Update slack notification guide to not use validation operators. (#4978)

* - Removed references to validation_operator
- Edited config yaml examples
- Grouped config options into tab groups for webhook vs app.

* - Added technical term tag to Checkpoint reference.
- Minor edit to document format.

* corrects number in step header

* [MAINTENANCE] Clean up unused imports and enforce through `flake8` in CI/CD (#5005)

* maintenance: clean up codebase

* chore: add to pipelines

* fix: ensure that flake8 is installed first

* chore: rename CI/CD stage

* chore: update type hint threshold

* parameter builder tests should utilize polymorphism (#5007)

* [MAINTENANCE] Clean up type hints in CLI (#5006)

* chore: first pass

* chore: more updates

* chore: more annotations

* chore: more annotations

* [FEATURE] Enable Pandas DataFrame and Series as MetricValues Output of Metric ParameterBuilder Classes (#5008)

* logging and exception handling (#5009)

* [FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

*[FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

* [FEATURE] Histogram/Partition Single-Batch ParameterBuilder (#5011)

* [FEATURE] Update `DataAssistantResult.plot()` return value to emit `PlotResult` wrapper dataclass (#4962)

* feat: init commit

* test: write all tests (except 1)

* test: write last test

* test: remove theme test

* refactor: add custom Chart dataclass

* chore: make Chart immutable

* chore: add docstring

* feat: misc updates per team convo

* chore: delete comments

* refactor: delete vconcat methods and consolidate helpers

* refactor: further consolidation through helpers

* chore: update subtitle styling

* chore: add padding

* [ENHANCEMENT] Condense column-level `vconcat` plots into one interactive plot (#5002)

* Clone vertically concatenated chart for interactive starting point

* Working interactive chart

* Update ColorPalette names

* Update ordinal palette

* Tooltip now working

* Change legend color

* Add y-axis titles

* Align y-axis vertically for both charts

* Add highlight line

* Change batch_id to Batch ID

* Improve legend title and tooltip titles

* New layer for starting point showing one line

* Detail title updating appropriately

* Column name shows up with empty selecdtion

* Use variable for alt.value(light_gray)

* Allow selection by mouseover on lines

* Anomaly encoded lines

* Move column seledtor to top left

* Format input_dropdown name

* Working with expectation kwargs

* Add predicate logic for strict_min and strict_max

* Add subtitle to prescriptive return charts

* WIP

* Overcame merge conflicts in descriptive mode

* Overcame merge conflicts in prescriptive mode

* Column charts are in their own list index

* [MAINTENANCE] Update version of black in pre-commit config

* [MAINTENANCE] Improve tooltips and formatting for distinct column values chart in VolumeDataAssistantResult (#5017)

* Correct type hints

* Improve tooltips

* Improve docstrings

* Fix return object indexing

* Return list length 1 instead of chart

* [FEATURE] Limit samplers work with supported sqlalchemy backends (#5014)

* [BUGFIX] Fix DataAssistantResult serialization issue (#5020)

* [MAINTENANCE] Enhance configuring serialization for DotDict type classes (#5023)

* [FEATURE] trino support (#5021)

* clean up SQL statements for handling subqueries properly

* use formal sqlalchemy for reflection

* TRINO WIP POC-COMPLETED

* Add trino package as a dependency and update imports

* Add docker-compose.yml for starburst database in assets/docker/

* Update _create_trino_engine to accept a hostname and schema_name

* Update get_dataset to have a block for trino

* Add ability to use data_alt in test_definitions JSON files (for Trino quirks)

* Minor update to get_test_validator_with_data to make debugging easier

* Add trino to sqla_keys dict in setup.py

* Update 3 test_definition json files with trino things

* Add table_selectable workaround for trino

* Add requirements-dev-trino to test_packaging.py

* Add trino to various azure-pipelines yml files

* Skip test_expectation__get_renderers

* Skip test__get_test_results

* Skip test__generate_expectation_tests__with_no_test_backends

Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>

* Pyarrow upper bound (#5028)

* release prep v0.15.4 (#5029)

* [MAINTENANCE] Use temporary branch to attemp to align git histories of `develop` and `main` (#5042)

* [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531)

* [BUGFIX] Moves testing dependencies out of core reqs (#4522)

* removing upper bound on mistune

* remove deprecated depedencies

* adds untracked dependency

* adds untracked dependency

* adds untracked dependency

* moving dependencies

* removes dependencies added to lite from core | adds missing dependencies

Co-authored-by: Chetan Kini <chetan@superconductive.com>

* [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547)

* [MAINTENANCE] Don't return from validate configuration methods (#4545)

* Add validate_configuration to 2 core Expectations that are passing all their tests

* Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests

* Update all validate_configuration methods to have type hints and return None

* Update all doc snippet references that were effected

* [DOCS] technical term tags connect to data cloud docs (#4414)

* - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.)
- Some additional editing was done to bring documents in line with the documentation and how-to guide standards.

* - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop.

* - Update to include technical term tags. (#4462)

- Minor updates to correct formatting and spelling issues.

* - Moved docs related to contributing integrations under contributing in the ToC (#4551)

- Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term).

* - Adds new image files for the intro page (#4540)

- Updates the image file link for the overview image on the intro page

* [DOCS] clarifications on execution engines and scalability (#4539)

* - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines.

* - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas.

* - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines.

* [DOCS] technical terms for validate data advanced (#4535)

* - add support for technical term tags.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* [DOCS] technical terms for validate data actions docs (#4518)

* - Edits to bring docs up to documentation and how-to guide standards.

* - add technical term tags to documents.
- minor formatting edits (technical terms missing capitalization, etc).

* [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553)

* [MAINTENANCE] Refactor global `conftest` (#4534)

* chore: use black directives to temporarily disable linting

* chore: more black directives to temporarily disable linting

* chore: finish remaining

* refactor: start cleaning up conftest

* refactor: more refactoring of conftest

* refactor: even more refactoring of conftest

* [FEATURE] Improve diagnostic checklist details (#4548)

* Update library_metadata check to provide details when it doesn't pass

* In linting check, if snake_case doesn't match filename, show computed snake_case

* Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr

* Update convert_to_json_serializable to handle bytes

* Update build_gallery.py script to convert diagnostics to JSON in separate try/except

* Update build_gallery.py script to write expectation_library_v2.json file with indenting

* Update _check_input_validation to tell if custom assert statements are used in validate_configuration

* clean up (#4554)

* minor touch up (#4558)

* [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485)

* feat: init commit

* refactor: shift all logic over to base class

* feat: start impl of anonymize on Anonymizer

* feat: get ProfilerRunAnonymizer working

* refactor: remove constructor from ProfilerRunAnonymizer

* refactor: start on CheckpointRunAnonymizer

* fix: clean up broken checkpoint tests

* fix: ensure *args and **kwargs are propogated through

* refactor: start work on datasource anonymizers

* refactor: remove all anonymizers except Anonymizer from usage stats attrs

* fix: update isinstance checks

* refactor: move helper into checkpoint_run_anonymizer

* refactor: move helper into datasource_anonymizer

* refactor: make anonymize string private and place in strategy

* refactor: make anonymize batch info private and place in strategy

* refactor: move build_init_payload to Anonymizer

* refactor: make remainder of anonymize methods private

* refactor: add store info to strategy

* refactor: add dataconnector info to strategy

* refactor: consolidate profiler info and profiler run anonymization

* refactor: remove *args from signatures

* refactor: updates around checkpoint anonymization

* chore: misc cleanup of Anonymizer

* feat: final touch up before review

* chore: remove 'else' statements

* fix: ensure appropriate checkpoint method gets called

* chore: misc updates from review

* refactor: move init_payload back to usage stats

* chore: misc type hinting

* refactor: start using individual classes again

* chore: continue updating individual anonymizer classes

* feat: further updates to child classes

* feat: update anonymize_init_payload

* fix: get checkpoint payloads working

* refactor: ensure all methods have obj

* fix: misc fixes

* fix: make misc updates to conditional checks for obj

* refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer

* refactor: rename Checkpoint and Profiler anonymizers

* feat: leverage aggregate anonymizer downstream

* feature: conditionally create aggregate_anonymizer in constructor

* feat: add cache retrieve or instantiate util

* chore: add batch_request can_handle

* feat: ensure that salt has a default value in anonymizers

* refactor: require aggregate anonymizer in constructor

* refactor: instantiate all strategies in aggregate

* fix: fix broken tests

* refactor: rename internal getter

Co-authored-by: Don Heppner <donald.heppner@gmail.com>

* [MAINTENANCE] Remove duplicate mistune dependency

* [MAINTENANCE] Run PEP-273 checks on a schedule or release cut

* [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573)

* -Corrected the line references and added <snippet> tags to source code for Spark version of guide.

* -Corrected the line references and added <snippet> tags to source code for Pandas version of guide.

* -lint reformat w/black

* -correcting line numbers after lint formatting.

* [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546)

Usage stats instrumentation of package dependencies

* [MAINTENANCE] Add DevRel team to GitHub auto-label action

* [MAINTENANCE] Add GitHub action to conditionally auto-update PR's  (#4574)

* feat: add new action

* chore: add conditions

* [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577)

* chore: bump version

* chore: test change

* chore: update all instances of black

* chore: new test changes

* chore: revert test changes

* Update overview.md (#4556)

* Add missing links.
* Fix some typos
* Simplify flow and grammar in a few places

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* - corrected broken link in admonition box. (#4585)

- updated links in admonition box to point to current technical documentation rather than old core concepts documents.

* [MAINTENANCE] Minor clean-up (#4571)

Little bit of cleanup in our execution engine and validator

* [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590)

* fix : misconfigured ExpectationConfigurationBuilder

* pushing fix

* clean up before submitting for review

* bugfix : remove sorting

* remove extra line

* [MAINTENANCE] Instrument package dependencies (#4583)

* Add dependencies to data_context.__init__ event

* [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599)

* release candidate for 0.14.13

* revert to 0.14.12 state

* [RELEASE] 0.15.3 (#4981)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>

* chore: revert azure pipeline

* chore: revert more files

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>

* [BUGFIX] Patch broken usage stats test around dependency tracking

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: andyjessen <62343929+andyjessen@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>

* [MAINTENANCE] Sync `main` and `develop` branches (#5060)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep (#4980)

* [FEATURE] Splitting data assets into batches using timestamp columns in spark (#4973)

* [BUGFIX] Use `monkeypatch` to ensure consistent bootstrap seed for additional probabilistic test (#4983)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* [FEATURE] Splitting data assets into batches using datetime columns in pandas (#4982)

* [BUGFIX] Patch the remainder of probabilistic `RuleBasedProfiler` tests with consistent bootstrap seed (#4989)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* fix: patch remaining tests

* chore: add docstring

* [FEATURE] Provide Semantic Type Domain Interpretation Utility For Use Within ParameterBuilder Classes (#4993)

* [MAINTENANCE] Splitter cleanup and enhancements (#4984)

* Update action.md (#4967)

Update action.md (#4967)

* [FEATURE] Add support for Interpolation Method for Quantile Statistic Used by Estimators in NumericMetricRangeMultiBatchParameterBuilder (#4997)

* [FEATURE] Enable self-initializing `ExpectColumnMeanToBeBetween` (#4986)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* [FEATURE] Enable self-initializing `ExpectColumnMedianToBeBetween` (#4987)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* chore: update GH action (#5001)

* [FEATURE] Enable self-initializing `ExpectColumnSumToBeBetween` (#4988)

* feat: init commit

* test: write integration test

* feat: add interpolation field

* [MAINTENANCE] Move `DataAssistant` registry capabilities into `DataAssistantRegistry` to enable user aliasing (#4991)

* refactor: move registry dict to dispatcher

* chore: misc cleanup

* chore: misc updates after review

* chore: misc cleanup

* chore: update error message

* Fix continuous partition example (#4939)

When calling json.dumps() method, the weights change.

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] RB…
cdkini added a commit that referenced this pull request May 12, 2022
* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep (#4980)

* [FEATURE] Splitting data assets into batches using timestamp columns in spark (#4973)

* [BUGFIX] Use `monkeypatch` to ensure consistent bootstrap seed for additional probabilistic test (#4983)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* [FEATURE] Splitting data assets into batches using datetime columns in pandas (#4982)

* [BUGFIX] Patch the remainder of probabilistic `RuleBasedProfiler` tests with consistent bootstrap seed (#4989)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* fix: patch remaining tests

* chore: add docstring

* [FEATURE] Provide Semantic Type Domain Interpretation Utility For Use Within ParameterBuilder Classes (#4993)

* [MAINTENANCE] Splitter cleanup and enhancements (#4984)

* Update action.md (#4967)

Update action.md (#4967)

* [FEATURE] Add support for Interpolation Method for Quantile Statistic Used by Estimators in NumericMetricRangeMultiBatchParameterBuilder (#4997)

* [FEATURE] Enable self-initializing `ExpectColumnMeanToBeBetween` (#4986)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* [FEATURE] Enable self-initializing `ExpectColumnMedianToBeBetween` (#4987)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* chore: update GH action (#5001)

* [FEATURE] Enable self-initializing `ExpectColumnSumToBeBetween` (#4988)

* feat: init commit

* test: write integration test

* feat: add interpolation field

* [MAINTENANCE] Move `DataAssistant` registry capabilities into `DataAssistantRegistry` to enable user aliasing (#4991)

* refactor: move registry dict to dispatcher

* chore: misc cleanup

* chore: misc updates after review

* chore: misc cleanup

* chore: update error message

* Fix continuous partition example (#4939)

When calling json.dumps() method, the weights change.

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] RBP Profiling Dataset ProgressBar Fix #4999

* [MAINTENANCE] Preliminary refactors for data samplers. (#4996)

* MetricSingleBatchParameterBuilder with unit and integration tests. (#5003)

* [DOCS] Update slack notification guide to not use validation operators. (#4978)

* - Removed references to validation_operator
- Edited config yaml examples
- Grouped config options into tab groups for webhook vs app.

* - Added technical term tag to Checkpoint reference.
- Minor edit to document format.

* corrects number in step header

* [MAINTENANCE] Clean up unused imports and enforce through `flake8` in CI/CD (#5005)

* maintenance: clean up codebase

* chore: add to pipelines

* fix: ensure that flake8 is installed first

* chore: rename CI/CD stage

* chore: update type hint threshold

* parameter builder tests should utilize polymorphism (#5007)

* [MAINTENANCE] Clean up type hints in CLI (#5006)

* chore: first pass

* chore: more updates

* chore: more annotations

* chore: more annotations

* [FEATURE] Enable Pandas DataFrame and Series as MetricValues Output of Metric ParameterBuilder Classes (#5008)

* logging and exception handling (#5009)

* [FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

*[FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

* [FEATURE] Histogram/Partition Single-Batch ParameterBuilder (#5011)

* [FEATURE] Update `DataAssistantResult.plot()` return value to emit `PlotResult` wrapper dataclass (#4962)

* feat: init commit

* test: write all tests (except 1)

* test: write last test

* test: remove theme test

* refactor: add custom Chart dataclass

* chore: make Chart immutable

* chore: add docstring

* feat: misc updates per team convo

* chore: delete comments

* refactor: delete vconcat methods and consolidate helpers

* refactor: further consolidation through helpers

* chore: update subtitle styling

* chore: add padding

* [ENHANCEMENT] Condense column-level `vconcat` plots into one interactive plot (#5002)

* Clone vertically concatenated chart for interactive starting point

* Working interactive chart

* Update ColorPalette names

* Update ordinal palette

* Tooltip now working

* Change legend color

* Add y-axis titles

* Align y-axis vertically for both charts

* Add highlight line

* Change batch_id to Batch ID

* Improve legend title and tooltip titles

* New layer for starting point showing one line

* Detail title updating appropriately

* Column name shows up with empty selecdtion

* Use variable for alt.value(light_gray)

* Allow selection by mouseover on lines

* Anomaly encoded lines

* Move column seledtor to top left

* Format input_dropdown name

* Working with expectation kwargs

* Add predicate logic for strict_min and strict_max

* Add subtitle to prescriptive return charts

* WIP

* Overcame merge conflicts in descriptive mode

* Overcame merge conflicts in prescriptive mode

* Column charts are in their own list index

* [MAINTENANCE] Update version of black in pre-commit config

* [MAINTENANCE] Improve tooltips and formatting for distinct column values chart in VolumeDataAssistantResult (#5017)

* Correct type hints

* Improve tooltips

* Improve docstrings

* Fix return object indexing

* Return list length 1 instead of chart

* [FEATURE] Limit samplers work with supported sqlalchemy backends (#5014)

* [BUGFIX] Fix DataAssistantResult serialization issue (#5020)

* [MAINTENANCE] Enhance configuring serialization for DotDict type classes (#5023)

* [FEATURE] trino support (#5021)

* clean up SQL statements for handling subqueries properly

* use formal sqlalchemy for reflection

* TRINO WIP POC-COMPLETED

* Add trino package as a dependency and update imports

* Add docker-compose.yml for starburst database in assets/docker/

* Update _create_trino_engine to accept a hostname and schema_name

* Update get_dataset to have a block for trino

* Add ability to use data_alt in test_definitions JSON files (for Trino quirks)

* Minor update to get_test_validator_with_data to make debugging easier

* Add trino to sqla_keys dict in setup.py

* Update 3 test_definition json files with trino things

* Add table_selectable workaround for trino

* Add requirements-dev-trino to test_packaging.py

* Add trino to various azure-pipelines yml files

* Skip test_expectation__get_renderers

* Skip test__get_test_results

* Skip test__generate_expectation_tests__with_no_test_backends

Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>

* Pyarrow upper bound (#5028)

* release prep v0.15.4 (#5029)

* [MAINTENANCE] Use temporary branch to attemp to align git histories of `develop` and `main` (#5042)

* [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531)

* [BUGFIX] Moves testing dependencies out of core reqs (#4522)

* removing upper bound on mistune

* remove deprecated depedencies

* adds untracked dependency

* adds untracked dependency

* adds untracked dependency

* moving dependencies

* removes dependencies added to lite from core | adds missing dependencies

Co-authored-by: Chetan Kini <chetan@superconductive.com>

* [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547)

* [MAINTENANCE] Don't return from validate configuration methods (#4545)

* Add validate_configuration to 2 core Expectations that are passing all their tests

* Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests

* Update all validate_configuration methods to have type hints and return None

* Update all doc snippet references that were effected

* [DOCS] technical term tags connect to data cloud docs (#4414)

* - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.)
- Some additional editing was done to bring documents in line with the documentation and how-to guide standards.

* - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop.

* - Update to include technical term tags. (#4462)

- Minor updates to correct formatting and spelling issues.

* - Moved docs related to contributing integrations under contributing in the ToC (#4551)

- Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term).

* - Adds new image files for the intro page (#4540)

- Updates the image file link for the overview image on the intro page

* [DOCS] clarifications on execution engines and scalability (#4539)

* - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines.

* - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas.

* - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines.

* [DOCS] technical terms for validate data advanced (#4535)

* - add support for technical term tags.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* [DOCS] technical terms for validate data actions docs (#4518)

* - Edits to bring docs up to documentation and how-to guide standards.

* - add technical term tags to documents.
- minor formatting edits (technical terms missing capitalization, etc).

* [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553)

* [MAINTENANCE] Refactor global `conftest` (#4534)

* chore: use black directives to temporarily disable linting

* chore: more black directives to temporarily disable linting

* chore: finish remaining

* refactor: start cleaning up conftest

* refactor: more refactoring of conftest

* refactor: even more refactoring of conftest

* [FEATURE] Improve diagnostic checklist details (#4548)

* Update library_metadata check to provide details when it doesn't pass

* In linting check, if snake_case doesn't match filename, show computed snake_case

* Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr

* Update convert_to_json_serializable to handle bytes

* Update build_gallery.py script to convert diagnostics to JSON in separate try/except

* Update build_gallery.py script to write expectation_library_v2.json file with indenting

* Update _check_input_validation to tell if custom assert statements are used in validate_configuration

* clean up (#4554)

* minor touch up (#4558)

* [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485)

* feat: init commit

* refactor: shift all logic over to base class

* feat: start impl of anonymize on Anonymizer

* feat: get ProfilerRunAnonymizer working

* refactor: remove constructor from ProfilerRunAnonymizer

* refactor: start on CheckpointRunAnonymizer

* fix: clean up broken checkpoint tests

* fix: ensure *args and **kwargs are propogated through

* refactor: start work on datasource anonymizers

* refactor: remove all anonymizers except Anonymizer from usage stats attrs

* fix: update isinstance checks

* refactor: move helper into checkpoint_run_anonymizer

* refactor: move helper into datasource_anonymizer

* refactor: make anonymize string private and place in strategy

* refactor: make anonymize batch info private and place in strategy

* refactor: move build_init_payload to Anonymizer

* refactor: make remainder of anonymize methods private

* refactor: add store info to strategy

* refactor: add dataconnector info to strategy

* refactor: consolidate profiler info and profiler run anonymization

* refactor: remove *args from signatures

* refactor: updates around checkpoint anonymization

* chore: misc cleanup of Anonymizer

* feat: final touch up before review

* chore: remove 'else' statements

* fix: ensure appropriate checkpoint method gets called

* chore: misc updates from review

* refactor: move init_payload back to usage stats

* chore: misc type hinting

* refactor: start using individual classes again

* chore: continue updating individual anonymizer classes

* feat: further updates to child classes

* feat: update anonymize_init_payload

* fix: get checkpoint payloads working

* refactor: ensure all methods have obj

* fix: misc fixes

* fix: make misc updates to conditional checks for obj

* refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer

* refactor: rename Checkpoint and Profiler anonymizers

* feat: leverage aggregate anonymizer downstream

* feature: conditionally create aggregate_anonymizer in constructor

* feat: add cache retrieve or instantiate util

* chore: add batch_request can_handle

* feat: ensure that salt has a default value in anonymizers

* refactor: require aggregate anonymizer in constructor

* refactor: instantiate all strategies in aggregate

* fix: fix broken tests

* refactor: rename internal getter

Co-authored-by: Don Heppner <donald.heppner@gmail.com>

* [MAINTENANCE] Remove duplicate mistune dependency

* [MAINTENANCE] Run PEP-273 checks on a schedule or release cut

* [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573)

* -Corrected the line references and added <snippet> tags to source code for Spark version of guide.

* -Corrected the line references and added <snippet> tags to source code for Pandas version of guide.

* -lint reformat w/black

* -correcting line numbers after lint formatting.

* [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546)

Usage stats instrumentation of package dependencies

* [MAINTENANCE] Add DevRel team to GitHub auto-label action

* [MAINTENANCE] Add GitHub action to conditionally auto-update PR's  (#4574)

* feat: add new action

* chore: add conditions

* [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577)

* chore: bump version

* chore: test change

* chore: update all instances of black

* chore: new test changes

* chore: revert test changes

* Update overview.md (#4556)

* Add missing links.
* Fix some typos
* Simplify flow and grammar in a few places

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* - corrected broken link in admonition box. (#4585)

- updated links in admonition box to point to current technical documentation rather than old core concepts documents.

* [MAINTENANCE] Minor clean-up (#4571)

Little bit of cleanup in our execution engine and validator

* [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590)

* fix : misconfigured ExpectationConfigurationBuilder

* pushing fix

* clean up before submitting for review

* bugfix : remove sorting

* remove extra line

* [MAINTENANCE] Instrument package dependencies (#4583)

* Add dependencies to data_context.__init__ event

* [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599)

* release candidate for 0.14.13

* revert to 0.14.12 state

* [RELEASE] 0.15.3 (#4981)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>

* chore: revert azure pipeline

* chore: revert more files

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>

* [BUGFIX] Patch broken usage stats test around dependency tracking

* [FEATURE] Add subset operation to Domain class (#5049)

* [FEATURE] In DataAssistant: Use Domain instead of domain_type as key for Metrics Parameter Builders (#5057)

* [MAINTENANCE] Save output of usage stats schema script in repo (#5053)

* chore: add output

* chore: update

* chore: docstring

* [MAINTENANCE] Apply Altair custom themes to return objects (#5044)

* Apply theme changes to dropdown input CSS

* Refactor to apply theme to return charts as well

* Bug in return charts missing theme

* Test for ensuring custom config is applied to return object

* [BUGFIX] self_check() now also checks for aws_config_file #5040

* [FEATURE] Self-initializing `ExpectColumnStddevToBeBetween` (#5065)

* feat: make expectation self-init

* test: write integration test

* [MAINTENANCE] Introducing RuleBasedProfilerResult -- neither expectation suite name nor expectation suite must be passed to RuleBasedProfiler.run() (#5061)

* [BUGFIX] multi_batch_rule_based_profiler test up to date with RBP changes

* [BUGFIX] Splitting Support at Asset level (#5026)

* [BUGFIX] Splitting Support at Asset level (#5026)

* [MAINTENANCE] Refactor `DataAssistant` plotting to leverage utility dataclasses (#5022)

* refactor: create util dataclass

* refactor: move helper to separate file

* refactor: refactor into individual plot utils

* chore: misc updates

* chore: update method signature

* refactor: move more logic to parent

* chore: use getter internally

* docs: write docstrs

* docs: remove whitespace in docstrs

* chore: convert to frozen dataclass

* refactor: rename to PlotComponent

* chore: rename _get_expect_domain_values

* chore: fix typo

* refactor: move subtitle to domain object

* feat: expectation kwarg subclass

* docs: add docstring

* corrections to self-initializing expectations (#5068)

* Check that a passed string is parseable as an integer. (#5071)

* fix(sqlalchemy-dataset): databricks engine create temporary view (#4994)

* Clean up mssql limit sampling code path and comments. (#5074)

* [FEATURE] Enum used by DateSplitter able to be represented as YAML (#5073)

* chore : fix and note from discussion

* feature: push of fix and cleaning up old comments

* update docsstring

* stray comments

* [MAINTENANCE] Make saving bootstraps histogram for NumericMetricRangeMultiBatchParameterBuilder  optional (absent by default) (#5075)

* [MAINTENANCE] Make self-initializing expectations return estimated kwargs with auto-generation timestamp and Great Expectation version (#5076)

* - Update configurations and instructions to reflect Checkpoint process, not Validation Operator. (#5039)

* [DOCS] Update process and config examples in Opsgenie guide (#5037)

* - Update configurations and instructions to reflect Checkpoint process, not Validation Operator.

* Update docs/guides/validation/validation_actions/how_to_trigger_opsgenie_notifications_as_a_validation_action.md

Co-authored-by: talagluck <tal@superconductive.com>

Co-authored-by: talagluck <tal@superconductive.com>

* [DOCS] Correct name of `openlineage-integration-common` package (#5041)

Documentation pointed to non-existing package `openlineage-common`. Update docs to proper `openlineage-integration-common`.

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* - Correction to Additional Information (`notify_with` is configured in Checkpoint config, not `great_expectations.yml`) (#5034)

- Update 2nd tab group to use different ID (so that switching tabs in that group will not also switch tabs in the first group, causing the page to scroll.)
- Minor formatting edits to bring in line with the documentation standards.

* [FEATURE] Implementation of auto-complete for DataAssistant class names in Jupyter notebooks (#5077)

* [FEATURE] Provide display ("friendly") names for batch identifiers (#5086)

* [MAINTENANCE] Adding a unit test for batch_id mapping to batch display names (#5087)

* [DOCS] Update process and configuration examples in email Action guide. (#5036)

* - Update configurations and instructions to reflect Checkpoint process, not Validation Operator.
- Minor formatting edits to bring in line with the documentation standards.

* - removed comment so example is consistent with other docs.

* [DOCS] Update Docusaurus version (#5063)

* - Updates docusaurus version and dependencies.
- Updates docusaurus configuration to new version's standard.
- Removes unused terminology plugin

* - Corrects invalid tab default values that were preventing Docusaurus build.

* - testing theme updates

* - resolves non-deterministic route warning for contributing/contributing by changing the file name and setting the url by id value.
- updated relative paths to contributing/contributing.md to point to the renamed contributing/contributing_overview.md
- updates sidebars.js to point to the id path of contributing/contributing_overview.md -- and also set that page to display as the category page for the contributing category, rather than a nested overview page.
- removes override in :root of custom.css that was breaking the image for the scroll to top button.
- minor cleanup of docusaurus.config.js to make it more consistent with how the config is laid out in the Docusaurus codebase for the current version of Docusaurus.

* -replaces the `.png` version of the logo (which was not resizing smoothly after the update) with a `.svg` version.

* -reverted sidebars.js to point to a contributing overview page rather than linking the page to the category (having only a single category causes an odd visual since linked categories have a larger drop down arrow).  All categories with overview pages will have those pages linked to the category itself in a future PR.

Co-authored-by: William Shin <will@superconductive.com>

* [MAINTENANCE] `pypandoc` version constraint added (`< 1.8`) (#5093)

* pypandoc constraint added and note added

* more pandoc removal

* [BUGFIX] Patch broken Expectation gallery script (#5090)

* fix: patch broken gallery

* chore: add comment

* chore: trigger gallery build

* chore: change pipeline yaml again

* chore: update pipeline

* chore: update pipeline

* chore: revert changes to get to mergeable state

* [BUGFIX] Sampling support at asset level (#5092)

* [MAINTENANCE] Utilize Rule objects in Profiler construction in DataAssistant (#5089)

* [FEATURE] Onboarding DataAssistant -- Initial Rule Implementations (Data Aspects) (#5101)

* [MAINTENANCE] Turn off metric calculation progress bars in `RuleBasedProfiler` and `DataAssistant` workflows (#5080)

* feat: add property on Validator to configure progress bars

* chore: patch deprecation warning

* [MAINTENANCE] A small refactor of ParamerBuilder management used in DataAssistant classes (#5102)

* [MAINTENANCE] Convenience method refactor for Onboarding DataAssistant (#5103)

* [FEATURE] OnboardingDataAssistant: Implement Nullity/Non-nullity Rules and Associated Metrics (#5104)

* release prep 0.15.5 (#5105)

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: andyjessen <62343929+andyjessen@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>
Co-authored-by: Gonzalo Villafañe Tapia <gvillafanetapia@gmail.com>
Co-authored-by: talagluck <tal@superconductive.com>
Co-authored-by: Maciej Obuchowski <obuchowski.maciej@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants