[MAINTENANCE] Sync `main` and `develop` branches #5060

cdkini · 2022-05-06T13:49:18Z

Sync main and develop after branches diverged as part of the v0.15.4 release.

…ailure (#4921) * chore: update pipeliens * chore: remove scope check from pipeline

* Add check that requirements is a list, but don't crash if it's not * Make requirements for icd_ten_category expectation a list

…ectations (#4906) * Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests * Remove accepting 'return_only_gallery_example's arg from run_diagnostics method * Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list * Use phrase 'Has a valid library_metadata object' * Update ExpectationTestDiagnostics to have include_in_gallery * Update _get_metric_list to accept expectation_config instead of executed_test_cases * Update ExpectationTestDiagnostics to include validation_result and error_diagnostics * Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase * Reformat with black * Update run_diagnostics to determine maturity level based on checks passed * Update evaluate_json_test_cfe to accept raise_exception and return a tuple * Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe * Add backend_test_result_counts to ExpectationDiagnostics and use in helpers * Reformat with black * Remove unused imports (flake8) * Fix fix tests * Update asserts at end of creating_custom_expectations/expect_xxx.py * Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem * test setup * fixes diagnostics for multi-table expectations * wrap tmp_dir -> abspath in func * apply to test_expectations/test_expectations_cfe * docstring Co-authored-by: Ken Wade <ken@superconductive.com> Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

…ion and from Builder Constructor Arguments (#4927)

* Comments on altair documentation * Predicate BinaryExpression type hint * Make default theme and enum as well * Pass custom config to altair * Bugfix using nested_update * Add tests that test notebook execution * Add failing test * Move opacity into theme, rename variable * Vanquish tooltip and point_color_condition parameters

* [FEATURE] code for new checksum expectation * [FEATURE] code for new checksum expectation * initial code for checksum expectation * linting & library_metadata updates Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com> Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> Co-authored-by: Austin Robinson <austin@superconductive.com>

…ersion < 1.4 (#4660) * Update helper to add explicit alias to subqueries for SQLA version < 1.4 Implicit conversion of a nested select into a subquery failed when running on SQLA 1.3 against Postgres - update the existing helper to also handle older supported versions of SQLA. * Update util.py Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

…ipelines

…n objects into flexible ExpectationSuite containers (#4943)

…4947)

Updating some language in Slack Guidelines

…event false positive build failures

…ortionOfUniqueValuesToBeBetween` (#4929) * feat: init commit * chore: misc changes per convo with Alex * feat: finish initial impl * feat: finish impl after convo with Alex * chore: update after review

… `VolumeDataAssistant` (#4930) * feat: init commit * feat: continue chugging along * feat: get both types of charts to work * chore: only update relevant kwargs in df * feat: add subtitle support * feat: create predicate helper func * chore: update type hint * chore: bold subtitle * chore: work on cleaning up vconcat * feat: continue impl * feat: get both prescriptive and descriptive working * chore: delete unnecessary import * refactor: further cleanup * chore: shrink charts some more * refactor: rename private method * chore: add docstrings * feat: add include/exclude column names lists * fix: correct method calls * fix: fix assertion around include/exclude columns * chore: update styling of charts * chore: misc changes per Nathan review

#4960) * feat: start impl * chore: finishing touches * fix: remedy typo in test * feat: update test * chore: revert changes in utils * chore: add comment

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

…from DataContext by registered name (#4966)

…P 484) (#4969) * feat: run script to type annotate * chore: update threshold

* chore: first pass * chore: more updates * chore: more annotations * chore: more annotations

…f Metric ParameterBuilder Classes (#5008)

*[FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

…lotResult` wrapper dataclass (#4962) * feat: init commit * test: write all tests (except 1) * test: write last test * test: remove theme test * refactor: add custom Chart dataclass * chore: make Chart immutable * chore: add docstring * feat: misc updates per team convo * chore: delete comments * refactor: delete vconcat methods and consolidate helpers * refactor: further consolidation through helpers * chore: update subtitle styling * chore: add padding

…ive plot (#5002) * Clone vertically concatenated chart for interactive starting point * Working interactive chart * Update ColorPalette names * Update ordinal palette * Tooltip now working * Change legend color * Add y-axis titles * Align y-axis vertically for both charts * Add highlight line * Change batch_id to Batch ID * Improve legend title and tooltip titles * New layer for starting point showing one line * Detail title updating appropriately * Column name shows up with empty selecdtion * Use variable for alt.value(light_gray) * Allow selection by mouseover on lines * Anomaly encoded lines * Move column seledtor to top left * Format input_dropdown name * Working with expectation kwargs * Add predicate logic for strict_min and strict_max * Add subtitle to prescriptive return charts * WIP * Overcame merge conflicts in descriptive mode * Overcame merge conflicts in prescriptive mode * Column charts are in their own list index

…ues chart in VolumeDataAssistantResult (#5017) * Correct type hints * Improve tooltips * Improve docstrings * Fix return object indexing * Return list length 1 instead of chart

…ses (#5023)

* clean up SQL statements for handling subqueries properly * use formal sqlalchemy for reflection * TRINO WIP POC-COMPLETED * Add trino package as a dependency and update imports * Add docker-compose.yml for starburst database in assets/docker/ * Update _create_trino_engine to accept a hostname and schema_name * Update get_dataset to have a block for trino * Add ability to use data_alt in test_definitions JSON files (for Trino quirks) * Minor update to get_test_validator_with_data to make debugging easier * Add trino to sqla_keys dict in setup.py * Update 3 test_definition json files with trino things * Add table_selectable workaround for trino * Add requirements-dev-trino to test_packaging.py * Add trino to various azure-pipelines yml files * Skip test_expectation__get_renderers * Skip test__get_test_results * Skip test__generate_expectation_tests__with_no_test_backends Co-authored-by: Alex Sherstinsky <alex@superconductive.com> Co-authored-by: James Campbell <james.p.campbell@gmail.com>

…f `develop` and `main` (#5042) * [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531) * [BUGFIX] Moves testing dependencies out of core reqs (#4522) * removing upper bound on mistune * remove deprecated depedencies * adds untracked dependency * adds untracked dependency * adds untracked dependency * moving dependencies * removes dependencies added to lite from core | adds missing dependencies Co-authored-by: Chetan Kini <chetan@superconductive.com> * [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547) * [MAINTENANCE] Don't return from validate configuration methods (#4545) * Add validate_configuration to 2 core Expectations that are passing all their tests * Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests * Update all validate_configuration methods to have type hints and return None * Update all doc snippet references that were effected * [DOCS] technical term tags connect to data cloud docs (#4414) * - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.) - Some additional editing was done to bring documents in line with the documentation and how-to guide standards. * - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop. * - Update to include technical term tags. (#4462) - Minor updates to correct formatting and spelling issues. * - Moved docs related to contributing integrations under contributing in the ToC (#4551) - Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term). * - Adds new image files for the intro page (#4540) - Updates the image file link for the overview image on the intro page * [DOCS] clarifications on execution engines and scalability (#4539) * - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines. * - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas. * - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines. * [DOCS] technical terms for validate data advanced (#4535) * - add support for technical term tags. * - added technical term tags. - Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming. - NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR. * - added technical term tags. - Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming. - NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR. * [DOCS] technical terms for validate data actions docs (#4518) * - Edits to bring docs up to documentation and how-to guide standards. * - add technical term tags to documents. - minor formatting edits (technical terms missing capitalization, etc). * [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553) * [MAINTENANCE] Refactor global `conftest` (#4534) * chore: use black directives to temporarily disable linting * chore: more black directives to temporarily disable linting * chore: finish remaining * refactor: start cleaning up conftest * refactor: more refactoring of conftest * refactor: even more refactoring of conftest * [FEATURE] Improve diagnostic checklist details (#4548) * Update library_metadata check to provide details when it doesn't pass * In linting check, if snake_case doesn't match filename, show computed snake_case * Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr * Update convert_to_json_serializable to handle bytes * Update build_gallery.py script to convert diagnostics to JSON in separate try/except * Update build_gallery.py script to write expectation_library_v2.json file with indenting * Update _check_input_validation to tell if custom assert statements are used in validate_configuration * clean up (#4554) * minor touch up (#4558) * [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485) * feat: init commit * refactor: shift all logic over to base class * feat: start impl of anonymize on Anonymizer * feat: get ProfilerRunAnonymizer working * refactor: remove constructor from ProfilerRunAnonymizer * refactor: start on CheckpointRunAnonymizer * fix: clean up broken checkpoint tests * fix: ensure *args and **kwargs are propogated through * refactor: start work on datasource anonymizers * refactor: remove all anonymizers except Anonymizer from usage stats attrs * fix: update isinstance checks * refactor: move helper into checkpoint_run_anonymizer * refactor: move helper into datasource_anonymizer * refactor: make anonymize string private and place in strategy * refactor: make anonymize batch info private and place in strategy * refactor: move build_init_payload to Anonymizer * refactor: make remainder of anonymize methods private * refactor: add store info to strategy * refactor: add dataconnector info to strategy * refactor: consolidate profiler info and profiler run anonymization * refactor: remove *args from signatures * refactor: updates around checkpoint anonymization * chore: misc cleanup of Anonymizer * feat: final touch up before review * chore: remove 'else' statements * fix: ensure appropriate checkpoint method gets called * chore: misc updates from review * refactor: move init_payload back to usage stats * chore: misc type hinting * refactor: start using individual classes again * chore: continue updating individual anonymizer classes * feat: further updates to child classes * feat: update anonymize_init_payload * fix: get checkpoint payloads working * refactor: ensure all methods have obj * fix: misc fixes * fix: make misc updates to conditional checks for obj * refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer * refactor: rename Checkpoint and Profiler anonymizers * feat: leverage aggregate anonymizer downstream * feature: conditionally create aggregate_anonymizer in constructor * feat: add cache retrieve or instantiate util * chore: add batch_request can_handle * feat: ensure that salt has a default value in anonymizers * refactor: require aggregate anonymizer in constructor * refactor: instantiate all strategies in aggregate * fix: fix broken tests * refactor: rename internal getter Co-authored-by: Don Heppner <donald.heppner@gmail.com> * [MAINTENANCE] Remove duplicate mistune dependency * [MAINTENANCE] Run PEP-273 checks on a schedule or release cut * [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573) * -Corrected the line references and added <snippet> tags to source code for Spark version of guide. * -Corrected the line references and added <snippet> tags to source code for Pandas version of guide. * -lint reformat w/black * -correcting line numbers after lint formatting. * [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546) Usage stats instrumentation of package dependencies * [MAINTENANCE] Add DevRel team to GitHub auto-label action * [MAINTENANCE] Add GitHub action to conditionally auto-update PR's (#4574) * feat: add new action * chore: add conditions * [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577) * chore: bump version * chore: test change * chore: update all instances of black * chore: new test changes * chore: revert test changes * Update overview.md (#4556) * Add missing links. * Fix some typos * Simplify flow and grammar in a few places Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> * - corrected broken link in admonition box. (#4585) - updated links in admonition box to point to current technical documentation rather than old core concepts documents. * [MAINTENANCE] Minor clean-up (#4571) Little bit of cleanup in our execution engine and validator * [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590) * fix : misconfigured ExpectationConfigurationBuilder * pushing fix * clean up before submitting for review * bugfix : remove sorting * remove extra line * [MAINTENANCE] Instrument package dependencies (#4583) * Add dependencies to data_context.__init__ event * [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599) * release candidate for 0.14.13 * revert to 0.14.12 state * [RELEASE] 0.15.3 (#4981) * ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918) * [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921) * chore: update pipeliens * chore: remove scope check from pipeline * [BUGFIX] check contrib requirements (#4922) * Add check that requirements is a list, but don't crash if it's not * Make requirements for icd_ten_category expectation a list * [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906) * Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests * Remove accepting 'return_only_gallery_example's arg from run_diagnostics method * Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list * Use phrase 'Has a valid library_metadata object' * Update ExpectationTestDiagnostics to have include_in_gallery * Update _get_metric_list to accept expectation_config instead of executed_test_cases * Update ExpectationTestDiagnostics to include validation_result and error_diagnostics * Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase * Reformat with black * Update run_diagnostics to determine maturity level based on checks passed * Update evaluate_json_test_cfe to accept raise_exception and return a tuple * Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe * Add backend_test_result_counts to ExpectationDiagnostics and use in helpers * Reformat with black * Remove unused imports (flake8) * Fix fix tests * Update asserts at end of creating_custom_expectations/expect_xxx.py * Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem * test setup * fixes diagnostics for multi-table expectations * wrap tmp_dir -> abspath in func * apply to test_expectations/test_expectations_cfe * docstring Co-authored-by: Ken Wade <ken@superconductive.com> Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com> * [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927) * [BUGFIX] Add missing events to schema (#4917) * [MAINTENANCE] Improve Altair plotting extensibility (#4923) * Comments on altair documentation * Predicate BinaryExpression type hint * Make default theme and enum as well * Pass custom config to altair * Bugfix using nested_update * Add tests that test notebook execution * Add failing test * Move opacity into theme, rename variable * Vanquish tooltip and point_color_condition parameters * [FEATURE] new checksum expectation (#4657) * [FEATURE] code for new checksum expectation * [FEATURE] code for new checksum expectation * initial code for checksum expectation * linting & library_metadata updates Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com> Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> Co-authored-by: Austin Robinson <austin@superconductive.com> * [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660) * Update helper to add explicit alias to subqueries for SQLA version < 1.4 Implicit conversion of a nested select into a subquery failed when running on SQLA 1.3 against Postgres - update the existing helper to also handle older supported versions of SQLA. * Update util.py Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> * [BUGFIX] Fix clickhouse same-alias issue (#4389) * Fix broken link for checklist (#4932) * [MAINTENANCE] Remove DataContext from DataAssistant (#4931) * [MAINTENANCE] Add condition for custom checks in great_expectations pipelines * Move general data splitting tasks to abstract base class (#4942) * [MAINTENANCE] Add test to check for missing usage events (#4933) * [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943) * [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947) * Update SLACK_GUIDELINES.md Updating some language in Slack Guidelines * added how to ask a question link * cleanup (#4949) * [MAINTENANCE] Rearrange modules for better reusability (#4955) * [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures * [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929) * feat: init commit * chore: misc changes per convo with Alex * feat: finish initial impl * feat: finish impl after convo with Alex * chore: update after review * clean up (#4959) * [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930) * feat: init commit * feat: continue chugging along * feat: get both types of charts to work * chore: only update relevant kwargs in df * feat: add subtitle support * feat: create predicate helper func * chore: update type hint * chore: bold subtitle * chore: work on cleaning up vconcat * feat: continue impl * feat: get both prescriptive and descriptive working * chore: delete unnecessary import * refactor: further cleanup * chore: shrink charts some more * refactor: rename private method * chore: add docstrings * feat: add include/exclude column names lists * fix: correct method calls * fix: fix assertion around include/exclude columns * chore: update styling of charts * chore: misc changes per Nathan review * [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960) * feat: start impl * chore: finishing touches * fix: remedy typo in test * feat: update test * chore: revert changes in utils * chore: add comment * pin cryptography package (#4963) pin cryptography package (#4963) * [FEATURE] BigQuery Temp Table Support (#4925) * [FEATURE] BigQuery Temp Table Support (#4925) * [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966) * [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969) * feat: run script to type annotate * chore: update threshold * Enable RuleBasedProfiler components to be serializable. (#4972) * [BUGFIX] extras_require (#4968) * Remove azure from requirements-dev-sqlalchemy.txt * Update get_extras_require func to strip comments and include sqlalchemy for some keys * [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946) * Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1 * Add failing tests for corner cases * Passing tests for 0 and 1 false_positive_rate * Add tests for very small false_positive_rates * Return type is already validated as float * Use custom ProfilerExecutionError rather than ValueError * Use 1-NP_EPSILON as an upper bound * Pass variables to quentin fixture to set random seed * Bugfix setting wrong parameter * Set object attribute as well * Unable to access the actual false_positive_rate used as it is private * Use floats instead of ints * Update type hints * [MAINTENANCE] fix a typo (#4974) * [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958) * feat: start impl * test: start writing alice test * feat: misc updates per discussion with Alex * test: update test regexes * feat: update other expectation * chore: update fixtures * chore: type hint * [BUGFIX] Fix broken packaging test and update dgtest-overrides * [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975) * [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat * release prep Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com> Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com> Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> Co-authored-by: Ken Wade <ken@superconductive.com> Co-authored-by: Anthony Burdi <anthony@superconductive.com> Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com> Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com> Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com> Co-authored-by: Austin Robinson <austin@superconductive.com> Co-authored-by: Douglas Cook <dugup@hotmail.co.uk> Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com> Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com> Co-authored-by: William Shin <will@superconductive.com> * chore: revert azure pipeline * chore: revert more files Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com> Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com> Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com> Co-authored-by: Don Heppner <donald.heppner@gmail.com> Co-authored-by: Anthony Burdi <anthony@superconductive.com> Co-authored-by: Abe Gong <abegong@users.noreply.github.com> Co-authored-by: William Shin <will@superconductive.com> Co-authored-by: Ben Horkley <horkley@superconductive.com> Co-authored-by: Allen Sallinger <allen@superconductive.com> Co-authored-by: Ken Wade <ken@superconductive.com> Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com> Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com> Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com> Co-authored-by: Austin Robinson <austin@superconductive.com> Co-authored-by: Douglas Cook <dugup@hotmail.co.uk> Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com> Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>

…for Metrics Parameter Builders (#5057)

…tions into maintenance/sync-main-and-develop

netlify · 2022-05-06T13:49:23Z

✅ Deploy Preview for niobium-lead-7998 ready!

Name	Link
🔨 Latest commit	`ea5fb5e`
🔍 Latest deploy log	https://app.netlify.com/sites/niobium-lead-7998/deploys/62752761154ead00084a14c0
😎 Deploy Preview	https://deploy-preview-5060--niobium-lead-7998.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

alexsherstinsky

LGTM

* [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531) * [BUGFIX] Moves testing dependencies out of core reqs (#4522) * removing upper bound on mistune * remove deprecated depedencies * adds untracked dependency * adds untracked dependency * adds untracked dependency * moving dependencies * removes dependencies added to lite from core | adds missing dependencies Co-authored-by: Chetan Kini <chetan@superconductive.com> * [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547) * [MAINTENANCE] Don't return from validate configuration methods (#4545) * Add validate_configuration to 2 core Expectations that are passing all their tests * Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests * Update all validate_configuration methods to have type hints and return None * Update all doc snippet references that were effected * [DOCS] technical term tags connect to data cloud docs (#4414) * - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.) - Some additional editing was done to bring documents in line with the documentation and how-to guide standards. * - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop. * - Update to include technical term tags. (#4462) - Minor updates to correct formatting and spelling issues. * - Moved docs related to contributing integrations under contributing in the ToC (#4551) - Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term). * - Adds new image files for the intro page (#4540) - Updates the image file link for the overview image on the intro page * [DOCS] clarifications on execution engines and scalability (#4539) * - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines. * - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas. * - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines. * [DOCS] technical terms for validate data advanced (#4535) * - add support for technical term tags. * - added technical term tags. - Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming. - NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR. * - added technical term tags. - Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming. - NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR. * [DOCS] technical terms for validate data actions docs (#4518) * - Edits to bring docs up to documentation and how-to guide standards. * - add technical term tags to documents. - minor formatting edits (technical terms missing capitalization, etc). * [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553) * [MAINTENANCE] Refactor global `conftest` (#4534) * chore: use black directives to temporarily disable linting * chore: more black directives to temporarily disable linting * chore: finish remaining * refactor: start cleaning up conftest * refactor: more refactoring of conftest * refactor: even more refactoring of conftest * [FEATURE] Improve diagnostic checklist details (#4548) * Update library_metadata check to provide details when it doesn't pass * In linting check, if snake_case doesn't match filename, show computed snake_case * Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr * Update convert_to_json_serializable to handle bytes * Update build_gallery.py script to convert diagnostics to JSON in separate try/except * Update build_gallery.py script to write expectation_library_v2.json file with indenting * Update _check_input_validation to tell if custom assert statements are used in validate_configuration * clean up (#4554) * minor touch up (#4558) * [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485) * feat: init commit * refactor: shift all logic over to base class * feat: start impl of anonymize on Anonymizer * feat: get ProfilerRunAnonymizer working * refactor: remove constructor from ProfilerRunAnonymizer * refactor: start on CheckpointRunAnonymizer * fix: clean up broken checkpoint tests * fix: ensure *args and **kwargs are propogated through * refactor: start work on datasource anonymizers * refactor: remove all anonymizers except Anonymizer from usage stats attrs * fix: update isinstance checks * refactor: move helper into checkpoint_run_anonymizer * refactor: move helper into datasource_anonymizer * refactor: make anonymize string private and place in strategy * refactor: make anonymize batch info private and place in strategy * refactor: move build_init_payload to Anonymizer * refactor: make remainder of anonymize methods private * refactor: add store info to strategy * refactor: add dataconnector info to strategy * refactor: consolidate profiler info and profiler run anonymization * refactor: remove *args from signatures * refactor: updates around checkpoint anonymization * chore: misc cleanup of Anonymizer * feat: final touch up before review * chore: remove 'else' statements * fix: ensure appropriate checkpoint method gets called * chore: misc updates from review * refactor: move init_payload back to usage stats * chore: misc type hinting * refactor: start using individual classes again * chore: continue updating individual anonymizer classes * feat: further updates to child classes * feat: update anonymize_init_payload * fix: get checkpoint payloads working * refactor: ensure all methods have obj * fix: misc fixes * fix: make misc updates to conditional checks for obj * refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer * refactor: rename Checkpoint and Profiler anonymizers * feat: leverage aggregate anonymizer downstream * feature: conditionally create aggregate_anonymizer in constructor * feat: add cache retrieve or instantiate util * chore: add batch_request can_handle * feat: ensure that salt has a default value in anonymizers * refactor: require aggregate anonymizer in constructor * refactor: instantiate all strategies in aggregate * fix: fix broken tests * refactor: rename internal getter Co-authored-by: Don Heppner <donald.heppner@gmail.com> * [MAINTENANCE] Remove duplicate mistune dependency * [MAINTENANCE] Run PEP-273 checks on a schedule or release cut * [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573) * -Corrected the line references and added <snippet> tags to source code for Spark version of guide. * -Corrected the line references and added <snippet> tags to source code for Pandas version of guide. * -lint reformat w/black * -correcting line numbers after lint formatting. * [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546) Usage stats instrumentation of package dependencies * [MAINTENANCE] Add DevRel team to GitHub auto-label action * [MAINTENANCE] Add GitHub action to conditionally auto-update PR's (#4574) * feat: add new action * chore: add conditions * [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577) * chore: bump version * chore: test change * chore: update all instances of black * chore: new test changes * chore: revert test changes * Update overview.md (#4556) * Add missing links. * Fix some typos * Simplify flow and grammar in a few places Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> * - corrected broken link in admonition box. (#4585) - updated links in admonition box to point to current technical documentation rather than old core concepts documents. * [MAINTENANCE] Minor clean-up (#4571) Little bit of cleanup in our execution engine and validator * [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590) * fix : misconfigured ExpectationConfigurationBuilder * pushing fix * clean up before submitting for review * bugfix : remove sorting * remove extra line * [MAINTENANCE] Instrument package dependencies (#4583) * Add dependencies to data_context.__init__ event * [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599) * release candidate for 0.14.13 * revert to 0.14.12 state * [RELEASE] 0.15.3 (#4981) * ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918) * [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921) * chore: update pipeliens * chore: remove scope check from pipeline * [BUGFIX] check contrib requirements (#4922) * Add check that requirements is a list, but don't crash if it's not * Make requirements for icd_ten_category expectation a list * [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906) * Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests * Remove accepting 'return_only_gallery_example's arg from run_diagnostics method * Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list * Use phrase 'Has a valid library_metadata object' * Update ExpectationTestDiagnostics to have include_in_gallery * Update _get_metric_list to accept expectation_config instead of executed_test_cases * Update ExpectationTestDiagnostics to include validation_result and error_diagnostics * Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase * Reformat with black * Update run_diagnostics to determine maturity level based on checks passed * Update evaluate_json_test_cfe to accept raise_exception and return a tuple * Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe * Add backend_test_result_counts to ExpectationDiagnostics and use in helpers * Reformat with black * Remove unused imports (flake8) * Fix fix tests * Update asserts at end of creating_custom_expectations/expect_xxx.py * Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem * test setup * fixes diagnostics for multi-table expectations * wrap tmp_dir -> abspath in func * apply to test_expectations/test_expectations_cfe * docstring Co-authored-by: Ken Wade <ken@superconductive.com> Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com> * [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927) * [BUGFIX] Add missing events to schema (#4917) * [MAINTENANCE] Improve Altair plotting extensibility (#4923) * Comments on altair documentation * Predicate BinaryExpression type hint * Make default theme and enum as well * Pass custom config to altair * Bugfix using nested_update * Add tests that test notebook execution * Add failing test * Move opacity into theme, rename variable * Vanquish tooltip and point_color_condition parameters * [FEATURE] new checksum expectation (#4657) * [FEATURE] code for new checksum expectation * [FEATURE] code for new checksum expectation * initial code for checksum expectation * linting & library_metadata updates Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com> Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> Co-authored-by: Austin Robinson <austin@superconductive.com> * [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660) * Update helper to add explicit alias to subqueries for SQLA version < 1.4 Implicit conversion of a nested select into a subquery failed when running on SQLA 1.3 against Postgres - update the existing helper to also handle older supported versions of SQLA. * Update util.py Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> * [BUGFIX] Fix clickhouse same-alias issue (#4389) * Fix broken link for checklist (#4932) * [MAINTENANCE] Remove DataContext from DataAssistant (#4931) * [MAINTENANCE] Add condition for custom checks in great_expectations pipelines * Move general data splitting tasks to abstract base class (#4942) * [MAINTENANCE] Add test to check for missing usage events (#4933) * [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943) * [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947) * Update SLACK_GUIDELINES.md Updating some language in Slack Guidelines * added how to ask a question link * cleanup (#4949) * [MAINTENANCE] Rearrange modules for better reusability (#4955) * [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures * [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929) * feat: init commit * chore: misc changes per convo with Alex * feat: finish initial impl * feat: finish impl after convo with Alex * chore: update after review * clean up (#4959) * [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930) * feat: init commit * feat: continue chugging along * feat: get both types of charts to work * chore: only update relevant kwargs in df * feat: add subtitle support * feat: create predicate helper func * chore: update type hint * chore: bold subtitle * chore: work on cleaning up vconcat * feat: continue impl * feat: get both prescriptive and descriptive working * chore: delete unnecessary import * refactor: further cleanup * chore: shrink charts some more * refactor: rename private method * chore: add docstrings * feat: add include/exclude column names lists * fix: correct method calls * fix: fix assertion around include/exclude columns * chore: update styling of charts * chore: misc changes per Nathan review * [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960) * feat: start impl * chore: finishing touches * fix: remedy typo in test * feat: update test * chore: revert changes in utils * chore: add comment * pin cryptography package (#4963) pin cryptography package (#4963) * [FEATURE] BigQuery Temp Table Support (#4925) * [FEATURE] BigQuery Temp Table Support (#4925) * [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966) * [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969) * feat: run script to type annotate * chore: update threshold * Enable RuleBasedProfiler components to be serializable. (#4972) * [BUGFIX] extras_require (#4968) * Remove azure from requirements-dev-sqlalchemy.txt * Update get_extras_require func to strip comments and include sqlalchemy for some keys * [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946) * Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1 * Add failing tests for corner cases * Passing tests for 0 and 1 false_positive_rate * Add tests for very small false_positive_rates * Return type is already validated as float * Use custom ProfilerExecutionError rather than ValueError * Use 1-NP_EPSILON as an upper bound * Pass variables to quentin fixture to set random seed * Bugfix setting wrong parameter * Set object attribute as well * Unable to access the actual false_positive_rate used as it is private * Use floats instead of ints * Update type hints * [MAINTENANCE] fix a typo (#4974) * [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958) * feat: start impl * test: start writing alice test * feat: misc updates per discussion with Alex * test: update test regexes * feat: update other expectation * chore: update fixtures * chore: type hint * [BUGFIX] Fix broken packaging test and update dgtest-overrides * [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975) * [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat * release prep Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com> Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com> Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> Co-authored-by: Ken Wade <ken@superconductive.com> Co-authored-by: Anthony Burdi <anthony@superconductive.com> Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com> Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com> Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com> Co-authored-by: Austin Robinson <austin@superconductive.com> Co-authored-by: Douglas Cook <dugup@hotmail.co.uk> Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com> Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com> Co-authored-by: William Shin <will@superconductive.com> * [RELEASE] 0.15.4 (#5051) * ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918) * [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921) * chore: update pipeliens * chore: remove scope check from pipeline * [BUGFIX] check contrib requirements (#4922) * Add check that requirements is a list, but don't crash if it's not * Make requirements for icd_ten_category expectation a list * [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906) * Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests * Remove accepting 'return_only_gallery_example's arg from run_diagnostics method * Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list * Use phrase 'Has a valid library_metadata object' * Update ExpectationTestDiagnostics to have include_in_gallery * Update _get_metric_list to accept expectation_config instead of executed_test_cases * Update ExpectationTestDiagnostics to include validation_result and error_diagnostics * Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase * Reformat with black * Update run_diagnostics to determine maturity level based on checks passed * Update evaluate_json_test_cfe to accept raise_exception and return a tuple * Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe * Add backend_test_result_counts to ExpectationDiagnostics and use in helpers * Reformat with black * Remove unused imports (flake8) * Fix fix tests * Update asserts at end of creating_custom_expectations/expect_xxx.py * Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem * test setup * fixes diagnostics for multi-table expectations * wrap tmp_dir -> abspath in func * apply to test_expectations/test_expectations_cfe * docstring Co-authored-by: Ken Wade <ken@superconductive.com> Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com> * [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927) * [BUGFIX] Add missing events to schema (#4917) * [MAINTENANCE] Improve Altair plotting extensibility (#4923) * Comments on altair documentation * Predicate BinaryExpression type hint * Make default theme and enum as well * Pass custom config to altair * Bugfix using nested_update * Add tests that test notebook execution * Add failing test * Move opacity into theme, rename variable * Vanquish tooltip and point_color_condition parameters * [FEATURE] new checksum expectation (#4657) * [FEATURE] code for new checksum expectation * [FEATURE] code for new checksum expectation * initial code for checksum expectation * linting & library_metadata updates Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com> Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> Co-authored-by: Austin Robinson <austin@superconductive.com> * [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660) * Update helper to add explicit alias to subqueries for SQLA version < 1.4 Implicit conversion of a nested select into a subquery failed when running on SQLA 1.3 against Postgres - update the existing helper to also handle older supported versions of SQLA. * Update util.py Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> * [BUGFIX] Fix clickhouse same-alias issue (#4389) * Fix broken link for checklist (#4932) * [MAINTENANCE] Remove DataContext from DataAssistant (#4931) * [MAINTENANCE] Add condition for custom checks in great_expectations pipelines * Move general data splitting tasks to abstract base class (#4942) * [MAINTENANCE] Add test to check for missing usage events (#4933) * [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943) * [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947) * Update SLACK_GUIDELINES.md Updating some language in Slack Guidelines * added how to ask a question link * cleanup (#4949) * [MAINTENANCE] Rearrange modules for better reusability (#4955) * [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures * [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929) * feat: init commit * chore: misc changes per convo with Alex * feat: finish initial impl * feat: finish impl after convo with Alex * chore: update after review * clean up (#4959) * [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930) * feat: init commit * feat: continue chugging along * feat: get both types of charts to work * chore: only update relevant kwargs in df * feat: add subtitle support * feat: create predicate helper func * chore: update type hint * chore: bold subtitle * chore: work on cleaning up vconcat * feat: continue impl * feat: get both prescriptive and descriptive working * chore: delete unnecessary import * refactor: further cleanup * chore: shrink charts some more * refactor: rename private method * chore: add docstrings * feat: add include/exclude column names lists * fix: correct method calls * fix: fix assertion around include/exclude columns * chore: update styling of charts * chore: misc changes per Nathan review * [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960) * feat: start impl * chore: finishing touches * fix: remedy typo in test * feat: update test * chore: revert changes in utils * chore: add comment * pin cryptography package (#4963) pin cryptography package (#4963) * [FEATURE] BigQuery Temp Table Support (#4925) * [FEATURE] BigQuery Temp Table Support (#4925) * [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966) * [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969) * feat: run script to type annotate * chore: update threshold * Enable RuleBasedProfiler components to be serializable. (#4972) * [BUGFIX] extras_require (#4968) * Remove azure from requirements-dev-sqlalchemy.txt * Update get_extras_require func to strip comments and include sqlalchemy for some keys * [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946) * Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1 * Add failing tests for corner cases * Passing tests for 0 and 1 false_positive_rate * Add tests for very small false_positive_rates * Return type is already validated as float * Use custom ProfilerExecutionError rather than ValueError * Use 1-NP_EPSILON as an upper bound * Pass variables to quentin fixture to set random seed * Bugfix setting wrong parameter * Set object attribute as well * Unable to access the actual false_positive_rate used as it is private * Use floats instead of ints * Update type hints * [MAINTENANCE] fix a typo (#4974) * [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958) * feat: start impl * test: start writing alice test * feat: misc updates per discussion with Alex * test: update test regexes * feat: update other expectation * chore: update fixtures * chore: type hint * [BUGFIX] Fix broken packaging test and update dgtest-overrides * [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975) * [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat * release prep (#4980) * [FEATURE] Splitting data assets into batches using timestamp columns in spark (#4973) * [BUGFIX] Use `monkeypatch` to ensure consistent bootstrap seed for additional probabilistic test (#4983) * feat: start impl * chore: finishing touches * fix: remedy typo in test * feat: update test * chore: revert changes in utils * chore: add comment * fix: use monkeypatch on test * [FEATURE] Splitting data assets into batches using datetime columns in pandas (#4982) * [BUGFIX] Patch the remainder of probabilistic `RuleBasedProfiler` tests with consistent bootstrap seed (#4989) * feat: start impl * chore: finishing touches * fix: remedy typo in test * feat: update test * chore: revert changes in utils * chore: add comment * fix: use monkeypatch on test * fix: patch remaining tests * chore: add docstring * [FEATURE] Provide Semantic Type Domain Interpretation Utility For Use Within ParameterBuilder Classes (#4993) * [MAINTENANCE] Splitter cleanup and enhancements (#4984) * Update action.md (#4967) Update action.md (#4967) * [FEATURE] Add support for Interpolation Method for Quantile Statistic Used by Estimators in NumericMetricRangeMultiBatchParameterBuilder (#4997) * [FEATURE] Enable self-initializing `ExpectColumnMeanToBeBetween` (#4986) * feat: init commit * test: write integration test * chore: add sigfigs * feat: add interpolation field * [FEATURE] Enable self-initializing `ExpectColumnMedianToBeBetween` (#4987) * feat: init commit * test: write integration test * chore: add sigfigs * feat: add interpolation field * chore: update GH action (#5001) * [FEATURE] Enable self-initializing `ExpectColumnSumToBeBetween` (#4988) * feat: init commit * test: write integration test * feat: add interpolation field * [MAINTENANCE] Move `DataAssistant` registry capabilities into `DataAssistantRegistry` to enable user aliasing (#4991) * refactor: move registry dict to dispatcher * chore: misc cleanup * chore: misc updates after review * chore: misc cleanup * chore: update error message * Fix continuous partition example (#4939) When calling json.dumps() method, the weights change. Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> * [BUGFIX] RBP Profiling Dataset ProgressBar Fix #4999 * [MAINTENANCE] Preliminary refactors for data samplers. (#4996) * MetricSingleBatchParameterBuilder with unit and integration tests. (#5003) * [DOCS] Update slack notification guide to not use validation operators. (#4978) * - Removed references to validation_operator - Edited config yaml examples - Grouped config options into tab groups for webhook vs app. * - Added technical term tag to Checkpoint reference. - Minor edit to document format. * corrects number in step header * [MAINTENANCE] Clean up unused imports and enforce through `flake8` in CI/CD (#5005) * maintenance: clean up codebase * chore: add to pipelines * fix: ensure that flake8 is installed first * chore: rename CI/CD stage * chore: update type hint threshold * parameter builder tests should utilize polymorphism (#5007) * [MAINTENANCE] Clean up type hints in CLI (#5006) * chore: first pass * chore: more updates * chore: more annotations * chore: more annotations * [FEATURE] Enable Pandas DataFrame and Series as MetricValues Output of Metric ParameterBuilder Classes (#5008) * logging and exception handling (#5009) * [FEATURE] Notebook for `VolumeDataAssistant` Example (#5010) *[FEATURE] Notebook for `VolumeDataAssistant` Example (#5010) * [FEATURE] Histogram/Partition Single-Batch ParameterBuilder (#5011) * [FEATURE] Update `DataAssistantResult.plot()` return value to emit `PlotResult` wrapper dataclass (#4962) * feat: init commit * test: write all tests (except 1) * test: write last test * test: remove theme test * refactor: add custom Chart dataclass * chore: make Chart immutable * chore: add docstring * feat: misc updates per team convo * chore: delete comments * refactor: delete vconcat methods and consolidate helpers * refactor: further consolidation through helpers * chore: update subtitle styling * chore: add padding * [ENHANCEMENT] Condense column-level `vconcat` plots into one interactive plot (#5002) * Clone vertically concatenated chart for interactive starting point * Working interactive chart * Update ColorPalette names * Update ordinal palette * Tooltip now working * Change legend color * Add y-axis titles * Align y-axis vertically for both charts * Add highlight line * Change batch_id to Batch ID * Improve legend title and tooltip titles * New layer for starting point showing one line * Detail title updating appropriately * Column name shows up with empty selecdtion * Use variable for alt.value(light_gray) * Allow selection by mouseover on lines * Anomaly encoded lines * Move column seledtor to top left * Format input_dropdown name * Working with expectation kwargs * Add predicate logic for strict_min and strict_max * Add subtitle to prescriptive return charts * WIP * Overcame merge conflicts in descriptive mode * Overcame merge conflicts in prescriptive mode * Column charts are in their own list index * [MAINTENANCE] Update version of black in pre-commit config * [MAINTENANCE] Improve tooltips and formatting for distinct column values chart in VolumeDataAssistantResult (#5017) * Correct type hints * Improve tooltips * Improve docstrings * Fix return object indexing * Return list length 1 instead of chart * [FEATURE] Limit samplers work with supported sqlalchemy backends (#5014) * [BUGFIX] Fix DataAssistantResult serialization issue (#5020) * [MAINTENANCE] Enhance configuring serialization for DotDict type classes (#5023) * [FEATURE] trino support (#5021) * clean up SQL statements for handling subqueries properly * use formal sqlalchemy for reflection * TRINO WIP POC-COMPLETED * Add trino package as a dependency and update imports * Add docker-compose.yml for starburst database in assets/docker/ * Update _create_trino_engine to accept a hostname and schema_name * Update get_dataset to have a block for trino * Add ability to use data_alt in test_definitions JSON files (for Trino quirks) * Minor update to get_test_validator_with_data to make debugging easier * Add trino to sqla_keys dict in setup.py * Update 3 test_definition json files with trino things * Add table_selectable workaround for trino * Add requirements-dev-trino to test_packaging.py * Add trino to various azure-pipelines yml files * Skip test_expectation__get_renderers * Skip test__get_test_results * Skip test__generate_expectation_tests__with_no_test_backends Co-authored-by: Alex Sherstinsky <alex@superconductive.com> Co-authored-by: James Campbell <james.p.campbell@gmail.com> * Pyarrow upper bound (#5028) * release prep v0.15.4 (#5029) * [MAINTENANCE] Use temporary branch to attemp to align git histories of `develop` and `main` (#5042) * [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531) * [BUGFIX] Moves testing dependencies out of core reqs (#4522) * removing upper bound on mistune * remove deprecated depedencies * adds untracked dependency * adds untracked dependency * adds untracked dependency * moving dependencies * removes dependencies added to lite from core | adds missing dependencies Co-authored-by: Chetan Kini <chetan@superconductive.com> * [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547) * [MAINTENANCE] Don't return from validate configuration methods (#4545) * Add validate_configuration to 2 core Expectations that are passing all their tests * Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests * Update all validate_configuration methods to have type hints and return None * Update all doc snippet references that were effected * [DOCS] technical term tags connect to data cloud docs (#4414) * - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.) - Some additional editing was done to bring documents in line with the documentation and how-to guide standards. * - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop. * - Update to include technical term tags. (#4462) - Minor updates to correct formatting and spelling issues. * - Moved docs related to contributing integrations under contributing in the ToC (#4551) - Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term). * - Adds new image files for the intro page (#4540) - Updates the image file link for the overview image on the intro page * [DOCS] clarifications on execution engines and scalability (#4539) * - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines. * - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas. * - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines. * [DOCS] technical terms for validate data advanced (#4535) * - add support for technical term tags. * - added technical term tags. - Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming. - NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR. * - added technical term tags. - Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming. - NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR. * [DOCS] technical terms for validate data actions docs (#4518) * - Edits to bring docs up to documentation and how-to guide standards. * - add technical term tags to documents. - minor formatting edits (technical terms missing capitalization, etc). * [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553) * [MAINTENANCE] Refactor global `conftest` (#4534) * chore: use black directives to temporarily disable linting * chore: more black directives to temporarily disable linting * chore: finish remaining * refactor: start cleaning up conftest * refactor: more refactoring of conftest * refactor: even more refactoring of conftest * [FEATURE] Improve diagnostic checklist details (#4548) * Update library_metadata check to provide details when it doesn't pass * In linting check, if snake_case doesn't match filename, show computed snake_case * Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr * Update convert_to_json_serializable to handle bytes * Update build_gallery.py script to convert diagnostics to JSON in separate try/except * Update build_gallery.py script to write expectation_library_v2.json file with indenting * Update _check_input_validation to tell if custom assert statements are used in validate_configuration * clean up (#4554) * minor touch up (#4558) * [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485) * feat: init commit * refactor: shift all logic over to base class * feat: start impl of anonymize on Anonymizer * feat: get ProfilerRunAnonymizer working * refactor: remove constructor from ProfilerRunAnonymizer * refactor: start on CheckpointRunAnonymizer * fix: clean up broken checkpoint tests * fix: ensure *args and **kwargs are propogated through * refactor: start work on datasource anonymizers * refactor: remove all anonymizers except Anonymizer from usage stats attrs * fix: update isinstance checks * refactor: move helper into checkpoint_run_anonymizer * refactor: move helper into datasource_anonymizer * refactor: make anonymize string private and place in strategy * refactor: make anonymize batch info private and place in strategy * refactor: move build_init_payload to Anonymizer * refactor: make remainder of anonymize methods private * refactor: add store info to strategy * refactor: add dataconnector info to strategy * refactor: consolidate profiler info and profiler run anonymization * refactor: remove *args from signatures * refactor: updates around checkpoint anonymization * chore: misc cleanup of Anonymizer * feat: final touch up before review * chore: remove 'else' statements * fix: ensure appropriate checkpoint method gets called * chore: misc updates from review * refactor: move init_payload back to usage stats * chore: misc type hinting * refactor: start using individual classes again * chore: continue updating individual anonymizer classes * feat: further updates to child classes * feat: update anonymize_init_payload * fix: get checkpoint payloads working * refactor: ensure all methods have obj * fix: misc fixes * fix: make misc updates to conditional checks for obj * refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer * refactor: rename Checkpoint and Profiler anonymizers * feat: leverage aggregate anonymizer downstream * feature: conditionally create aggregate_anonymizer in constructor * feat: add cache retrieve or instantiate util * chore: add batch_request can_handle * feat: ensure that salt has a default value in anonymizers * refactor: require aggregate anonymizer in constructor * refactor: instantiate all strategies in aggregate * fix: fix broken tests * refactor: rename internal getter Co-authored-by: Don Heppner <donald.heppner@gmail.com> * [MAINTENANCE] Remove duplicate mistune dependency * [MAINTENANCE] Run PEP-273 checks on a schedule or release cut * [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573) * -Corrected the line references and added <snippet> tags to source code for Spark version of guide. * -Corrected the line references and added <snippet> tags to source code for Pandas version of guide. * -lint reformat w/black * -correcting line numbers after lint formatting. * [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546) Usage stats instrumentation of package dependencies * [MAINTENANCE] Add DevRel team to GitHub auto-label action * [MAINTENANCE] Add GitHub action to conditionally auto-update PR's (#4574) * feat: add new action * chore: add conditions * [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577) * chore: bump version * chore: test change * chore: update all instances of black * chore: new test changes * chore: revert test changes * Update overview.md (#4556) * Add missing links. * Fix some typos * Simplify flow and grammar in a few places Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> * - corrected broken link in admonition box. (#4585) - updated links in admonition box to point to current technical documentation rather than old core concepts documents. * [MAINTENANCE] Minor clean-up (#4571) Little bit of cleanup in our execution engine and validator * [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590) * fix : misconfigured ExpectationConfigurationBuilder * pushing fix * clean up before submitting for review * bugfix : remove sorting * remove extra line * [MAINTENANCE] Instrument package dependencies (#4583) * Add dependencies to data_context.__init__ event * [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599) * release candidate for 0.14.13 * revert to 0.14.12 state * [RELEASE] 0.15.3 (#4981) * ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918) * [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921) * chore: update pipeliens * chore: remove scope check from pipeline * [BUGFIX] check contrib requirements (#4922) * Add check that requirements is a list, but don't crash if it's not * Make requirements for icd_ten_category expectation a list * [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906) * Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests * Remove accepting 'return_only_gallery_example's arg from run_diagnostics method * Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list * Use phrase 'Has a valid library_metadata object' * Update ExpectationTestDiagnostics to have include_in_gallery * Update _get_metric_list to accept expectation_config instead of executed_test_cases * Update ExpectationTestDiagnostics to include validation_result and error_diagnostics * Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase * Reformat with black * Update run_diagnostics to determine maturity level based on checks passed * Update evaluate_json_test_cfe to accept raise_exception and return a tuple * Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe * Add backend_test_result_counts to ExpectationDiagnostics and use in helpers * Reformat with black * Remove unused imports (flake8) * Fix fix tests * Update asserts at end of creating_custom_expectations/expect_xxx.py * Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem * test setup * fixes diagnostics for multi-table expectations * wrap tmp_dir -> abspath in func * apply to test_expectations/test_expectations_cfe * docstring Co-authored-by: Ken Wade <ken@superconductive.com> Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com> * [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927) * [BUGFIX] Add missing events to schema (#4917) * [MAINTENANCE] Improve Altair plotting extensibility (#4923) * Comments on altair documentation * Predicate BinaryExpression type hint * Make default theme and enum as well * Pass custom config to altair * Bugfix using nested_update * Add tests that test notebook execution * Add failing test * Move opacity into theme, rename variable * Vanquish tooltip and point_color_condition parameters * [FEATURE] new checksum expectation (#4657) * [FEATURE] code for new checksum expectation * [FEATURE] code for new checksum expectation * initial code for checksum expectation * linting & library_metadata updates Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com> Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> Co-authored-by: Austin Robinson <austin@superconductive.com> * [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660) * Update helper to add explicit alias to subqueries for SQLA version < 1.4 Implicit conversion of a nested select into a subquery failed when running on SQLA 1.3 against Postgres - update the existing helper to also handle older supported versions of SQLA. * Update util.py Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> * [BUGFIX] Fix clickhouse same-alias issue (#4389) * Fix broken link for checklist (#4932) * [MAINTENANCE] Remove DataContext from DataAssistant (#4931) * [MAINTENANCE] Add condition for custom checks in great_expectations pipelines * Move general data splitting tasks to abstract base class (#4942) * [MAINTENANCE] Add test to check for missing usage events (#4933) * [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943) * [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947) * Update SLACK_GUIDELINES.md Updating some language in Slack Guidelines * added how to ask a question link * cleanup (#4949) * [MAINTENANCE] Rearrange modules for better reusability (#4955) * [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures * [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929) * feat: init commit * chore: misc changes per convo with Alex * feat: finish initial impl * feat: finish impl after convo with Alex * chore: update after review * clean up (#4959) * [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930) * feat: init commit * feat: continue chugging along * feat: get both types of charts to work * chore: only update relevant kwargs in df * feat: add subtitle support * feat: create predicate helper func * chore: update type hint * chore: bold subtitle * chore: work on cleaning up vconcat * feat: continue impl * feat: get both prescriptive and descriptive working * chore: delete unnecessary import * refactor: further cleanup * chore: shrink charts some more * refactor: rename private method * chore: add docstrings * feat: add include/exclude column names lists * fix: correct method calls * fix: fix assertion around include/exclude columns * chore: update styling of charts * chore: misc changes per Nathan review * [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960) * feat: start impl * chore: finishing touches * fix: remedy typo in test * feat: update test * chore: revert changes in utils * chore: add comment * pin cryptography package (#4963) pin cryptography package (#4963) * [FEATURE] BigQuery Temp Table Support (#4925) * [FEATURE] BigQuery Temp Table Support (#4925) * [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966) * [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969) * feat: run script to type annotate * chore: update threshold * Enable RuleBasedProfiler components to be serializable. (#4972) * [BUGFIX] extras_require (#4968) * Remove azure from requirements-dev-sqlalchemy.txt * Update get_extras_require func to strip comments and include sqlalchemy for some keys * [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946) * Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1 * Add failing tests for corner cases * Passing tests for 0 and 1 false_positive_rate * Add tests for very small false_positive_rates * Return type is already validated as float * Use custom ProfilerExecutionError rather than ValueError * Use 1-NP_EPSILON as an upper bound * Pass variables to quentin fixture to set random seed * Bugfix setting wrong parameter * Set object attribute as well * Unable to access the actual false_positive_rate used as it is private * Use floats instead of ints * Update type hints * [MAINTENANCE] fix a typo (#4974) * [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958) * feat: start impl * test: start writing alice test * feat: misc updates per discussion with Alex * test: update test regexes * feat: update other expectation * chore: update fixtures * chore: type hint * [BUGFIX] Fix broken packaging test and update dgtest-overrides * [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975) * [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat * release prep Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com> Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com> Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> Co-authored-by: Ken Wade <ken@superconductive.com> Co-authored-by: Anthony Burdi <anthony@superconductive.com> Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com> Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com> Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com> Co-authored-by: Austin Robinson <austin@superconductive.com> Co-authored-by: Douglas Cook <dugup@hotmail.co.uk> Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com> Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com> Co-authored-by: William Shin <will@superconductive.com> * chore: revert azure pipeline * chore: revert more files Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com> Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com> Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com> Co-authored-by: Don Heppner <donald.heppner@gmail.com> Co-authored-by: Anthony Burdi <anthony@superconductive.com> Co-authored-by: Abe Gong <abegong@users.noreply.github.com> Co-authored-by: William Shin <will@superconductive.com> Co-authored-by: Ben Horkley <horkley@superconductive.com> Co-authored-by: Allen Sallinger <allen@superconductive.com> Co-authored-by: Ken Wade <ken@superconductive.com> Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com> Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com> Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com> Co-authored-by: Austin Robinson <austin@superconductive.com> Co-authored-by: Douglas Cook <dugup@hotmail.co.uk> Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com> Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com> * [BUGFIX] Patch broken usage stats test around dependency tracking Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com> Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com> Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> Co-authored-by: Ken Wade <ken@superconductive.com> Co-authored-by: Anthony Burdi <anthony@superconductive.com> Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com> Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com> Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com> Co-authored-by: Austin Robinson <austin@superconductive.com> Co-authored-by: Douglas Cook <dugup@hotmail.co.uk> Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com> Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com> Co-authored-by: William Shin <will@superconductive.com> Co-authored-by: Abe Gong <abegong@users.noreply.github.com> Co-authored-by: andyjessen <62343929+andyjessen@users.noreply.github.com> Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com> Co-authored-by: Alex Sherstinsky <alex@superconductive.com> Co-authored-by: James Campbell <james.p.campbell@gmail.com> Co-authored-by: Don Heppner <donald.heppner@gmail.com> Co-authored-by: Ben Horkley <horkley@superconductive.com> Co-authored-by: Allen Sallinger <allen@superconductive.com> * [MAINTENANCE] Sync `main` and `develop` branches (#5060) * ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918) * [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921) * chore: update pipeliens * chore: remove scope check from pipeline * [BUGFIX] check contrib requirements (#4922) * Add check that requirements is a list, but don't crash if it's not * Make requirements for icd_ten_category expectation a list * [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906) * Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests * Remove accepting 'return_only_gallery_example's arg from run_diagnostics method * Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list * Use phrase 'Has a valid library_metadata object' * Update ExpectationTestDiagnostics to have include_in_gallery * Update _get_metric_list to accept expectation_config instead of executed_test_cases * Update ExpectationTestDiagnostics to include validation_result and error_diagnostics * Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase * Reformat with black * Update run_diagnostics to determine maturity level based on checks passed * Update evaluate_json_test_cfe to accept raise_exception and return a tuple * Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe * Add backend_test_result_counts to ExpectationDiagnostics and use in helpers * Reformat with black * Remove unused imports (flake8) * Fix fix tests * Update asserts at end of creating_custom_expectations/expect_xxx.py * Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem * test setup * fixes diagnostics for multi-table expectations * wrap tmp_dir -> abspath in func * apply to test_expectations/test_expectations_cfe * docstring Co-authored-by: Ken Wade <ken@superconductive.com> Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com> * [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927) * [BUGFIX] Add missing events to schema (#4917) * [MAINTENANCE] Improve Altair plotting extensibility (#4923) * Comments on altair documentation * Predicate BinaryExpression type hint * Make default theme and enum as well * Pass custom config to altair * Bugfix using nested_update * Add tests that test notebook execution * Add failing test * Move opacity into theme, rename variable * Vanquish tooltip and point_color_condition parameters * [FEATURE] new checksum expectation (#4657) * [FEATURE] code for new checksum expectation * [FEATURE] code for new checksum expectation * initial code for checksum expectation * linting & library_metadata updates Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com> Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> Co-authored-by: Austin Robinson <austin@superconductive.com> * [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660) * Update helper to add explicit alias to subqueries for SQLA version < 1.4 Implicit conversion of a nested select into a subquery failed when running on SQLA 1.3 against Postgres - update the existing helper to also handle older supported versions of SQLA. * Update util.py Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> * [BUGFIX] Fix clickhouse same-alias issue (#4389) * Fix broken link for checklist (#4932) * [MAINTENANCE] Remove DataContext from DataAssistant (#4931) * [MAINTENANCE] Add condition for custom checks in great_expectations pipelines * Move general data splitting tasks to abstract base class (#4942) * [MAINTENANCE] Add test to check for missing usage events (#4933) * [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943) * [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947) * Update SLACK_GUIDELINES.md Updating some language in Slack Guidelines * added how to ask a question link * cleanup (#4949) * [MAINTENANCE] Rearrange modules for better reusability (#4955) * [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures * [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929) * feat: init commit * chore: misc changes per convo with Alex * feat: finish initial impl * feat: finish impl after convo with Alex * chore: update after review * clean up (#4959) * [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930) * feat: init commit * feat: continue chugging along * feat: get both types of charts to work * chore: only update relevant kwargs in df * feat: add subtitle support * feat: create predicate helper func * chore: update type hint * chore: bold subtitle * chore: work on cleaning up vconcat * feat: continue impl * feat: get both prescriptive and descriptive working * chore: delete unnecessary import * refactor: further cleanup * chore: shrink charts some more * refactor: rename private method * chore: add docstrings * feat: add include/exclude column names lists * fix: correct method calls * fix: fix assertion around include/exclude columns * chore: update styling of charts * chore: misc changes per Nathan review * [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960) * feat: start impl * chore: finishing touches * fix: remedy typo in test * feat: update test * chore: revert changes in utils * chore: add comment * pin cryptography package (#4963) pin cryptography package (#4963) * [FEATURE] BigQuery Temp Table Support (#4925) * [FEATURE] BigQuery Temp Table Support (#4925) * [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966) * [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969) * feat: run script to type annotate * chore: update threshold * Enable RuleBasedProfiler components to be serializable. (#4972) * [BUGFIX] extras_require (#4968) * Remove azure from requirements-dev-sqlalchemy.txt * Update get_extras_require func to strip comments and include sqlalchemy for some keys * [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946) * Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1 * Add failing tests for corner cases * Passing tests for 0 and 1 false_positive_rate * Add tests for very small false_positive_rates * Return type is already validated as float * Use custom ProfilerExecutionError rather than ValueError * Use 1-NP_EPSILON as an upper bound * Pass variables to quentin fixture to set random seed * Bugfix setting wrong parameter * Set object attribute as well * Unable to access the actual false_positive_rate used as it is private * Use floats instead of ints * Update type hints * [MAINTENANCE] fix a typo (#4974) * [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958) * feat: start impl * test: start writing alice test * feat: misc updates per discussion with Alex * test: update test regexes * feat: update other expectation * chore: update fixtures * chore: type hint * [BUGFIX] Fix broken packaging test and update dgtest-overrides * [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975) * [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat * release prep (#4980) * [FEATURE] Splitting data assets into batches using timestamp columns in spark (#4973) * [BUGFIX] Use `monkeypatch` to ensure consistent bootstrap seed for additional probabilistic test (#4983) * feat: start impl * chore: finishing touches * fix: remedy typo in test * feat: update test * chore: revert changes in utils * chore: add comment * fix: use monkeypatch on test * [FEATURE] Splitting data assets into batches using datetime columns in pandas (#4982) * [BUGFIX] Patch the remainder of probabilistic `RuleBasedProfiler` tests with consistent bootstrap seed (#4989) * feat: start impl * chore: finishing touches * fix: remedy typo in test * feat: update test * chore: revert changes in utils * chore: add comment * fix: use monkeypatch on test * fix: patch remaining tests * chore: add docstring * [FEATURE] Provide Semantic Type Domain Interpretation Utility For Use Within ParameterBuilder Classes (#4993) * [MAINTENANCE] Splitter cleanup and enhancements (#4984) * Update action.md (#4967) Update action.md (#4967) * [FEATURE] Add support for Interpolation Method for Quantile Statistic Used by Estimators in NumericMetricRangeMultiBatchParameterBuilder (#4997) * [FEATURE] Enable self-initializing `ExpectColumnMeanToBeBetween` (#4986) * feat: init commit * test: write integration test * chore: add sigfigs * feat: add interpolation field * [FEATURE] Enable self-initializing `ExpectColumnMedianToBeBetween` (#4987) * feat: init commit * test: write integration test * chore: add sigfigs * feat: add interpolation field * chore: update GH action (#5001) * [FEATURE] Enable self-initializing `ExpectColumnSumToBeBetween` (#4988) * feat: init commit * test: write integration test * feat: add interpolation field * [MAINTENANCE] Move `DataAssistant` registry capabilities into `DataAssistantRegistry` to enable user aliasing (#4991) * refactor: move registry dict to dispatcher * chore: misc cleanup * chore: misc updates after review * chore: misc cleanup * chore: update error message * Fix continuous partition example (#4939) When calling json.dumps() method, the weights change. Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com> * [BUGFIX] RB…

alexsherstinsky and others added 30 commits April 21, 2022 10:51

ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

7cadee5

[MAINTENANCE] Ensure that code style scripts in CI/CD exit early on f…

404b979

…ailure (#4921) * chore: update pipeliens * chore: remove scope check from pipeline

[BUGFIX] check contrib requirements (#4922)

fa64f1a

* Add check that requirements is a list, but don't crash if it's not * Make requirements for icd_ten_category expectation a list

[MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configurat…

a9fc2c6

…ion and from Builder Constructor Arguments (#4927)

[BUGFIX] Add missing events to schema (#4917)

4075137

[BUGFIX] Fix clickhouse same-alias issue (#4389)

42c643c

Fix broken link for checklist (#4932)

52c30b3

[MAINTENANCE] Remove DataContext from DataAssistant (#4931)

2eb16f2

[MAINTENANCE] Add condition for custom checks in great_expectations p…

af03302

…ipelines

Move general data splitting tasks to abstract base class (#4942)

61fad2a

[MAINTENANCE] Add test to check for missing usage events (#4933)

2257f66

[FEATURE] Provide ability to combine lists of ExpectationConfiguratio…

81ec5cb

…n objects into flexible ExpectationSuite containers (#4943)

[MAINTENANCE] Move splitter related taxi integration test fixtures (#…

562874b

…4947)

Update SLACK_GUIDELINES.md

262c767

Updating some language in Slack Guidelines

added how to ask a question link

fa5271d

cleanup (#4949)

e478d09

[MAINTENANCE] Rearrange modules for better reusability (#4955)

595245e

[MAINTENANCE] Add timeout to great_expectations pipeline stages to pr…

13e11d2

…event false positive build failures

[FEATURE] Enable self-initializing capabilities for `ExpectColumnProp…

b048544

…ortionOfUniqueValuesToBeBetween` (#4929) * feat: init commit * chore: misc changes per convo with Alex * feat: finish initial impl * feat: finish impl after convo with Alex * chore: update after review

clean up (#4959)

b907c7d

[BUGFIX] Use monkeypatch to set a consistent bootstrap seed in tests (

4f791e1

#4960) * feat: start impl * chore: finishing touches * fix: remedy typo in test * feat: update test * chore: revert changes in utils * chore: add comment

pin cryptography package (#4963)

b56e08b

pin cryptography package (#4963)

[FEATURE] BigQuery Temp Table Support (#4925)

9c47d72

* [FEATURE] BigQuery Temp Table Support (#4925)

[FEATURE] Registry for DataAssistant classes with ability to execute …

12baf7a

…from DataContext by registered name (#4966)

[MAINTENANCE] Type annotate relevant functions with -> None (per PE…

7e53b61

…P 484) (#4969) * feat: run script to type annotate * chore: update threshold

alexsherstinsky and others added 21 commits May 3, 2022 20:36

parameter builder tests should utilize polymorphism (#5007)

7a9c3c7

[MAINTENANCE] Clean up type hints in CLI (#5006)

48af6e1

* chore: first pass * chore: more updates * chore: more annotations * chore: more annotations

[FEATURE] Enable Pandas DataFrame and Series as MetricValues Output o…

a91c306

…f Metric ParameterBuilder Classes (#5008)

logging and exception handling (#5009)

33c1efb

[FEATURE] Notebook for VolumeDataAssistant Example (#5010)

f52aab3

*[FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

[FEATURE] Histogram/Partition Single-Batch ParameterBuilder (#5011)

2fac5ba

[MAINTENANCE] Update version of black in pre-commit config

8dff260

[MAINTENANCE] Improve tooltips and formatting for distinct column val…

20ac63b

…ues chart in VolumeDataAssistantResult (#5017) * Correct type hints * Improve tooltips * Improve docstrings * Fix return object indexing * Return list length 1 instead of chart

[FEATURE] Limit samplers work with supported sqlalchemy backends (#5014)

5c60daf

[BUGFIX] Fix DataAssistantResult serialization issue (#5020)

53f13f0

[MAINTENANCE] Enhance configuring serialization for DotDict type clas…

a07329a

…ses (#5023)

Pyarrow upper bound (#5028)

344dce9

release prep v0.15.4 (#5029)

ca62bbc

[BUGFIX] Patch broken usage stats test around dependency tracking

7880263

[FEATURE] Add subset operation to Domain class (#5049)

131fc4c

[FEATURE] In DataAssistant: Use Domain instead of domain_type as key …

58f2637

…for Metrics Parameter Builders (#5057)

Merge branch 'develop' of github.com:great-expectations/great_expecta…

ea5fb5e

…tions into maintenance/sync-main-and-develop

github-actions bot added the core-team label May 6, 2022

cdkini changed the base branch from develop to main May 6, 2022 13:49

cdkini self-assigned this May 6, 2022

alexsherstinsky approved these changes May 6, 2022

View reviewed changes

cdkini merged commit 9ca11fa into main May 6, 2022

cdkini deleted the maintenance/sync-main-and-develop branch May 6, 2022 15:39

cdkini mentioned this pull request May 12, 2022

[RELEASE] 0.15.5 #5106

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAINTENANCE] Sync `main` and `develop` branches #5060

[MAINTENANCE] Sync `main` and `develop` branches #5060

cdkini commented May 6, 2022 •

edited

netlify bot commented May 6, 2022 •

edited

alexsherstinsky left a comment

[MAINTENANCE] Sync main and develop branches #5060

[MAINTENANCE] Sync main and develop branches #5060

Conversation

cdkini commented May 6, 2022 • edited

netlify bot commented May 6, 2022 • edited

✅ Deploy Preview for niobium-lead-7998 ready!

alexsherstinsky left a comment

Choose a reason for hiding this comment

[MAINTENANCE] Sync `main` and `develop` branches #5060

[MAINTENANCE] Sync `main` and `develop` branches #5060

cdkini commented May 6, 2022 •

edited

netlify bot commented May 6, 2022 •

edited