Skip to content

Commit

Permalink
main-cleanup (#5097)
Browse files Browse the repository at this point in the history
* [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531)

* [BUGFIX] Moves testing dependencies out of core reqs (#4522)

* removing upper bound on mistune

* remove deprecated depedencies

* adds untracked dependency

* adds untracked dependency

* adds untracked dependency

* moving dependencies

* removes dependencies added to lite from core | adds missing dependencies

Co-authored-by: Chetan Kini <chetan@superconductive.com>

* [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547)

* [MAINTENANCE] Don't return from validate configuration methods (#4545)

* Add validate_configuration to 2 core Expectations that are passing all their tests

* Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests

* Update all validate_configuration methods to have type hints and return None

* Update all doc snippet references that were effected

* [DOCS] technical term tags connect to data cloud docs (#4414)

* - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.)
- Some additional editing was done to bring documents in line with the documentation and how-to guide standards.

* - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop.

* - Update to include technical term tags. (#4462)

- Minor updates to correct formatting and spelling issues.

* - Moved docs related to contributing integrations under contributing in the ToC (#4551)

- Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term).

* - Adds new image files for the intro page (#4540)

- Updates the image file link for the overview image on the intro page

* [DOCS] clarifications on execution engines and scalability (#4539)

* - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines.

* - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas.

* - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines.

* [DOCS] technical terms for validate data advanced (#4535)

* - add support for technical term tags.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* [DOCS] technical terms for validate data actions docs (#4518)

* - Edits to bring docs up to documentation and how-to guide standards.

* - add technical term tags to documents.
- minor formatting edits (technical terms missing capitalization, etc).

* [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553)

* [MAINTENANCE] Refactor global `conftest` (#4534)

* chore: use black directives to temporarily disable linting

* chore: more black directives to temporarily disable linting

* chore: finish remaining

* refactor: start cleaning up conftest

* refactor: more refactoring of conftest

* refactor: even more refactoring of conftest

* [FEATURE] Improve diagnostic checklist details (#4548)

* Update library_metadata check to provide details when it doesn't pass

* In linting check, if snake_case doesn't match filename, show computed snake_case

* Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr

* Update convert_to_json_serializable to handle bytes

* Update build_gallery.py script to convert diagnostics to JSON in separate try/except

* Update build_gallery.py script to write expectation_library_v2.json file with indenting

* Update _check_input_validation to tell if custom assert statements are used in validate_configuration

* clean up (#4554)

* minor touch up (#4558)

* [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485)

* feat: init commit

* refactor: shift all logic over to base class

* feat: start impl of anonymize on Anonymizer

* feat: get ProfilerRunAnonymizer working

* refactor: remove constructor from ProfilerRunAnonymizer

* refactor: start on CheckpointRunAnonymizer

* fix: clean up broken checkpoint tests

* fix: ensure *args and **kwargs are propogated through

* refactor: start work on datasource anonymizers

* refactor: remove all anonymizers except Anonymizer from usage stats attrs

* fix: update isinstance checks

* refactor: move helper into checkpoint_run_anonymizer

* refactor: move helper into datasource_anonymizer

* refactor: make anonymize string private and place in strategy

* refactor: make anonymize batch info private and place in strategy

* refactor: move build_init_payload to Anonymizer

* refactor: make remainder of anonymize methods private

* refactor: add store info to strategy

* refactor: add dataconnector info to strategy

* refactor: consolidate profiler info and profiler run anonymization

* refactor: remove *args from signatures

* refactor: updates around checkpoint anonymization

* chore: misc cleanup of Anonymizer

* feat: final touch up before review

* chore: remove 'else' statements

* fix: ensure appropriate checkpoint method gets called

* chore: misc updates from review

* refactor: move init_payload back to usage stats

* chore: misc type hinting

* refactor: start using individual classes again

* chore: continue updating individual anonymizer classes

* feat: further updates to child classes

* feat: update anonymize_init_payload

* fix: get checkpoint payloads working

* refactor: ensure all methods have obj

* fix: misc fixes

* fix: make misc updates to conditional checks for obj

* refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer

* refactor: rename Checkpoint and Profiler anonymizers

* feat: leverage aggregate anonymizer downstream

* feature: conditionally create aggregate_anonymizer in constructor

* feat: add cache retrieve or instantiate util

* chore: add batch_request can_handle

* feat: ensure that salt has a default value in anonymizers

* refactor: require aggregate anonymizer in constructor

* refactor: instantiate all strategies in aggregate

* fix: fix broken tests

* refactor: rename internal getter

Co-authored-by: Don Heppner <donald.heppner@gmail.com>

* [MAINTENANCE] Remove duplicate mistune dependency

* [MAINTENANCE] Run PEP-273 checks on a schedule or release cut

* [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573)

* -Corrected the line references and added <snippet> tags to source code for Spark version of guide.

* -Corrected the line references and added <snippet> tags to source code for Pandas version of guide.

* -lint reformat w/black

* -correcting line numbers after lint formatting.

* [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546)

Usage stats instrumentation of package dependencies

* [MAINTENANCE] Add DevRel team to GitHub auto-label action

* [MAINTENANCE] Add GitHub action to conditionally auto-update PR's  (#4574)

* feat: add new action

* chore: add conditions

* [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577)

* chore: bump version

* chore: test change

* chore: update all instances of black

* chore: new test changes

* chore: revert test changes

* Update overview.md (#4556)

* Add missing links.
* Fix some typos
* Simplify flow and grammar in a few places

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* - corrected broken link in admonition box. (#4585)

- updated links in admonition box to point to current technical documentation rather than old core concepts documents.

* [MAINTENANCE] Minor clean-up (#4571)

Little bit of cleanup in our execution engine and validator

* [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590)

* fix : misconfigured ExpectationConfigurationBuilder

* pushing fix

* clean up before submitting for review

* bugfix : remove sorting

* remove extra line

* [MAINTENANCE] Instrument package dependencies (#4583)

* Add dependencies to data_context.__init__ event

* [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599)

* release candidate for 0.14.13

* revert to 0.14.12 state

* [RELEASE] 0.15.3 (#4981)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>

* [RELEASE] 0.15.4 (#5051)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep (#4980)

* [FEATURE] Splitting data assets into batches using timestamp columns in spark (#4973)

* [BUGFIX] Use `monkeypatch` to ensure consistent bootstrap seed for additional probabilistic test (#4983)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* [FEATURE] Splitting data assets into batches using datetime columns in pandas (#4982)

* [BUGFIX] Patch the remainder of probabilistic `RuleBasedProfiler` tests with consistent bootstrap seed (#4989)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* fix: patch remaining tests

* chore: add docstring

* [FEATURE] Provide Semantic Type Domain Interpretation Utility For Use Within ParameterBuilder Classes (#4993)

* [MAINTENANCE] Splitter cleanup and enhancements (#4984)

* Update action.md (#4967)

Update action.md (#4967)

* [FEATURE] Add support for Interpolation Method for Quantile Statistic Used by Estimators in NumericMetricRangeMultiBatchParameterBuilder (#4997)

* [FEATURE] Enable self-initializing `ExpectColumnMeanToBeBetween` (#4986)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* [FEATURE] Enable self-initializing `ExpectColumnMedianToBeBetween` (#4987)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* chore: update GH action (#5001)

* [FEATURE] Enable self-initializing `ExpectColumnSumToBeBetween` (#4988)

* feat: init commit

* test: write integration test

* feat: add interpolation field

* [MAINTENANCE] Move `DataAssistant` registry capabilities into `DataAssistantRegistry` to enable user aliasing (#4991)

* refactor: move registry dict to dispatcher

* chore: misc cleanup

* chore: misc updates after review

* chore: misc cleanup

* chore: update error message

* Fix continuous partition example (#4939)

When calling json.dumps() method, the weights change.

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] RBP Profiling Dataset ProgressBar Fix #4999

* [MAINTENANCE] Preliminary refactors for data samplers. (#4996)

* MetricSingleBatchParameterBuilder with unit and integration tests. (#5003)

* [DOCS] Update slack notification guide to not use validation operators. (#4978)

* - Removed references to validation_operator
- Edited config yaml examples
- Grouped config options into tab groups for webhook vs app.

* - Added technical term tag to Checkpoint reference.
- Minor edit to document format.

* corrects number in step header

* [MAINTENANCE] Clean up unused imports and enforce through `flake8` in CI/CD (#5005)

* maintenance: clean up codebase

* chore: add to pipelines

* fix: ensure that flake8 is installed first

* chore: rename CI/CD stage

* chore: update type hint threshold

* parameter builder tests should utilize polymorphism (#5007)

* [MAINTENANCE] Clean up type hints in CLI (#5006)

* chore: first pass

* chore: more updates

* chore: more annotations

* chore: more annotations

* [FEATURE] Enable Pandas DataFrame and Series as MetricValues Output of Metric ParameterBuilder Classes (#5008)

* logging and exception handling (#5009)

* [FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

*[FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

* [FEATURE] Histogram/Partition Single-Batch ParameterBuilder (#5011)

* [FEATURE] Update `DataAssistantResult.plot()` return value to emit `PlotResult` wrapper dataclass (#4962)

* feat: init commit

* test: write all tests (except 1)

* test: write last test

* test: remove theme test

* refactor: add custom Chart dataclass

* chore: make Chart immutable

* chore: add docstring

* feat: misc updates per team convo

* chore: delete comments

* refactor: delete vconcat methods and consolidate helpers

* refactor: further consolidation through helpers

* chore: update subtitle styling

* chore: add padding

* [ENHANCEMENT] Condense column-level `vconcat` plots into one interactive plot (#5002)

* Clone vertically concatenated chart for interactive starting point

* Working interactive chart

* Update ColorPalette names

* Update ordinal palette

* Tooltip now working

* Change legend color

* Add y-axis titles

* Align y-axis vertically for both charts

* Add highlight line

* Change batch_id to Batch ID

* Improve legend title and tooltip titles

* New layer for starting point showing one line

* Detail title updating appropriately

* Column name shows up with empty selecdtion

* Use variable for alt.value(light_gray)

* Allow selection by mouseover on lines

* Anomaly encoded lines

* Move column seledtor to top left

* Format input_dropdown name

* Working with expectation kwargs

* Add predicate logic for strict_min and strict_max

* Add subtitle to prescriptive return charts

* WIP

* Overcame merge conflicts in descriptive mode

* Overcame merge conflicts in prescriptive mode

* Column charts are in their own list index

* [MAINTENANCE] Update version of black in pre-commit config

* [MAINTENANCE] Improve tooltips and formatting for distinct column values chart in VolumeDataAssistantResult (#5017)

* Correct type hints

* Improve tooltips

* Improve docstrings

* Fix return object indexing

* Return list length 1 instead of chart

* [FEATURE] Limit samplers work with supported sqlalchemy backends (#5014)

* [BUGFIX] Fix DataAssistantResult serialization issue (#5020)

* [MAINTENANCE] Enhance configuring serialization for DotDict type classes (#5023)

* [FEATURE] trino support (#5021)

* clean up SQL statements for handling subqueries properly

* use formal sqlalchemy for reflection

* TRINO WIP POC-COMPLETED

* Add trino package as a dependency and update imports

* Add docker-compose.yml for starburst database in assets/docker/

* Update _create_trino_engine to accept a hostname and schema_name

* Update get_dataset to have a block for trino

* Add ability to use data_alt in test_definitions JSON files (for Trino quirks)

* Minor update to get_test_validator_with_data to make debugging easier

* Add trino to sqla_keys dict in setup.py

* Update 3 test_definition json files with trino things

* Add table_selectable workaround for trino

* Add requirements-dev-trino to test_packaging.py

* Add trino to various azure-pipelines yml files

* Skip test_expectation__get_renderers

* Skip test__get_test_results

* Skip test__generate_expectation_tests__with_no_test_backends

Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>

* Pyarrow upper bound (#5028)

* release prep v0.15.4 (#5029)

* [MAINTENANCE] Use temporary branch to attemp to align git histories of `develop` and `main` (#5042)

* [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531)

* [BUGFIX] Moves testing dependencies out of core reqs (#4522)

* removing upper bound on mistune

* remove deprecated depedencies

* adds untracked dependency

* adds untracked dependency

* adds untracked dependency

* moving dependencies

* removes dependencies added to lite from core | adds missing dependencies

Co-authored-by: Chetan Kini <chetan@superconductive.com>

* [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547)

* [MAINTENANCE] Don't return from validate configuration methods (#4545)

* Add validate_configuration to 2 core Expectations that are passing all their tests

* Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests

* Update all validate_configuration methods to have type hints and return None

* Update all doc snippet references that were effected

* [DOCS] technical term tags connect to data cloud docs (#4414)

* - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.)
- Some additional editing was done to bring documents in line with the documentation and how-to guide standards.

* - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop.

* - Update to include technical term tags. (#4462)

- Minor updates to correct formatting and spelling issues.

* - Moved docs related to contributing integrations under contributing in the ToC (#4551)

- Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term).

* - Adds new image files for the intro page (#4540)

- Updates the image file link for the overview image on the intro page

* [DOCS] clarifications on execution engines and scalability (#4539)

* - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines.

* - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas.

* - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines.

* [DOCS] technical terms for validate data advanced (#4535)

* - add support for technical term tags.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* [DOCS] technical terms for validate data actions docs (#4518)

* - Edits to bring docs up to documentation and how-to guide standards.

* - add technical term tags to documents.
- minor formatting edits (technical terms missing capitalization, etc).

* [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553)

* [MAINTENANCE] Refactor global `conftest` (#4534)

* chore: use black directives to temporarily disable linting

* chore: more black directives to temporarily disable linting

* chore: finish remaining

* refactor: start cleaning up conftest

* refactor: more refactoring of conftest

* refactor: even more refactoring of conftest

* [FEATURE] Improve diagnostic checklist details (#4548)

* Update library_metadata check to provide details when it doesn't pass

* In linting check, if snake_case doesn't match filename, show computed snake_case

* Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr

* Update convert_to_json_serializable to handle bytes

* Update build_gallery.py script to convert diagnostics to JSON in separate try/except

* Update build_gallery.py script to write expectation_library_v2.json file with indenting

* Update _check_input_validation to tell if custom assert statements are used in validate_configuration

* clean up (#4554)

* minor touch up (#4558)

* [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485)

* feat: init commit

* refactor: shift all logic over to base class

* feat: start impl of anonymize on Anonymizer

* feat: get ProfilerRunAnonymizer working

* refactor: remove constructor from ProfilerRunAnonymizer

* refactor: start on CheckpointRunAnonymizer

* fix: clean up broken checkpoint tests

* fix: ensure *args and **kwargs are propogated through

* refactor: start work on datasource anonymizers

* refactor: remove all anonymizers except Anonymizer from usage stats attrs

* fix: update isinstance checks

* refactor: move helper into checkpoint_run_anonymizer

* refactor: move helper into datasource_anonymizer

* refactor: make anonymize string private and place in strategy

* refactor: make anonymize batch info private and place in strategy

* refactor: move build_init_payload to Anonymizer

* refactor: make remainder of anonymize methods private

* refactor: add store info to strategy

* refactor: add dataconnector info to strategy

* refactor: consolidate profiler info and profiler run anonymization

* refactor: remove *args from signatures

* refactor: updates around checkpoint anonymization

* chore: misc cleanup of Anonymizer

* feat: final touch up before review

* chore: remove 'else' statements

* fix: ensure appropriate checkpoint method gets called

* chore: misc updates from review

* refactor: move init_payload back to usage stats

* chore: misc type hinting

* refactor: start using individual classes again

* chore: continue updating individual anonymizer classes

* feat: further updates to child classes

* feat: update anonymize_init_payload

* fix: get checkpoint payloads working

* refactor: ensure all methods have obj

* fix: misc fixes

* fix: make misc updates to conditional checks for obj

* refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer

* refactor: rename Checkpoint and Profiler anonymizers

* feat: leverage aggregate anonymizer downstream

* feature: conditionally create aggregate_anonymizer in constructor

* feat: add cache retrieve or instantiate util

* chore: add batch_request can_handle

* feat: ensure that salt has a default value in anonymizers

* refactor: require aggregate anonymizer in constructor

* refactor: instantiate all strategies in aggregate

* fix: fix broken tests

* refactor: rename internal getter

Co-authored-by: Don Heppner <donald.heppner@gmail.com>

* [MAINTENANCE] Remove duplicate mistune dependency

* [MAINTENANCE] Run PEP-273 checks on a schedule or release cut

* [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573)

* -Corrected the line references and added <snippet> tags to source code for Spark version of guide.

* -Corrected the line references and added <snippet> tags to source code for Pandas version of guide.

* -lint reformat w/black

* -correcting line numbers after lint formatting.

* [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546)

Usage stats instrumentation of package dependencies

* [MAINTENANCE] Add DevRel team to GitHub auto-label action

* [MAINTENANCE] Add GitHub action to conditionally auto-update PR's  (#4574)

* feat: add new action

* chore: add conditions

* [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577)

* chore: bump version

* chore: test change

* chore: update all instances of black

* chore: new test changes

* chore: revert test changes

* Update overview.md (#4556)

* Add missing links.
* Fix some typos
* Simplify flow and grammar in a few places

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* - corrected broken link in admonition box. (#4585)

- updated links in admonition box to point to current technical documentation rather than old core concepts documents.

* [MAINTENANCE] Minor clean-up (#4571)

Little bit of cleanup in our execution engine and validator

* [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590)

* fix : misconfigured ExpectationConfigurationBuilder

* pushing fix

* clean up before submitting for review

* bugfix : remove sorting

* remove extra line

* [MAINTENANCE] Instrument package dependencies (#4583)

* Add dependencies to data_context.__init__ event

* [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599)

* release candidate for 0.14.13

* revert to 0.14.12 state

* [RELEASE] 0.15.3 (#4981)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>

* chore: revert azure pipeline

* chore: revert more files

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>

* [BUGFIX] Patch broken usage stats test around dependency tracking

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: andyjessen <62343929+andyjessen@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>

* [MAINTENANCE] Sync `main` and `develop` branches (#5060)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep (#4980)

* [FEATURE] Splitting data assets into batches using timestamp columns in spark (#4973)

* [BUGFIX] Use `monkeypatch` to ensure consistent bootstrap seed for additional probabilistic test (#4983)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* [FEATURE] Splitting data assets into batches using datetime columns in pandas (#4982)

* [BUGFIX] Patch the remainder of probabilistic `RuleBasedProfiler` tests with consistent bootstrap seed (#4989)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* fix: patch remaining tests

* chore: add docstring

* [FEATURE] Provide Semantic Type Domain Interpretation Utility For Use Within ParameterBuilder Classes (#4993)

* [MAINTENANCE] Splitter cleanup and enhancements (#4984)

* Update action.md (#4967)

Update action.md (#4967)

* [FEATURE] Add support for Interpolation Method for Quantile Statistic Used by Estimators in NumericMetricRangeMultiBatchParameterBuilder (#4997)

* [FEATURE] Enable self-initializing `ExpectColumnMeanToBeBetween` (#4986)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* [FEATURE] Enable self-initializing `ExpectColumnMedianToBeBetween` (#4987)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* chore: update GH action (#5001)

* [FEATURE] Enable self-initializing `ExpectColumnSumToBeBetween` (#4988)

* feat: init commit

* test: write integration test

* feat: add interpolation field

* [MAINTENANCE] Move `DataAssistant` registry capabilities into `DataAssistantRegistry` to enable user aliasing (#4991)

* refactor: move registry dict to dispatcher

* chore: misc cleanup

* chore: misc updates after review

* chore: misc cleanup

* chore: update error message

* Fix continuous partition example (#4939)

When calling json.dumps() method, the weights change.

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] RB…
  • Loading branch information
21 people committed May 11, 2022
1 parent 590d0df commit 1f92dc8
Show file tree
Hide file tree
Showing 378 changed files with 22,732 additions and 7,499 deletions.
4 changes: 2 additions & 2 deletions .github/pull_request_template.md
@@ -1,12 +1,12 @@
Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], or [MAINTENANCE]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our [contributor checklist](https://docs.greatexpectations.io/en/latest/contributing/contribution_checklist.html).
Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], or [MAINTENANCE]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our [contributor checklist](https://docs.greatexpectations.io/docs/contributing/contributing_checklist).

Changes proposed in this pull request:
-
-
-


After submitting your PR, CI checks will run and @ge-cla-bot will check for your CLA signature.
After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/autoupdate.yml
@@ -1,9 +1,9 @@
# https://github.com/marketplace/actions/auto-update
name: autoupdate
on:
pull_request:
types:
- auto_merge_enabled
push:
branches:
- develop # Whenever the base changes, this action should run

jobs:
autoupdate:
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Expand Up @@ -16,7 +16,7 @@ repos:
- id: isort
exclude: venv/.*\|docs/.*|tests/test_sets/broken_excel_file\.xls
- repo: https://github.com/psf/black
rev: 22.1.0
rev: 22.3.0
hooks:
- id: black
exclude: docs/.*|tests/.*.fixture|.*.ge_store_backend_id
Expand Down
13 changes: 5 additions & 8 deletions SLACK_GUIDELINES.md
Expand Up @@ -11,16 +11,16 @@ If you post in off hours be patient, Someone will get back to you once the sun c
## Asking for help

- Do your best to try and solve the problem first as your efforts will help us more easily answer the question.
- [Read "How to write a good question in Slack"](https://github.com/great-expectations/great_expectations/discussions/4951)
- Head over to our [Documentation](https://docs.greatexpectations.io/en/latest/)
- Checkout [Discuss](https://discuss.greatexpectations.io/) this is where we want most of our problem solving, discussion, updates, etc to go because it helps keep a more visible record for GE users.
- Checkout [GitHub Discussions](https://github.com/great-expectations/great_expectations/discussions) this is where we want most of our problem solving, discussion, updates, etc to go because it helps keep a more visible record for GE users.

#### Asking your question in Slack

**Know your support channel:**
<ul>
<li>#beginners: If you are just getting started this is your place to be!</li>
<li>#support: Having trouble with customizing your expectations, an integration or anything else .beyond just getting started? Post here.</li>
<li>#expectation-requests: Have a good idea for an expectation? Post it here. </li>
<li>#support: Having trouble with customizing your Expectations, an integration or anything else .beyond just getting started? Post here.</li>
<li>#feature-requests: Have a good idea for an Expectation or a feature? Post it here. </li>
<li>#contributors-contributing: For previous, current and prospective contributors to talk about potential contributions and to help each other with contributions.</li>
</ul>
## Use Public Channels, Not Private Groups
Expand All @@ -44,13 +44,10 @@ Great Expectations is a piece of the puzzle when it comes to being a data practi
**\#contributors-contributing**<br/>
For previous, current and prospective contributors to talk about potential contributions and to help each other with contributions.

**\#beginners**<br/>
Judgement free question zone! If you’re having trouble getting started with GE this is a perfect place to ask. Community help is encouraged here :)

**\#job-openings**<br/>
Looking to hire someone in the community? Post your job here:

**\#expectation-requests**<br/>
**\#feature-requests**<br/>
Have a good idea for an expectation? Post it here.

**\#support**<br/>
Expand Down
6 changes: 6 additions & 0 deletions assets/docker/starburst/docker-compose.yml
@@ -0,0 +1,6 @@
version: '3.2'
services:
starburst_db:
image: starburstdata/starburst-enterprise:373-e
ports:
- "8088:8080"
2 changes: 2 additions & 0 deletions assets/scripts/build_gallery.py
Expand Up @@ -109,6 +109,8 @@ def get_contrib_requirements(filepath: str) -> Dict:
if "library_metadata" in target_ids:
library_metadata = ast.literal_eval(node.value)
requirements = library_metadata.get("requirements", [])
if type(requirements) == str:
requirements = [requirements]
requirements_info[current_class] = requirements
requirements_info["requirements"] += requirements

Expand Down
80 changes: 70 additions & 10 deletions azure-pipelines-dependency-graph-testing.yml
Expand Up @@ -41,6 +41,10 @@ resources:
MSSQL_PID: Developer
ports:
- 1433:1433
- container: trino
image: trinodb/trino:379
ports:
- 8088:8080

variables:
GE_USAGE_STATISTICS_URL: "https://qa.stats.greatexpectations.io/great_expectations/v1/usage_statistics"
Expand Down Expand Up @@ -118,6 +122,14 @@ stages:
- bash: python scripts/check_docstring_coverage.py
name: DocstringChecker

- job: unused_import_checker
steps:
- script: |
pip install flake8
# https://www.flake8rules.com/rules/F401.html - Prunes the dgtest graph to improve accuracy
flake8 --select F401 great_expectations tests
name: UnusedImportChecker
- stage: import_ge
dependsOn: [lint]
pool:
Expand Down Expand Up @@ -153,7 +165,7 @@ stages:
displayName: 'Import Great Expectations'
- stage: required
dependsOn: [scope_check, lint, import_ge]
dependsOn: [scope_check, lint, import_ge, custom_checks]
pool:
vmImage: 'ubuntu-18.04'

Expand Down Expand Up @@ -318,15 +330,17 @@ stages:
# These are tests that are missed by the dgtest dependency graph.
# In order to ensure coverage, we run them during each CI/CD cycle.
pytest \
tests/test_deprecation.py \
tests/core/test_urn.py \
tests/core/usage_statistics/test_package_dependencies.py \
tests/data_asset/test_data_asset_util.py \
tests/datasource/test_batch_kwargs.py \
tests/datasource/test_sqlalchemy_datasource_workarounds.py \
tests/data_context/test_templates.py \
tests/data_context/test_data_context_store_configs.py \
tests/core/test_urn.py \
tests/data_context/test_templates.py \
tests/dataset/test_pandas_dataset_conditionals.py \
tests/expectations/test_generate_diagnostic_checklist.py
tests/datasource/test_batch_kwargs.py \
tests/datasource/test_sqlalchemy_datasource_workarounds.py \
tests/expectations/test_generate_diagnostic_checklist.py \
tests/test_deprecation.py \
tests/test_packaging.py
displayName: 'dgtest-overrides'
Expand All @@ -343,7 +357,7 @@ stages:
reportDirectory: '$(System.DefaultWorkingDirectory)/**/htmlcov'

- stage: usage_stats_integration
dependsOn: [scope_check, lint, import_ge]
dependsOn: [scope_check, lint, import_ge, custom_checks]
pool:
vmImage: 'ubuntu-latest'

Expand Down Expand Up @@ -379,7 +393,7 @@ stages:
pool:
vmImage: 'ubuntu-latest'

dependsOn: [scope_check, lint, import_ge]
dependsOn: [scope_check, lint, import_ge, custom_checks]

jobs:
- job: mysql
Expand Down Expand Up @@ -471,8 +485,54 @@ stages:
displayName: 'dgtest'
- job: trino
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GEChanged'], true)

services:
trino: trino

variables:
python.version: '3.8'

steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '$(python.version)'
displayName: 'Use Python $(python.version)'

- bash: python -m pip install --upgrade pip==21.3.1
displayName: 'Update pip'

- script: |
printf 'Waiting for Trino database to accept connections'
sleep 30
# until trino --execute "SHOW CATALOGS"; do
# printf '.'
# sleep 1;
# done;
displayName: Wait for database to initialise

- script: |
pip install --requirement requirements-dev-test.txt --requirement requirements-dev-sqlalchemy.txt --constraint constraints-dev.txt
pip install --requirement requirements.txt
pip install .
displayName: 'Install dependencies'
- script: |
# Install dependencies
pip install --requirement requirements.txt
pip install pytest pytest-cov pytest-azurepipelines
git clone https://github.com/superconductive/dgtest.git
pip install -e dgtest
# Run dgtest
dgtest run great_expectations --ignore 'tests/cli' --ignore 'tests/integration/usage_statistics' \
--trino --napoleon-docstrings --junitxml=junit/test-results.xml --cov=. --cov-report=xml --cov-report=html
displayName: 'dgtest'
- stage: cli_integration
dependsOn: [scope_check, lint, import_ge]
dependsOn: [scope_check, lint, import_ge, custom_checks]
pool:
vmImage: 'ubuntu-latest'

Expand Down
7 changes: 6 additions & 1 deletion azure-pipelines-docs-integration.yml
Expand Up @@ -28,6 +28,10 @@ resources:
MSSQL_PID: Developer
ports:
- 1433:1433
- container: trino
image: trinodb/trino:379
ports:
- 8088:8080

variables:
isMain: $[eq(variables['Build.SourceBranch'], 'refs/heads/main')]
Expand Down Expand Up @@ -85,6 +89,7 @@ stages:
postgres: postgres
mysql: mysql
mssql: mssql
trino: trino

steps:
- task: UsePythonVersion@0
Expand Down Expand Up @@ -137,7 +142,7 @@ stages:
- script: |
pip install pytest pytest-azurepipelines
pytest -v --docs-tests -m integration --mysql --bigquery --mssql --spark --postgresql --aws tests/integration/test_script_runner.py
pytest -v --docs-tests -m integration --mysql --bigquery --mssql --spark --postgresql --trino --aws tests/integration/test_script_runner.py
displayName: 'pytest'
env:
# snowflake credentials
Expand Down
44 changes: 44 additions & 0 deletions azure-pipelines-os-integration.yml
Expand Up @@ -35,6 +35,10 @@ resources:
MSSQL_PID: Developer
ports:
- 1433:1433
- container: trino
image: trinodb/trino:379
ports:
- 8088:8080

variables:
GE_USAGE_STATISTICS_URL: "https://qa.stats.greatexpectations.io/great_expectations/v1/usage_statistics"
Expand Down Expand Up @@ -236,3 +240,43 @@ stages:
pip install pytest pytest-cov pytest-azurepipelines
pytest --postgresql --napoleon-docstrings --junitxml=junit/test-results.xml --cov=. --cov-report=xml --cov-report=html --ignore=tests/cli --ignore=tests/integration/usage_statistics
displayName: 'pytest'
- job: trino
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GEChanged'], true)

services:
trino: trino

variables:
python.version: '3.8'

steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '$(python.version)'
displayName: 'Use Python $(python.version)'

- bash: python -m pip install --upgrade pip==20.2.4
displayName: 'Update pip'

- script: |
printf 'Waiting for Trino database to accept connections'
sleep 30
# until trino --execute "SHOW CATALOGS"; do
# printf '.'
# sleep 1;
# done;
displayName: Wait for database to initialise

- script: |
pip install --requirement requirements-dev-test.txt --requirement requirements-dev-sqlalchemy.txt
# Install latest sqlalchemy version
pip install --upgrade SQLAlchemy
pip install --requirement requirements.txt
pip install .
displayName: 'Install dependencies'
- script: |
pip install pytest pytest-cov pytest-azurepipelines
pytest --trino --napoleon-docstrings --junitxml=junit/test-results.xml --cov=. --cov-report=xml --cov-report=html --ignore=tests/cli --ignore=tests/integration/usage_statistics
displayName: 'pytest'

0 comments on commit 1f92dc8

Please sign in to comment.