Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] Sync main and develop branches #5060

Merged
merged 79 commits into from May 6, 2022

Conversation

cdkini
Copy link
Member

@cdkini cdkini commented May 6, 2022

  • Sync main and develop after branches diverged as part of the v0.15.4 release.

Screen Shot 2022-05-06 at 9 51 00 AM

alexsherstinsky and others added 30 commits April 21, 2022 10:51
…ailure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline
* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list
…ectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters
* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
…ersion < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
…n objects into flexible ExpectationSuite containers (#4943)
Updating some language in Slack Guidelines
…ortionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review
… `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review
#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment
pin cryptography package (#4963)
* [FEATURE] BigQuery Temp Table Support (#4925)
…P 484) (#4969)

* feat: run script to type annotate

* chore: update threshold
alexsherstinsky and others added 21 commits May 3, 2022 20:36
* chore: first pass

* chore: more updates

* chore: more annotations

* chore: more annotations
*[FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)
…lotResult` wrapper dataclass (#4962)

* feat: init commit

* test: write all tests (except 1)

* test: write last test

* test: remove theme test

* refactor: add custom Chart dataclass

* chore: make Chart immutable

* chore: add docstring

* feat: misc updates per team convo

* chore: delete comments

* refactor: delete vconcat methods and consolidate helpers

* refactor: further consolidation through helpers

* chore: update subtitle styling

* chore: add padding
…ive plot (#5002)

* Clone vertically concatenated chart for interactive starting point

* Working interactive chart

* Update ColorPalette names

* Update ordinal palette

* Tooltip now working

* Change legend color

* Add y-axis titles

* Align y-axis vertically for both charts

* Add highlight line

* Change batch_id to Batch ID

* Improve legend title and tooltip titles

* New layer for starting point showing one line

* Detail title updating appropriately

* Column name shows up with empty selecdtion

* Use variable for alt.value(light_gray)

* Allow selection by mouseover on lines

* Anomaly encoded lines

* Move column seledtor to top left

* Format input_dropdown name

* Working with expectation kwargs

* Add predicate logic for strict_min and strict_max

* Add subtitle to prescriptive return charts

* WIP

* Overcame merge conflicts in descriptive mode

* Overcame merge conflicts in prescriptive mode

* Column charts are in their own list index
…ues chart in VolumeDataAssistantResult (#5017)

* Correct type hints

* Improve tooltips

* Improve docstrings

* Fix return object indexing

* Return list length 1 instead of chart
* clean up SQL statements for handling subqueries properly

* use formal sqlalchemy for reflection

* TRINO WIP POC-COMPLETED

* Add trino package as a dependency and update imports

* Add docker-compose.yml for starburst database in assets/docker/

* Update _create_trino_engine to accept a hostname and schema_name

* Update get_dataset to have a block for trino

* Add ability to use data_alt in test_definitions JSON files (for Trino quirks)

* Minor update to get_test_validator_with_data to make debugging easier

* Add trino to sqla_keys dict in setup.py

* Update 3 test_definition json files with trino things

* Add table_selectable workaround for trino

* Add requirements-dev-trino to test_packaging.py

* Add trino to various azure-pipelines yml files

* Skip test_expectation__get_renderers

* Skip test__get_test_results

* Skip test__generate_expectation_tests__with_no_test_backends

Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>
…f `develop` and `main` (#5042)

* [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531)

* [BUGFIX] Moves testing dependencies out of core reqs (#4522)

* removing upper bound on mistune

* remove deprecated depedencies

* adds untracked dependency

* adds untracked dependency

* adds untracked dependency

* moving dependencies

* removes dependencies added to lite from core | adds missing dependencies

Co-authored-by: Chetan Kini <chetan@superconductive.com>

* [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547)

* [MAINTENANCE] Don't return from validate configuration methods (#4545)

* Add validate_configuration to 2 core Expectations that are passing all their tests

* Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests

* Update all validate_configuration methods to have type hints and return None

* Update all doc snippet references that were effected

* [DOCS] technical term tags connect to data cloud docs (#4414)

* - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.)
- Some additional editing was done to bring documents in line with the documentation and how-to guide standards.

* - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop.

* - Update to include technical term tags. (#4462)

- Minor updates to correct formatting and spelling issues.

* - Moved docs related to contributing integrations under contributing in the ToC (#4551)

- Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term).

* - Adds new image files for the intro page (#4540)

- Updates the image file link for the overview image on the intro page

* [DOCS] clarifications on execution engines and scalability (#4539)

* - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines.

* - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas.

* - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines.

* [DOCS] technical terms for validate data advanced (#4535)

* - add support for technical term tags.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* [DOCS] technical terms for validate data actions docs (#4518)

* - Edits to bring docs up to documentation and how-to guide standards.

* - add technical term tags to documents.
- minor formatting edits (technical terms missing capitalization, etc).

* [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553)

* [MAINTENANCE] Refactor global `conftest` (#4534)

* chore: use black directives to temporarily disable linting

* chore: more black directives to temporarily disable linting

* chore: finish remaining

* refactor: start cleaning up conftest

* refactor: more refactoring of conftest

* refactor: even more refactoring of conftest

* [FEATURE] Improve diagnostic checklist details (#4548)

* Update library_metadata check to provide details when it doesn't pass

* In linting check, if snake_case doesn't match filename, show computed snake_case

* Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr

* Update convert_to_json_serializable to handle bytes

* Update build_gallery.py script to convert diagnostics to JSON in separate try/except

* Update build_gallery.py script to write expectation_library_v2.json file with indenting

* Update _check_input_validation to tell if custom assert statements are used in validate_configuration

* clean up (#4554)

* minor touch up (#4558)

* [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485)

* feat: init commit

* refactor: shift all logic over to base class

* feat: start impl of anonymize on Anonymizer

* feat: get ProfilerRunAnonymizer working

* refactor: remove constructor from ProfilerRunAnonymizer

* refactor: start on CheckpointRunAnonymizer

* fix: clean up broken checkpoint tests

* fix: ensure *args and **kwargs are propogated through

* refactor: start work on datasource anonymizers

* refactor: remove all anonymizers except Anonymizer from usage stats attrs

* fix: update isinstance checks

* refactor: move helper into checkpoint_run_anonymizer

* refactor: move helper into datasource_anonymizer

* refactor: make anonymize string private and place in strategy

* refactor: make anonymize batch info private and place in strategy

* refactor: move build_init_payload to Anonymizer

* refactor: make remainder of anonymize methods private

* refactor: add store info to strategy

* refactor: add dataconnector info to strategy

* refactor: consolidate profiler info and profiler run anonymization

* refactor: remove *args from signatures

* refactor: updates around checkpoint anonymization

* chore: misc cleanup of Anonymizer

* feat: final touch up before review

* chore: remove 'else' statements

* fix: ensure appropriate checkpoint method gets called

* chore: misc updates from review

* refactor: move init_payload back to usage stats

* chore: misc type hinting

* refactor: start using individual classes again

* chore: continue updating individual anonymizer classes

* feat: further updates to child classes

* feat: update anonymize_init_payload

* fix: get checkpoint payloads working

* refactor: ensure all methods have obj

* fix: misc fixes

* fix: make misc updates to conditional checks for obj

* refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer

* refactor: rename Checkpoint and Profiler anonymizers

* feat: leverage aggregate anonymizer downstream

* feature: conditionally create aggregate_anonymizer in constructor

* feat: add cache retrieve or instantiate util

* chore: add batch_request can_handle

* feat: ensure that salt has a default value in anonymizers

* refactor: require aggregate anonymizer in constructor

* refactor: instantiate all strategies in aggregate

* fix: fix broken tests

* refactor: rename internal getter

Co-authored-by: Don Heppner <donald.heppner@gmail.com>

* [MAINTENANCE] Remove duplicate mistune dependency

* [MAINTENANCE] Run PEP-273 checks on a schedule or release cut

* [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573)

* -Corrected the line references and added <snippet> tags to source code for Spark version of guide.

* -Corrected the line references and added <snippet> tags to source code for Pandas version of guide.

* -lint reformat w/black

* -correcting line numbers after lint formatting.

* [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546)

Usage stats instrumentation of package dependencies

* [MAINTENANCE] Add DevRel team to GitHub auto-label action

* [MAINTENANCE] Add GitHub action to conditionally auto-update PR's  (#4574)

* feat: add new action

* chore: add conditions

* [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577)

* chore: bump version

* chore: test change

* chore: update all instances of black

* chore: new test changes

* chore: revert test changes

* Update overview.md (#4556)

* Add missing links.
* Fix some typos
* Simplify flow and grammar in a few places

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* - corrected broken link in admonition box. (#4585)

- updated links in admonition box to point to current technical documentation rather than old core concepts documents.

* [MAINTENANCE] Minor clean-up (#4571)

Little bit of cleanup in our execution engine and validator

* [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590)

* fix : misconfigured ExpectationConfigurationBuilder

* pushing fix

* clean up before submitting for review

* bugfix : remove sorting

* remove extra line

* [MAINTENANCE] Instrument package dependencies (#4583)

* Add dependencies to data_context.__init__ event

* [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599)

* release candidate for 0.14.13

* revert to 0.14.12 state

* [RELEASE] 0.15.3 (#4981)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>

* chore: revert azure pipeline

* chore: revert more files

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
…tions into maintenance/sync-main-and-develop
@netlify
Copy link

netlify bot commented May 6, 2022

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit ea5fb5e
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/62752761154ead00084a14c0
😎 Deploy Preview https://deploy-preview-5060--niobium-lead-7998.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@cdkini cdkini changed the base branch from develop to main May 6, 2022 13:49
@cdkini cdkini self-assigned this May 6, 2022
Copy link
Contributor

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cdkini cdkini merged commit 9ca11fa into main May 6, 2022
@cdkini cdkini deleted the maintenance/sync-main-and-develop branch May 6, 2022 15:39
kenwade4 added a commit that referenced this pull request May 11, 2022
* [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531)

* [BUGFIX] Moves testing dependencies out of core reqs (#4522)

* removing upper bound on mistune

* remove deprecated depedencies

* adds untracked dependency

* adds untracked dependency

* adds untracked dependency

* moving dependencies

* removes dependencies added to lite from core | adds missing dependencies

Co-authored-by: Chetan Kini <chetan@superconductive.com>

* [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547)

* [MAINTENANCE] Don't return from validate configuration methods (#4545)

* Add validate_configuration to 2 core Expectations that are passing all their tests

* Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests

* Update all validate_configuration methods to have type hints and return None

* Update all doc snippet references that were effected

* [DOCS] technical term tags connect to data cloud docs (#4414)

* - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.)
- Some additional editing was done to bring documents in line with the documentation and how-to guide standards.

* - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop.

* - Update to include technical term tags. (#4462)

- Minor updates to correct formatting and spelling issues.

* - Moved docs related to contributing integrations under contributing in the ToC (#4551)

- Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term).

* - Adds new image files for the intro page (#4540)

- Updates the image file link for the overview image on the intro page

* [DOCS] clarifications on execution engines and scalability (#4539)

* - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines.

* - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas.

* - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines.

* [DOCS] technical terms for validate data advanced (#4535)

* - add support for technical term tags.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* [DOCS] technical terms for validate data actions docs (#4518)

* - Edits to bring docs up to documentation and how-to guide standards.

* - add technical term tags to documents.
- minor formatting edits (technical terms missing capitalization, etc).

* [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553)

* [MAINTENANCE] Refactor global `conftest` (#4534)

* chore: use black directives to temporarily disable linting

* chore: more black directives to temporarily disable linting

* chore: finish remaining

* refactor: start cleaning up conftest

* refactor: more refactoring of conftest

* refactor: even more refactoring of conftest

* [FEATURE] Improve diagnostic checklist details (#4548)

* Update library_metadata check to provide details when it doesn't pass

* In linting check, if snake_case doesn't match filename, show computed snake_case

* Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr

* Update convert_to_json_serializable to handle bytes

* Update build_gallery.py script to convert diagnostics to JSON in separate try/except

* Update build_gallery.py script to write expectation_library_v2.json file with indenting

* Update _check_input_validation to tell if custom assert statements are used in validate_configuration

* clean up (#4554)

* minor touch up (#4558)

* [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485)

* feat: init commit

* refactor: shift all logic over to base class

* feat: start impl of anonymize on Anonymizer

* feat: get ProfilerRunAnonymizer working

* refactor: remove constructor from ProfilerRunAnonymizer

* refactor: start on CheckpointRunAnonymizer

* fix: clean up broken checkpoint tests

* fix: ensure *args and **kwargs are propogated through

* refactor: start work on datasource anonymizers

* refactor: remove all anonymizers except Anonymizer from usage stats attrs

* fix: update isinstance checks

* refactor: move helper into checkpoint_run_anonymizer

* refactor: move helper into datasource_anonymizer

* refactor: make anonymize string private and place in strategy

* refactor: make anonymize batch info private and place in strategy

* refactor: move build_init_payload to Anonymizer

* refactor: make remainder of anonymize methods private

* refactor: add store info to strategy

* refactor: add dataconnector info to strategy

* refactor: consolidate profiler info and profiler run anonymization

* refactor: remove *args from signatures

* refactor: updates around checkpoint anonymization

* chore: misc cleanup of Anonymizer

* feat: final touch up before review

* chore: remove 'else' statements

* fix: ensure appropriate checkpoint method gets called

* chore: misc updates from review

* refactor: move init_payload back to usage stats

* chore: misc type hinting

* refactor: start using individual classes again

* chore: continue updating individual anonymizer classes

* feat: further updates to child classes

* feat: update anonymize_init_payload

* fix: get checkpoint payloads working

* refactor: ensure all methods have obj

* fix: misc fixes

* fix: make misc updates to conditional checks for obj

* refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer

* refactor: rename Checkpoint and Profiler anonymizers

* feat: leverage aggregate anonymizer downstream

* feature: conditionally create aggregate_anonymizer in constructor

* feat: add cache retrieve or instantiate util

* chore: add batch_request can_handle

* feat: ensure that salt has a default value in anonymizers

* refactor: require aggregate anonymizer in constructor

* refactor: instantiate all strategies in aggregate

* fix: fix broken tests

* refactor: rename internal getter

Co-authored-by: Don Heppner <donald.heppner@gmail.com>

* [MAINTENANCE] Remove duplicate mistune dependency

* [MAINTENANCE] Run PEP-273 checks on a schedule or release cut

* [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573)

* -Corrected the line references and added <snippet> tags to source code for Spark version of guide.

* -Corrected the line references and added <snippet> tags to source code for Pandas version of guide.

* -lint reformat w/black

* -correcting line numbers after lint formatting.

* [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546)

Usage stats instrumentation of package dependencies

* [MAINTENANCE] Add DevRel team to GitHub auto-label action

* [MAINTENANCE] Add GitHub action to conditionally auto-update PR's  (#4574)

* feat: add new action

* chore: add conditions

* [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577)

* chore: bump version

* chore: test change

* chore: update all instances of black

* chore: new test changes

* chore: revert test changes

* Update overview.md (#4556)

* Add missing links.
* Fix some typos
* Simplify flow and grammar in a few places

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* - corrected broken link in admonition box. (#4585)

- updated links in admonition box to point to current technical documentation rather than old core concepts documents.

* [MAINTENANCE] Minor clean-up (#4571)

Little bit of cleanup in our execution engine and validator

* [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590)

* fix : misconfigured ExpectationConfigurationBuilder

* pushing fix

* clean up before submitting for review

* bugfix : remove sorting

* remove extra line

* [MAINTENANCE] Instrument package dependencies (#4583)

* Add dependencies to data_context.__init__ event

* [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599)

* release candidate for 0.14.13

* revert to 0.14.12 state

* [RELEASE] 0.15.3 (#4981)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>

* [RELEASE] 0.15.4 (#5051)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep (#4980)

* [FEATURE] Splitting data assets into batches using timestamp columns in spark (#4973)

* [BUGFIX] Use `monkeypatch` to ensure consistent bootstrap seed for additional probabilistic test (#4983)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* [FEATURE] Splitting data assets into batches using datetime columns in pandas (#4982)

* [BUGFIX] Patch the remainder of probabilistic `RuleBasedProfiler` tests with consistent bootstrap seed (#4989)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* fix: patch remaining tests

* chore: add docstring

* [FEATURE] Provide Semantic Type Domain Interpretation Utility For Use Within ParameterBuilder Classes (#4993)

* [MAINTENANCE] Splitter cleanup and enhancements (#4984)

* Update action.md (#4967)

Update action.md (#4967)

* [FEATURE] Add support for Interpolation Method for Quantile Statistic Used by Estimators in NumericMetricRangeMultiBatchParameterBuilder (#4997)

* [FEATURE] Enable self-initializing `ExpectColumnMeanToBeBetween` (#4986)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* [FEATURE] Enable self-initializing `ExpectColumnMedianToBeBetween` (#4987)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* chore: update GH action (#5001)

* [FEATURE] Enable self-initializing `ExpectColumnSumToBeBetween` (#4988)

* feat: init commit

* test: write integration test

* feat: add interpolation field

* [MAINTENANCE] Move `DataAssistant` registry capabilities into `DataAssistantRegistry` to enable user aliasing (#4991)

* refactor: move registry dict to dispatcher

* chore: misc cleanup

* chore: misc updates after review

* chore: misc cleanup

* chore: update error message

* Fix continuous partition example (#4939)

When calling json.dumps() method, the weights change.

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] RBP Profiling Dataset ProgressBar Fix #4999

* [MAINTENANCE] Preliminary refactors for data samplers. (#4996)

* MetricSingleBatchParameterBuilder with unit and integration tests. (#5003)

* [DOCS] Update slack notification guide to not use validation operators. (#4978)

* - Removed references to validation_operator
- Edited config yaml examples
- Grouped config options into tab groups for webhook vs app.

* - Added technical term tag to Checkpoint reference.
- Minor edit to document format.

* corrects number in step header

* [MAINTENANCE] Clean up unused imports and enforce through `flake8` in CI/CD (#5005)

* maintenance: clean up codebase

* chore: add to pipelines

* fix: ensure that flake8 is installed first

* chore: rename CI/CD stage

* chore: update type hint threshold

* parameter builder tests should utilize polymorphism (#5007)

* [MAINTENANCE] Clean up type hints in CLI (#5006)

* chore: first pass

* chore: more updates

* chore: more annotations

* chore: more annotations

* [FEATURE] Enable Pandas DataFrame and Series as MetricValues Output of Metric ParameterBuilder Classes (#5008)

* logging and exception handling (#5009)

* [FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

*[FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

* [FEATURE] Histogram/Partition Single-Batch ParameterBuilder (#5011)

* [FEATURE] Update `DataAssistantResult.plot()` return value to emit `PlotResult` wrapper dataclass (#4962)

* feat: init commit

* test: write all tests (except 1)

* test: write last test

* test: remove theme test

* refactor: add custom Chart dataclass

* chore: make Chart immutable

* chore: add docstring

* feat: misc updates per team convo

* chore: delete comments

* refactor: delete vconcat methods and consolidate helpers

* refactor: further consolidation through helpers

* chore: update subtitle styling

* chore: add padding

* [ENHANCEMENT] Condense column-level `vconcat` plots into one interactive plot (#5002)

* Clone vertically concatenated chart for interactive starting point

* Working interactive chart

* Update ColorPalette names

* Update ordinal palette

* Tooltip now working

* Change legend color

* Add y-axis titles

* Align y-axis vertically for both charts

* Add highlight line

* Change batch_id to Batch ID

* Improve legend title and tooltip titles

* New layer for starting point showing one line

* Detail title updating appropriately

* Column name shows up with empty selecdtion

* Use variable for alt.value(light_gray)

* Allow selection by mouseover on lines

* Anomaly encoded lines

* Move column seledtor to top left

* Format input_dropdown name

* Working with expectation kwargs

* Add predicate logic for strict_min and strict_max

* Add subtitle to prescriptive return charts

* WIP

* Overcame merge conflicts in descriptive mode

* Overcame merge conflicts in prescriptive mode

* Column charts are in their own list index

* [MAINTENANCE] Update version of black in pre-commit config

* [MAINTENANCE] Improve tooltips and formatting for distinct column values chart in VolumeDataAssistantResult (#5017)

* Correct type hints

* Improve tooltips

* Improve docstrings

* Fix return object indexing

* Return list length 1 instead of chart

* [FEATURE] Limit samplers work with supported sqlalchemy backends (#5014)

* [BUGFIX] Fix DataAssistantResult serialization issue (#5020)

* [MAINTENANCE] Enhance configuring serialization for DotDict type classes (#5023)

* [FEATURE] trino support (#5021)

* clean up SQL statements for handling subqueries properly

* use formal sqlalchemy for reflection

* TRINO WIP POC-COMPLETED

* Add trino package as a dependency and update imports

* Add docker-compose.yml for starburst database in assets/docker/

* Update _create_trino_engine to accept a hostname and schema_name

* Update get_dataset to have a block for trino

* Add ability to use data_alt in test_definitions JSON files (for Trino quirks)

* Minor update to get_test_validator_with_data to make debugging easier

* Add trino to sqla_keys dict in setup.py

* Update 3 test_definition json files with trino things

* Add table_selectable workaround for trino

* Add requirements-dev-trino to test_packaging.py

* Add trino to various azure-pipelines yml files

* Skip test_expectation__get_renderers

* Skip test__get_test_results

* Skip test__generate_expectation_tests__with_no_test_backends

Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>

* Pyarrow upper bound (#5028)

* release prep v0.15.4 (#5029)

* [MAINTENANCE] Use temporary branch to attemp to align git histories of `develop` and `main` (#5042)

* [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531)

* [BUGFIX] Moves testing dependencies out of core reqs (#4522)

* removing upper bound on mistune

* remove deprecated depedencies

* adds untracked dependency

* adds untracked dependency

* adds untracked dependency

* moving dependencies

* removes dependencies added to lite from core | adds missing dependencies

Co-authored-by: Chetan Kini <chetan@superconductive.com>

* [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547)

* [MAINTENANCE] Don't return from validate configuration methods (#4545)

* Add validate_configuration to 2 core Expectations that are passing all their tests

* Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests

* Update all validate_configuration methods to have type hints and return None

* Update all doc snippet references that were effected

* [DOCS] technical term tags connect to data cloud docs (#4414)

* - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.)
- Some additional editing was done to bring documents in line with the documentation and how-to guide standards.

* - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop.

* - Update to include technical term tags. (#4462)

- Minor updates to correct formatting and spelling issues.

* - Moved docs related to contributing integrations under contributing in the ToC (#4551)

- Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term).

* - Adds new image files for the intro page (#4540)

- Updates the image file link for the overview image on the intro page

* [DOCS] clarifications on execution engines and scalability (#4539)

* - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines.

* - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas.

* - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines.

* [DOCS] technical terms for validate data advanced (#4535)

* - add support for technical term tags.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* [DOCS] technical terms for validate data actions docs (#4518)

* - Edits to bring docs up to documentation and how-to guide standards.

* - add technical term tags to documents.
- minor formatting edits (technical terms missing capitalization, etc).

* [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553)

* [MAINTENANCE] Refactor global `conftest` (#4534)

* chore: use black directives to temporarily disable linting

* chore: more black directives to temporarily disable linting

* chore: finish remaining

* refactor: start cleaning up conftest

* refactor: more refactoring of conftest

* refactor: even more refactoring of conftest

* [FEATURE] Improve diagnostic checklist details (#4548)

* Update library_metadata check to provide details when it doesn't pass

* In linting check, if snake_case doesn't match filename, show computed snake_case

* Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr

* Update convert_to_json_serializable to handle bytes

* Update build_gallery.py script to convert diagnostics to JSON in separate try/except

* Update build_gallery.py script to write expectation_library_v2.json file with indenting

* Update _check_input_validation to tell if custom assert statements are used in validate_configuration

* clean up (#4554)

* minor touch up (#4558)

* [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485)

* feat: init commit

* refactor: shift all logic over to base class

* feat: start impl of anonymize on Anonymizer

* feat: get ProfilerRunAnonymizer working

* refactor: remove constructor from ProfilerRunAnonymizer

* refactor: start on CheckpointRunAnonymizer

* fix: clean up broken checkpoint tests

* fix: ensure *args and **kwargs are propogated through

* refactor: start work on datasource anonymizers

* refactor: remove all anonymizers except Anonymizer from usage stats attrs

* fix: update isinstance checks

* refactor: move helper into checkpoint_run_anonymizer

* refactor: move helper into datasource_anonymizer

* refactor: make anonymize string private and place in strategy

* refactor: make anonymize batch info private and place in strategy

* refactor: move build_init_payload to Anonymizer

* refactor: make remainder of anonymize methods private

* refactor: add store info to strategy

* refactor: add dataconnector info to strategy

* refactor: consolidate profiler info and profiler run anonymization

* refactor: remove *args from signatures

* refactor: updates around checkpoint anonymization

* chore: misc cleanup of Anonymizer

* feat: final touch up before review

* chore: remove 'else' statements

* fix: ensure appropriate checkpoint method gets called

* chore: misc updates from review

* refactor: move init_payload back to usage stats

* chore: misc type hinting

* refactor: start using individual classes again

* chore: continue updating individual anonymizer classes

* feat: further updates to child classes

* feat: update anonymize_init_payload

* fix: get checkpoint payloads working

* refactor: ensure all methods have obj

* fix: misc fixes

* fix: make misc updates to conditional checks for obj

* refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer

* refactor: rename Checkpoint and Profiler anonymizers

* feat: leverage aggregate anonymizer downstream

* feature: conditionally create aggregate_anonymizer in constructor

* feat: add cache retrieve or instantiate util

* chore: add batch_request can_handle

* feat: ensure that salt has a default value in anonymizers

* refactor: require aggregate anonymizer in constructor

* refactor: instantiate all strategies in aggregate

* fix: fix broken tests

* refactor: rename internal getter

Co-authored-by: Don Heppner <donald.heppner@gmail.com>

* [MAINTENANCE] Remove duplicate mistune dependency

* [MAINTENANCE] Run PEP-273 checks on a schedule or release cut

* [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573)

* -Corrected the line references and added <snippet> tags to source code for Spark version of guide.

* -Corrected the line references and added <snippet> tags to source code for Pandas version of guide.

* -lint reformat w/black

* -correcting line numbers after lint formatting.

* [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546)

Usage stats instrumentation of package dependencies

* [MAINTENANCE] Add DevRel team to GitHub auto-label action

* [MAINTENANCE] Add GitHub action to conditionally auto-update PR's  (#4574)

* feat: add new action

* chore: add conditions

* [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577)

* chore: bump version

* chore: test change

* chore: update all instances of black

* chore: new test changes

* chore: revert test changes

* Update overview.md (#4556)

* Add missing links.
* Fix some typos
* Simplify flow and grammar in a few places

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* - corrected broken link in admonition box. (#4585)

- updated links in admonition box to point to current technical documentation rather than old core concepts documents.

* [MAINTENANCE] Minor clean-up (#4571)

Little bit of cleanup in our execution engine and validator

* [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590)

* fix : misconfigured ExpectationConfigurationBuilder

* pushing fix

* clean up before submitting for review

* bugfix : remove sorting

* remove extra line

* [MAINTENANCE] Instrument package dependencies (#4583)

* Add dependencies to data_context.__init__ event

* [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599)

* release candidate for 0.14.13

* revert to 0.14.12 state

* [RELEASE] 0.15.3 (#4981)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>

* chore: revert azure pipeline

* chore: revert more files

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>

* [BUGFIX] Patch broken usage stats test around dependency tracking

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: andyjessen <62343929+andyjessen@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>

* [MAINTENANCE] Sync `main` and `develop` branches (#5060)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep (#4980)

* [FEATURE] Splitting data assets into batches using timestamp columns in spark (#4973)

* [BUGFIX] Use `monkeypatch` to ensure consistent bootstrap seed for additional probabilistic test (#4983)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* [FEATURE] Splitting data assets into batches using datetime columns in pandas (#4982)

* [BUGFIX] Patch the remainder of probabilistic `RuleBasedProfiler` tests with consistent bootstrap seed (#4989)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* fix: patch remaining tests

* chore: add docstring

* [FEATURE] Provide Semantic Type Domain Interpretation Utility For Use Within ParameterBuilder Classes (#4993)

* [MAINTENANCE] Splitter cleanup and enhancements (#4984)

* Update action.md (#4967)

Update action.md (#4967)

* [FEATURE] Add support for Interpolation Method for Quantile Statistic Used by Estimators in NumericMetricRangeMultiBatchParameterBuilder (#4997)

* [FEATURE] Enable self-initializing `ExpectColumnMeanToBeBetween` (#4986)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* [FEATURE] Enable self-initializing `ExpectColumnMedianToBeBetween` (#4987)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* chore: update GH action (#5001)

* [FEATURE] Enable self-initializing `ExpectColumnSumToBeBetween` (#4988)

* feat: init commit

* test: write integration test

* feat: add interpolation field

* [MAINTENANCE] Move `DataAssistant` registry capabilities into `DataAssistantRegistry` to enable user aliasing (#4991)

* refactor: move registry dict to dispatcher

* chore: misc cleanup

* chore: misc updates after review

* chore: misc cleanup

* chore: update error message

* Fix continuous partition example (#4939)

When calling json.dumps() method, the weights change.

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] RB…
@cdkini cdkini mentioned this pull request May 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet