Skip to content

Commit

Permalink
[MAINTENANCE] Align main with develop (#5062)
Browse files Browse the repository at this point in the history
* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep (#4980)

* [FEATURE] Splitting data assets into batches using timestamp columns in spark (#4973)

* [BUGFIX] Use `monkeypatch` to ensure consistent bootstrap seed for additional probabilistic test (#4983)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* [FEATURE] Splitting data assets into batches using datetime columns in pandas (#4982)

* [BUGFIX] Patch the remainder of probabilistic `RuleBasedProfiler` tests with consistent bootstrap seed (#4989)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* fix: use monkeypatch on test

* fix: patch remaining tests

* chore: add docstring

* [FEATURE] Provide Semantic Type Domain Interpretation Utility For Use Within ParameterBuilder Classes (#4993)

* [MAINTENANCE] Splitter cleanup and enhancements (#4984)

* Update action.md (#4967)

Update action.md (#4967)

* [FEATURE] Add support for Interpolation Method for Quantile Statistic Used by Estimators in NumericMetricRangeMultiBatchParameterBuilder (#4997)

* [FEATURE] Enable self-initializing `ExpectColumnMeanToBeBetween` (#4986)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* [FEATURE] Enable self-initializing `ExpectColumnMedianToBeBetween` (#4987)

* feat: init commit

* test: write integration test

* chore: add sigfigs

* feat: add interpolation field

* chore: update GH action (#5001)

* [FEATURE] Enable self-initializing `ExpectColumnSumToBeBetween` (#4988)

* feat: init commit

* test: write integration test

* feat: add interpolation field

* [MAINTENANCE] Move `DataAssistant` registry capabilities into `DataAssistantRegistry` to enable user aliasing (#4991)

* refactor: move registry dict to dispatcher

* chore: misc cleanup

* chore: misc updates after review

* chore: misc cleanup

* chore: update error message

* Fix continuous partition example (#4939)

When calling json.dumps() method, the weights change.

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] RBP Profiling Dataset ProgressBar Fix #4999

* [MAINTENANCE] Preliminary refactors for data samplers. (#4996)

* MetricSingleBatchParameterBuilder with unit and integration tests. (#5003)

* [DOCS] Update slack notification guide to not use validation operators. (#4978)

* - Removed references to validation_operator
- Edited config yaml examples
- Grouped config options into tab groups for webhook vs app.

* - Added technical term tag to Checkpoint reference.
- Minor edit to document format.

* corrects number in step header

* [MAINTENANCE] Clean up unused imports and enforce through `flake8` in CI/CD (#5005)

* maintenance: clean up codebase

* chore: add to pipelines

* fix: ensure that flake8 is installed first

* chore: rename CI/CD stage

* chore: update type hint threshold

* parameter builder tests should utilize polymorphism (#5007)

* [MAINTENANCE] Clean up type hints in CLI (#5006)

* chore: first pass

* chore: more updates

* chore: more annotations

* chore: more annotations

* [FEATURE] Enable Pandas DataFrame and Series as MetricValues Output of Metric ParameterBuilder Classes (#5008)

* logging and exception handling (#5009)

* [FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

*[FEATURE] Notebook for `VolumeDataAssistant` Example (#5010)

* [FEATURE] Histogram/Partition Single-Batch ParameterBuilder (#5011)

* [FEATURE] Update `DataAssistantResult.plot()` return value to emit `PlotResult` wrapper dataclass (#4962)

* feat: init commit

* test: write all tests (except 1)

* test: write last test

* test: remove theme test

* refactor: add custom Chart dataclass

* chore: make Chart immutable

* chore: add docstring

* feat: misc updates per team convo

* chore: delete comments

* refactor: delete vconcat methods and consolidate helpers

* refactor: further consolidation through helpers

* chore: update subtitle styling

* chore: add padding

* [ENHANCEMENT] Condense column-level `vconcat` plots into one interactive plot (#5002)

* Clone vertically concatenated chart for interactive starting point

* Working interactive chart

* Update ColorPalette names

* Update ordinal palette

* Tooltip now working

* Change legend color

* Add y-axis titles

* Align y-axis vertically for both charts

* Add highlight line

* Change batch_id to Batch ID

* Improve legend title and tooltip titles

* New layer for starting point showing one line

* Detail title updating appropriately

* Column name shows up with empty selecdtion

* Use variable for alt.value(light_gray)

* Allow selection by mouseover on lines

* Anomaly encoded lines

* Move column seledtor to top left

* Format input_dropdown name

* Working with expectation kwargs

* Add predicate logic for strict_min and strict_max

* Add subtitle to prescriptive return charts

* WIP

* Overcame merge conflicts in descriptive mode

* Overcame merge conflicts in prescriptive mode

* Column charts are in their own list index

* [MAINTENANCE] Update version of black in pre-commit config

* [MAINTENANCE] Improve tooltips and formatting for distinct column values chart in VolumeDataAssistantResult (#5017)

* Correct type hints

* Improve tooltips

* Improve docstrings

* Fix return object indexing

* Return list length 1 instead of chart

* [FEATURE] Limit samplers work with supported sqlalchemy backends (#5014)

* [BUGFIX] Fix DataAssistantResult serialization issue (#5020)

* [MAINTENANCE] Enhance configuring serialization for DotDict type classes (#5023)

* [FEATURE] trino support (#5021)

* clean up SQL statements for handling subqueries properly

* use formal sqlalchemy for reflection

* TRINO WIP POC-COMPLETED

* Add trino package as a dependency and update imports

* Add docker-compose.yml for starburst database in assets/docker/

* Update _create_trino_engine to accept a hostname and schema_name

* Update get_dataset to have a block for trino

* Add ability to use data_alt in test_definitions JSON files (for Trino quirks)

* Minor update to get_test_validator_with_data to make debugging easier

* Add trino to sqla_keys dict in setup.py

* Update 3 test_definition json files with trino things

* Add table_selectable workaround for trino

* Add requirements-dev-trino to test_packaging.py

* Add trino to various azure-pipelines yml files

* Skip test_expectation__get_renderers

* Skip test__get_test_results

* Skip test__generate_expectation_tests__with_no_test_backends

Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>

* Pyarrow upper bound (#5028)

* release prep v0.15.4 (#5029)

* [MAINTENANCE] Use temporary branch to attemp to align git histories of `develop` and `main` (#5042)

* [FEATURE] Introduce ParameterBuilder Evaluation Dependencies and Validation Dependencies 2022 03 23 66 (#4531)

* [BUGFIX] Moves testing dependencies out of core reqs (#4522)

* removing upper bound on mistune

* remove deprecated depedencies

* adds untracked dependency

* adds untracked dependency

* adds untracked dependency

* moving dependencies

* removes dependencies added to lite from core | adds missing dependencies

Co-authored-by: Chetan Kini <chetan@superconductive.com>

* [FEATURE] Convert Existing Self-Initializing Expectations to Make ExpectationConfigurationBuilder Self-Contained with its own validation_parameter_builder settings (#4547)

* [MAINTENANCE] Don't return from validate configuration methods (#4545)

* Add validate_configuration to 2 core Expectations that are passing all their tests

* Comment out examples for expect_column_values_to_match_regex.py... its test_definitions JSON has many more tests

* Update all validate_configuration methods to have type hints and return None

* Update all doc snippet references that were effected

* [DOCS] technical term tags connect to data cloud docs (#4414)

* - Adds technical tags to all documents in the Connect to data: Cloud section of the docs. (Note, the term in the <WhereToRunCode /> imported component was tagged in a different PR.)
- Some additional editing was done to bring documents in line with the documentation and how-to guide standards.

* - Fixed extra </Tabs> and </TabItem> closures from prior commit to resolve conflicts with develop.

* - Update to include technical term tags. (#4462)

- Minor updates to correct formatting and spelling issues.

* - Moved docs related to contributing integrations under contributing in the ToC (#4551)

- Minor edit to title of "How to write integration documentation" to conform to ToC standards (not title cased unless containing a Technical Term).

* - Adds new image files for the intro page (#4540)

- Updates the image file link for the overview image on the intro page

* [DOCS] clarifications on execution engines and scalability (#4539)

* - DOC-184: Specify in the tutorial that Spark and SqlAlchemy are also supported Execution Engines.

* - DOC-183: In the Execution Engine technical term page, list the class names for Execution Engines and specify that spark is supported as a scalable alternative to Pandas.

* - DOC-182: In the connect to data: overview section for "configuring your datasource's execution engine" list the class names for execution engines.

* [DOCS] technical terms for validate data advanced (#4535)

* - add support for technical term tags.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* - added technical term tags.
- Updated how to validate data without a checkpoint to mention the replacement workflow and that additional documentation is forthcoming.
- NOTE: how_to_implement_custom_notifications needs to be rewritten, and was not edited as part of this PR.

* [DOCS] technical terms for validate data actions docs (#4518)

* - Edits to bring docs up to documentation and how-to guide standards.

* - add technical term tags to documents.
- minor formatting edits (technical terms missing capitalization, etc).

* [MAINTENANCE] Rule-Based Profiler: Refactor utilities into appropriate modules/classes for better separation of concerns (#4553)

* [MAINTENANCE] Refactor global `conftest` (#4534)

* chore: use black directives to temporarily disable linting

* chore: more black directives to temporarily disable linting

* chore: finish remaining

* refactor: start cleaning up conftest

* refactor: more refactoring of conftest

* refactor: even more refactoring of conftest

* [FEATURE] Improve diagnostic checklist details (#4548)

* Update library_metadata check to provide details when it doesn't pass

* In linting check, if snake_case doesn't match filename, show computed snake_case

* Change class name for expect_column_values_to_be_valid_ipv4.py and remove package attr

* Update convert_to_json_serializable to handle bytes

* Update build_gallery.py script to convert diagnostics to JSON in separate try/except

* Update build_gallery.py script to write expectation_library_v2.json file with indenting

* Update _check_input_validation to tell if custom assert statements are used in validate_configuration

* clean up (#4554)

* minor touch up (#4558)

* [MAINTENANCE] Refactor Anonymizer utilizing the Strategy design pattern (#4485)

* feat: init commit

* refactor: shift all logic over to base class

* feat: start impl of anonymize on Anonymizer

* feat: get ProfilerRunAnonymizer working

* refactor: remove constructor from ProfilerRunAnonymizer

* refactor: start on CheckpointRunAnonymizer

* fix: clean up broken checkpoint tests

* fix: ensure *args and **kwargs are propogated through

* refactor: start work on datasource anonymizers

* refactor: remove all anonymizers except Anonymizer from usage stats attrs

* fix: update isinstance checks

* refactor: move helper into checkpoint_run_anonymizer

* refactor: move helper into datasource_anonymizer

* refactor: make anonymize string private and place in strategy

* refactor: make anonymize batch info private and place in strategy

* refactor: move build_init_payload to Anonymizer

* refactor: make remainder of anonymize methods private

* refactor: add store info to strategy

* refactor: add dataconnector info to strategy

* refactor: consolidate profiler info and profiler run anonymization

* refactor: remove *args from signatures

* refactor: updates around checkpoint anonymization

* chore: misc cleanup of Anonymizer

* feat: final touch up before review

* chore: remove 'else' statements

* fix: ensure appropriate checkpoint method gets called

* chore: misc updates from review

* refactor: move init_payload back to usage stats

* chore: misc type hinting

* refactor: start using individual classes again

* chore: continue updating individual anonymizer classes

* feat: further updates to child classes

* feat: update anonymize_init_payload

* fix: get checkpoint payloads working

* refactor: ensure all methods have obj

* fix: misc fixes

* fix: make misc updates to conditional checks for obj

* refactor: rename ExpectationAnonymizer to ExpectationSuiteAnonymizer

* refactor: rename Checkpoint and Profiler anonymizers

* feat: leverage aggregate anonymizer downstream

* feature: conditionally create aggregate_anonymizer in constructor

* feat: add cache retrieve or instantiate util

* chore: add batch_request can_handle

* feat: ensure that salt has a default value in anonymizers

* refactor: require aggregate anonymizer in constructor

* refactor: instantiate all strategies in aggregate

* fix: fix broken tests

* refactor: rename internal getter

Co-authored-by: Don Heppner <donald.heppner@gmail.com>

* [MAINTENANCE] Remove duplicate mistune dependency

* [MAINTENANCE] Run PEP-273 checks on a schedule or release cut

* [DOCS] correct code reference line numbers and snippet tags for how to create a batch of data from an in memory data frame (#4573)

* -Corrected the line references and added <snippet> tags to source code for Spark version of guide.

* -Corrected the line references and added <snippet> tags to source code for Pandas version of guide.

* -lint reformat w/black

* -correcting line numbers after lint formatting.

* [MAINTENANCE] Package dependencies usage stats instrumentation - part 1 (#4546)

Usage stats instrumentation of package dependencies

* [MAINTENANCE] Add DevRel team to GitHub auto-label action

* [MAINTENANCE] Add GitHub action to conditionally auto-update PR's  (#4574)

* feat: add new action

* chore: add conditions

* [MAINTENANCE] Bump version of `black` in response to hotfix for Click v8.1.0 (#4577)

* chore: bump version

* chore: test change

* chore: update all instances of black

* chore: new test changes

* chore: revert test changes

* Update overview.md (#4556)

* Add missing links.
* Fix some typos
* Simplify flow and grammar in a few places

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* - corrected broken link in admonition box. (#4585)

- updated links in admonition box to point to current technical documentation rather than old core concepts documents.

* [MAINTENANCE] Minor clean-up (#4571)

Little bit of cleanup in our execution engine and validator

* [BUGFIX] Adjust output of datetime `ParameterBuilder` to match Expectation (#4590)

* fix : misconfigured ExpectationConfigurationBuilder

* pushing fix

* clean up before submitting for review

* bugfix : remove sorting

* remove extra line

* [MAINTENANCE] Instrument package dependencies (#4583)

* Add dependencies to data_context.__init__ event

* [MAINTENANCE] Standardize DomainBuilder Constructor Arguments Ordering (#4599)

* release candidate for 0.14.13

* revert to 0.14.12 state

* [RELEASE] 0.15.3 (#4981)

* ProgressBar for DataAssistant RuleBasedProfiler computations. (#4918)

* [MAINTENANCE] Ensure that code style scripts in CI/CD exit early on failure (#4921)

* chore: update pipeliens

* chore: remove scope check from pipeline

* [BUGFIX] check contrib requirements (#4922)

* Add check that requirements is a list, but don't crash if it's not

* Make requirements for icd_ten_category expectation a list

* [BUGFIX] Enables successful parsing of test cases for multi-table expectations (#4906)

* Remove pointless _generate_expectation_tests wrapper method and update docstring on generate_expectation_tests

* Remove accepting 'return_only_gallery_example's arg from run_diagnostics method

* Update build_gallery.py script to receive --no-core --no-contrib and arbitrary Expectation list

* Use phrase 'Has a valid library_metadata object'

* Update ExpectationTestDiagnostics to have include_in_gallery

* Update _get_metric_list to accept expectation_config instead of executed_test_cases

* Update ExpectationTestDiagnostics to include validation_result and error_diagnostics

* Delete _execute_test_examples, _choose_example, _instantiate_example_validation_results, and ExecutedExpectationTestCase

* Reformat with black

* Update run_diagnostics to determine maturity level based on checks passed

* Update evaluate_json_test_cfe to accept raise_exception and return a tuple

* Update _get_test_results to include more in ExpectationErrorDiagnostics via evaluate_json_test_cfe

* Add backend_test_result_counts to ExpectationDiagnostics and use in helpers

* Reformat with black

* Remove unused imports (flake8)

* Fix fix tests

* Update asserts at end of creating_custom_expectations/expect_xxx.py

* Add some print statements to generate_expectation_tests when get_test_validator_with_data has a problem

* test setup

* fixes diagnostics for multi-table expectations

* wrap tmp_dir -> abspath in func

* apply  to test_expectations/test_expectations_cfe

* docstring

Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>

* [MAINTENANCE] Remove BatchRequest from Rule-Based Profiler Configuration and from Builder Constructor Arguments (#4927)

* [BUGFIX] Add missing events to schema (#4917)

* [MAINTENANCE] Improve Altair plotting extensibility (#4923)

* Comments on altair documentation

* Predicate BinaryExpression type hint

* Make default theme and enum as well

* Pass custom config to altair

* Bugfix using nested_update

* Add tests that test notebook execution

* Add failing test

* Move opacity into theme, rename variable

* Vanquish tooltip and point_color_condition parameters

* [FEATURE] new checksum expectation (#4657)

* [FEATURE] code for new checksum expectation

* [FEATURE] code for new checksum expectation

* initial code for checksum expectation

* linting & library_metadata updates

Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>

* [BUGFIX] Update helper to add explicit alias to subqueries for SQLA version < 1.4 (#4660)

* Update helper to add explicit alias to subqueries for SQLA version < 1.4

Implicit conversion of a nested select into a subquery failed when
running on SQLA 1.3 against Postgres - update the existing helper to
also handle older supported versions of SQLA.

* Update util.py

Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>

* [BUGFIX] Fix clickhouse same-alias issue (#4389)

* Fix broken link for checklist (#4932)

* [MAINTENANCE] Remove DataContext from DataAssistant  (#4931)

* [MAINTENANCE] Add condition for custom checks in great_expectations pipelines

* Move general data splitting tasks to abstract base class (#4942)

* [MAINTENANCE] Add test to check for missing usage events (#4933)

* [FEATURE] Provide ability to combine lists of ExpectationConfiguration objects into flexible ExpectationSuite containers (#4943)

* [MAINTENANCE] Move splitter related taxi integration test fixtures (#4947)

* Update SLACK_GUIDELINES.md

Updating some language in Slack Guidelines

* added how to ask a question link

* cleanup (#4949)

* [MAINTENANCE] Rearrange modules for better reusability (#4955)

* [MAINTENANCE] Add timeout to great_expectations pipeline stages to prevent false positive build failures

* [FEATURE] Enable self-initializing capabilities for `ExpectColumnProportionOfUniqueValuesToBeBetween` (#4929)

* feat: init commit

* chore: misc changes per convo with Alex

* feat: finish initial impl

* feat: finish impl after convo with Alex

* chore: update after review

* clean up (#4959)

* [FEATURE] Enable support for plotting both Table and Column charts in `VolumeDataAssistant` (#4930)

* feat: init commit

* feat: continue chugging along

* feat: get both types of charts to work

* chore: only update relevant kwargs in df

* feat: add subtitle support

* feat: create predicate helper func

* chore: update type hint

* chore: bold subtitle

* chore: work on cleaning up vconcat

* feat: continue impl

* feat: get both prescriptive and descriptive working

* chore: delete unnecessary import

* refactor: further cleanup

* chore: shrink charts some more

* refactor: rename private method

* chore: add docstrings

* feat: add include/exclude column names lists

* fix: correct method calls

* fix: fix assertion around include/exclude columns

* chore: update styling of charts

* chore: misc changes per Nathan review

* [BUGFIX] Use `monkeypatch` to set a consistent bootstrap seed in tests (#4960)

* feat: start impl

* chore: finishing touches

* fix: remedy typo in test

* feat: update test

* chore: revert changes in utils

* chore: add comment

* pin cryptography package (#4963)

pin cryptography package (#4963)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] BigQuery Temp Table Support (#4925)

* [FEATURE] Registry for DataAssistant classes with ability to execute from DataContext by registered name (#4966)

* [MAINTENANCE] Type annotate relevant functions with `-> None` (per PEP 484) (#4969)

* feat: run script to type annotate

* chore: update threshold

* Enable RuleBasedProfiler components to be serializable. (#4972)

* [BUGFIX] extras_require (#4968)

* Remove azure from requirements-dev-sqlalchemy.txt

* Update get_extras_require func to strip comments and include sqlalchemy for some keys

* [MAINTENANCE] Handle edge cases where `false_positive_rate` is not in range [0, 1] or very close to bounds (#4946)

* Warn and use NP_EPSILON if false_positive_rate <= 0, raise ValueError if false_positive_rate >= 1

* Add failing tests for corner cases

* Passing tests for 0 and 1 false_positive_rate

* Add tests for very small false_positive_rates

* Return type is already validated as float

* Use custom ProfilerExecutionError rather than ValueError

* Use 1-NP_EPSILON as an upper bound

* Pass variables to quentin fixture to set random seed

* Bugfix setting wrong parameter

* Set object attribute as well

* Unable to access the actual false_positive_rate used as it is private

* Use floats instead of ints

* Update type hints

* [MAINTENANCE] fix a typo  (#4974)

* [FEATURE] Enable self-intializing capabilities for `ExpectColumnValuesToMatchRegex`/`ExpectColumnValuesToNotMatchRegex` (#4958)

* feat: start impl

* test: start writing alice test

* feat: misc updates per discussion with Alex

* test: update test regexes

* feat: update other expectation

* chore: update fixtures

* chore: type hint

* [BUGFIX] Fix broken packaging test and update dgtest-overrides

* [FEATURE] Provide "estimation histogram" ParameterBuilder output details . (#4975)

* [FEATURE] Enable self-initializing ExpectColumnValuesToMatchStrftimeFormat

* release prep

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>

* chore: revert azure pipeline

* chore: revert more files

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>

* [BUGFIX] Patch broken usage stats test around dependency tracking

* [FEATURE] Add subset operation to Domain class (#5049)

* [FEATURE] In DataAssistant: Use Domain instead of domain_type as key for Metrics Parameter Builders (#5057)

* [MAINTENANCE] Save output of usage stats schema script in repo (#5053)

* chore: add output

* chore: update

* chore: docstring

Co-authored-by: Alex Sherstinsky <alexsherstinsky@users.noreply.github.com>
Co-authored-by: kenwade4 <95714847+kenwade4@users.noreply.github.com>
Co-authored-by: Austin Ziech Robinson <44794138+austiezr@users.noreply.github.com>
Co-authored-by: Ken Wade <ken@superconductive.com>
Co-authored-by: Anthony Burdi <anthony@superconductive.com>
Co-authored-by: Nathan Farmer <NathanFarmer@users.noreply.github.com>
Co-authored-by: Yashavant Dudhe <ydudhe@gmail.com>
Co-authored-by: Yashavant-Dudhe <Yashavant.Dudhe@kyndryl.com>
Co-authored-by: Austin Robinson <austin@superconductive.com>
Co-authored-by: Douglas Cook <dugup@hotmail.co.uk>
Co-authored-by: serg-music <99654151+serg-music@users.noreply.github.com>
Co-authored-by: Kyle Eaton <kyle@superconductivehealth.com>
Co-authored-by: William Shin <will@superconductive.com>
Co-authored-by: Abe Gong <abegong@users.noreply.github.com>
Co-authored-by: andyjessen <62343929+andyjessen@users.noreply.github.com>
Co-authored-by: Rachel-Reverie <94694058+Rachel-Reverie@users.noreply.github.com>
Co-authored-by: Alex Sherstinsky <alex@superconductive.com>
Co-authored-by: James Campbell <james.p.campbell@gmail.com>
Co-authored-by: Don Heppner <donald.heppner@gmail.com>
Co-authored-by: Ben Horkley <horkley@superconductive.com>
Co-authored-by: Allen Sallinger <allen@superconductive.com>
  • Loading branch information
22 people committed May 6, 2022
1 parent 9ca11fa commit fa66eae
Show file tree
Hide file tree
Showing 3 changed files with 5,530 additions and 2 deletions.
23 changes: 21 additions & 2 deletions great_expectations/core/usage_statistics/schemas.py
Expand Up @@ -1189,8 +1189,27 @@
],
}

if __name__ == "__main__":

def write_schema_to_file(target_dir: str) -> None:
"""Utility to write schema to disk.
The file name will always be "usage_statistics_record_schema.json" but the target directory can be specified.
Args:
target_dir (str): The dir you wish to write the schema to.
"""
import json
import os

file: str = "usage_statistics_record_schema.json"
out: str = os.path.join(target_dir, file)

with open("usage_statistics_record_schema.json", "w") as outfile:
with open(out, "w") as outfile:
json.dump(anonymized_usage_statistics_record_schema, outfile, indent=2)


if __name__ == "__main__":
import sys

target_dir = sys.argv[1] if len(sys.argv) >= 2 else "."
write_schema_to_file(target_dir)

0 comments on commit fa66eae

Please sign in to comment.