Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Implement the "column.standard_deviation" metric for sqlite database #5338

Conversation

alexsherstinsky
Copy link
Contributor

Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], or [MAINTENANCE]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.

Changes proposed in this pull request:

  • JIRA: GREAT-586

After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in github issues or slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123).

Previous Design Review notes:

Definition of Done

Please delete options that are not relevant.

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

Thank you for submitting!

Alex Sherstinsky and others added 30 commits June 15, 2022 15:41
…4/GREAT-999/alexsherstinsky/rule_based_profiler/data_assistant/data_assistant_result/move_getting_expectation_suite_to_expectation_suite_class-2022_06_15-170
…/alexsherstinsky/rule_based_profiler/remove_json_serialize_directive_and_add_raw_parameter_builder_computation_results_to_json_serialized_results-2022_06_15-170
…/alexsherstinsky/rule_based_profiler/remove_json_serialize_directive_and_add_raw_parameter_builder_computation_results_to_json_serialized_results-2022_06_15-170
…/alexsherstinsky/rule_based_profiler/remove_json_serialize_directive_and_add_raw_parameter_builder_computation_results_to_json_serialized_results-2022_06_15-170
…ror-changelog' into maintenance/GREAT-467/GREAT-464/GREAT-999/alexsherstinsky/rule_based_profiler/remove_json_serialize_directive_and_add_raw_parameter_builder_computation_results_to_json_serialized_results-2022_06_15-170
…ky/rule_based_profiler/remove_json_serialize_directive_and_add_raw_parameter_builder_computation_results_to_json_serialized_results-2022_06_15-170' into pre_pr-prototype/maintenance/GREAT-467/GREAT-464/GREAT-1000/alexsherstinsky/rule_based_profiler/data_assistant/onboarding_data_assistant/performance_improvements-2022_06_15-171
…exsherstinsky/rule_based_profiler/enable_numeric_metric_range_multibatch_parameter_builder_to_use_evaluation_dependencies-2022_06_16-171
…EAT-464/GREAT-1000/alexsherstinsky/rule_based_profiler/data_assistant/onboarding_data_assistant/performance_improvements-2022_06_15-171
…rule_based_profiler/enable_numeric_metric_range_multibatch_parameter_builder_to_use_evaluation_dependencies-2022_06_16-171' into pre_pr-prototype/maintenance/GREAT-467/GREAT-464/GREAT-1000/alexsherstinsky/rule_based_profiler/data_assistant/onboarding_data_assistant/performance_improvements-2022_06_15-171
…/maintenance/GREAT-467/GREAT-464/GREAT-1000/alexsherstinsky/rule_based_profiler/data_assistant/onboarding_data_assistant/performance_improvements-2022_06_15-171
…EAT-464/GREAT-1000/alexsherstinsky/rule_based_profiler/data_assistant/onboarding_data_assistant/performance_improvements-2022_06_15-171
…EAT-464/GREAT-1000/alexsherstinsky/rule_based_profiler/data_assistant/onboarding_data_assistant/performance_improvements-2022_06_15-171
@ghost
Copy link

ghost commented Jun 17, 2022

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

@netlify
Copy link

netlify bot commented Jun 17, 2022

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit 246be4d
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/62ae2905651ae700087e0164
😎 Deploy Preview https://deploy-preview-5338--niobium-lead-7998.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@@ -25,7 +25,7 @@ def _pandas(cls, column, **kwargs):
def _sqlalchemy(cls, column, **kwargs):
"""SqlAlchemy Mean Implementation"""
# column * 1.0 needed for correct calculation of avg in MSSQL
return sa.func.avg(column * 1.0)
return sa.func.avg(1.0 * column)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: does this make a difference in the results of the calculation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shinnnyshinshin No -- only sense of aesthetics on my part. Thanks!

Alex Sherstinsky added 2 commits June 17, 2022 14:11
@alexsherstinsky alexsherstinsky enabled auto-merge (squash) June 17, 2022 21:14
…_standard_deviation_metric_for_sqlite_database-2022_06_17-176
@alexsherstinsky alexsherstinsky enabled auto-merge (squash) June 17, 2022 21:18
Copy link
Contributor

@Shinnnyshinshin Shinnnyshinshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate you digging into this @alexsherstinsky Thank you very very much. LGTM

Comment on lines +45 to 52
elif _dialect.name.lower() == "sqlite":
mean = _metrics["column.mean"]
nonnull_row_count = _metrics["column_values.null.unexpected_count"]
standard_deviation = sa.func.sqrt(
sa.func.sum((1.0 * column - mean) * (1.0 * column - mean))
/ ((1.0 * nonnull_row_count) - 1.0)
)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@@ -1496,7 +1496,6 @@ def candidate_test_is_on_temporary_notimplemented_list_cfe(context, expectation_
"expect_column_values_to_be_dateutil_parseable",
"expect_column_values_to_be_json_parseable",
"expect_column_values_to_match_json_schema",
"expect_column_stdev_to_be_between",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoohoo. Just explicitly stating that this enables the expect_column_stdev_to_be_between Expectation tests for sql dialects (include sqlite). So additional unittests are not needed for this PR

Copy link
Contributor

@kenwade4 kenwade4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great and everything is passing for expect_column_stdev_to_be_between in the build_gallery.py script! Thanks!

@alexsherstinsky alexsherstinsky enabled auto-merge (squash) June 18, 2022 05:26
@alexsherstinsky alexsherstinsky enabled auto-merge (squash) June 18, 2022 05:26
@alexsherstinsky alexsherstinsky enabled auto-merge (squash) June 18, 2022 21:27
@alexsherstinsky alexsherstinsky merged commit f9a2127 into develop Jun 18, 2022
@alexsherstinsky alexsherstinsky deleted the feature/GREAT-586/alexsherstinsky/implement_column_standard_deviation_metric_for_sqlite_database-2022_06_17-176 branch June 18, 2022 22:15
Shinnnyshinshin pushed a commit that referenced this pull request Jun 21, 2022
…ture/GREAT-953/migration-part2-refactor-usage-stats-opt

* feature/GREAT-953/migration-part1-move-to-abc:
  Update base_data_context.py
  [MAINTENANCE] Update release schedule JSON (#5349)
  [DOCS] DOC-337 automate updates to the version information displayed in the getting started tutorial. (#5348)
  Maintenance/great 761/great 1010/great 1011/alexsherstinsky/rule based profiler/data assistant/include only essential public methods in data assistant dispatcher class 2022 06 21 177 (#5351)
  [FEATURE] Implement the "column.standard_deviation" metric for sqlite database (#5338)
  [BUGFIX] Fix for failing Expectation test in `cloud_db_integration` pipeline (#5321)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants