Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] Remove DataAssistants #9859

Merged
merged 24 commits into from
May 2, 2024
Merged

Conversation

cdkini
Copy link
Member

@cdkini cdkini commented May 1, 2024

Assistants and their components should be deleted if possible. If not, we should make them private.

Changes:

  • Deleted all assistants
  • Deleted profiler store and references throughout codebase
  • Moved rule_based_profiler/ directory into experimental/ (some components are relied on there for CDM)

We could probably trim more from the RBP directory but I've removed tests so we should be good.

  • Description of PR changes above includes a link to an existing GitHub issue
  • PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB]
  • Code is linted - run invoke lint (uses ruff format + ruff check)
  • Appropriate tests and docs have been updated

For more information about contributing, see Contribute.

After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!

Copy link

netlify bot commented May 1, 2024

Deploy Preview for niobium-lead-7998 canceled.

Name Link
🔨 Latest commit 76ca808
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/6633b15e7c66dc0008a50f30

Copy link

codecov bot commented May 2, 2024

Codecov Report

Attention: Patch coverage is 86.36364% with 27 lines in your changes are missing coverage. Please review.

Project coverage is 78.61%. Comparing base (e08b025) to head (76ca808).

Files Patch % Lines
...ns/data_context/data_context/cloud_data_context.py 77.77% 4 Missing ⚠️
...ions/experimental/rule_based_profiler/rule/rule.py 62.50% 3 Missing ⚠️
...ns/experimental/rule_based_profiler/config/base.py 50.00% 2 Missing ⚠️
...er/estimators/bootstrap_numeric_range_estimator.py 50.00% 2 Missing ⚠️
...profiler/estimators/kde_numeric_range_estimator.py 50.00% 2 Missing ⚠️
...sed_profiler/estimators/numeric_range_estimator.py 0.00% 2 Missing ⚠️
...rule_based_profiler/helpers/runtime_environment.py 0.00% 2 Missing ⚠️
...imental/rule_based_profiler/rule_based_profiler.py 81.81% 2 Missing ⚠️
...rule_based_profiler/attributed_resolved_metrics.py 0.00% 1 Missing ⚠️
...omain_builder/categorical_column_domain_builder.py 75.00% 1 Missing ⚠️
... and 6 more
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9859      +/-   ##
===========================================
- Coverage    81.18%   78.61%   -2.57%     
===========================================
  Files          500      484      -16     
  Lines        44481    42544    -1937     
===========================================
- Hits         36112    33447    -2665     
- Misses        8369     9097     +728     
Flag Coverage Δ
3.10 64.51% <86.36%> (-1.40%) ⬇️
3.10 athena or clickhouse or openpyxl or pyarrow or project or sqlite or aws_creds ?
3.10 aws_deps ?
3.10 big ?
3.10 databricks ?
3.10 filesystem ?
3.10 mssql ?
3.10 mysql ?
3.10 postgresql ?
3.10 snowflake ?
3.10 spark ?
3.10 trino ?
3.11 64.51% <86.36%> (-1.40%) ⬇️
3.11 athena or clickhouse or openpyxl or pyarrow or project or sqlite or aws_creds 53.91% <80.30%> (+0.87%) ⬆️
3.11 aws_deps 44.87% <80.30%> (+0.46%) ⬆️
3.11 big 55.99% <80.30%> (-3.41%) ⬇️
3.11 databricks 46.02% <80.30%> (+0.53%) ⬆️
3.11 filesystem 61.22% <86.36%> (-1.04%) ⬇️
3.11 mssql 48.89% <80.30%> (+0.64%) ⬆️
3.11 mysql 48.95% <80.30%> (+0.64%) ⬆️
3.11 postgresql 52.80% <80.30%> (+0.82%) ⬆️
3.11 snowflake 46.65% <80.30%> (+0.55%) ⬆️
3.11 spark 56.94% <86.36%> (-1.29%) ⬇️
3.11 trino 50.80% <80.30%> (-0.49%) ⬇️
3.8 64.53% <86.36%> (-1.39%) ⬇️
3.8 athena or clickhouse or openpyxl or pyarrow or project or sqlite or aws_creds 53.92% <80.30%> (+0.87%) ⬆️
3.8 aws_deps 44.89% <80.30%> (+0.46%) ⬆️
3.8 big 56.01% <80.30%> (-3.40%) ⬇️
3.8 databricks 46.04% <80.30%> (+0.53%) ⬆️
3.8 filesystem 61.23% <86.36%> (-1.04%) ⬇️
3.8 mssql 48.87% <80.30%> (+0.64%) ⬆️
3.8 mysql 48.93% <80.30%> (+0.64%) ⬆️
3.8 postgresql 52.79% <80.30%> (+0.82%) ⬆️
3.8 snowflake 46.67% <80.30%> (+0.55%) ⬆️
3.8 spark ?
3.8 trino 50.79% <80.30%> (-0.50%) ⬇️
3.9 64.53% <86.36%> (-1.39%) ⬇️
3.9 athena or clickhouse or openpyxl or pyarrow or project or sqlite or aws_creds ?
3.9 aws_deps ?
3.9 big ?
3.9 databricks ?
3.9 filesystem ?
3.9 mssql ?
3.9 mysql ?
3.9 postgresql ?
3.9 snowflake ?
3.9 spark ?
3.9 trino ?
cloud 0.00% <0.00%> (ø)
docs-basic 49.02% <1.51%> (-2.56%) ⬇️
docs-creds-needed 50.10% <1.51%> (-2.52%) ⬇️
docs-spark 48.24% <1.51%> (-2.60%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cdkini cdkini changed the title [MAINTENANCE] Rename rule_based_profiler/ directory to _data_assistants/ [MAINTENANCE] Delete RuleBasedProfiler, DataAssistants, and ProfilerStore May 2, 2024
@cdkini cdkini changed the title [MAINTENANCE] Delete RuleBasedProfiler, DataAssistants, and ProfilerStore [MAINTENANCE] Remove DataAssistants May 2, 2024
from great_expectations.rule_based_profiler.config import ParameterBuilderConfig
from great_expectations.rule_based_profiler.data_assistant import DataAssistant
from great_expectations.rule_based_profiler.data_assistant_result import (
from great_expectations.experimental.rule_based_profiler.config import ParameterBuilderConfig
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved RBP into experimental since it is referenced there by metrics_repository

@@ -39,7 +39,6 @@
"checkpoint_store",
"suite_parameter_store",
"validation_results_store",
"profiler_store",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more need for a profiler store

@@ -1,104 +0,0 @@
"""Example Script: How to create an Expectation Suite with the Missingness Data Assistant
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All assistants have been deleted

@@ -178,7 +178,7 @@ class Meta:
module_name = fields.String(
required=False,
allow_none=True,
missing="great_expectations.rule_based_profiler.domain_builder",
load_default="great_expectations.rule_based_profiler.domain_builder",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throws deprecation warning if we don't use new keyword arg

Copy link
Contributor

@tyler-hoffman tyler-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! If CI is happy I'm happy!

@cdkini cdkini enabled auto-merge May 2, 2024 14:30
@cdkini cdkini added this pull request to the merge queue May 2, 2024
Merged via the queue into develop with commit 69c1b2c May 2, 2024
68 checks passed
@cdkini cdkini deleted the m/v1-24/rbp_and_assistant_cleanup branch May 2, 2024 16:06
@victorrgez
Copy link

victorrgez commented May 3, 2024

Hi @cdkini I was wondering if you could help me a bit or if this is not the correct place, please guide me to the correct door to knock.

I'm recently introducing myself to the world of great expectations. As far as I understand, there used to be Data profilers that helped us create new suites of expectations from scratch. Then, they were removed according to 1.

I thought that they had been replaced by DataAssistants since I could not find more public info than 1. Now I have realised that, according to this thread 2, DataAssistants are also being removed.

Is there any new feature of great expectations that is going to replace DataAssistants or is there any particular reason for which they are being removed?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants