Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] General cleanup/refactor of DataAssistantResult #5198

Merged
merged 5 commits into from
May 26, 2022

Conversation

cdkini
Copy link
Member

@cdkini cdkini commented May 25, 2022

Changes proposed in this pull request:

  • General refactors in preparation for OnboardingDataAssistant plotting

Definition of Done

Please delete options that are not relevant.

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have run any local integration tests and made sure that nothing is broken.

@netlify
Copy link

netlify bot commented May 25, 2022

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit 470bde9
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/628f6f18d6f01000081c6002
😎 Deploy Preview https://deploy-preview-5198--niobium-lead-7998.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@@ -38,6 +39,8 @@
)
from great_expectations.types import ColorPalettes, Colors, SerializableDictDot

ColumnDataFrame = namedtuple("ColumnDataFrame", ["column", "df"])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to provide some more clarity, let's associate our column-df tuple with names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really really like this adjustment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

Comment on lines -351 to +354
theme: Dict[str, Any] = DataAssistantResult._get_theme(theme=theme)
theme = DataAssistantResult._get_theme(theme=theme)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already declared above so the type annotation is not necessary.

@@ -440,15 +443,15 @@ def get_expect_domain_values_to_be_between_chart(
column
for column in df.columns
if column
not in [
not in {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor but set vs list lookup

Comment on lines +1634 to +1638
column_based_expectation_configurations_by_type: Dict[
str, List[ExpectationConfiguration]
] = self._filter_expectation_configurations_by_column_type(
expectation_configurations, include_column_names, exclude_column_names
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper method to filter relevant column-based expectation configurations from the list - the result is a dictionary with keys representing expectation name and values representing the list of relevant configs.

Comment on lines +1647 to +1669
for (
column_based_expectation_configurations
) in column_based_expectation_configurations_by_type.values():
display_charts_for_expectation: List[
alt.VConcatChart
] = self._create_display_chart_for_column_domain_expectation(
expectation_configurations=column_based_expectation_configurations,
attributed_metrics=attributed_metrics_by_column_domain,
plot_mode=plot_mode,
sequential=sequential,
)
display_charts.extend(display_charts_for_expectation)

for expectation_configuration in column_based_expectation_configurations:
return_chart: alt.Chart = (
self._create_return_chart_for_column_domain_expectation(
expectation_configuration=expectation_configuration,
attributed_metrics=attributed_metrics_by_column_domain,
plot_mode=plot_mode,
sequential=sequential,
)
)
return_charts.append(return_chart)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each column-based expectation, create a layer chart (this is only one for the VolumeDataAssistant but will be more numerous for the OnboardingDataAssistant). I've also moved the _create_return_chart calls to be within the same loop to save on work.

Comment on lines -1714 to +1754
if plot_mode is PlotMode.PRESCRIPTIVE:
if metric_name in implemented_metrics:
if metric_name in implemented_metrics:
if plot_mode is PlotMode.PRESCRIPTIVE:
plot_impl = self.get_expect_domain_values_to_be_between_chart
elif plot_mode is PlotMode.DESCRIPTIVE:
if metric_name in implemented_metrics:
elif plot_mode is PlotMode.DESCRIPTIVE:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flip around conditionals to save on some code (don't need to use if metric_name in implemented_metrics both times).

Comment on lines +1869 to +1871

if metric_name == "column_distinct_values_count":
if plot_mode is PlotMode.PRESCRIPTIVE:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is functionally equivalent but please call me out if not!

        # OLD 
        if plot_mode is PlotMode.PRESCRIPTIVE:
            if metric_name == "column_distinct_values_count":
                plot_impl = (
                    self.get_interactive_detail_expect_column_values_to_be_between_chart
                )
        else:
            if metric_name == "column_distinct_values_count":
                plot_impl = self.get_interactive_detail_multi_chart

       # NEW
        if metric_name == "column_distinct_values_count":
            if plot_mode is PlotMode.PRESCRIPTIVE:
                plot_impl = (
                    self.get_interactive_detail_expect_column_values_to_be_between_chart
                )
            else:
                plot_impl = self.get_interactive_detail_multi_chart

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note with the similar refactor above!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but as we continue to add more metrics/expectations we will need possibly rethink if the metric names can get pulled from EXPECTATION_METRIC_MAP or if this get's pushed down into child classes.

Copy link
Contributor

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Shinnnyshinshin Shinnnyshinshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love it. Thank you @cdkini

@@ -38,6 +39,8 @@
)
from great_expectations.types import ColorPalettes, Colors, SerializableDictDot

ColumnDataFrame = namedtuple("ColumnDataFrame", ["column", "df"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really really like this adjustment

@cdkini cdkini requested a review from NathanFarmer May 26, 2022 12:14
Copy link
Contributor

@NathanFarmer NathanFarmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -38,6 +39,8 @@
)
from great_expectations.types import ColorPalettes, Colors, SerializableDictDot

ColumnDataFrame = namedtuple("ColumnDataFrame", ["column", "df"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

@@ -564,7 +567,7 @@ def get_interactive_detail_multi_chart(
batch_name: str = "batch"
batch_identifiers: List[str] = [
column
for column in column_dfs[0][1].columns
for column in column_dfs[0].df.columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️


domain = domains_by_column_name[domain_kwargs["column"]]
domain = domains_by_column_name[column_name]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦🏻

Comment on lines +1869 to +1871

if metric_name == "column_distinct_values_count":
if plot_mode is PlotMode.PRESCRIPTIVE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but as we continue to add more metrics/expectations we will need possibly rethink if the metric names can get pulled from EXPECTATION_METRIC_MAP or if this get's pushed down into child classes.

@cdkini cdkini merged commit 4a71e72 into develop May 26, 2022
@cdkini cdkini deleted the maintenance/refactor-data-assistant-result branch May 26, 2022 14:04
Shinnnyshinshin pushed a commit that referenced this pull request May 26, 2022
…' of https://github.com/great-expectations/great_expectations into feature/GREAT-933/data-context-new-hierarchy-with-stubs

* 'feature/GREAT-933/data-context-new-hierarchy-with-stubs' of https://github.com/great-expectations/great_expectations:
  [MAINTENANCE] suppressing type hints in ill-defined situations (#5213)
  Bugfix for initial position of bar charts with selections (#5212)
  [RELEASE] 0.15.7 (#5210)
  typo (#5207)
  [MAINTENANCE] General cleanup/refactor of `DataAssistantResult` (#5198)
  [BUGFIX] RuleBasedProfiler: Ensure that run() method runtime environment directives are handled correctly when existing setting is None (by default) (#5202)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants