[MAINTENANCE] General cleanup/refactor of `DataAssistantResult` #5198

cdkini · 2022-05-25T20:22:34Z

Changes proposed in this pull request:

General refactors in preparation for OnboardingDataAssistant plotting

Definition of Done

Please delete options that are not relevant.

My code follows the Great Expectations style guide
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have run any local integration tests and made sure that nothing is broken.

netlify · 2022-05-25T20:22:38Z

✅ Deploy Preview for niobium-lead-7998 ready!

Name	Link
🔨 Latest commit	`470bde9`
🔍 Latest deploy log	https://app.netlify.com/sites/niobium-lead-7998/deploys/628f6f18d6f01000081c6002
😎 Deploy Preview	https://deploy-preview-5198--niobium-lead-7998.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

cdkini · 2022-05-25T20:23:13Z

great_expectations/rule_based_profiler/types/data_assistant_result/data_assistant_result.py

@@ -38,6 +39,8 @@
 )
 from great_expectations.types import ColorPalettes, Colors, SerializableDictDot

+ColumnDataFrame = namedtuple("ColumnDataFrame", ["column", "df"])


Just to provide some more clarity, let's associate our column-df tuple with names.

really really like this adjustment

cdkini · 2022-05-25T20:23:26Z

great_expectations/rule_based_profiler/types/data_assistant_result/data_assistant_result.py

-        theme: Dict[str, Any] = DataAssistantResult._get_theme(theme=theme)
+        theme = DataAssistantResult._get_theme(theme=theme)


Already declared above so the type annotation is not necessary.

cdkini · 2022-05-25T20:23:44Z

great_expectations/rule_based_profiler/types/data_assistant_result/data_assistant_result.py

@@ -440,15 +443,15 @@ def get_expect_domain_values_to_be_between_chart(
            column
            for column in df.columns
            if column
-            not in [
+            not in {


Very minor but set vs list lookup

cdkini · 2022-05-25T20:24:41Z

great_expectations/rule_based_profiler/types/data_assistant_result/data_assistant_result.py

+        column_based_expectation_configurations_by_type: Dict[
+            str, List[ExpectationConfiguration]
+        ] = self._filter_expectation_configurations_by_column_type(
+            expectation_configurations, include_column_names, exclude_column_names
+        )


Helper method to filter relevant column-based expectation configurations from the list - the result is a dictionary with keys representing expectation name and values representing the list of relevant configs.

cdkini · 2022-05-25T20:25:48Z

great_expectations/rule_based_profiler/types/data_assistant_result/data_assistant_result.py

+        for (
+            column_based_expectation_configurations
+        ) in column_based_expectation_configurations_by_type.values():
+            display_charts_for_expectation: List[
+                alt.VConcatChart
+            ] = self._create_display_chart_for_column_domain_expectation(
+                expectation_configurations=column_based_expectation_configurations,
+                attributed_metrics=attributed_metrics_by_column_domain,
+                plot_mode=plot_mode,
+                sequential=sequential,
+            )
+            display_charts.extend(display_charts_for_expectation)
+
+            for expectation_configuration in column_based_expectation_configurations:
+                return_chart: alt.Chart = (
+                    self._create_return_chart_for_column_domain_expectation(
+                        expectation_configuration=expectation_configuration,
+                        attributed_metrics=attributed_metrics_by_column_domain,
+                        plot_mode=plot_mode,
+                        sequential=sequential,
+                    )
+                )
+                return_charts.append(return_chart)


For each column-based expectation, create a layer chart (this is only one for the VolumeDataAssistant but will be more numerous for the OnboardingDataAssistant). I've also moved the _create_return_chart calls to be within the same loop to save on work.

cdkini · 2022-05-25T20:27:01Z

great_expectations/rule_based_profiler/types/data_assistant_result/data_assistant_result.py

-        if plot_mode is PlotMode.PRESCRIPTIVE:
-            if metric_name in implemented_metrics:
+        if metric_name in implemented_metrics:
+            if plot_mode is PlotMode.PRESCRIPTIVE:
                plot_impl = self.get_expect_domain_values_to_be_between_chart
-        elif plot_mode is PlotMode.DESCRIPTIVE:
-            if metric_name in implemented_metrics:
+            elif plot_mode is PlotMode.DESCRIPTIVE:


Flip around conditionals to save on some code (don't need to use if metric_name in implemented_metrics both times).

cdkini · 2022-05-25T20:33:24Z

great_expectations/rule_based_profiler/types/data_assistant_result/data_assistant_result.py

+
+        if metric_name == "column_distinct_values_count":
+            if plot_mode is PlotMode.PRESCRIPTIVE:


I believe this is functionally equivalent but please call me out if not!

# OLD if plot_mode is PlotMode.PRESCRIPTIVE: if metric_name == "column_distinct_values_count": plot_impl = ( self.get_interactive_detail_expect_column_values_to_be_between_chart ) else: if metric_name == "column_distinct_values_count": plot_impl = self.get_interactive_detail_multi_chart # NEW if metric_name == "column_distinct_values_count": if plot_mode is PlotMode.PRESCRIPTIVE: plot_impl = ( self.get_interactive_detail_expect_column_values_to_be_between_chart ) else: plot_impl = self.get_interactive_detail_multi_chart

Same note with the similar refactor above!

This looks good, but as we continue to add more metrics/expectations we will need possibly rethink if the metric names can get pulled from EXPECTATION_METRIC_MAP or if this get's pushed down into child classes.

alexsherstinsky

LGTM

…_expectations into maintenance/refactor-data-assistant-result

Shinnnyshinshin

love it. Thank you @cdkini

Shinnnyshinshin · 2022-05-25T21:09:22Z

great_expectations/rule_based_profiler/types/data_assistant_result/data_assistant_result.py

@@ -38,6 +39,8 @@
 )
 from great_expectations.types import ColorPalettes, Colors, SerializableDictDot

+ColumnDataFrame = namedtuple("ColumnDataFrame", ["column", "df"])


really really like this adjustment

NathanFarmer

LGTM!

NathanFarmer · 2022-05-26T13:24:47Z

great_expectations/rule_based_profiler/types/data_assistant_result/data_assistant_result.py

@@ -38,6 +39,8 @@
 )
 from great_expectations.types import ColorPalettes, Colors, SerializableDictDot

+ColumnDataFrame = namedtuple("ColumnDataFrame", ["column", "df"])


NathanFarmer · 2022-05-26T13:25:28Z

great_expectations/rule_based_profiler/types/data_assistant_result/data_assistant_result.py

@@ -564,7 +567,7 @@ def get_interactive_detail_multi_chart(
        batch_name: str = "batch"
        batch_identifiers: List[str] = [
            column
-            for column in column_dfs[0][1].columns
+            for column in column_dfs[0].df.columns


NathanFarmer · 2022-05-26T13:27:37Z

great_expectations/rule_based_profiler/types/data_assistant_result/data_assistant_result.py


-        domain = domains_by_column_name[domain_kwargs["column"]]
+        domain = domains_by_column_name[column_name]


NathanFarmer · 2022-05-26T13:29:29Z

great_expectations/rule_based_profiler/types/data_assistant_result/data_assistant_result.py

+
+        if metric_name == "column_distinct_values_count":
+            if plot_mode is PlotMode.PRESCRIPTIVE:


This looks good, but as we continue to add more metrics/expectations we will need possibly rethink if the metric names can get pulled from EXPECTATION_METRIC_MAP or if this get's pushed down into child classes.

…' of https://github.com/great-expectations/great_expectations into feature/GREAT-933/data-context-new-hierarchy-with-stubs * 'feature/GREAT-933/data-context-new-hierarchy-with-stubs' of https://github.com/great-expectations/great_expectations: [MAINTENANCE] suppressing type hints in ill-defined situations (#5213) Bugfix for initial position of bar charts with selections (#5212) [RELEASE] 0.15.7 (#5210) typo (#5207) [MAINTENANCE] General cleanup/refactor of `DataAssistantResult` (#5198) [BUGFIX] RuleBasedProfiler: Ensure that run() method runtime environment directives are handled correctly when existing setting is None (by default) (#5202)

refactor: clean up data assistant result

3a729ab

github-actions bot added the core-team label May 25, 2022

cdkini commented May 25, 2022

View reviewed changes

chore: change up implemented metrics

08049ba

alexsherstinsky approved these changes May 25, 2022

View reviewed changes

cdkini added 2 commits May 25, 2022 17:02

Merge branch 'develop' of https://github.com/great-expectations/great…

f5e2ba4

…_expectations into maintenance/refactor-data-assistant-result

refactor: use get() in type_ check

e81f5d3

Shinnnyshinshin approved these changes May 25, 2022

View reviewed changes

cdkini requested a review from NathanFarmer May 26, 2022 12:14

Merge branch 'develop' into maintenance/refactor-data-assistant-result

470bde9

NathanFarmer approved these changes May 26, 2022

View reviewed changes

cdkini merged commit 4a71e72 into develop May 26, 2022

cdkini deleted the maintenance/refactor-data-assistant-result branch May 26, 2022 14:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAINTENANCE] General cleanup/refactor of `DataAssistantResult` #5198

[MAINTENANCE] General cleanup/refactor of `DataAssistantResult` #5198

cdkini commented May 25, 2022 •

edited

Loading

netlify bot commented May 25, 2022 •

edited

Loading

cdkini May 25, 2022

Shinnnyshinshin May 25, 2022

NathanFarmer May 26, 2022

cdkini May 25, 2022

cdkini May 25, 2022

cdkini May 25, 2022

cdkini May 25, 2022

cdkini May 25, 2022

cdkini May 25, 2022

cdkini May 25, 2022

NathanFarmer May 26, 2022

alexsherstinsky left a comment

Shinnnyshinshin left a comment

Shinnnyshinshin May 25, 2022

NathanFarmer left a comment

NathanFarmer May 26, 2022

NathanFarmer May 26, 2022

NathanFarmer May 26, 2022

NathanFarmer May 26, 2022

		theme: Dict[str, Any] = DataAssistantResult._get_theme(theme=theme)
		theme = DataAssistantResult._get_theme(theme=theme)


		if metric_name == "column_distinct_values_count":
		if plot_mode is PlotMode.PRESCRIPTIVE:


		domain = domains_by_column_name[domain_kwargs["column"]]
		domain = domains_by_column_name[column_name]

[MAINTENANCE] General cleanup/refactor of DataAssistantResult #5198

[MAINTENANCE] General cleanup/refactor of DataAssistantResult #5198

Conversation

cdkini commented May 25, 2022 • edited Loading

Definition of Done

netlify bot commented May 25, 2022 • edited Loading

✅ Deploy Preview for niobium-lead-7998 ready!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexsherstinsky left a comment

Choose a reason for hiding this comment

Shinnnyshinshin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NathanFarmer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[MAINTENANCE] General cleanup/refactor of `DataAssistantResult` #5198

[MAINTENANCE] General cleanup/refactor of `DataAssistantResult` #5198

cdkini commented May 25, 2022 •

edited

Loading

netlify bot commented May 25, 2022 •

edited

Loading