[BUGFIX] Fix run_diagnostics for contrib expectations #3096

talagluck · 2021-07-22T01:33:52Z

Changes proposed in this pull request:

This fixes a bug in the check_json_test_result function whereby the result of list.sort(), which is None, was being assigned to a variable, when it should have been sorted(list).
Apart from modifying from .sort() to sorted(), this also adds a key to sorted() to make the sorting function more robust when it comes to comparing items of different data types.
As a result of fixing this function, a couple of failing tests came to light that had previously not been caught, so this addresses those as well. In the case of the test for expect_column_values_to_be_between, I removed the unexpected_list due to conflicts between timestamp parsing between pandas and sqlalchemy.

Thanks to @manyshapes for surfacing!

Definition of Done

Please delete options that are not relevant.

My code follows the Great Expectations style guide
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added unit tests where applicable and made sure that new and existing tests are passing.
I have run any local integration tests and made sure that nothing is broken.

Thank you for submitting!

netlify · 2021-07-22T01:33:57Z

✔️ Deploy Preview for knoxpod ready!

🔨 Explore the source changes: 4891cf8

🔍 Inspect the deploy log: https://app.netlify.com/sites/knoxpod/deploys/60f99a57fb302a000733cf45

😎 Browse the preview: https://deploy-preview-3096--knoxpod.netlify.app

…pectations

github-actions · 2021-07-22T01:34:13Z

HOWDY! This is your friendly 🤖 CHANGELOG bot 🤖

Please don't forget to add a clear and succinct description of your change under the Develop header in docs/changelog.rst, if applicable. This will help us with the release process. See the Contribution checklist in the Great Expectations documentation for the type of labels to use!

✨ Thank you! ✨

… github.com:great-expectations/great_expectations into bugfix/fix_run_diagnostics_for_contrib_expectations

cdkini

I left a few minor comments but none of them are blocking so LGTM 🚀

This is more of a general software engineering question but if we find bugs that haven't been caught by our test suite, should we add new tests to cover the missing functionality upon fixing the bug? So in the case of None being returned by sort(), should we add a test that ensures we properly handles cases that failed before the change to sorted()?

cdkini · 2021-07-22T13:34:52Z

great_expectations/self_check/util.py

+                        value = sorted(value, key=lambda x: str(x))
+                        result["result"]["unexpected_list"] = sorted(
+                            result["result"]["unexpected_list"], key=lambda x: str(x)
+                        )


Ah sort() vs sorted() is always annoying. Great catch!

What are the types of value and unexpected_list here? Do we need the custom comparator for sorting? I think its perfectly fine but just asking for my knowledge.

The types for both of these should be list. We do a check on L1905 to see if value is a list, and we only get here if that resolves to True. We also assume that unexpected_list is True, and I think that's just a fundamental part of expectations (I'm not sure whether we test than anywhere). I can add some typing here as well - I'm not going to add it everywhere in this module, so it may look a bit funny, but if adding it here makes it clearer, I can do that.

Nope I think that explanation is more than sufficient for me :) thanks for clarifying.

Ah, and the custom comparator is due to an issue that was arising if the values in the list were of different types. So if one of the values was a list, but the rest were strings, this would raise an error due to trying to sort items of different types.

cdkini · 2021-07-22T13:35:58Z

tests/test_definitions/column_map_expectations/expect_column_values_to_be_between.json

        "out": {
-          "unexpected_list": ["Jan 01 2001 12:00:01"],
          "unexpected_index_list": [0, 9],


Could you clarify what this removal is doing exactly?

Essentially, this json test is running a test for this expectation for both pandas and sqlalchemy. I believe that parse_strings_as_datetimes functions differently for each of these. So in the case of pandas, the unexpected list would look like what was there (though it was previously missing the 1970 date from L9). However in the case of sqlalchemy, this date actually gets parsed into "2001-01-01 12:00:01" or thereabouts. (I also might be mixing up pandas and sqlalchemy here). Though the expectation for both backgrounds is correct, the unexpected list for each would be slightly different. I could split out the section completely for the two backends, but that adds a fair amount of complexity, and we have most of the functionality due to unexpected_index_list - this tells us the index of the unexpected dates. So I decided to go with that approach.

That makes a lot of sense as well. Do you think we should add a ticket about bifurcating these backends to our backlog? I think that split might be helpful but I'm not sure if the value it'll provide is worth the effort.

My gut tells me that the bifurcation would not be worthwhile. I think it would really only help in very severe edge cases, like this one, and it would introduce a bunch of complexity and duplicated code.

It might be worth a re-visit of this testing infrastructure entirely to make sure that it's effective for V3. When we initially set it up, it was basically a port of V2 functionality to V3. I'm not sure how much additional thought and work has been done to it since then, but it's possible that it could stand some updating.

…pectations

talagluck · 2021-07-22T15:18:38Z

Thanks for the review, @cdkini! To your point about adding test coverage - the way I'm looking at this, we thought we had test coverage here, and we didn't, due to the sorted/sort issue. That is fixed now, and so now we should catch any errors not previously caught (which is the case in the two other files amended here. Adding additional test coverage feels like we would be testing our testing capability. Am I thinking about this right? Happy to chat.

cdkini · 2021-07-22T15:29:34Z

Hmm I think that makes sense. My question was more general and less specific to this PR so it may have been better to have discussed this externally. Sorry if it's off-topic.

The change you made is a very internal part of the method and so, I think isolating that specific behavior in a test would be difficult. Plus, more tests are not always the answer. I more so asked the question to get a better sense of when and when not to add tests but I think the points you've made are valid and I'm happy to have this merged as is!

…pectations

…https://github.com/great-expectations/great_expectations into docs/GDOC-199/snowflake-connections-closed-correctly * 'docs/GDOC-199/snowflake-connections-closed-correctly' of https://github.com/great-expectations/great_expectations: Update util.py [MAINTENANCE] rename map_metric.py to map_metric_provider.py (with DeprecationWarning) for a better code readability/interpretability (#3103) [MAINTENANCE] rename ColumnMetricProvider to ColumnAggregateMetricProvider (with DeprecationWarning) (#3100) use correct variable name (#3069) [RELEASE] release candidate for 2021-07-22 (#3101) [FEATURE] SqlAlchemy engine support for column.most_common_value metric (#3020) [BUGFIX] Fix run_diagnostics for contrib expectations (#3096) [BUGFIX] Fix typos discovered by codespell (#3064) [DOCS] Migrating pages under guides/miscellaneous (#3094) [DOCS] Port over "How to run a Checkpoint in Airflow" from RTD to Docusaurus (#3074) disable snowflake tests temporarily (#3093) [BUGFIX] fix incorrect pandas top rows usage (#3091)

talagluck added 2 commits July 21, 2021 15:32

Resolved sorting and variable assignation error

d684ac8

Fix sorting and variable assignation error

4067d89

Merge branch 'develop' into bugfix/fix_run_diagnostics_for_contrib_ex…

871b25b

…pectations

talagluck added 2 commits July 21, 2021 22:31

Updated failing json expectation test

e900320

Merge branch 'bugfix/fix_run_diagnostics_for_contrib_expectations' of…

aa66c18

… github.com:great-expectations/great_expectations into bugfix/fix_run_diagnostics_for_contrib_expectations

cdkini approved these changes Jul 22, 2021

View reviewed changes

Merge branch 'develop' into bugfix/fix_run_diagnostics_for_contrib_ex…

f0335c3

…pectations

Merge branch 'develop' into bugfix/fix_run_diagnostics_for_contrib_ex…

4891cf8

…pectations

talagluck enabled auto-merge (squash) July 22, 2021 16:19

talagluck merged commit a3f5f8f into develop Jul 22, 2021

talagluck deleted the bugfix/fix_run_diagnostics_for_contrib_expectations branch July 22, 2021 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUGFIX] Fix run_diagnostics for contrib expectations #3096

[BUGFIX] Fix run_diagnostics for contrib expectations #3096

talagluck commented Jul 22, 2021 •

edited

netlify bot commented Jul 22, 2021 •

edited

github-actions bot commented Jul 22, 2021

cdkini left a comment

cdkini Jul 22, 2021

talagluck Jul 22, 2021

cdkini Jul 22, 2021

talagluck Jul 22, 2021

cdkini Jul 22, 2021

talagluck Jul 22, 2021

cdkini Jul 22, 2021

talagluck Jul 22, 2021

talagluck commented Jul 22, 2021

cdkini commented Jul 22, 2021 •

edited

[BUGFIX] Fix run_diagnostics for contrib expectations #3096

[BUGFIX] Fix run_diagnostics for contrib expectations #3096

Conversation

talagluck commented Jul 22, 2021 • edited

Definition of Done

netlify bot commented Jul 22, 2021 • edited

github-actions bot commented Jul 22, 2021

HOWDY! This is your friendly 🤖 CHANGELOG bot 🤖

cdkini left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

talagluck commented Jul 22, 2021

cdkini commented Jul 22, 2021 • edited

talagluck commented Jul 22, 2021 •

edited

netlify bot commented Jul 22, 2021 •

edited

cdkini commented Jul 22, 2021 •

edited