[BUGFIX] Fix conditional for pyspark compatibility #8108

anthonyburdi · 2023-06-13T18:06:19Z

Looks like we were checking the compatibility module instead of the actual pyspark import. The issue is that the truthyness of the module will not change whether pyspark is installed or not, which is what we are trying to test in the if pyspark:.

netlify · 2023-06-13T18:06:23Z

✅ Deploy Preview for niobium-lead-7998 canceled.

Name	Link
🔨 Latest commit	`214cb12`
🔍 Latest deploy log	https://app.netlify.com/sites/niobium-lead-7998/deploys/6489e5e0375e3400087d5a2b

ghost · 2023-06-13T18:13:49Z

👇 Click on the image for a new way to code review

Legend

billdirks

LGTM, thanks!

alexsherstinsky · 2023-06-13T19:15:25Z

great_expectations/core/batch.py

@@ -690,9 +690,9 @@ def head(self, *args, **kwargs) -> pd.DataFrame:
        return pd.DataFrame({})


-if pyspark:
+if pyspark_DataFrame:  # type: ignore[truthy-function]


@anthonyburdi This does not sound right -- per your own idea, I thought that we could actually make pyspark.DataFrame the way of passing the DataFrame (and I got to appreciate that idea!). Happy to discuss. Thanks.

Hi @alexsherstinsky, the issue with the code as-is was that the if pyspark: as written was checking truthyness of the compatibility/pyspark.py module and not whether pyspark was installed. We could have changed it to if pyspark.pyspark: but thought that since only the dataframe is imported that this might be cleaner.

@anthonyburdi You got me to believe that using the pattern if pyspark.pyspark or if pyspark.DataFrame would work -- and it did! So I utilized it uniformly; the benefit was full scoping (no need to create special variables). So I would vote for that approach, because if that pyspark.DataFrame does not exist, the SPARK_NOT_IMPORTED will produce the relevant error message. Thanks!

That makes sense @alexsherstinsky, the issue though here is that the if statement was effectively not checking for SPARK_NOT_IMPORTED but instead for the existence of the pyspark compatibility module. That is a real bug and what this PR fixes.

@alexsherstinsky fixed, thanks for the discussion

alexsherstinsky

@anthonyburdi I do not yet see this change to be needed. Happy to discuss. Thanks!

Kilo59 · 2023-06-13T20:36:14Z

great_expectations/core/batch.py

@@ -690,9 +690,9 @@ def head(self, *args, **kwargs) -> pd.DataFrame:
        return pd.DataFrame({})


-if pyspark:
+if pyspark_DataFrame:  # type: ignore[truthy-function]


@anthonyburdi can you add a comment to this type ignore explaining why the truthy-function is "okay"?

Thanks for the catch!

alexsherstinsky

LGTM! Thank you!

…m/great-expectations/great_expectations into m/_/sqlalchemy2-pandas2-follow-up * 'm/_/sqlalchemy2-pandas2-follow-up' of https://github.com/great-expectations/great_expectations: [BUGFIX] Robust Handling Of Column Types And Empty DataFrames For DataBricks/Spark Environment (#8115) [BUGFIX] respect result format bool only for validators and checkpoints (#8111) [MAINTENANCE] Update build_in_memory_runtime_context to accept which datasources to include (#8017) [BUGFIX] Import dataframe directly (#8108) [MAINTENANCE] Cleanup generate_expectation_tests (#8019) [MAINTENANCE] Ensure that new usage statistics schema changes are backwards compatible (#8109)

import dataframe directly

7743251

anthonyburdi self-assigned this Jun 13, 2023

github-actions bot added core dx labels Jun 13, 2023

anthonyburdi changed the title ~~[BUGFIX] import dataframe directly~~ [BUGFIX] Import dataframe directly Jun 13, 2023

anthonyburdi added 2 commits June 13, 2023 14:20

Merge branch 'develop' into b/_/fix_pyspark_import

9cf449f

Ignore mypy error

2a11f0d

anthonyburdi requested review from billdirks and Kilo59 June 13, 2023 18:23

anthonyburdi enabled auto-merge (squash) June 13, 2023 19:00

billdirks approved these changes Jun 13, 2023

View reviewed changes

Merge develop into b/_/fix_pyspark_import

ba8aca5

alexsherstinsky reviewed Jun 13, 2023

View reviewed changes

alexsherstinsky suggested changes Jun 13, 2023

View reviewed changes

Kilo59 reviewed Jun 13, 2023

View reviewed changes

Kilo59 approved these changes Jun 13, 2023

View reviewed changes

github-actions bot and others added 2 commits June 13, 2023 21:30

Merge develop into b/_/fix_pyspark_import

99f4243

Add ignore rationale

599a016

anthonyburdi changed the title ~~[BUGFIX] Import dataframe directly~~ [BUGFIX] Fix conditional for pyspark compatibility Jun 14, 2023

github-actions bot and others added 2 commits June 14, 2023 14:06

Merge develop into b/_/fix_pyspark_import

03b7554

Just check for pyspark.DataFrame

f941ed3

alexsherstinsky approved these changes Jun 14, 2023

View reviewed changes

Merge develop into b/_/fix_pyspark_import

214cb12

anthonyburdi merged commit b7d2060 into develop Jun 14, 2023
43 checks passed

anthonyburdi deleted the b/_/fix_pyspark_import branch June 14, 2023 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUGFIX] Fix conditional for pyspark compatibility #8108

[BUGFIX] Fix conditional for pyspark compatibility #8108

anthonyburdi commented Jun 13, 2023 •

edited

netlify bot commented Jun 13, 2023 •

edited

ghost commented Jun 13, 2023 •

edited by ghost

Legend

billdirks left a comment

alexsherstinsky Jun 13, 2023

anthonyburdi Jun 13, 2023

alexsherstinsky Jun 13, 2023

anthonyburdi Jun 14, 2023

anthonyburdi Jun 14, 2023

alexsherstinsky left a comment

Kilo59 Jun 13, 2023 •

edited

anthonyburdi Jun 14, 2023

alexsherstinsky left a comment

[BUGFIX] Fix conditional for pyspark compatibility #8108

[BUGFIX] Fix conditional for pyspark compatibility #8108

Conversation

anthonyburdi commented Jun 13, 2023 • edited

netlify bot commented Jun 13, 2023 • edited

✅ Deploy Preview for niobium-lead-7998 canceled.

ghost commented Jun 13, 2023 • edited by ghost

Legend

billdirks left a comment

Choose a reason for hiding this comment

alexsherstinsky Jun 13, 2023

Choose a reason for hiding this comment

anthonyburdi Jun 13, 2023

Choose a reason for hiding this comment

alexsherstinsky Jun 13, 2023

Choose a reason for hiding this comment

anthonyburdi Jun 14, 2023

Choose a reason for hiding this comment

anthonyburdi Jun 14, 2023

Choose a reason for hiding this comment

alexsherstinsky left a comment

Choose a reason for hiding this comment

Kilo59 Jun 13, 2023 • edited

Choose a reason for hiding this comment

anthonyburdi Jun 14, 2023

Choose a reason for hiding this comment

alexsherstinsky left a comment

Choose a reason for hiding this comment

anthonyburdi commented Jun 13, 2023 •

edited

netlify bot commented Jun 13, 2023 •

edited

ghost commented Jun 13, 2023 •

edited by ghost

Kilo59 Jun 13, 2023 •

edited