[MAINTENANCE] Adding serialization tests for Spark #5897

Shinnnyshinshin · 2022-08-31T20:48:19Z

Changes proposed in this pull request:

2 tests added to test_serialization for serializing objects with SparkDFExecutionEngine.
1 is a Checkpoint test
1 is a Datasource test
Both will be set-up for allowing the serialization to happen automatically with a spark schema object

After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in github issues or slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123).

Previous Design Review notes:

Definition of Done

Please delete options that are not relevant.

My code follows the Great Expectations style guide
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added unit tests where applicable and made sure that new and existing tests are passing.
I have run any local integration tests and made sure that nothing is broken.

Thank you for submitting!

netlify · 2022-08-31T20:48:23Z

✅ Deploy Preview for niobium-lead-7998 ready!

Name	Link
🔨 Latest commit	`e987124`
🔍 Latest deploy log	https://app.netlify.com/sites/niobium-lead-7998/deploys/630fd29f293f30000876de12
😎 Deploy Preview	https://deploy-preview-5897--niobium-lead-7998.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

ghost · 2022-08-31T20:50:37Z

👇 Click on the image for a new way to code review

Make big changes easier — review code in small groups of related files
Know where to start — see the whole change at a glance
Take a code tour — explore the change with an interactive tour
Make comments and review — all fully sync’ed with github

Try it now!

Legend

cdkini · 2022-08-31T20:52:43Z

tests/core/test_serialization.py

@@ -696,6 +704,133 @@ def test_checkpoint_config_and_nested_objects_are_serialized(
    )


+@pytest.mark.unit
+def test_checkpoint_config_and_nested_objects_are_serialized_spark(spark_session):


By nature of having a Spark session here, I think this is an integration test

you're right. Updated

cdkini · 2022-08-31T20:53:07Z

tests/core/test_serialization.py

+    )
+
+
+def test_serialization_of_datasource_with_nested_objects_spark(spark_session):


Please annotate with mark 🙇🏽

cdkini · 2022-08-31T20:53:40Z

tests/core/test_serialization.py

+    expected_serialized_datasource_config: dict = {
+        "data_connectors": {
+            "configured_asset_connector": {
+                "assets": {
+                    "my_asset": {
+                        "batch_spec_passthrough": {"reader_options": {"header": True}},
+                        "class_name": "Asset",
+                        "module_name": "great_expectations.datasource.data_connector.asset",
+                    }
+                },
+                "class_name": "ConfiguredAssetFilesystemDataConnector",
+                "module_name": "great_expectations.datasource.data_connector.configured_asset_filesystem_data_connector",
+            }
+        },
+        "execution_engine": {
+            "class_name": "SparkDFExecutionEngine",
+            "module_name": "great_expectations.execution_engine.sparkdf_execution_engine",
+        },
+        "module_name": "great_expectations.datasource",
+        "class_name": "Datasource",
+        "name": "taxi_data",
+    }
+
+    observed_dump = datasourceConfigSchema.dump(obj=datasource_config)


Think we could take all of this and combine into one test test_serialization_something_about_spark and parameterize the schema and expected return values.

For this one I dont feel as comfortable because we are testing two separate objects CheckpointConfig and DatasourceConfig. I feel like the rest of the tests keep them separated.

As a follow-up to this PR, I'm planning on adding 2 more tests (one as a CheckpointConfig test and one as a DatasourceConfig test), where we do the pre_dump() logic on the schema object. I think that would be a more appropriate way to parameterize the values.... what do you think?

works for me! thank you 🙇🏽

…rialization-checkpoint-and-datasource-spark

…in-spark' of https://github.com/great-expectations/great_expectations into f/GREAT-465/GREAT-1204/adding-serialization-for-schema-in-spark * 'f/GREAT-465/GREAT-1204/adding-serialization-for-schema-in-spark' of https://github.com/great-expectations/great_expectations: [BUGFIX] Patch issue with `checkpoint_identifier` within `Checkpoint.run` workflow (#5894) [MAINTENANCE] Adding serialization tests for Spark (#5897) [MAINTENANCE] Add slow pytest marker to config and sort them alphabetically. (#5892)

dded tests for spark object serialization

80bc6a7

github-actions bot added the core-team label Aug 31, 2022

removed extra comment

c22ba49

removing extra imports

a75a014

cdkini requested changes Aug 31, 2022

View reviewed changes

integration tests marking

610493e

Shinnnyshinshin requested a review from cdkini August 31, 2022 21:05

Shinnnyshinshin self-assigned this Aug 31, 2022

cdkini approved these changes Aug 31, 2022

View reviewed changes

Merge branch 'develop' into m/GREAT-465/GREAT-1204/adding-test-for-se…

e987124

…rialization-checkpoint-and-datasource-spark

Shinnnyshinshin merged commit 28df031 into develop Aug 31, 2022

Shinnnyshinshin deleted the m/GREAT-465/GREAT-1204/adding-test-for-serialization-checkpoint-and-datasource-spark branch August 31, 2022 22:45

Shinnnyshinshin mentioned this pull request Sep 1, 2022

[FEATURE] Allowing schema to be passed in as batch_spec_passthrough in Spark #5900

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAINTENANCE] Adding serialization tests for Spark #5897

[MAINTENANCE] Adding serialization tests for Spark #5897

Shinnnyshinshin commented Aug 31, 2022 •

edited

netlify bot commented Aug 31, 2022 •

edited

ghost commented Aug 31, 2022 •

edited by ghost

cdkini Aug 31, 2022

Shinnnyshinshin Aug 31, 2022

cdkini Aug 31, 2022

cdkini Aug 31, 2022

Shinnnyshinshin Aug 31, 2022

Shinnnyshinshin Aug 31, 2022

cdkini Aug 31, 2022

		)


		def test_serialization_of_datasource_with_nested_objects_spark(spark_session):

[MAINTENANCE] Adding serialization tests for Spark #5897

[MAINTENANCE] Adding serialization tests for Spark #5897

Conversation

Shinnnyshinshin commented Aug 31, 2022 • edited

Previous Design Review notes:

Definition of Done

netlify bot commented Aug 31, 2022 • edited

✅ Deploy Preview for niobium-lead-7998 ready!

ghost commented Aug 31, 2022 • edited by ghost

Legend

cdkini Aug 31, 2022

Choose a reason for hiding this comment

Shinnnyshinshin Aug 31, 2022

Choose a reason for hiding this comment

cdkini Aug 31, 2022

Choose a reason for hiding this comment

cdkini Aug 31, 2022

Choose a reason for hiding this comment

Shinnnyshinshin Aug 31, 2022

Choose a reason for hiding this comment

Shinnnyshinshin Aug 31, 2022

Choose a reason for hiding this comment

cdkini Aug 31, 2022

Choose a reason for hiding this comment

Shinnnyshinshin commented Aug 31, 2022 •

edited

netlify bot commented Aug 31, 2022 •

edited

ghost commented Aug 31, 2022 •

edited by ghost