New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MAINTENANCE] Adding serialization tests for Spark #5897
[MAINTENANCE] Adding serialization tests for Spark #5897
Conversation
✅ Deploy Preview for niobium-lead-7998 ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
👇 Click on the image for a new way to code review
Legend |
@@ -696,6 +704,133 @@ def test_checkpoint_config_and_nested_objects_are_serialized( | |||
) | |||
|
|||
|
|||
@pytest.mark.unit | |||
def test_checkpoint_config_and_nested_objects_are_serialized_spark(spark_session): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By nature of having a Spark session here, I think this is an integration
test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right. Updated
) | ||
|
||
|
||
def test_serialization_of_datasource_with_nested_objects_spark(spark_session): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please annotate with mark 🙇🏽
expected_serialized_datasource_config: dict = { | ||
"data_connectors": { | ||
"configured_asset_connector": { | ||
"assets": { | ||
"my_asset": { | ||
"batch_spec_passthrough": {"reader_options": {"header": True}}, | ||
"class_name": "Asset", | ||
"module_name": "great_expectations.datasource.data_connector.asset", | ||
} | ||
}, | ||
"class_name": "ConfiguredAssetFilesystemDataConnector", | ||
"module_name": "great_expectations.datasource.data_connector.configured_asset_filesystem_data_connector", | ||
} | ||
}, | ||
"execution_engine": { | ||
"class_name": "SparkDFExecutionEngine", | ||
"module_name": "great_expectations.execution_engine.sparkdf_execution_engine", | ||
}, | ||
"module_name": "great_expectations.datasource", | ||
"class_name": "Datasource", | ||
"name": "taxi_data", | ||
} | ||
|
||
observed_dump = datasourceConfigSchema.dump(obj=datasource_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we could take all of this and combine into one test test_serialization_something_about_spark
and parameterize the schema and expected return values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one I dont feel as comfortable because we are testing two separate objects CheckpointConfig
and DatasourceConfig
. I feel like the rest of the tests keep them separated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow-up to this PR, I'm planning on adding 2 more tests (one as a CheckpointConfig
test and one as a DatasourceConfig
test), where we do the pre_dump()
logic on the schema
object. I think that would be a more appropriate way to parameterize the values.... what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works for me! thank you 🙇🏽
…rialization-checkpoint-and-datasource-spark
…in-spark' of https://github.com/great-expectations/great_expectations into f/GREAT-465/GREAT-1204/adding-serialization-for-schema-in-spark * 'f/GREAT-465/GREAT-1204/adding-serialization-for-schema-in-spark' of https://github.com/great-expectations/great_expectations: [BUGFIX] Patch issue with `checkpoint_identifier` within `Checkpoint.run` workflow (#5894) [MAINTENANCE] Adding serialization tests for Spark (#5897) [MAINTENANCE] Add slow pytest marker to config and sort them alphabetically. (#5892)
Changes proposed in this pull request:
test_serialization
for serializing objects withSparkDFExecutionEngine
.Checkpoint
testDatasource
testAfter submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.
For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.
In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in github issues or slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g.
closes #123
).Previous Design Review notes:
Definition of Done
Please delete options that are not relevant.
Thank you for submitting!