New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Allowing schema
to be passed in as batch_spec_passthrough
in Spark
#5900
[FEATURE] Allowing schema
to be passed in as batch_spec_passthrough
in Spark
#5900
Conversation
✅ Deploy Preview for niobium-lead-7998 ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
👇 Click on the image for a new way to code review
Legend |
@pre_dump | ||
def prepare_dump(self, data, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch_spec_passthrough
can be at the Asset
-level
@pre_dump | ||
def prepare_dump(self, data, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch_spec_passthrough
can be at the DataConnector
-level
67a0f5e
to
22f5c01
Compare
…on-for-schema-in-spark
schema
in Sparkschema
to be passed in as batch_spec_passthrough
in Spark
@pytest.mark.parametrize( | ||
"checkpoint_config,expected_serialized_checkpoint_config", | ||
[ | ||
pytest.param( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same tests are now parameterized :)
@pytest.mark.integration | ||
def test_serialization_of_datasource_with_nested_objects_spark(spark_session): | ||
datasource_config: DatasourceConfig = DatasourceConfig( | ||
name="taxi_data", | ||
class_name="Datasource", | ||
module_name="great_expectations.datasource", | ||
execution_engine=ExecutionEngineConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same tests are now parameterized :)
…in-spark' of https://github.com/great-expectations/great_expectations into f/GREAT-465/GREAT-1204/adding-serialization-for-schema-in-spark * 'f/GREAT-465/GREAT-1204/adding-serialization-for-schema-in-spark' of https://github.com/great-expectations/great_expectations: [BUGFIX] Patch issue with `checkpoint_identifier` within `Checkpoint.run` workflow (#5894) [MAINTENANCE] Adding serialization tests for Spark (#5897) [MAINTENANCE] Add slow pytest marker to config and sort them alphabetically. (#5892)
This reverts commit 41dd255.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it!
if schema and isinstance(schema, StructType): | ||
data["batch_spec_passthrough"]["reader_options"][ | ||
"schema" | ||
] = schema.jsonValue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
tests/core/test_serialization.py
Outdated
"schema": StructType( | ||
[ | ||
StructField( | ||
"a", IntegerType(), True, None | ||
), | ||
StructField( | ||
"b", IntegerType(), True, None | ||
), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is leveling up huge!!
…on-for-schema-in-spark
…on-for-schema-in-spark
…on-for-schema-in-spark * develop: [MAINTENANCE] Remove xfails from passing tests in preparation for 0.15.21 release (#5908)
…on-for-schema-in-spark * develop: [DOCS] DOC-368 spelling correction (#5912)
…on-for-schema-in-spark
Changes proposed in this pull request:
StructType
, which is not serializable.schema.jsonValue()
call which translates the object into ajson
, whenever a schema is passed in as abatch_spec_passthrough
parameterschema
can be passed in at theAsset
-level andDataConnector
-level.schema
can also be passed in as part ofBatchRequest
that is part of aCheckpoint
.prepare_dump
andconvert_to_json_serializable()
and adds integration tests totest_serialization.py
.SparkDFExecutionEngine
able to useschema
#5917Definition of Done
Please delete options that are not relevant.
Thank you for submitting!