-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MAINTENANCE] FDS Documentation - Creating ExpectationSuite with Domain Knowledge #7852
[MAINTENANCE] FDS Documentation - Creating ExpectationSuite with Domain Knowledge #7852
Conversation
* develop: (79 commits) [DOCS] Removing datasource centric test_yaml_config doc (#7836) [FEATURE] Splitters work with Spark Fluent Datasources (#7832) [MAINTENANCE] New PR template (#7710) [DOCS] Update how to create a checkpoint with Test YAML config (#7835) [DOCS] Updating Checkpoint terms page (#7722) [BUGFIX] Adding support for Fluent Batch Requests to context.get_validator (#7808) [FEATURE] Plumbing of validation_result_url from cloud response (#7809) [MAINTENANCE] Enable S3/Spark Connecting To Your Data tests (#7828) [BUGFIX] Handle "persist" directive in "SparkDFExecutionEngine" properly. (#7830) [DOCS] Update docs for how_to_initialize_a_filesystem_data_context_in_python (#7831) [MAINTENANCE] Clean up: Remove duplicated fixture and utilize deeper filtering mechanism for configuration assertions. (#7825) [MAINTENANCE] Enable Spark-S3 Integration tests on Azure CI/CD (#7819) [MAINTENANCE] FDS - Datasources can rebuild their own asset data_connectors (#7826) [DOCS] Prerequisites Cleanup (#7811) [BUGFIX] Azure Package Presence/Absence Tests Strengthening (#7818) [RELEASE] 0.16.11 (#7824) [MAINTENANCE] Fix pin count. (#7823) [BUGFIX] Upper bound `pyathena` due to breaking API in V3 (#7821) [MAINTENANCE] Fix linting error. (#7820) [BUGFIX] Cloud - Fix FDS Asset has no attribute `_data_connector` (#7813) ...
* develop: [MAINTENANCE] Prevent TCH001 warnings for pydantic model annotations (#7846) [BUGFIX] FDS - Deletes not immediately reflected in `great_expectations.yml` (#7843) [MAINTENANCE] Boto import pattern established (#7796) [DOCS] Creating a Checkpoint from an In-Memory Dataframe (#7701) [MAINTENANCE] ruff `.0.262 -> .0.265` (#7829)
✅ Deploy Preview for niobium-lead-7998 ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
…ithub.com/great-expectations/great_expectations into d/dx-441/new_guide_edit_expectation_suite * 'd/dx-441/new_guide_edit_expectation_suite' of https://github.com/great-expectations/great_expectations: [MAINTENANCE] Clean up version checker message formatting (#7838) [BUGFIX] `batching_regex` tags are now correctly rendered in docs (#7845) [MAINTENANCE] Adding docs link checker to invoke (#7841) [MAINTENANCE] Pin altair (#7849)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few small items to address.
...o_create_and_edit_expectations_based_on_domain_knowledge_without_inspecting_data_directly.md
Show resolved
Hide resolved
### 2. Use an existing Data Asset to create a Batch Request | ||
|
||
Add the following method to retrieve a previously configured Data Asset from the Data Context you initialized and create a Batch Request to identify the Batch of data that you'll use to validate your Expectations: | ||
|
||
```python name="tests/integration/docusaurus/expectations/how_to_create_and_edit_an_expectationsuite_domain_knowledge.py get_data_asset_and_build_batch_request" | ||
``` | ||
|
||
:::info Limit the Batches returned by a Batch Request | ||
|
||
You can provide a dictionary as the `options` parameter of `build_batch_request()` to limit the Batches returned by a Batch Request. If you leave the `options` parameter empty, your Batch Request will include all the Batches configured in the corresponding Data Asset. For more information about Batch Requests, see [How to request data from a Data Asset](/docs/guides/connecting_to_your_data/fluent/batch_requests/how_to_request_data_from_a_data_asset). | ||
|
||
::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are not using the batch to create an expectation suite, would it make sense to remove this section (and re-number the rest)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhhhh I understand now. Yes, you're right. I think I had modeled this a bit "too" closely with the validator-based guide :)
context.sources.add_pandas(name="my_datasource",).add_csv_asset( | ||
name="my_data_asset", filepath_or_buffer="./data/yellow_tripdata_sample_2019-01.csv" | ||
) | ||
|
||
# <snippet name="tests/integration/docusaurus/expectations/how_to_create_and_edit_an_expectationsuite_domain_knowledge.py get_data_asset_and_build_batch_request"> | ||
data_asset = context.get_datasource("my_datasource").get_asset("my_data_asset") | ||
batch_request = data_asset.build_batch_request() | ||
# </snippet> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we aren't using a batch, can we get rid of this section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, you're absolutely right
) | ||
suite.add_expectation(expectation_configuration=expectation_configuration) | ||
# </snippet> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about an assertion about the content of the expectation suite after adding these expectations? I think that would be valuable to make sure that the suite contains all the expectations we added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great idea. Do you mean in the script only, or in the guide too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! Just in the script so it's not just a smoke test. With this assert we will know that as long as this test passes that someone following the guide will get the end result we expect.
Thank you for the review @anthonyburdi. PR has been updated and ready for re-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for making those changes 🙇
* develop: [FEATURE] Add Spark DeltaAsset type (#7872) [DOCS] CLI Edits (#7865) [FEATURE] Spark directory asset types (#7873) [MAINTENANCE] FDS Documentation - Creating ExpectationSuite with Domain Knowledge (#7852) [DOCS] Update "How to use Great Expectations with Databricks" (#7762) [DOCS] removes remaining Block-config Datasource guides (#7870) [FEATURE] Spark file reader support for fluent datasources (#7844)
What does this PR propose?
How to create and edit Expectations based on domain knowledge, without inspecting data directly to be inline with FDS changes, removing references to CLI.
Checklist
Description of PR changes above includes a link to an existing GitHub issue
PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB]
Code is linted
Appropriate tests and docs have been updated
For more details, see our Contribution Checklist, Coding style guide, and Documentation style guide.
After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!