Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] FDS Documentation - Creating ExpectationSuite with Domain Knowledge #7852

Merged
merged 19 commits into from
May 12, 2023

Conversation

Shinnnyshinshin
Copy link
Contributor

@Shinnnyshinshin Shinnnyshinshin commented May 9, 2023

What does this PR propose?

Checklist

  • Description of PR changes above includes a link to an existing GitHub issue

  • PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB]

  • Code is linted

    black .
    
    ruff . --fix
    
  • Appropriate tests and docs have been updated

For more details, see our Contribution Checklist, Coding style guide, and Documentation style guide.

After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!

anthonyburdi and others added 7 commits April 26, 2023 18:09
* develop: (79 commits)
  [DOCS] Removing datasource centric test_yaml_config doc (#7836)
  [FEATURE] Splitters work with Spark Fluent Datasources (#7832)
  [MAINTENANCE] New PR template (#7710)
  [DOCS] Update how to create a checkpoint with Test YAML config (#7835)
  [DOCS] Updating Checkpoint terms page (#7722)
  [BUGFIX] Adding support for Fluent Batch Requests to context.get_validator (#7808)
  [FEATURE] Plumbing of validation_result_url from cloud response (#7809)
  [MAINTENANCE] Enable S3/Spark Connecting To Your Data tests (#7828)
  [BUGFIX] Handle "persist" directive in "SparkDFExecutionEngine" properly. (#7830)
  [DOCS] Update docs for how_to_initialize_a_filesystem_data_context_in_python (#7831)
  [MAINTENANCE] Clean up: Remove duplicated fixture and utilize deeper filtering mechanism for configuration assertions. (#7825)
  [MAINTENANCE] Enable Spark-S3 Integration tests on Azure CI/CD (#7819)
  [MAINTENANCE] FDS - Datasources can rebuild their own asset data_connectors (#7826)
  [DOCS] Prerequisites Cleanup (#7811)
  [BUGFIX] Azure Package Presence/Absence Tests Strengthening (#7818)
  [RELEASE] 0.16.11 (#7824)
  [MAINTENANCE] Fix pin count. (#7823)
  [BUGFIX] Upper bound `pyathena` due to breaking API in V3 (#7821)
  [MAINTENANCE] Fix linting error. (#7820)
  [BUGFIX] Cloud - Fix FDS Asset has no attribute `_data_connector` (#7813)
  ...
* develop:
  [MAINTENANCE] Prevent TCH001 warnings for pydantic model annotations (#7846)
  [BUGFIX] FDS - Deletes not immediately reflected in `great_expectations.yml` (#7843)
  [MAINTENANCE] Boto import pattern established (#7796)
  [DOCS] Creating a Checkpoint from an In-Memory Dataframe (#7701)
  [MAINTENANCE] ruff `.0.262 -> .0.265` (#7829)
@netlify
Copy link

netlify bot commented May 9, 2023

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit 0d83b0a
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/645e6145375c8d00084b49c7
😎 Deploy Preview https://deploy-preview-7852--niobium-lead-7998.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@ghost
Copy link

ghost commented May 9, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

…ithub.com/great-expectations/great_expectations into d/dx-441/new_guide_edit_expectation_suite

* 'd/dx-441/new_guide_edit_expectation_suite' of https://github.com/great-expectations/great_expectations:
  [MAINTENANCE] Clean up version checker message formatting (#7838)
  [BUGFIX] `batching_regex` tags are now correctly rendered in docs (#7845)
  [MAINTENANCE] Adding docs link checker to invoke (#7841)
  [MAINTENANCE] Pin altair (#7849)
@Shinnnyshinshin Shinnnyshinshin changed the title [MAINTENANCE] FDS Documentation - Editing ExpectationSuite with Domain Knowledge [MAINTENANCE] FDS Documentation - Creating ExpectationSuite with Domain Knowledge May 10, 2023
@Shinnnyshinshin Shinnnyshinshin marked this pull request as ready for review May 10, 2023 16:38
@Shinnnyshinshin Shinnnyshinshin self-assigned this May 10, 2023
Copy link
Member

@anthonyburdi anthonyburdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small items to address.

Comment on lines 95 to 106
### 2. Use an existing Data Asset to create a Batch Request

Add the following method to retrieve a previously configured Data Asset from the Data Context you initialized and create a Batch Request to identify the Batch of data that you'll use to validate your Expectations:

```python name="tests/integration/docusaurus/expectations/how_to_create_and_edit_an_expectationsuite_domain_knowledge.py get_data_asset_and_build_batch_request"
```

:::info Limit the Batches returned by a Batch Request

You can provide a dictionary as the `options` parameter of `build_batch_request()` to limit the Batches returned by a Batch Request. If you leave the `options` parameter empty, your Batch Request will include all the Batches configured in the corresponding Data Asset. For more information about Batch Requests, see [How to request data from a Data Asset](/docs/guides/connecting_to_your_data/fluent/batch_requests/how_to_request_data_from_a_data_asset).

:::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are not using the batch to create an expectation suite, would it make sense to remove this section (and re-number the rest)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhhh I understand now. Yes, you're right. I think I had modeled this a bit "too" closely with the validator-based guide :)

Comment on lines 23 to 30
context.sources.add_pandas(name="my_datasource",).add_csv_asset(
name="my_data_asset", filepath_or_buffer="./data/yellow_tripdata_sample_2019-01.csv"
)

# <snippet name="tests/integration/docusaurus/expectations/how_to_create_and_edit_an_expectationsuite_domain_knowledge.py get_data_asset_and_build_batch_request">
data_asset = context.get_datasource("my_datasource").get_asset("my_data_asset")
batch_request = data_asset.build_batch_request()
# </snippet>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we aren't using a batch, can we get rid of this section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, you're absolutely right

)
suite.add_expectation(expectation_configuration=expectation_configuration)
# </snippet>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about an assertion about the content of the expectation suite after adding these expectations? I think that would be valuable to make sure that the suite contains all the expectations we added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea. Do you mean in the script only, or in the guide too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! Just in the script so it's not just a smoke test. With this assert we will know that as long as this test passes that someone following the guide will get the end result we expect.

tests/integration/test_script_runner.py Outdated Show resolved Hide resolved
@Shinnnyshinshin
Copy link
Contributor Author

Thank you for the review @anthonyburdi. PR has been updated and ready for re-review

@Shinnnyshinshin Shinnnyshinshin requested review from anthonyburdi and a team May 11, 2023 22:40
Copy link
Member

@anthonyburdi anthonyburdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for making those changes 🙇

@Shinnnyshinshin Shinnnyshinshin enabled auto-merge (squash) May 12, 2023 15:54
@Shinnnyshinshin Shinnnyshinshin merged commit c4ed2e1 into develop May 12, 2023
@Shinnnyshinshin Shinnnyshinshin deleted the d/dx-441/new_guide_edit_expectation_suite branch May 12, 2023 16:43
Shinnnyshinshin added a commit that referenced this pull request May 12, 2023
* develop:
  [FEATURE] Add Spark DeltaAsset type (#7872)
  [DOCS] CLI Edits (#7865)
  [FEATURE] Spark directory asset types (#7873)
  [MAINTENANCE] FDS Documentation - Creating ExpectationSuite with Domain Knowledge (#7852)
  [DOCS] Update "How to use Great Expectations with Databricks" (#7762)
  [DOCS] removes remaining Block-config Datasource guides (#7870)
  [FEATURE] Spark file reader support for fluent datasources (#7844)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants