This repository contains code examples for PyDeequ that can be used to test your data quality at scale on AWS. It covers all the different components present in PyDeequ.
It covers all the different components of PyDeequ:
- Metrics Computation
- Analyzers
- Profilers
- Constraint Suggestions
- Constraint Verification
- Metrics Repositories: This mainly includes examples for storing metrics in S3 and reading metrics from S3.
Refer:
- pydeequ-on-local.ipynb: For running PyDeequ on your local workstation.
- Pydeequ-on-EMR.ipynb: For running PyDeequ on EMR.
These notebooks provide working examples that can be downloaded and played around with.
More details on all the PyDeequ Components and when to use which component in different use cases can be found in my blog post here.