Skip to content

This repositories contains code examples for PyDeequ that can be used to test your data quality at scale on AWS. It covers all the different components present in PyDeequ.

License

Notifications You must be signed in to change notification settings

guptaakashdeep/pydeequ-on-aws

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

pydeequ-on-aws

This repository contains code examples for PyDeequ that can be used to test your data quality at scale on AWS. It covers all the different components present in PyDeequ.

It covers all the different components of PyDeequ:

  • Metrics Computation
    • Analyzers
    • Profilers
  • Constraint Suggestions
  • Constraint Verification
  • Metrics Repositories: This mainly includes examples for storing metrics in S3 and reading metrics from S3.

Refer:

  • pydeequ-on-local.ipynb: For running PyDeequ on your local workstation.
  • Pydeequ-on-EMR.ipynb: For running PyDeequ on EMR.

These notebooks provide working examples that can be downloaded and played around with.

More details on all the PyDeequ Components and when to use which component in different use cases can be found in my blog post here.

About

This repositories contains code examples for PyDeequ that can be used to test your data quality at scale on AWS. It covers all the different components present in PyDeequ.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published