Skip to content

nhat2008/cloud-data-quality

 
 

Repository files navigation

Cloud Data Quality Engine

beta build-test status Code style: black

Introductions

CloudDQ is a cloud-native, declarative, and scalable Data Quality validation Command-Line Interface (CLI) application for Google BigQuery.

It takes as input Data Quality validation tests defined using a flexible and reusable YAML configurations language. For each Rule Binding definition in the YAML configs, CloudDQ creates a corresponding SQL view in your Data Warehouse. It then executes the view and collects the data quality validation outputs into a summary table for reporting and visualization.

CloudDQ currently supports in-place validation of BigQuery data.

  • For a high-level overview of the purpose of CloudDQ, an explanation of the concepts and how it works, as well as how you would consume the outputs, please see our Overview
  • For tutorials on how to use CloudDQ, example use cases, deployment best practices and example dashboards, see the User Manual
  • We also provide a Reference Guide with spec of the configuration and the library reference.
  • For more advanced rules covering more specific requirements, please refer to Advanced Rules User Manual.

Note: This project is currently in beta status and may still change in breaking ways.

Contributions

We welcome all community contributions, whether by opening Github Issues, updating documentations, or updating the code directly. Please consult the contribution guide for details on how to contribute.

Before opening a pull request to suggest a feature change, please open a Github Issue to discuss the use-case and feature proposal with the project maintainers.

Feedback / Questions

For any feedback or questions, please feel free to get in touch at clouddq at google.com.

License

CloudDQ is licensed under the Apache License version 2.0. This is not an official Google product.

About

Data Quality Engine for BigQuery

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 85.4%
  • Shell 10.4%
  • Starlark 3.7%
  • Makefile 0.5%