Skip to content

ryancollingwood/datacontract_models

Repository files navigation

datacontract_models

Overview

Data Quality – garbage in, garbage out. We all know that, but it’s hard to know what to do about it. Documenting everything is well-intentioned, but it’s often too detailed or too tedious to be effective. So what’s the solution?

Code.

Using code to spell out the expectations between data producers and data teams can help to ensure that everyone is on the same page and that the data is of the highest quality. This approach is ambitious, but it has the potential to change the way we collaborate and instil confidence in our data.

I shared some thoughts at PyCon Australia 2023 on an approach to improving data quality as part of my role as the Head of Data and Analytics for a fashion retailer. Taking the principles and spirit of collaboration of Data Contracts and applying them in a pragmatic way that works within in my reality. Recording of presentation: https://youtu.be/L9mEGb31snk The slides for my talk are available here: https://docs.google.com/presentation/d/1AJKHWJ4_qX-FgsfUjlmnW1jfwNdJJq7PojEiJ2WznoA/edit

TODO:

  • Better explain the why of this repo with more documentation
  • Document the Meta-Schema
  • Generate concrete Pydnatic classes from the Meta-Schema
  • Identify some other things worth generating (diagrams as code, DBT models)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published