Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Referential integrity output needs to be less verbose and more informative #1198

Closed
vijaykiran opened this issue Mar 28, 2022 · 1 comment
Closed

Comments

@vijaykiran
Copy link
Contributor

Given this SodaCL

checks for fact_product_inventory:
  - reference from (product_key) to dim_product (product_key)

the output of running soda scan results in 111455 lines because there are too many rows in fact_product_inventory with a product_key that's not there in dim_product. This makes it very inconvenient to see which keys are missing.

...
483         20140616 2014-06-16    '$36.80'  0        0         4            
483         20140617 2014-06-17    '$36.62'  0        0         4            
483         20140618 2014-06-18    '$36.78'  0        0         4            
483         20140619 2014-06-19    '$36.88'  0        0         4            
483         20140620 2014-06-20    '$36.86'  0        0         4            
483         20140621 2014-06-21    '$36.84'  0        0         4            
483         20140622 2014-06-22    '$37.01'  0        0         4            
483         20140623 2014-06-23    '$37.13'  0        0         4            
483         20140624 2014-06-24    '$37.29'  0        0         4            
483         20140625 2014-06-25    '$37.19'  0        0         4            
483         20140626 2014-06-26    '$37.13'  0        0         4            
483         20140627 2014-06-27    '$37.13'  0        0         4            
483         20140628 2014-06-28    '$37.26'  0        0         4            
483         20140629 2014-06-29    '$37.16'  0        0         4            
483         20140630 2014-06-30    '$37.22'  0        0         4   
...

Suggestion: only show the distinct columns (in this case product_key) that are violating the referential integrity check.

Context: https://soda-community.slack.com/archives/C032BDZN3HV/p1645463084048419

@tombaeyens
Copy link
Contributor

This will be tackled by #1237 Samples upload to Soda Cloud.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants